An Enhanced Transportation System for People of Determination

Perumal, Uma; Jeribi, Fathe; Alhameed, Mohammed Hameed

doi:10.3390/s24196411

Open AccessArticle

An Enhanced Transportation System for People of Determination

by

Uma Perumal

,

Fathe Jeribi

^*

and

Mohammed Hameed Alhameed

College of Engineering and Computer Science, Jazan University, Jazan 45142, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(19), 6411; https://doi.org/10.3390/s24196411

Submission received: 1 July 2024 / Revised: 21 August 2024 / Accepted: 27 August 2024 / Published: 3 October 2024

(This article belongs to the Section Intelligent Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

Visually Impaired Persons (VIPs) have difficulty in recognizing vehicles used for navigation. Additionally, they may not be able to identify the bus to their desired destination. However, the bus bay in which the designated bus stops has not been analyzed in the existing literature. Thus, a guidance system for VIPs that identifies the correct bus for transportation is presented in this paper. Initially, speech data indicating the VIP’s destination are pre-processed and converted to text. Next, utilizing the Arctan Gradient-activated Recurrent Neural Network (ArcGRNN) model, the number of bays at the location is detected with the help of a Global Positioning System (GPS), input text, and bay location details. Then, the optimal bay is chosen from the detected bays by utilizing the Experienced Perturbed Bacteria Foraging Triangular Optimization Algorithm (EPBFTOA), and an image of the selected bay is captured and pre-processed. Next, the bus is identified utilizing a You Only Look Once (YOLO) series model. Utilizing the Sub-pixel Shuffling Convoluted Encoder–ArcGRNN Decoder (SSCEAD) framework, the text is detected and segmented for the buses identified in the image. From the segmented output, the text is extracted, based on the destination and route of the bus. Finally, regarding the similarity value with respect to the VIP’s destination, a decision is made utilizing the Multi-characteristic Non-linear S-Curve-Fuzzy Rule (MNC-FR). This decision informs the bus conductor about the VIP, such that the bus can be stopped appropriately to pick them up. During testing, the proposed system selected the optimal bay in 247,891 ms, which led to deciding the bus stop for the VIP with a fuzzification time of 34,197 ms. Thus, the proposed model exhibits superior performance over those utilized in prevailing works.

Keywords:

bus identification; mean cross-covariance spectral subtraction (MCC-SS); people of determination; bus route identification; recurrent neural network (RNN); radio frequency identification (RFID); bus bay detection and transport guidance system

1. Introduction

The World Health Organization has stated that there are approximately 290 million VIPs worldwide. Self-navigation is a major issue for these individuals in society [1]. VIPs depend on others to accomplish daily tasks, due to their inability to recognize objects, and have difficulty in locating transport services and bus stations [2]. Primarily, research has focused on detecting obstacles to enable safer movement of VIPs. To inform VIPs about the type of obstacle, its distance from the person, and its position, these objects can be detected using deep learning techniques [3].

Through utilizing a transfer learning technique, various objects can be detected in real time [4]. However, to ensure the safety of VIPs, it is necessary to detect moving objects near them. Thus, to alert users, a voice-assisted smart stick has been designed using Radio Frequency Identification (RFID) technology and ultrasonic sensors [5]. Utilizing OpenStreetMap and the General Transit Feed Specification (GTFS), a multi-modal route planning system has been developed to guide VIPs [6]. After planning routes, the safer navigation of VIPs is ensured through the utilization of a Web-Based Application (WBA). The object information is forwarded to the MobileNet architecture by the WBA camera. Regarding objects, MobileNet produces audio messages via a speech module [7]. In this way, the user can recognize obstacles, including vehicles, shopping stores, and traffic signals, using a deep attention network [8].

To date, various wearable devices based on GPS have been developed to facilitate access to public transport [9,10]. VIPs can be informed about the current location and arrival of buses through an Application Interface (API) and learned database of the device [11]. Generally, bus detection systems are constructed with microcontrollers, RF modules, sensors, and Bluetooth [12]. Utilizing the BlindMobi app, bus travel in urban centers is made easier for VIPs [13]. A VIP at a bus station can be recognized by the corresponding bus driver via the user’s RFID tag [14]. The data from such a tag are forwarded to an RFID reader and alerts the bus driver about the person’s destination via voice messages [15]. However, no study in the existing literature has concentrated on guiding VIPs at a bus station which contains multiple bus lines. Thus, this study proposes an optimal bay selection method using the EPBFTOA approach, enabling VIPs to better access bus transport.

Problem Statement

The problems noticed in prevailing works are described below:

-: A bus station containing more than one bus line is not considered in most of the prevailing works, thus causing difficulty for VIPs in recognizing the relevant bus line.
-: The route information accessed from a single source is inefficient for VIPs who are making journeys.
-: The VIP still needs human guidance to identify bus routes and bus numbers, or they need to enquire of bus coordinators.
-: When there is a larger queue of buses at the bus station, it is difficult for VIPs to identify the correct bus, especially when the desired bus is farther away.
-: The conversion of the VIP’s voice data can be affected by the presence of background noise, resulting in the wrong information being attained.
-: In some existing works, computer vision has been used to capture the user’s surroundings for identification of buses. However, this approach is inefficient when nearby buses are not the appropriate transport for VIPs.
-: Main goal: The main goal of the proposed study is to detect and select the optimal bay, as per the visually impaired person’s query, in order to promote the accessibility of bus transport. Through selection of the optimal bay, the difficulties and obstacles faced by the VIP regarding the determination of the location of the desired bus in the bus station are mitigated. Thus, based on the recognition of the optimal bay, bus transportation becomes more efficient for the visually impaired. The major contributions or objectives of the proposed transportation system for the visually challenged are further mentioned.

The proposed work’s main objectives are as follows:

-: The optimal bay among the detected bus bays is selected using the EPBFTOA method, in order to recognize the appropriate bus line in the bus station.
-: To access the correct bus, information from multiple sources is obtained, including the GPS location, voice data, bus image, and bus route.
-: To minimize the need for human guidance, the VIP’s speech data are acquired using an Internet of Things (IoT) application via a mobile device to retrieve the details of the desired bus.
-: The VIPs face difficulty in reaching the correct bus among the large queue of buses. Thus, an RFID sensor is utilized to inform the respective bus drivers who should carry the VIP.
-: To retrieve correct information from the VIP, their voice data are pre-processed with the Mean Cross-Covariance Spectral Subtraction (MCC-SS) approach in order to remove background noise.
-: Utilizing the SSCEAD method, the text is segmented from the bus, and the Levenshtein Distance (LD) measure is used to determine the route similarity.

The remainder of this paper is arranged as follows: The related works of this paper are discussed in Section 2. The proposed method is described in Section 3, and its performance is examined in Section 4. Finally, Section 5 concludes this paper with future scope.

2. Literature Survey

Authors [16] proposed a technique named ‘My Vision’, which enables VIPs to identify bus route numbers. Through the Lucas–Kanade tracker of My Vision images, the arriving bus is captured. Next, the bus board area is acquired utilizing the Random Forest (RF) technique. Moreover, the route number is extracted utilizing the pattern-matching approach. In addition, the detection rate of the proposed method was enhanced, when compared to traditional techniques. However, the relationships among features could not be evaluated by RF, potentially resulting in the inaccurate detection of bus route numbers.

Authors [17] recommended a navigation system for VIPs in a multi-obstacle scenario. Through a query processor, the person’s query was accessed. The YOLO version 3 (YOLOv3) model is used to detect various obstacles, and the optimal path is selected utilizing the Environment-aware Bald Eagle Search algorithm. The performance was improved regarding latency and detection accuracy; however, the Actor–critic algorithm utilized for navigation decisions led to trade-offs, and the decisions made were not reliable.

Authors [18] established a wearable device for the safe traveling of VIPs. The bus is detected utilizing the YOLOv3 technique, and the bus board is segmented with a transfer learning technique. Next, the bus number obtained is transformed into a voice. The bus board detection accuracy was improved over prevailing techniques; however, smaller objects could not be recognized by the anchor box of YOLOv3, thus limiting the efficiency.

Authors [19] explored a device to assist in the navigation of blind persons. The user is informed to make a decision regarding the safer path through the use of a speech generation device. The Robot Operating System minimized the occurrence of distractions. Next, through a Fuzzy logic system, the safe directions are issued to the VIP. When compared to other approaches, the recommended device attained a lower collision rate; however, the Fuzzy rules were simply developed based on assumptions, and thus, an accurate decision was not produced.

Authors [20] suggested a smart glass system for the independent movement of blind persons at night-time. To detect the object accurately using the U2-Net model, the path image is pre-processed and represented as a tactile graph. Moreover, the text in the image is converted to speech. When compared to other networks, this model detected objects with higher accuracy and precision; however, the Tesseract model could not effectively extract text from low-quality and poorly lit images.

Authors [21] proposed a Commute Booster (CB) mobile application for the navigation support of blind persons. From the GTFS dataset, way-finding footage is obtained. Next, the Optimal Character Recognition (OCR) system is utilized to enable VIPs to identify the relevant path. When compared to prevailing techniques, the performance of the presented approach was improved, regarding precision, accuracy, and f1-score; however, the OCR could not process image data in different formats.

Authors [22] presented a lightweight bus detection approach for VIPs utilizing the YOLO network. To detect buses in real-time, the structure of YOLO is modified with a slim scale detection module. This model detected the bus with higher accuracy and precision; however, the bus detection performance was non-optimal, as the YOLO model was processed with fewer parameters.

Authors [23] proposed a framework named ‘Vision Navigator’ for the blind and VIPs utilizing a Recurrent Neural Network (RNN). To detect the presence of an object in the path, a stick utilizing the single-shot mechanism is created. Moreover, obstacles within shorter distances are detected via sensor-equipped lightweight shoes. The developed framework could detect obstacles with a high accuracy rate; however, the RNN did not learn complex data patterns due to the gradient vanishing problem, which degraded the framework’s efficiency.

Authors [24] established a mobile application for VIPs to identify bus stops. The bus stop signs were detected through a mobile camera with the All-Aboard Application (AAA) utilizing a neural network. The VIP can assess a bus stop within a distance of 30 to 50 m via AAA. Regarding the distance between actual bus stop locations and the indicated location, the performance was analyzed. However, for accurate detection, the application needed a large number of labeled data and high-quality images.

Authors [25] explored a wearable device to assist VIPs in navigation. The device was developed for use with a camera embedded on a smartphone or eyeglasses. Utilizing a Convolutional Neural Network (CNN), the object detection system was developed and deployed in the smart phone. The performance was evaluated regarding its efficiency and safety. However, its capacity for object detection was limited by the usage of a CNN, as it could not efficiently learn sequential data.

The related works are comparatively summarized in Table 1.

3. Proposed Methodology

Using the MNC-FR and EPBFTOA methods, the proposed model comprises a transportation system that better enables VIPs to travel by bus. First, the bus bays are detected, and an image is captured from the optimal bay. Then, the text on buses is extracted from the captured image, in order to determine their destinations. Finally, a decision is made, and the VIP and bus conductor are consequently informed. Figure 1 depicts the proposed model’s architecture.

3.1. Speech Input

Initially, through utilizing an IoT application, the destination details of the VIP are gathered in the form of speech from their mobile device. Let the input speech

(A)

be signified as

A = [A^{1}, A^{2}, A^{3}, A^{4}, \dots, A^{q}],

(1)

where the number of input words is denoted as

(q)

. The input

(A)

is then pre-processed as follows.

3.2. Speech Pre-Processing

Next, employing the MCC-SS method, the voice data

(A)

are pre-processed to remove the background noise. To obtain a clean speech signal, Spectral Subtraction (SS) is used, which subtracts the noise spectrum present in the original audio. Nevertheless, fixed parameters that do not adapt to the noise level in the input are utilized as a subtraction factor. To mitigate this issue, the Mean Cross-Covariance (MCC), which analyzes each signal of the input, is employed to determine the subtraction parameter. The MCC-SS process is described as follows:

The cross-covariance

(α)

between signals

(E, F)

present in the speech input

(A)

is computed as

α = \sum_{q}^{A} (E - \hat{E}) \times (F - \hat{F}),

(2)

where the mean value of

(E, F)

is denoted as

(\hat{E}, \hat{F})

. The value

(α)

, which is the obtained subtraction factor, is utilized to remove the input’s background noise. It is calculated as

A^{'} = A - α,

(3)

where the pre-processed audio is signified as

(A^{'})

.

3.3. Speech-to-Text Conversion

The pre-processed speech

(A^{'})

is additionally converted into the text format, in order to obtain the VIP’s destination and detect the bus bay in the road traffic. The text format is also employed to check the similarity between the VIP’s bus destination and bus route. Let the text converted from speech

(A^{'})

be denoted as

(T)

.

3.4. Bus Bay Detection

Utilizing ArcGRNN, the number of bus bays for the VIP is detected. The inputs employed for detecting the bus bays are as follows:

-: The VIP’s destination $(T)$ , in the form of text.
-: The VIP’s location $(β)$ , based on GPS.
-: Bay details $(d)$ concerning numerous locations, obtained from the cloud database.

An RNN, which can handle inputs of varying lengths and provides an output ranging from single- to multi-class classification, was utilized for bus bay detection. Even though the RNN can process memory and binary data, the vanishing and exploding gradient problems may occur during the backpropagation process. Therefore, to prevent this issue, the Arctan Gradient Activation Function (Arc-GAF), which automatically enlarges small inputs and blocks large inputs, was used as an activation function in the RNN. Figure 2 illustrates the framework of ArcGRNN.

The ArcGRNN process is described as follows:

-: Input Layer

The inputs required for the detection of bus bays are combined as follows:

D_{s} = T + β + d,

(4)

where the VIP’s location is denoted as

(β)

, which is employed to identify the number of bays present in the specified location. From the cloud database, the bay details

(d)

of various locations are obtained. The input

(D_{s})

concerning time

(s)

is passed to the hidden layer for further processing.

-: Hidden Layer

The hidden layer collects the information from the previous output and computes the input along with the activation function. The designed hidden layer’s input

[η_{H}]

is given by

η_{H} = [(D_{s} \times ω_{D}) + (H_{s - 1} \times ω_{H})] + b,

(5)

where the hidden layer’s previous output concerning time

(s - 1)

is signified as

(H_{s - 1})

,

(ω_{D}, ω_{H})

are the weights of the input and hidden layers, respectively, and the bias value is denoted as

(b)

.

Next, utilizing Arc-GAF, the computed value of the hidden state is activated. The Arc-GAF takes in the required input to avoid the gradient-related issues of RNNs. The Arc-GAF activation function

(ϕ)

is given by

ϕ = \frac{g \times h}{1 + {[h * (D_{s})]}^{2}} - 1,

(6)

where the parameters of the input

(D_{s})

are denoted as

(g, h)

. Then, the hidden layer’s output

(H_{s})

is calculated as follows:

H_{s} = η_{H} (H_{s}) * ϕ .

(7)

The output

(H_{s})

is further passed on to the output layer to obtain the final classification output.

-: Output layer

Regarding

(H_{s})

and the weight

(ω_{H})

of the hidden layer, the output layer

[η_{B}]

is computed as follows:

η_{B} = (H_{s} \times ω_{H}) + b .

(8)

Utilizing

(ϕ)

, the value

[η_{B}]

is activated to give the final output

(B_{s})

, as follows:

B_{s} = [η_{B}] * ϕ,

(9)

B_{s} = \{B_{s}^{1}, B_{s}^{2}, B_{s}^{3}, \dots, B_{s}^{k - 1}, B_{s}^{k}\},

(10)

where

(B_{s})

are the detected bays for the respective VIP, and the number of detected bays is denoted as

(k)

. The pseudocode for the ArcGRNN model is given below as Algorithm 1:

Algorithm 1 Pseudocode for ArcGRNN

Input: VIP’s destination

(T)

, location

(β)

, bay details

(d)

Output: Detected bay

(B_{s})

Begin
Initialize parameters

(ω_{D}, ω_{H})

,

(b)

For

(s)

While input

D_{s} = T + β + d

Calculate Hidden layer input

η_{H} = [(D_{s} \times ω_{D}) + (H_{s - 1} \times ω_{H})] + b

Evaluate activation function

(ϕ)

ϕ = \frac{g \times h}{1 + {[h * (D_{s})]}^{2}} - 1

Hidden layer output

H_{s} = η_{H} (H_{s}) * ϕ

Vectorize output layer’s input

η_{B} = (H_{s} \times ω_{H}) + b

Find final output

B_{s} = [η_{B}] * ϕ

End while
End for
Obtain detected bays

(B_{s})

End

Next, optimal bays are chosen from the detected bus bays

(B_{s})

, as described below.

3.5. Optimal Bay Selection

Subsequently, utilizing EPBFTOA, the optimal bay is chosen from the detected bays

(B_{s})

. The optimal bay is the bay line of the bus station, which should be utilized by the VIP to catch the desired bus. In this context, for optimal bay selection, the Bacteria Foraging Optimization Algorithm (BFOA), which mimics the foraging strategy of bacteria to select the best value, is utilized. Nevertheless, BFOA may suffer from premature convergence, thereby affecting the outcome of the optimizer. To solve this issue, the Experienced Perturbed Adaptive Search (EPAS) mechanism with Triangular Mutation is utilized in BFOA, which computes health and reproduction parameters for the bacteria to overcome the abovementioned premature convergence problem.

The EPBFTOA is explained below:

-: Initialization

The detected bays

(B_{s})

, which are considered as bacteria, are the search agents. There are

(k)

number of bacteria to search the nutrients (number of bays). The

(f th)

bacterium’s position is initialized as follows:

B_{s}^{f} (δ, Z, μ) = (B_{s}^{1}, B_{s}^{2}, B_{s}^{3}, \dots, B_{s}^{k}),

(11)

where

(B_{s}^{f})

is the bacteria’s position concerning the chemotaxis

(δ)

,

(Z)

is the reproduction value, and

(μ)

is the elimination dispersal step parameters.

-: Fitness

Concerning the minimum distance

(λ)

between bacteria and nutrients, the fitness value

(θ)

that is employed to obtain the optimal bay is computed. The fitness function

(θ)

is defined as

θ = \min (λ) .

(12)

The EPBFTOA encompasses four foraging strategies; namely, chemotaxis, swarming, reproduction, and elimination and dispersal.

-: Chemotaxis

In this strategy, a bacterium chooses a favorable environment by swimming and tumbling. The bacterium’s movement concerning the swimming step

(R)

and swimming direction

(v)

is given by

B_{s}^{f} (δ + 1, Z, μ) = B_{s}^{f} (δ, Z, μ) + \frac{R (f) \times l}{\sqrt{v^{T} (f) \times v (f)}} * v (f),

(13)

where the number of swimming steps taken by the

(f th)

bacteria is signified as

(l)

.

-: Swarming

The bacteria’s swarming behavior

C (B_{s}^{f})

after chemotaxis is centered on the attraction

(a)

and repulsion

(r)

of the bacteria, defined by

\begin{array}{l} C (B_{s}^{f}) = \sum_{f = 1}^{k} \{- x^{a} \times \exp [y^{a} \sum {(B_{s}^{f} (δ, Z, μ) - B_{s}^{f} (δ + 1, Z, μ))}^{2}]\} \\ + \sum_{f = 1}^{k} \{x^{r} \times \exp [y^{r} \sum {(B_{s}^{f} (δ, Z, μ) - B_{s}^{f} (δ + 1, Z, μ))}^{2}]\} \end{array},

(14)

where the depth and width regarding bacterial attraction are denoted as

(x^{a}, y^{a})

, and

(x^{r}, y^{r})

are the depth and width regarding bacterial repulsion.

The bacteria’s new position

{\hat{B}}_{s}^{f} (δ + 1, Z, μ)

regarding swarming is given by

{\hat{B}}_{s}^{f} (δ + 1, Z, μ) = B_{s}^{f} (δ + 1, Z, μ) + C (B_{s}^{f}) .

(15)

-: Reproduction

In this strategy, healthier bacteria are located to attain better optimization outcomes. Initially, centered on EPAS, the healthier bacteria

({\hat{B}}_{b e s t})

are computed. Regarding the fitness function, the best bacteria are identified by the EPAS as follows:

{\hat{B}}_{b e s t} = {\hat{B}}_{m e a n} + (τ \times {\hat{B}}_{d e v}),

(16)

{\hat{B}}_{m e a n} = \frac{({\hat{B}}_{s}^{f} + θ)}{2},

(17)

{\hat{B}}_{d e v} = {\hat{B}}_{s}^{f} - θ,

(18)

where the mean and deviation of the healthier bacteria with random values

(τ)

are denoted as

({\hat{B}}_{m e a n}, {\hat{B}}_{d e v})

. Next, the reproduction value

{\hat{B}}_{s}^{f} (δ + 1, Z + 1, μ)

is obtained by Triangular Mutation, which gives the output concerning the best

({\hat{B}}_{b e s t})

, the worst

({\hat{B}}_{w o r})

, and better

({\hat{B}}_{b e t})

bacteria, as follows:

{\hat{B}}_{s}^{f} (δ + 1, Z + 1, μ) = {\hat{B}}_{b e s t} + p_{1} ({\hat{B}}_{b e s t} - {\hat{B}}_{b e t}) + Z,

(19)

Z = p_{2} ({\hat{B}}_{b e s t} - {\hat{B}}_{w o r}) + p_{3} ({\hat{B}}_{b e t} - {\hat{B}}_{w o r}),

(20)

{\hat{B}}_{w o r} = {\hat{B}}_{s}^{f} - {\hat{B}}_{b e s t},

(21)

{\hat{B}}_{b e t} = {\hat{B}}_{s}^{f} \approx {\hat{B}}_{b e s t},

(22)

where the mutation factors of the bacteria are signified as

(p_{1}, p_{2}, p_{3})

.

-: Elimination and Dispersal

After reproduction, the bacteria with lower probability dies and provide their optimal solution, with respect to their random dispersion

(κ)

and probability

(P)

, as follows:

{\hat{B}}_{s}^{f} (δ + 1, Z + 1, μ + 1) = \{\begin{cases} {\hat{B}}_{s}^{f} (δ + 1, Z + 1, μ) \forall (κ > P) \\ B^{*} \forall (κ < P) \end{cases} .

(23)

Thus, through using the EPBFTOA approach, the optimal bus bay value

(B^{*})

is obtained. The pseudocode for this optimizer is given below in Algorithm 2.

Algorithm 2 Pseudocode for EPBFTOA

Input: Detected Bay

(B_{s})

Output: Optimal Bus Bay

(B^{*})

Begin
Initialize bacteria population

(B_{s}^{f})

, Iteration

(\int, \int^{\max})

,

(κ, P)

B_{s}^{f} (δ, Z, μ) = (B_{s}^{1}, B_{s}^{2}, B_{s}^{3}, \dots, B_{s}^{k})

Calculate fitness

θ = \min (λ)

While

(\int \leq \int^{\max})

For

(θ)

Move bacteria by chemotaxis

B_{s}^{f} (δ + 1, Z, μ)

Swarm bacteria

{\hat{B}}_{s}^{f} (δ + 1, Z, μ) = B_{s}^{f} (δ + 1, Z, μ) + C (B_{s}^{f})

Reproduce bacteria

{\hat{B}}_{s}^{f} (δ + 1, Z + 1, μ) = {\hat{B}}_{b e s t} + p_{1} ({\hat{B}}_{b e s t} - {\hat{B}}_{b e t}) + Z

Eliminate and disperse

{\hat{B}}_{s}^{f} (δ + 1, Z + 1, μ + 1)

If

(κ < P)

Optimal bus bay

(B^{*})

Else
Original position

{\hat{B}}_{s}^{f}

End if
End for
End while
Return optimal bus bay

(B^{*})

End

After the selection of the optimal bay

(B^{*})

, the image is captured using the mobile device of the VIP. Then, the image is pre-processed as detailed below.

3.6. Image Capturing and Preprocessing

From the optimal bay

(B^{*})

, bus images are captured. These are then pre-processed for noise removal and contrast enhancement. Let

(j)

the number of images

(G)

captured via the mobile device of VIPs be represented as

G = [G^{1}, G^{2}, G^{3}, \dots, G^{j}] .

(24)

In this way, unwanted artifacts are removed and the quality of the image is enhanced to make further processing more effective. The pre-processing of

(G)

is described below.

3.6.1. Step 1: Noise Removal

Due to noises such as salt and pepper and speckle noise, the quality of the image is affected. Therefore, the Median Filter (MF)—which eliminates noisy pixels—is utilized for the removal of noise from the image

(G)

, which is calculated as follows:

G^{″} = G (t, u) \times \frac{(t + u)}{2},

(25)

where the coordinates of pixels in the image

(G)

are denoted as

(t, u)

, and the noise-removed image, which is used for subsequent contrast enhancement, is denoted as

(G^{″})

.

3.6.2. Step 2: Contrast Enhancement

Histogram Equalization (HE), which adjusts the contrast using the image’s histogram, is used to make the image clearer for further processing. The HE is elaborated as follows:

Regarding histogram values such as the pixel value

(c)

and number of pixels

(Π)

in the image

(G^{″})

, the probability of occurrence

(J)

is given by

J = \frac{{(Π)}_{c}}{Π} \forall (0 \leq c \leq W),

(26)

where the number of pixels regarding

(c)

is denoted as

(Π_{c})

, and the total grayscale value of

(G^{″})

is denoted as

(W)

. Then, the cumulative distribution function

(ψ)

is computed as:

ψ = \sum_{c} J (G^{″}) .

(27)

Finally, the image’s contrast

(G^{″})

is improved as follows:

G^{*} = ψ \times G^{″} .

(28)

The contrast-enhanced image

(G^{*})

is the pre-processed image, and the bus present in this image is detected as explained below.

3.7. Bus Detection

Through employing a YOLO series model, buses are detected in the pre-processed image

(G^{*})

. The YOLO model splits the images into grids and detects objects concerning the bounding boxes of the grids. The YOLO model’s process is explained in the following.

Initially, the image

(G^{*})

is split into a

(M \times M)

grid pattern. These grids have a number

(e)

of bounding boxes

(U)

, which are signified as:

U = 〈U_{1}, U_{2}, U_{3}, \dots, U_{e}〉 .

(29)

Each bounding box, with height

(z)

, width

(w)

, and coordinates

(l, m)

is denoted as:

U \to U (z, w, l, m) .

(30)

The bounding boxes might overlap each other, and therefore, the degree of overlap

(ς)

concerning bounding boxes

(U_{1}, U_{2})

is computed as:

ς = \frac{U_{1} \cap U_{2}}{U_{1} \cup U_{2}} .

(31)

Finally, utilizing

(U)

and

(ς)

, the objects can be detected, which are given as:

N = U (z, w, l, m) * ς .

(32)

A bus detected in the image is symbolized as

(N)

and from this, the text is identified and segmented for further analysis.

3.8. Text Identification and Segmentation

Next, for identification of the destination of the bus, the text present in the detected bus image

(N)

is identified and segmented. For this purpose, Encoder–Decoder (ED) processing is carried out. In this process, a CNN, which analyzes each input accurately, is utilized for encoding. To enhance the learning ability of the CNN, the Sub-pixel Shuffling Convolution (SSC) strategy is used in the convolution layer, thus expanding the receptive field of the CNN. A Bidirectional Long Short-Term Memory (BiLSTM) model is typically used as a decoder in the ED process; however, text identification becomes difficult, as BiLSTM models have high complexity. Thus, to enhance the text identification and segmentation effect, ArcGRNN is used as the decoder in this study. The SSCEAD process is elaborated further in the following.

-: Encoder

Utilizing the Sub-pixel Shuffling Convolution Encoder (SSCE), the text is first recognized in the image

(N)

. Initially, the pixels

(K \times L)

of the input image are reshuffled in the convolutional layer of the CNN using SSC. Then, the encoder process is executed for the reshuffled image. The image

(\bar{N})

that is obtained using SSC, with an up-sampling factor

(i)

, is given as

\bar{N} = [i (K) \times i (L)] .

(33)

Then, regarding the image

(\bar{N})

and the Rectified Linear Unit (ReLU) activation function

(℘)

, the convolutional layer’s output

(Q)

is calculated as

Q = \{[\bar{N} \times ϖ_{Q}] + o\} * ℘,

(34)

℘ = \max (ε^{1}, ε^{2}),

(35)

where the coordinates of the image

(\bar{N})

are

(ε^{1}, ε^{2})

, the weight value is denoted as

(ϖ_{Q})

, and the bias value of the image

(\bar{N})

is signified as

(o)

. The value

(Q)

is max-pooled and fully connected to give the encoded output

(I)

:

I = [\sum (\max (Q) \times ϖ_{I}) + o] * ℘,

(36)

where the text-detected image is denoted as

(I)

. This image is then decoded as follows:

-: Decoder

In order to determine the destination of the bus, the text-identified image

(I)

is decoded. The ArcGRNN decoder methodology, which processes the input concerning the prior information, enlarges the input size and blocks the large input (note that the process of the ArcGRNN technique is explained in Section 3.4). The image

(I)

is processed via the hidden and output layers, along with Arc-GAF. The text-segmented image

(S)

, which represents the bus destination, is the obtained output. Subsequently, the text is extracted from

(S)

as detailed below.

3.9. Text Extraction

Next, regarding the bus destination image

(S)

, the bus route is identified utilizing the Tesseract API, which is an OCR model. In this model, the text is extracted automatically from

(S)

. Initially, the fixed pitch of the text present in the image is found. After that, the characters are split into words and are automatically recognized and extracted in text form for the fixed pitch. Let the extracted text be signified as

(n)

, which is the destination of the bus in text form. Therefore, with respect to the bus route details obtained from the cloud database, the bus route

(V)

is identified from

(n)

. Subsequently, to make the decision command for transportation, the similarity between the VIP’s destination and the route and destination of the bus is checked.

3.10. Similarity Check

Next, the similarity analysis between the bus destination

(n)

, bus route

(V)

, and VIP’s destination

(T)

is performed. For similarity analysis, the LD, which measures the similarity between two parameters, is used. First, the similarity between

(n)

and

(T)

is checked as follows. Let the length of

(n)

be

(γ)

and length of

(T)

be

(ƛ)

. Then, the similarity

(O)

between

(n)

and

(T)

is calculated as

O (n, T) = \{\begin{cases} 0 & i f (ƛ = 0) \\ 0 & i f (γ = 0) \\ 1 & i f (γ = ƛ) \\ \min \{\begin{cases} O (γ - 1, ƛ) + 1 \\ O (γ, ƛ - 1) + 1 \\ O (γ - 1, ƛ - 1) + 1 \end{cases} & (o t h e r w i s e) \end{cases} .

(37)

Equation (37) states that the similarity value

(O)

becomes higher when the length of the destinations is similar. In addition, if there are changes in lengths, then the minimum conditions are followed to obtain the similarity score. If

(O)

is high, this similarity value is used for decision-making. Otherwise, the similarity

(O^{*})

between

(V)

and

(T)

is checked further:

O (n, T) = \{\begin{cases} 0 & i f (ƛ = 0) \\ 0 & i f (σ = 0) \\ 1 & i f (σ = ƛ) \\ \min \{\begin{cases} O (σ - 1, ƛ) + 1 \\ O (σ, ƛ - 1) + 1 \\ O (σ - 1, ƛ - 1) + 1 \end{cases} & (o t h e r w i s e) \end{cases} .

(38)

Here, the length of the words present in the bus route is denoted as

(σ)

. The similarity value is finally given for decision-making, as detailed below.

3.11. Decision-Making

Finally, regarding the similarity score between the VIP destination and the bus destination

(O)

, as well as the similarity score between the bus route and the VIP destination

(O^{*})

, the decision is made to inform both the VIP and the bus conductor. In this context, the Fuzzy Rule (FR), which is used to analyze the input parameters efficiently, is utilized for decision-making. However, a probability measure that might make the output value zero is utilized in the Fuzzy logic approach. Therefore, a solution in decision-making may not be attained with the FR. Therefore, to mitigate this issue, a Multi-characteristic Index (MI) that gives non-zero output in Fuzzy logic is employed to compute the Fuzzy relationship. In addition, the Non-linear S-Curve (NC) membership function, which explains the certainty of the Fuzzy inputs, is used rather than the Fuzzy membership function. The MNC-FR approach is described further below.

-: Rule

Initially, based on the if–then condition, the rules

(ℜ)

are set for the decision-making as follows:

ℜ = \{\begin{cases} i f O i s h i g h t h e n Ω \\ i f O i s l o w t h e n Θ \\ i f O^{*} i s h i g h t h e n Ω \\ i f O^{*} i s h i g h t h e n Θ \end{cases},

(39)

where the decision-making factor is denoted as

(Ω)

and the non-decision factor is signified as

(Θ)

. The rule states that the decision to inform the VIP and the bus conductor is made when the similarity scores are high; otherwise, no decision is made. The image capturing process continues until a high similarity is obtained.

-: Membership Function

The NC membership function

(X)

is used to change the output value automatically, concerning the input data. To find the degree of relationship to the input parameter, the NC membership function is used. Therefore,

(X)

is calculated as

X = \{\begin{cases} 1 & \forall (ℵ < \bar{ℵ}) \\ 0.999 & \forall (ℵ = \bar{ℵ}) \\ ℏ / (1 + (υ * e^{ℵ})) & \forall (\bar{ℵ} < ℵ < ℵ^{″}) \\ 0.0001 & \forall (ℵ = ℵ^{″}) \\ 0 & \forall (ℵ > ℵ^{″}) \end{cases},

(40)

where the scaling parameters of the input are denoted as

(ℵ, \bar{ℵ}, ℵ^{″})

, and the constant values are signified as

(ℏ, υ)

. According to Equation (40), the membership function values are obtained in the range of 0 to 1 regarding the scaling parameter conditions.

-: Fuzzification

In this step, the input data are converted to Fuzzy data

(ϑ)

, such that the FR methodology can be used to enable further processing. Let the inputs

(O)

and

(O^{*})

be combined and represented by

(Ξ)

. The fuzzification is then performed as follows:

Ξ \to ϑ .

(41)

-: Fuzzy Relationship

To make the final decision, the relationship between the Fuzzy data

(ϑ)

is determined. To calculate the Fuzzy relationship, the MI that gives the optimal decision accurately is used, which is given as

Y = \frac{(\sum ϑ * X (ℜ))}{X},

(42)

where the final decision obtained using the MNC-FR methodology is denoted as

(Y)

. These data are then converted back into crisp data.

-: Defuzzification

For the purpose of intimation, the data

(Y)

are converted into measurement data

(Y^{*})

as follows:

Y^{*} = (\sum Y \times X) / \sum X .

(43)

Hence, the decision is made from

(Y^{*})

to guide the VIP onto the designated bus through giving information to the VIP through an RFID signal. Simultaneously, it also informs the bus conductor when to stop the bus, taking the VIP’s current location into consideration. Therefore, an enhanced transportation system for the VIP to travel by bus can be designed through the utilization of the proposed methodology. In Section 4, the performance analysis of the proposed approach is detailed.

4. Results and Discussions

The performance of the proposed model was evaluated by comparing it with existing approaches. The proposed work was implemented using the PYTHON 3.11 software in order to analyze the performance.

4.1. Dataset Description

Bus identification was performed utilizing the data in a public bus transport dataset regarding Dubai Bus Transportation. The dataset contains the route ID, trip ID, stop ID, bus stop name, and number of boardings data. By utilizing the Geospatial Bus Route Analysis (GBRA) dataset, the bus route similarity based on the speech requirements of VIPs was determined. In the GBRA dataset, the Bus Route Transit (BRT) route details, non-BRT route details, route type, route description, and bus stop names are present. Moreover, the proposed model was validated utilizing the Microsoft Common Objects in Context (MS-COCO) dataset. This dataset consists of 328,000 images in more than 80 object categories. From all these datasets, the data were split with a ratio of 70:20:10 for training, testing, and validation of the proposed model, respectively.

Table 2 displays the obtained results for the captured bus images, noise-removed images, contrast-enhanced images, and text-segmented images using the proposed bus identification method.

4.2. Performance Assessment

The proposed system’s performance was examined with respect to optimal bay selection, decision-making, text detection, and similarity between the bus route and voice data of the VIPs. Through comparing the proposed method with state-of-the-art algorithms including BFOA, Manta Ray Foraging Optimization (MRFO), African Vultures Optimization Algorithm (AVOA), and Bald Eagle Search Optimization (BESO), the time taken to select the optimal bay and fitness over iteration were evaluated.

Table 3 analyzes the time and fitness metrics of the proposed technique. In order to select the optimal bay, the proposed EPBFTOA took 247,891 ms, less than that of the prevailing techniques; in particular, for optimal bay selection, the BFOA, MRFO, AVOA, and BESO approaches took 389,124 ms, 596,757 ms, 804,628 ms, and 997,245 ms, respectively. The proposed algorithm selected the optimal bay within a minimum duration as the proposed algorithm compensates for the premature convergence problem in the selection process. When compared to the prevailing algorithms, the average fitness (9.068) attained by the proposed algorithm was lower, indicating the proposed EPBFTOA technique had enhanced performance in selecting the optimal bay.

Figure 3 represents the performance of decision-making to reach the relevant bus for the VIP using the proposed MNC-FR technique. It can be observed that the MNC-FR technique performed fuzzification within 34,197 ms and de-fuzzification in 31,073 ms, and thus, generated the decision rule within 79,932 ms. When compared to prevailing techniques such as the FR, Trapezoidal Fuzzy Rule (Tp-FR), Triangular Fuzzy Rule (Tr-FR), and Decision Rule (DR), the decision time of the proposed technique was shorter. The proposed MNC-FR takes less time to determine the relevant bus as the similarity between the bus route and voice data is analyzed before generating a decision.

Figure 4 illustrates the proposed model’s performance in detecting the desired bus in comparison with the existing models. Regarding its accuracy, precision, recall, and f-measure, the proposed technique’s performance was analyzed in comparison with other techniques including the RNN, Long Short-Term Memory (LSTM), Deep Neural Network (DNN), and Artificial Neural Network (ANN) models. The proposed system detected bus bays with 96.932% accuracy, 95.872% precision, 96.328% recall, and 95.363% f-measure, which were all higher when compared to those of conventional network models. In particular, the average accuracy, precision, recall, and f-measure attained by the prevailing techniques were 92.743%, 92.294%, 92.891%, and 93.016%. The ArcGRNN model effectively detected the bay, as it was detected based on the GPS location and cloud database.

Table 4 details the bus bay detection performance with respect to specificity, sensitivity, and processing time. It can be observed that the proposed approach achieved 95.714% specificity and 96.328% sensitivity, which are higher compared to those of the prevailing networks. In particular, the detection performance of the proposed model was enhanced due to the mitigation of the gradient vanishing problem. Moreover, the proposed model carried out processing within a much shorter time (174,405 ms), when compared to the average time taken by the existing models (505,242 ms). This indicates that the bus bay is efficiently identified by the proposed network.

As specified in Figure 5, the bus bay detection performance was evaluated according to the False Positive Rate (FPR) and False Negative Rate (FNR). The ArcGRNN can properly detect the bus bay, as the person’s voice data are pre-processed and converted to text before detection. It can be seen that the proposed technique achieved an FPR and FNR of 0.0652% and 0.0529%, respectively, while the prevailing networks achieved an average FPR and FNR of 0.0820% and 0.0811%, respectively, higher than the proposed technique. This demonstrates that the performance of the proposed technique in terms of bus bay detection was improved, when compared to traditional approaches.

In Table 5, the performance of the proposed method for bus bay detection is detailed, regarding the obtained Positive Predictive Value (PPV) and Negative Predictive Value (NPV). As the speech data are preprocessed for noise removal and converted to text prior to data training, the learning efficiency of the model is improved. Further, tackling the gradient vanishing issue through the use of the Arctan Gradient activation function aids in improving the detection performance, leading to a PPV of 96.74% and an NPV of 95.83%. For comparison, the existing RNN and LSTM attained PPVs of 94.20% and 93.41%, respectively. Furthermore, the existing DNN and ANN attained NPVs of 91.37% and 89.36%, respectively, which are lower than that of the proposed technique. Thus, the detection performance of the proposed method is better than those of the existing methods.

Figure 6 illustrates the performance of the proposed ArcGRNN, in terms of training time, in comparison to the traditional networks. It can be seen that the proposed model consumed a shorter time (of 43,148 ms) for training on the data, due to suppression of the overfitting problem through the use of the Arctan gradient activation function. Meanwhile, the RNN, LSTM, DNN, and ANN models had longer training times of 50,159 ms, 56,921 ms, 62,086 ms, and 67,935 ms, respectively. This is because these existing methods fail to focus on the overfitting or gradient vanishing problem during the backpropagation of data among the neuron layers. Hence, it was also verified that the proposed method is more efficient than the existing techniques.

Figure 7 analyzes the proposed SSCEAD framework’s performance by weighing its similarity score against those of the models used in the comparison. It was found that the proposed technique attained a similarity score of 0.98973, which is higher than that of the prevailing text detection models. In particular, the Encoder–Decoder Network (EDN), General Adversarial Network (GAN), Hierarchical Attention Network (HAN), and CNN achieved similarity scores of 0.93565, 0.90721, 0.83862, and 0.77638, respectively, all lower than that attained by the proposed model. The learning capability of the proposed model was enhanced due to the use of SSC. Thus, the performance of SSCEAD in text detection provides an improvement over the other approaches.

With respect to detection time, the performance of the proposed technique regarding text detection was also evaluated. Table 6 shows that the proposed SSCEAD technique took 160,738 ms to detect text, while the existing techniques took an average of 270,851 ms, longer than that of the proposed approach. The usage of the ArcGRNN model in text decoding helps in the rapid detection of text from images. Hence, when compared to the state-of-the-art approaches, the proposed technique could detect text in less time.

Figure 8 analyzes the presented SSCEAD technique for text identification regarding its accuracy, True Positive Rate (TPR), and True Negative Rate (TNR). The proposed model detected text with an accuracy of 97.19%, TPR of 96.55%, and TNR of 97.61%, which are all higher than the metrics obtained with the existing approaches. Among the prevailing techniques, the CNN displayed the lowest performance in text detection, with 89.25% accuracy, 86.56% TPR, and 89.02% TNR. The proposed technique efficiently recognizes text as the receptive field is expanded through pixel shuffling in the convolution kernel.

4.3. Comparative Analysis with Related Works

In the framework of Suman et al. [23], who developed a stick using an RNN for VIPs, the obstacles on the path are analyzed for independent navigation. Utilizing this system, obstacles were detected with 91.6% accuracy; however, safer navigation after detecting obstacles was not considered. In this line, Bai et al. [25] presented a wearable travel aid device. To improve the perception of VIPs, this device was made using a CNN to be deployed on eyeglasses and smartphones. However, this device was not sufficient to enable travel from one place to another using a transport system. Hence, based on an improved YOLOv5 network, a bus detection model was proposed by Arifando et al. [22]. This model efficiently assists VIPs in detecting buses with 93.9% precision. Although the bus could be detected by the VIP, it only became efficient once they recognized the bus route name and number. Hence, Sujata Dash Subhendu Kumar Pani [18] detected the bus with the YOLOv3 network and segmented the bus board utilizing a transfer learning technique. Through the LSTM-based Tesseract tool, the route details were converted to text. This technique can detect buses with 90% accuracy and segment the bus board with 80% accuracy. However, to properly guide the VIP, recognizing the route number from the bus is necessary. Using RF and Haar-like filter-based approaches, the bus board area was identified. Next, the route number was recognized through the use of a pattern-matching approach. Tan et al. [16] assessed the bus route number with a detection rate of 56%. Nevertheless, none of these previous works have concentrated on the selection of an appropriate bus bay among the queue of buses in the bus station. Hence, the proposed method selects the optimal bus bay from the bus line utilizing multiple data sources. When identifying the relevant bus, the optimal bay could be selected within 247,891 ms. Moreover, the bus route number was shown to be segmented with 97.19% accuracy. This signifies that the proposed transportation system for VIPs outperforms the state-of-the-art techniques in this area.

Comparative Analysis with Similar Works Based on the MS-COCO Dataset

Next, the performance of the proposed system was assessed on the MS-COCO dataset, comparing it with related existing methods.

Table 7 shows a comparative analysis of the proposed system with respect to accuracy, precision, and f-measure. The proposed model exhibited a higher accuracy (96.93%), precision (95.87%), and f-measure (95.36%) for bus bay detection. Meanwhile, the existing ResNet 50 attained 61.50% precision, YOLO version 5 attained 94% f-measure, Neural Architecture Search (NAS) attained 86.30% precision, and Efficient Featurized Image Pyramid Network (EPIFNet) attained 31.60% precision. Further, the ANN attained 83% accuracy and 80% f-measure, which are both lower than those of the proposed system. The superior performance of the proposed method is due to its utilization of the Arctan Gradient Activation Function (Arc-GAF), which improves its learning ability through expanding smaller input data and converging against larger values. In this way, the vanishing or exploding of gradients during the backpropagation is suppressed. Therefore, the performance of the proposed network was found to be enhanced, when compared to the related object detection techniques.

4.4. Practical Applicability of the Proposed System

In practice, the voice data of the VIP are collected through a voice assistance module that has an inbuilt microphone. Then, the collected data can be pre-processed using a device that is embedded with the proposed pre-processing approaches. Hence, the redundancy or noise in the voice data is removed, and the resulting clear signal is translated into text format using an installed text converter tool. Further, the retrieved text is sent as a message to the sensor device, which is installed in the wearable aid of the VIP. Then, according to the received text, the GPS location of the VIP and the bay details—namely, the bus bay and optimal bay—are intimated to the VIP. Simultaneously, the sensor captures an image of the destination bus in the bay via GPS. As per the proposed model, the text on the bus is detected, segmented, and extracted, and its similarity with the VIP’s voice data is assessed by the sensor. If the similarity is high, then the sensor transmits a message to the RFID tag, which is fixed in the bus. This informs the bus drivers and conductors about the VIP waiting at the bus station so that they can pick them up accordingly. In the case that the similarity level is very low, the process is iterated from recapturing of the VIP’s voice data. Therefore, the proposed model can be practically applied in the real world to assist VIPs in accurately boarding their desired bus.

4.5. Discussions and Limitations

From the simulated outcomes, the bus bay was exactly detected, and the optimal bay was selected with better performance by the proposed model, when compared to existing methods, as the inclusion of the Arctan gradient activation function in the traditional RNN allowed for effective processing of the input data. Thus, the bus bay was detected with higher accuracy, precision, and recall. Furthermore, introducing the EPAS into the proposed algorithm aids in selecting the optimal bay among the bus lines within a short time. The destination bus image is pre-processed, and the text in it is detected and segmented using the proposed SSCEAD, leading to higher accuracy and lower detection time. The learning efficiency of the CNN was also improved by expanding its receptive field using SSC, enabling more effective text detection. Subsequently, the similarity analysis between the extracted text and the pre-processed input voice aided in the decision to inform the bus coordinators. As various data sources and optimal bus bays were taken in to consideration to develop an effective transportation navigation system, the proposed model achieved better performance than those proposed in prevailing works in the related literature.

However, while applying the proposed framework in real-time, environmental factors can affect the transmission of data and may cause delays. Furthermore, the safety of the VIP after reaching and boarding the bus is not considered in this work. These are considered as limitations of this study. We will attempt to rectify these limitations in future work through the use of advanced deep learning techniques. In this regard, various obstacles—including the height of the bus stop from the ground, objects in the step, and environmental data—need to be analyzed to resolve some of the limitations of this study.

5. Conclusions

This study proposed an effective system to assist in the navigation of VIPs via bus transport, based on MNC-FR and EPBFTOA. The speech data of the VIP are first subjected to voice pre-processing, followed by text conversion. Then, to identify the desired bus and inform the VIP, optimal bay selection, image pre-processing, bus identification, text detection and segmentation, route similarity analysis, and decision-making processes are carried out. The superior performance of the proposed work was examined through comparing it with state-of-the-art techniques. The desired bus was accurately detected by the VIP using the proposed approach, due to selection of the optimal bay line. The obtained outcomes validated that the proposed model presents improved performance when compared to the existing techniques. Using the proposed technique, the optimal bay was selected within 247,891 ms, and the decision rule was generated within 79,932 ms. Moreover, to access the bus, the bus bay was detected with 96.932% accuracy, and text was identified with 97.19% accuracy. Thus, the proposed approach enables the development of an effective guidance system for blind and Visually Impaired People.

Future Scope

Although the bus was detected for the VIP using multiple data sources, we did not concentrate on the person’s safety when boarding the bus. Thus, this work will be extended in the future through determining the obstacles faced by VIPs while boarding the bus.

Author Contributions

All authors contributed equally. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the funding by the Deanship of Graduate Studies and Scientific Research, Jazan University, Saudi Arabia, through Project Number: GSSRD-24.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Choudhary, S.; Bhatia, V.; Ramkumar, K.R. IoT Based Navigation System for Visually Impaired People. In Proceedings of the ICRITO 2020—IEEE 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Direction), Noida, India, 13–14 October 2022; pp. 521–525. [Google Scholar] [CrossRef]
Yadav, D.K.; Mookherji, S.; Gomes, J.; Patil, S. Intelligent Navigation System for the Visually Impaired—A Deep Learning Approach. In Proceedings of the 4th International Conference on Computing Methodologies and Communication, ICCMC, Erode, India, 11–13 March 2020; pp. 652–659. [Google Scholar] [CrossRef]
Kuriakose, B.; Shrestha, R.; Sandnes, F.E. DeepNAVI: A deep learning based smartphone navigation assistant for people with visual impairments. Expert Syst. Appl. 2023, 212, 118720. [Google Scholar] [CrossRef]
Khan, W.; Hussain, A.; Khan, B.M.; Crockett, K. Outdoor mobility aid for people with visual impairment: Obstacle detection and responsive framework for the scene perception during the outdoor mobility of people with visual impairment. Expert Syst. Appl. 2023, 228, 120464. [Google Scholar] [CrossRef]
Poornima, J.; Vishnupriyan, J.; Vijayadhasan, G.K.; Ettappan, M. Voice assisted smart vision stick for visually impaired. Int. J. Control. Autom. 2020, 13, 512–519. [Google Scholar]
Costa, C.; Paiva, S.; Gavalas, D. Multimodal Route Planning for Blind and Visually Impaired People; Lecture Notes in Intelligent Transportation and Infrastructure; Springer Nature: Cham, Switzerland, 2023; pp. 1017–1026. [Google Scholar] [CrossRef]
Ashiq, F.; Asif, M.; Bin Ahmad, M.; Zafar, S.; Masood, K.; Mahmood, T.; Mahmood, M.T.; Lee, I.H. CNN-Based Object Recognition and Tracking System to Assist Visually Impaired People. IEEE Access 2022, 10, 14819–14834. [Google Scholar] [CrossRef]
Yohannes, E.; Lin, P.; Lin, C.Y.; Shih, T.K. Robot Eye: Automatic Object Detection and Recognition Using Deep Attention Network to Assist Blind People. In Proceedings of the 2020 International Conference on Pervasive Artificial Intelligence, ICPAI, Taipei, Taiwan, 3–5 December 2020; pp. 152–157. [Google Scholar] [CrossRef]
Gowda, M.C.P.; Hajare, R.; Pavan, P.S.S. Cognitive IoT System for visually impaired: Machine learning approach. Mater. Today Proc. 2021, 49, 529–535. [Google Scholar] [CrossRef]
Martinez-Cruz, S.; Morales-Hernandez, L.A.; Perez-Soto, G.I.; Benitez-Rangel, J.P.; Camarillo-Gomez, K.A. An Outdoor Navigation Assistance System for Visually Impaired People in Public Transportation. IEEE Access 2021, 9, 130767–130777. [Google Scholar] [CrossRef]
Akanda, M.R.R.; Khandaker, M.M.; Saha, T.; Haque, J.; Majumder, A.; Rakshit, A. Voice-controlled smart assistant and real-time vehicle detection for blind people. Lect. Notes Electr. Eng. 2020, 672, 287–297. [Google Scholar] [CrossRef]
Agarwal, A.; Agarwal, K.; Agrawal, R.; Patra, A.K.; Mishra, A.K.; Nahak, N. Wireless bus identification system for visually impaired person. In Proceedings of the 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology, ODICON, Bhubaneswar, India, 8–9 January 2021; pp. 1–6. [Google Scholar] [CrossRef]
de Andrade, H.G.V.; Borges, D.d.M.; Bernardes, L.H.C.; de Albuquerque, J.L.A.; da Silva-Filho, A.G. BlindMobi: A system for bus identification, based on Bluetooth Low Energy, for people with visual impairment. In Proceedings of the XXXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, Gramado, Brazil, 6–10 May 2019; pp. 391–402. [Google Scholar] [CrossRef]
Ramaswamy, T.; Vaishnavi, M.; Prasanna, S.S.; Archana, T. Bus identification for blind people using rfid. Int. Res. J. Mod. Eng. Technol. Sci. 2022, 4, 3478–3483. [Google Scholar]
Kumar, K.A.; Sreekanth, P.; Reddy, P.R. Bus Identification Device for Blind People using Arduino. CVR J. Sci. Technol. 2019, 16, 48–52. [Google Scholar] [CrossRef]
Tan, J.K.; Hamasaki, Y.; Zhou, Y.; Kazuma, I. A method of identifying a public bus route number employing MY VISION. J. Robot. Netw. Artif. Life 2021, 8, 224–228. [Google Scholar] [CrossRef]
Mueen, A.; Awedh, M.; Zafar, B. Multi-obstacle aware smart navigation system for visually impaired people in fog connected IoT-cloud environment. Health Inform. J. 2022, 28, 14604582221112609. [Google Scholar] [CrossRef] [PubMed]
Dash, S.; Pani, S.K.; Abraham, A.; Liang, Y. Advanced Soft Computing Techniques in Data Science, IoT and Cloud Computing. In Studies in Big Data; Springer International Publishing: Cham, Switzerland, 2021; Volume 89. [Google Scholar] [CrossRef]
Bouteraa, Y. Design and development of a wearable assistive device integrating a fuzzy decision support system for blind and visually impaired people. Micromachines 2021, 12, 1082. [Google Scholar] [CrossRef] [PubMed]
Mukhiddinov, M.; Cho, J. Smart glass system using deep learning for the blind and visually impaired. Electronics 2021, 10, 2756. [Google Scholar] [CrossRef]
Feng, J.; Beheshti, M.; Philipson, M.; Ramsaywack, Y.; Porfiri, M.; Rizzo, J.R. Commute Booster: A Mobile Application for First/Last Mile and Middle Mile Navigation Support for People with Blindness and Low Vision. IEEE J. Transl. Eng. Health Med. 2023, 11, 523–535. [Google Scholar] [CrossRef]
Arifando, R.; Eto, S.; Wada, C. Improved YOLOv5-Based Lightweight Object Detection Algorithm for People with Visual Impairment to Detect Buses. Appl. Sci. 2023, 13, 5802. [Google Scholar] [CrossRef]
Suman, S.; Mishra, S.; Sahoo, K.S.; Nayyar, A. Vision Navigator: A Smart and Intelligent Obstacle Recognition Model for Visually Impaired Users. Mob. Inf. Syst. 2022, 2022, 9715891. [Google Scholar] [CrossRef]
Pundlik, S.; Shivshanker, P.; Traut-Savino, T.; Luo, G. Field evaluation of a mobile app for assisting blind and visually impaired travelers to find bus stops. arXiv 2023, arXiv:2309.10940. [Google Scholar] [CrossRef]
Bai, J.; Liu, Z.; Lin, Y.; Li, Y.; Lian, S.; Liu, D. Wearable travel aid for environment perception and navigation of visually impaired people. Electronics 2019, 8, 697. [Google Scholar] [CrossRef]
Dubey, S.; Olimov, F.; Rafique, M.A.; Jeon, M. Improving small objects detection using transformer. J. Vis. Commun. Image Represent. 2022, 89, 103620. [Google Scholar] [CrossRef]
Ancha, V.K.; Sibai, F.N.; Gonuguntla, V.; Vaddi, R. Utilizing YOLO Models for Real-World Scenarios: Assessing Novel Mixed Defect Detection Dataset in PCBs. IEEE Access 2024, 12, 100983–100990. [Google Scholar] [CrossRef]
Said, Y.; Atri, M.; Albahar, M.A.; Ben Atitallah, A.; Alsariera, Y.A. Obstacle Detection System for Navigation Assistance of Visually Impaired People Based on Deep Learning Techniques. Sensors 2023, 23, 5262. [Google Scholar] [CrossRef] [PubMed]
Quang, T.N.; Lee, S.; Song, B.C. Object detection using improved bi-directional feature pyramid network. Electronics 2021, 10, 746. [Google Scholar] [CrossRef]
Naseer, A.; Almujally, N.A.; Alotaibi, S.S.; Alazeb, A.; Park, J. Efficient Object Segmentation and Recognition Using Multi-Layer Perceptron Networks. Comput. Mater. Contin. 2024, 78, 1381–1398. [Google Scholar] [CrossRef]

Figure 1. Framework of the proposed model.

Figure 2. Architecture of ArcGRNN.

Figure 3. Graphical representation of the proposed decision generation approach.

Figure 4. Comparison of the proposed bus bay detection method with existing methods.

Figure 5. Graphical analysis of FPR and FNR for bus bay detection.

Figure 6. Training time evaluation.

Figure 7. Graphical depiction of similarity scores obtained by text detection approaches.

Figure 8. Performance evaluation of the proposed SSCEAD.

Table 1. Comparative summary of related works.

Related Works	Objective	Methods Used	Advantages	Result	Drawbacks
[16]	Identification of bus route numbers	RF algorithm and pattern matching method	Bus details were accurately captured using the Lucas–Kanade tracker	Achieved a higher detection rate	Features relationships were not recognized by RF, resulting in inaccurate detection.
[17]	Navigation system for the VIP	YOLOv3, Environment-aware Bald Eagle Search, and Actor–Critic algorithm	The person’s query was analyzed, and the obstacles on the path were detected	Higher latency and detection accuracy	Trade-offs occurred when using the Actor–critic algorithm for navigation decisions.
[18]	Wearable device formulation for the safe traveling of VIPs	YOLOv3 and transfer learning method	The detected bus board number is communicated to the person in a voice format	Attained more accuracy	Smaller objects were not detected by the anchor box of YOLOv3.
[19]	Navigation device to assist the blind person	Robot operating system and Fuzzy logic algorithm	Safer directions were alerted to the person	Obtained lower collision	The Fuzzy rules were developed based on assumptions, thereby degrading the decision efficiency.
[20]	Smart glass system for the independent movement of blind persons at night-time	U2-Net and Tesseract model	Path image was expressed in a tactile graph, and the related information was translated into speech format	Detected objects in the path with improved accuracy and precision	Text was not properly analyzed from the low-quality and poorly lit images.
[21]	Mobile application-based navigation assistance for blind persons	CB application and OCR system	The utilization of way-finding footage provided accurate information on the path	The path was identified with higher accuracy, precision, and f1-score	Image data in different formats could not be processed by the OCR.
[22]	Lightweight bus detection network for VIP	Improved YOLO with slim scale detection module	The lightweight approach facilitated real-time detection of the bus	The bus was detected with high accuracy and precision	YOLO involved fewer parameters, thus resulting in sub-optimal detection.
[23]	Vision Navigator framework for the blind and VIP	RNN and single-shot mechanism	Stick and sensor-equipped shoes were utilized for identifying obstacles on the path	Achieved a high accuracy rate	Complex data patterns were not learned by the RNN due to gradient vanishing.
[24]	Mobile application-centered bus stop identification system for VIPs	Neural network-based AAA	The bus stop signs were processed for recognizing bus stops	Distance or deviation between actual location and identified bus stop location was lower	The application required a large number of labeled data and high-quality images for detection.
[25]	Wearable device assistance for the navigation of VIPs	CNN	The device was developed with an in-built camera for better detection of objects	Depicted higher safety and efficiency	Sequential data were not effectively learned by the CNN, which restricted its real-time object detection performance.

Table 2. Image outcomes of the proposed technique.

Input Speech Signal	.mp3 Audio File
Noise-removed signal
Speech-to-Text conversion
Input image
Noise-removed image
Contrast-enhanced image
Identified Bus
Text identification
Text extraction

Table 3. Performance analysis of the proposed optimal bay selection method.

Methods	Optimal Bay Selection Time (ms)	Average Fitness
Proposed EPBFTOA	247,891	9.068
BFOA	389,124	19.8787
MRFO	596,757	42.0519
AVOA	804,628	67.5498
BESO	997,245	79.8054

Table 4. Performance of bus bay detection.

Techniques	Specificity (%)	Sensitivity (%)	Processing Time (ms)
Proposed ArcGRNN	95.714	96.328	174,405
RNN	94.285	95.019	407,395
LSTM	93.09	93.652	428,012
DNN	91.788	92.301	562,513
ANN	90.394	90.595	623,048

Table 5. Bus bay detection analysis.

Methods	PPV (%)	NPV (%)
Proposed ArcGRNN	96.74	95.83
RNN	94.20	94.12
LSTM	93.41	92.35
DNN	92.08	91.37
ANN	90.16	89.36

Table 6. Performance comparison of text detection.

Techniques	Detection Time (ms)
Proposed SSCEAD	160,738
EDN	286,059
GAN	215,191
HAN	260,425
CNN	321,728

Table 7. Performance comparison on the MS-COCO dataset.

Authors	Technique Used	Precision (%)	Accuracy (%)	F-Measure
Proposed	ArcGRNN	95.87	96.93	95.36
[26]	ResNet 50	61.5	-	-
[27]	YOLO version 5	-	93.00	94.00
[28]	NAS	86.30	-	-
[29]	EFIPNet	31.60	-	-
[30]	ANN	-	83.00	80.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Perumal, U.; Jeribi, F.; Alhameed, M.H. An Enhanced Transportation System for People of Determination. Sensors 2024, 24, 6411. https://doi.org/10.3390/s24196411

AMA Style

Perumal U, Jeribi F, Alhameed MH. An Enhanced Transportation System for People of Determination. Sensors. 2024; 24(19):6411. https://doi.org/10.3390/s24196411

Chicago/Turabian Style

Perumal, Uma, Fathe Jeribi, and Mohammed Hameed Alhameed. 2024. "An Enhanced Transportation System for People of Determination" Sensors 24, no. 19: 6411. https://doi.org/10.3390/s24196411

APA Style

Perumal, U., Jeribi, F., & Alhameed, M. H. (2024). An Enhanced Transportation System for People of Determination. Sensors, 24(19), 6411. https://doi.org/10.3390/s24196411

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Enhanced Transportation System for People of Determination

Abstract

1. Introduction

Problem Statement

2. Literature Survey

3. Proposed Methodology

3.1. Speech Input

3.2. Speech Pre-Processing

3.3. Speech-to-Text Conversion

3.4. Bus Bay Detection

3.5. Optimal Bay Selection

3.6. Image Capturing and Preprocessing

3.6.1. Step 1: Noise Removal

3.6.2. Step 2: Contrast Enhancement

3.7. Bus Detection

3.8. Text Identification and Segmentation

3.9. Text Extraction

3.10. Similarity Check

3.11. Decision-Making

4. Results and Discussions

4.1. Dataset Description

4.2. Performance Assessment

4.3. Comparative Analysis with Related Works

Comparative Analysis with Similar Works Based on the MS-COCO Dataset

4.4. Practical Applicability of the Proposed System

4.5. Discussions and Limitations

5. Conclusions

Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI