Skip to Content
SustainabilitySustainability
  • Article
  • Open Access

13 May 2022

Using Bidirectional Long-Term Memory Neural Network for Trajectory Prediction of Large Inner Wheel Routes

,
and
Department of Computer Science and Information Engineering, Southern Taiwan University of Science and Technology, Tainan 71005, Taiwan
*
Author to whom correspondence should be addressed.

Abstract

When a large car turns at an intersection, it often leads to tragedy because the driver does not pay attention to the incoming car or the dead corner of the line of sight of the car body. On the market, the wheel difference warning system used in large cars generally adds sensors or lenses to confirm whether there are incoming vehicles in the dead corner of the line of sight. However, the accident rate of large vehicles has not been reduced due to the installation of a vision subsidy system. The main reason is that motorcycle and bicycle drivers often neglect to pay attention to the inner wheel difference formed when large vehicles turn, resulting in accidents with large vehicles at intersections. This paper proposes a bidirectional long-term memory neural network for the prediction of the inner wheel path trajectory of large cars, mainly from the perspective of motorcycle riders, through the combination of YOLOv4 and the stacked Bi-LSTM model used in this study to analyze the motion of large cars and predict the inner wheel path trajectory. In this study, the turning trajectory of large vehicles at the intersection is predicted by using an object detection algorithm and cyclic neural network model. Finally, the experiment shows that this study uses the stacked Bi-LSTM trajectory prediction model to predict the next second trajectory with one second trajectory data, and the prediction accuracy is 87.77%; it has an accuracy of 75.75% when predicting the trajectory data of two seconds. In terms of prediction error, the system has a better prediction error than LSTM and Bi-LSTM models.

1. Introduction

Traffic accidents worldwide cause more than 1.35 million deaths and 20–50 million serious injuries every year [1]. Therefore, many studies are being carried out on how to prevent traffic accidents and reduce damage. This kind of research is mainly to deduce the accident-prone sections by analyzing the historical data based on the road, to minimize the loss after the accident [2,3,4]. With the progress of traffic monitoring and image recognition, vehicle trajectory data is gradually used in recent research. Trajectory data includes vehicle dynamic information, such as vehicle position, speed, acceleration, etc. [5,6]. According to these data, various alternative safety measures based on traffic accidents are analyzed and predicted to reduce the occurrence and damage of traffic accidents.
In Taiwan, mass transportation has brought many mobile conveniences to our lives. All counties and cities have convenient mass transportation. In addition to fast moving modes such as MRT, trains, etc., the mass transportation in the city is mainly bus. The bus set up in each county and city is very convenient and has the characteristics of short time intervals and clear purpose, allowing us to choose a way of moving.
Not every city has a well-developed mass transportation network. Most people in other regions still use cars and locomotives as the main vehicle, which also leads to Taiwan’s top car density in the world. Combined with factors such as small roads in Taiwan, route planning, serious mixed traffic flow, passers-by habits and other factors, resulting in traffic accidents more serious than advanced countries, large cars at some junctions or traffic lights start in large numbers. The traffic accident causes a lot of regret.
Cars are roughly classified by use for buses, vans and special vehicles. The classification is divided into large and small cars. Small trucks, trucks, minibuses and buses are usually built according to the use of different forms, such as box trucks carrying solid bulk cargo, tank trucks carrying liquid class, etc.; different forms of vehicle body cause vision dead angle, roughly the height of the driver’s eye and the body structure of the car. Despite the large car body height, the driver’s eye position can see far, other than its front view dead angle and rear vision. The range of dead corners is much larger than small cars. While the installation of auxiliary devices such as mirrors, reversing radar, monitors and sensors helps improve the driver’s field of view, there is still a range of reflections or no shots.
The road killer of a large car internal wheel difference creates visual dead horns, claiming many lives every year. Figure 1 shows the number of large car accidents in the past five years. Large trucks are about twice the number of buses. Since many industries rely on large cargo vehicles, if sufficient safety warning devices are available, they will effectively reduce the accidents that occur. As a result, the Ministry of Transport has particularly promoted the law amendment, and in 2020, trucks and buses will need to install a driving vision assistance system and include regular inspection projects. However, there are still many driving reactions, and eyes on the screen of the auxiliary system will be distracted.
Figure 1. Statistics on the number of large car traffic accidents [7].
The market is widely used in the large vehicle wheel difference warning system to install sensors or lenses in large car bodies to confirm if there is a dead end of sight. However, the accident rate of large cars has not been reduced because of the installation of the field of view subsidy system, and locomotive and bicycle drivers still do not pay attention to the driving safety distance, and negligently pay attention to the inner wheel difference when the large car turns, causing accidents with large cars at the intersection.
In recent years, many trajectory prediction models based on the deep neural network (DNN) have been proposed [8,9,10]; the recurrent neural network (RNN), represented by long short-term memory (LSTM), has natural advantages in processing time series, which promotes the research on vehicle trajectory prediction [11,12]. Car trajectory prediction mainly uses the trajectory data of different vehicles to predict the next position or continuous trajectory to which the vehicle will move. Most of the existing studies collect and predict the trajectory based on aerial photos or camera images, but less so from the perspective of motorcycles or cyclists.
In today’s camera imaging resolution improvement, image recognition, data analysis and various intelligent technologies are developing rapidly, exploring the foresight and practicality of each technology. Through intelligent image recognition and deep learning, our main purpose is to detect large cars by using the system on the locomotive, from the perspective of the rider of the locomotive, through combining deep learning and machine vision. The action and analysis of movement trajectory during turn analyzes the inner wheel trajectory of a large car turn and predicts its future trajectory, alerting the locomotive rider to dangerous areas and reducing the chance of accidents with large cars.
This paper predicts the trajectory of the inner wheel during the turning of large cars using deep learning and the first training of image recognition models and cyclic neural network models from the perspective of locomotive or cyclists at the intersection. Using AI learning and image recognition technology, neural network models can analyze known and unknown objects. Using today’s proven algorithms to quickly dig and analyze the information and characteristics of various objects, so that computer devices can be like humans, and analyzing and predicting objects by object texture, size and direction of motion, demonstrates their ability to learn and manipulate even unknown things.
This paper focuses on intelligent image recognition, the circulating neural network and the trajectory prediction system, analyzing the turning trajectory of large cars and predicting their future position and movement. The so-called “intelligent image recognition”, in the early stages of development, means converting the reality information observed by humans into digital information that computers can understand, and allows the machine to continue to learn to have the same ability to recognize as human beings. Ultimately, it can be applied to more intelligent computing and recognition systems, and eventually to more intelligent computing and recognition systems. While a “circulating neural network” improves the traditional neural network and can only look at the current information problem alone, the information in the network cycle, as well as having memory, according to past memory and experience, can make predictions and judgments. The final “trajectory prediction” is to determine the direction and trajectory of the object in the future by giving the target object’s movement trajectory data over a period of time, by analyzing the information in the data.
This paper is divided into five sections. The first section presents the background, motivation and purpose of this paper. The second section is a study on the literature, including the development of image detection and trajectory prediction, which summarizes the knowledge of relevant research, and serves as the basis for the theoretical development of this paper. The third section is the system architecture of this paper. This paper provides an overview of the system architecture in a schematic diagram, followed by a detailed description of the image detection and circulating neural networks used by the system. The fourth section introduces the simulation environment, deep learning training process, and then analyzes and compares the experimental data results. This paper is summarized in Section 5, which includes its contributions, recommendations and improvements in future research directions.

3. System Architecture

Because large cars and locomotive knights in the inner wheel difference accident is often at the turn of the junction, through the installation of video equipment at the intersection, for the perspective of locomotive and bicycle image collection, and with deep learning machine vision to detect the movement of large vehicles, it is important to predict the movement trajectory of inner wheels when turning the large car in the intersection. This paper expects to predict the movement trajectory of large cars in the next few seconds by analyzing the intersection images collected.
In the schematic diagram in Figure 5, showing the system architecture, the system can be roughly divided into three module blocks, the data collection module and object detection module, and the trajectory prediction module. The data collection module mainly collects moving images of large vehicles at intersection and provides the object detection module for training and testing and finally passes the results to the trajectory prediction module for training with predicting the trajectory of a large truck at the intersection when turning.
Figure 5. System Architecture Diagram.
In Figure 6, the locomotive rider system uses the situation in the locomotive or driving helmet set up tachograph and the data collection. Data collection was conducted mainly through the perspective of the locomotive knight to set up a drive recorder or set up video equipment at the intersection of the same height, in order to collect the moving images of large cars at the intersection; the images will be taken according to the truck time segmented, in addition to collecting a variety of large car images from the web from the web crawlers. The image data is provided to the object detection module for model setup and training.
Figure 6. Motorcycle knight system usage scenes.
During object detection through the use of Darknet framework for YOLOv4 object detection model training, it is mainly adapted for training images, training different numbers of images in batches, and finally, for testing the model. The network detects whether the accuracy allowed by the system is reached. trajectory prediction contains a bi-LSTM bidirectional circulating neural network model built using TensorFlow, which mainly modulates the detected object movement trajectory data and will adjust trajectory data that is randomly placed into the model for training to predict the movement trajectory when a truck turns at an intersection.

3.1. Data Collection

The images that can be collected at the traditional intersection are usually taken by the intersection or store monitor. It is difficult to collect images from the viewpoint of the road by the locomotive knight or pedestrians. Video data at the turn is to be erected at the same height as the locomotive knight viewing angle at the side of the road to observe and collect actual trajectory data.
The data collection process is shown in Figure 7, as a large car turning trajectory data collection process diagram, first in the intersection of large cars through the intersection of mobile phone shooting, in order to collect intersection images. The video taken at the intersection is segmented according to the time of the arrival of the truck, with the segmented image placed into the object detection model of “object detection” and “object tracking”, and the corresponding pixel is coordinated. According to the time record, the inner wheel movement trajectory of large cars at the intersection is obtained for training and prediction of trajectory prediction models. At the same time, the collected large car images are labeled, and the large car figures collected by web crawlers are put into the object detection model for training.
Figure 7. Schematic diagram of the data collection process of large car turning trajectory.
  • Object detection
For detecting objects in the captured image, we apply YOLO for vehicle detection and YOLO detection results, e.g., object name, image coordinates, detection confidence index, and create a new object list and record the information for each frame.
  • Object tracking
YOLO and other object detection algorithms are all operations on each frame captured by the camera. If the objects detected are counted directly on each frame, the same objects will be counted repeatedly; in order to avoid this happening and to use object tracking technology to distinguish the information of each frame, compare the x, y pixel coordinates of the center of the bounding box before and after the frame, and the coordinates of the next frame to the previous frame coordinate the European distance operation. The shortest European distance can be considered as the same object.
On the road, unless the same object leaves the detection range or is obscured by other scenes, there is basically no sudden disappearance of the object. Thus, in the comparison between the front and back frames, the coordinate distance between the image of the object is detected and the previous frame is not too far apart. This paper uses the storage front frame data to compare with the rear frame data to see whether the same object is still in the detection range. First, using the YOLO object detection model to detect each frame of the captured image, store the object information in a list form, and continue to receive the next frame content, comparing the bounding box center of the two frames before and after. The last frame list is used as a benchmark to find the object that is the shortest distance from the current frame, so that the comparison results are the same object.
Such as the Equation (13), for calculating two points in the list (x1, y1), (x2, y2), …, (xt, yt) to coordinate the difference, this paper uses Euclid distance (also known as European distance) to calculate the straight distance of the shortest line between two points. Where d is the Eujid distance of the current frame (xt, yt) from the previous frame ( x t 1 , y t 1 ) of the coordinate point, find out the center of each bounding box appearing in the image, which is the point with the shortest distance from the center of the previous bounding box, in order to set a threshold in several calculations. If the calculated distance is greater than 30 pixels, it means that this result is a newly detected object: A new list is created for this object to store the next frame. For recording data and comparisons, if the pixel quality is less than 30, the test result is the same object, and the results are added to the established list. If the target disappears and the list is not updated, the data of the objects recorded in the list is exported and the list is deleted.
d = x t x t 1 2 + y t y t 1 2

3.2. Object Detection Model

There are many neural network models for object detection, but the system in this paper needs to balance the accuracy and speed of detection execution; therefore, it is necessary to choose both to reach that are within an acceptable range. After comparing the results of various object detection neural network models with the use of FPS and YOLO, an object detection model with a high average accuracy will be selected.
In machine learning, one will need to collect different training materials as much as possible in order to avoid overfitting the model. This issue causes the model to be more accurate for the checked data detection and vice versa. In addition to this, consider the resolution and size of the image: the higher the resolution, the more information one can take from the image. This paper plans to search for different types of objects, shooting angles, ambient light and environmental complexity from the network to increase the awareness of neural network models.
In addition, in order to simulate the actual scene, a mobile phone is set up at the intersection where large vehicles are frequent. The size of the image needs to be selected, and should not be too small or too large. This paper plans to re-scale the collected images to 1280 × 720 and add images of about the same size on the web to be used as training materials for the model. Figure 8 shows a schematic of the types of training images collected, which include trucks, buses, tractors, semi-linked vehicles and semi-linked vehicles, and are combined with the bottom semi-trailer.
Figure 8. Schematic of the type of training image.
Next, one needs to determine the type and scope of the object classification in the image data. Use LabelImg [34] labeling tool LabelImg [34] to perform manual labeling. Figure 9 shows the training image analysis diagram. It is important to pay attention to the selection of categories in addition to large cars needed to label their front and wheels. The main purpose is to grab the range of contact with the pavement when turning large cars, which need to grab the image of the movement of large car tires and avoid detection with other objects prone to error, for example: other types of tires. In addition, the information generated by the labeling process has a great impact on the accuracy of the training model detection. Thus, one needs to note that the process needs to be marked with a human eye view.
Figure 9. Schematic analysis of the training image.
We label the image 1, 2 and 3 for three objects; after the image labeling, we produce the corresponding TXT file, which is used to store the content of the labeling image. Its content contains the label category name of x, y, w, h, which is composed of four values, where x, y represents the ratio of the center coordinate of the label image to the width and height of the image. w, h represents the ratio of the width and height of the label image to the input image width and height. If the image is too blurred, causing the object to lose its features, the object is shaved.
Before starting the training model, one needs to configure the environment parameters for Darknet to construct the neural network model, as shown in the Figure 10 training file setup diagram, including the number of categories, image input size, the maximum minimum scale of the bounding box produced by anchor, the number of batch trainings and the number of rounds of training. Whereas, the amount of batch training is determined by the size of the memory space of the GPU and affects the overall convergence rate of the loss Value to RTX 1080, one sets the batch number of 24 images to be based on 8 G.
Figure 10. Schematic of training profile setup.
In order to observe the stability of the neural network model, the number of training steps is set to 15,000 rounds, and the prepared images will be randomly selected at a ratio of 2:8, with 20% of the data as test images and 80% of data as training images. After training the model, the converted data will be automatically poured into the model according to the parameters set. After training, the test image is cross-validated, and the result is displayed on the terminal to view the loss of the change of function.
Figure 11 shows Darknet using a folder structure diagram. Although Darknet can directly and quickly build neural network models, the data structure is too complex and can cause it to be difficult maintain and build the model, which creates three directories under the Object Detection directory, and places the train_data and val_data directories to be used for training and testing image data, corresponding to the txt file, which records the coordinates of the category and bndBox indicated after the image is labeled.
Figure 11. Schematic diagram of Darknet using a folder structure.

3.3. Trajectory Prediction Models

In general, our forecasts of time series data require data in the past, and future data are essential, resulting in a significant impact on the accuracy of the forecast. In addition, RNN is prone to gradient disappearance or explosion problems during training. LSTM, which is used to solve gradient explosion and loss of gradient during long sequence training, is the prediction. However, in some problems, the output of the current moment is not only related to the previous state, but also to the future state. After comparing multiple models, the higher accuracy of double Bi-LSTM makes predictions.
Here, we explain how to deal with the captured intersection image file, and import the file into the object detection model, as shown in Figure 12, which is the data collection diagram of large vehicle movement trajectory data, shown through the object detection model to detect large vehicles to obtain the position of the front and rear wheels in the image. The red dot represents the movement of the large front wheel; the rear wheel is represented by the blue dot.
Figure 12. Roadmap of moving trajectory data collection of large vehicles at intersection.
Note that some large vehicles do not have only one set of rear wheels; thus, the object tracking method will determine whether the rear wheel is the last set of wheels of a large car, in this way obtaining the large car at the intersection When turning, the actual moving trajectory data of the front and rear wheels contact with the road, and set the sampling frequency of data to 10 Hz, and 10 trajectory coordinates per detected target every 1 s. Some of the data are found to have incorrect track coordinates; thus, the wrong coordinate data will be preprocessed.
Figure 13 shows the trajectory data of a large car in the intersection movement data. Each color represents the trajectory data of a group of large cars at the intersection, which is visualized by the data. The map can observe each car at the same intersection of the movement of different trajectories, and also finds some wrong trajectories in the figure.
Figure 13. Visualization of moving trajectory data of large vehicle intersection.
Figure 14 shows a diagram of the data preprocessor. In order to find the wrong trajectory, the pixel coordinate F(xt, yt) of the front wheel and the rear wheel pixel coordinates W(xt, yt) in the original data, one divides and calculates the vectors between the coordinates separately to find significant outliers, and finally removes the outliers and makes up the average of the two points as the value of that coordinate.
Figure 14. Schematic diagram of the data preprocessor.
The method proposed in this paper is to stack two layers of Bi-LSTM with the Stacked Bi-LSTM structure as a hidden layer. Figure 15 shows a single layer Bi-LSTM architecture diagram; the single layer Bi-LSTM consists of two layers of the green block LSTM structure, with one layer for backward training tasks and the other for reverse training tasks. In addition, the layer uses 240 neurons, the timing information is input from left to right, X1, X2, …, Xt represents the input data for each time sequence, h1, h2, …, ht for each time sequence the output of a layer of Bi-LSTM and t represents the input time.
Figure 15. Schematic diagram of single-layer Bi-LSTM architecture.
Figure 16 shows a schematic diagram of the Stacked Bi-LSTM architecture. After the input data passes through the first layer of Bi-LSTM, the output of each time series will be used as the second layer Bi-LSTM input data, and it is output in the same way as the second layer, which belongs to the architecture of Many-to-Many in LSTM. The reason for this is because the purpose of this paper is to predict the movement trajectory in the next few seconds, not just where it occurs after a few seconds.
Figure 16. Stacked Bi-LSTM architectures.
In order to improve the learning speed of the network model and avoid the overfitting problem of neural network during training, before putting the data into the model training, the input data will be normalized in advance, and the value will be output between 0 and 1. So, this improves the comparability of the training data. We also join the dropout to avoid the overfitting of neural networks. The dropout will randomly delete neurons, which can reduce the dependence between neurons; thus, the model will not rely too much on a neuron, by reducing the network complexity and effectively prevent overfitting.
In this paper, the Stacked Bi-LSTM neural network is used; one mainly adjusts the parameters of the neural network training through the rules of thumb. The excitation function of the model as ReLU is set based on the rule of thumb, and the optimizer uses Adam. Its advantage is that after bias corrections, the learning rate of each iteration has a certain range, making the parameters more stable. We observe the change of the loss value and adjust the parameters of the model, and only adjust the parameters of one neural network at a time.
The experiment in this paper was trained on a trajectory prediction model with GPU using the Keras suite on TensorFlow. The trajectory prediction model training parameters take the display card GTX 1080 8 G as an example. The optimizer used is Adam and loss function is mse. The Adam optimizer updates weights in the neural training model based on the loss value and updates the weights in the neural training model based on the loss value. In addition, in each layer of the Bi-LSTM the dropout layer is added, to a dropout value of 0.5, in order to randomly remove several neurons for training to prevent neural network overfitting, and finally to add flattened and fully connected layers (Dense).
In the input layer section, one needs to set a fixed time series length (timestep), which represents the fixed length of each time the data is dropped, as well as the input data dimension. In the training set, this paper takes the smallest number of data pens in all turn trajectories as the timestep length, assuming that there are 50 large car trajectory data in the training set; therefore, including the smallest number of data pen, the number of trajectories is 30, and the training set input timestep is 30.
This setting ensures that most of the data can be trained, and that input data within the timestep length will be trajectory data of the same vehicle, in accordance with the nature of the predicted timing information, because of this type of method. The data in the training set need to be cut or complementary in the timestep length. For example, the timestep is set to 30, then, each trajectory data will be based on every 30 trajectory coordinates. Cutting, which abandons the rest according to its set threshold, such as: a trajectory data with 95 trajectory coordinates where the threshold is set to 20, then, it will be cut into 3 segments; the neural network reads a section of information each time in each segment. There are 30 areas of information, while the remaining 95 – 30 × 3 = 5 information is discarded.

4. Results

The main purpose of this study is to identify and predict the inner wheel trajectories of the front and rear wheels of large vehicles turning at an intersection from the perspective of motorcycles and cyclists. Therefore, when collecting data, we need to find the intersection where large vehicles often drive, and take photos without affecting road safety. It takes a lot of time to find the shooting location, take images and sort out image data.
In this study, a total of 33 h of images were collected at the intersection. The images of large cars turning at the intersection were taken out in sections, and the parts in which the wheels in the large car were not obscured by other objects, and with clear images, were selected as the training data of object detection. Finally, the trained object detection model is used to detect the moving track of the inner wheel of the large car in the image, which is used as the track data of the track prediction model. A total of 374 groups of the large car trajectory data were screened and put into the trajectory prediction model for training in the proportion of 8:1:1.
This paper plans to use an object detection model to detect large vehicle images and their front and rear wheel positions on images in order to apply the results to predict the movement trajectory of large cars at the intersections. Because of the differences of different neural network models, this paper will be the same as the training materials, respectively, in YOLOv4, YOLOv3 and the MobileNetv2 SSD object detection model, with residuals of the Network Resnet152 serving as a faster R-CNN of the basic network model and comparing its training results.
The experimental training parameters, such as in the Table 3 neural network model setting table, demonstrate that the above four methods will be trained with the same information. In the training materials, videos taken at the intersection were sampled at 10 Hz, and 3389 images were already marked as training sets, with 847 images taken from the highlighted data as the test set for the model; the training results are described below.
Table 3. Neural network model setup.
This experiment is divided into three categories, with the whole truck called “truck”, the front of the truck called “front” and the wheel of the truck called “wheel”. For the object, the accuracy (Average Precision, AP) and mean accuracy (mAP) predicted by each category are recorded separately when the model is trained. The accuracy prediction is greater than the 0.5 intersection over union (IOU), for example. Figure 17 illustrates the IOU with the category “front” as an example.
Figure 17. IOU interpretation schematic.
In order to find suitable object inspection models, this paper builds four object detection models and is trained under the same conditions, such as the Table 4 object detection model comparison table. YOLOv4 is an object detection model.
Table 4. Comparison of training results of object detection models.
The training process is shown in the YOLOv4 training curve in Figure 18, which demonstrates the early stages of the training to drop quickly, due to a higher learning rate at the beginning and after a period of training. The Learning rate is gradually reduced to avoid oscillation. When the training steps close to 1500, the loss drops below six to start calculating the mAP value. The mAP value curve continues to grow and is inversely proportional to the loss curve, indicating that training is good, with no overfit occurring, and the model begins gently when the training steps approach 4500 steps, approximately when the 12,000 step start model is nearing saturation and the overall model training time is about 7.5 h.
Figure 18. Training curve of YOLOv4.
In the training results shown in the comparison table of the object detection model in Table 4, the categories are compared and the growth bar plots are drawn. From Figure 19 it can be found that YOLOv4 accuracy in three categories is higher than other object detection models. In addition, Figure 20 compares the average accuracy (mAP) in each model with the frames per second (FPS), and the YOLOv4 average accuracy and FPS are also higher than other models.
Figure 19. The accuracy of object detection of various categories is relatively long.
Figure 20. Object detection average accuracy and picture update rate comparison.
As shown in Figure 21, Figure 22 and Figure 23, the object detection neural network model training is completed, the random capture of 74 images from the untrained images will be executed in the model and the green box represents the category truck, with blue for the category front and red for the category wheel. In general, the objects are clear or less obscure, can be easily detected in the image and all categories are the same. However, in the lack of light with too much masking, the objects are too small on the image or objects. Too many similar colors can result in incorrect detection.
Figure 21. Test image (1~25).
Figure 22. Test image (26~74).
Figure 23. Training curve of Bi-LSTM 10-10.
As in Figure 21, image number sheet 16 to 25, the wheel error detection condition occurs, perhaps after the image compression is not detected in the model features of the wheel; as in Figure 22, the image number sheet 31 to 33 are not marked in the front due to the detection of features similar to the wheel; however, this object is not detected. The car body is detected for several reasons. The image features are not detected by the model because of the size of the object on the image, the color of the object being too similar, and it being obscured by other objects.

Training Results of Cyclic Neural Network Prediction Model

This paper uses a cyclic neural network prediction model to predict the movement trajectory of large vehicles at the intersection. The trajectory data of the trajectory data is to be detected using the object detection model trained in this study. Data analysis and preprocessing delete it or fill the wrong track. This paper compares the LSTM model, Bi-LSTM and Stacked Bi-LSTM model of the architecture of the stacked cyclic neural network architecture, and it will be compared with different input dimensions. The number of layers, different predicted time lengths and different preprocessing methods of data are trained.
This study’s training parameters, such as the cyclic neural network model setting table in Table 5, compare the above two models in four different modes, and then compare them with different preprocessing methods. In terms of training materials, the video taken at the intersection was sampled at 10 Hz, the sample image was sent to the object detection model to capture the trajectory of a large car at the junction, and 312 strokes were taken from the trajectory data as a result of the trajectory data. The test set of training and the results of the training are described below.
Table 5. Cyclic neural network model setup table.
We set the number of trajectory prediction pens to predict the movement trajectory of the next second (10-10) in one second and two seconds in one second (10-20). The last trained model accuracy (accuracy) is recorded when the trajectory prediction models are trained. In this paper, three different architectures of LSTM models are trained in two different conditions and the results are compared, as shown in the LSTM model comparison table in Table 6.
Table 6. Comparison of training results of LSTM models.
The trajectory prediction model training process is shown in Figure 23 and Figure 24. When training steps are about 2000 steps, the accuracy curve continues to grow and is inversely proportional to the loss curve, indicating that the training is good, no overfitting occurs, and the model begins to smooth when the training steps are close to 4000 steps. The model nears saturation at about 10,000 steps, the total number of steps is set to 12,000 and the overall model training time is about 13 h. It can be found by two training curves and that when predicted with the same number of data pens, the predictive precision of the model is reduced when the prediction time series is extended.
Figure 24. Training curve of Bi-LSTM 10-20.
From Table 6, one can compare input dimensions in different models with forecast lengths and visualize the table section, as shown in Figure 25. It is obvious that the model accuracy of the Stacked Bi-LSTM is generally higher than other models when the re-input dimension is four dimensions (front and rear wheel coordinates). In addition, different prediction lengths have a certain degree of influence on the model and the trajectory prediction. The model accuracy of 10 trajectories after 10 trajectory prediction can be observed in Figure 25 It is higher and the predictive accuracy of Stacked Bi-LSTM is higher than in the other two models at the same input dimension and predicted length.
Figure 25. Model accuracy comparison of different data input dimensions and predicted length.
In Figure 25, the accuracy of the longitudinal axis of the chart indicates that the three models (stacked LSTM, Bi LSTM and Bi LSTM stacked) are used under the same setting of this study, as the training set accuracy of the training results. We can find the difference in accuracy between the method used in this paper and other different methods.
This paper uses the following three formulas for the paper review model error tool: Equation (14) is the mean absolute deviation (MAD), Equation (15) is the mean squared error (MSE), and Equation (16) is the mean absolute percentage error (MAPE).
MAD = Actual   value t Predicted   values t n
MSE = Actual   value t Predicted   values t 2 n 1
MAPE = Actual   value t Predicted   values t Actual   value t × 100 n
After the trajectory prediction model training is completed, the movement trajectory of 25 groups of large car front or rear wheels that are not put into the model training is the prediction. There is a total of 10-20 ways to predict by recording the average error of each group in order to make a comparison. The comparison results are shown in the Table 7 LSTM model trajectory prediction error comparison table.
Table 7. Trajectory prediction error comparison table for LSTM model.
The trajectory prediction results output the pseudocode in Figure 26, split the test set data according to the predicted length of the trajectory model, split it into input and real trajectory, and put the input trajectory into the trajectory prediction model, Predict, and finally, compare the predicted trajectory to the real track output chart. Figure 27 shows the bi-LSTM trajectory prediction control graph. The next two seconds (20 trajectories) are predicted in one second (10 trajectories). The left is the result of the trajectory prediction, and on the right, the predicted trajectory is on the actual image, with yellow lines representing the previous second input. The red represents the last two seconds of the model’s predicted trajectory, and the green represents the true trajectory of the next two seconds.
Figure 26. Trajectory prediction result output pseudocode (#: indicates a comment).
Figure 27. Bi-LSTM trajectory prediction.
Figure 28 evaluates the prediction model error pseudocode. It calculates and outputs the predicted and true trajectory into the letter, and calculates and outputs the predicted absolute deviation, mean square error and average absolute percentage error, and records the results into the trajectory of the Table 7 LSTM model forecast error comparison table. One can compare the predictive results for each LSTM model to evaluate the predictive accuracy of the model in a mean absolute deviation, mean square error and average absolute percentage error, and the results are plots in long bars as a way to compare.
Figure 28. Evaluating prediction model error pseudocode (#: indicates a comment).
As shown in Figure 29, Figure 30 and Figure 31, the error of Stacked Bi-LSTM in prediction is generally lower than other LSTM models. It was found that the single-layer Bi-LSTM model predicts longer time-series data, the error is higher than the stacked Stacked LSTM model. Based on the above numerical results of the MAD average absolute deviation, MSE mean square error and MAPE average absolute percentage error, the Stacked Bi-LSTM is better at predicting the moving trajectory then the other two LSTM model architectures.
Figure 29. Predictive accuracy comparison of three models using MAD.
Figure 30. Comparison of three models using MSE prediction accuracy.
Figure 31. Comparison of three models using MAPE prediction accuracy.
As shown in Figure 32, the trajectory prediction test image (1~20) of the trajectory prediction model training is completed, and 20 sets of trajectory data from untrained trajectory data will be selected into the model. The Stacked Bi-LSTM model predicts 10–20 trajectory data, and the yellow means the input will first put a second moving trajectory (10 strokes) into the model. The red represents the prediction, and the green represents the true movement trajectory (true) in the track data and presents the data on the real image. The predicted trajectory is basically similar to the real trajectory, but there is an error in some of the predictions.
Figure 32. Trajectory prediction test image (1~20).

5. Conclusions

The contribution of this paper is to predict the inner wheel path trajectory of large cars from the perspective of the locomotive knight through object detection models and cyclic neural network models, and with image recognition through deep learning. The images are collected and analyzed to predict possible hazards by predicting the trajectory of the movement of large wheels.
This paper predicts the turning trajectory of large vehicles at the intersection by using object detection algorithms and cyclic neural network models; we analyzed object detection models such as YOLOv3, YOLOv4, Faster RCNN, MobileNet ver2 SSD and YOLOv4 models, which have a 95% average accuracy compared to other models with high precision. This paper looks for a trajectory prediction model using Stacked Bi-LSTM and proposes an 87.77% prediction accuracy when predicting subsequent trajectories with one-second trajectory data, and a 75.75% accuracy for two-second trajectory data. The Stacked Bi-LSTM model predicted error is also lower in terms of prediction errors versus the other two models.
This paper is made by stacking various basic technologies, and there are many improvements in the research. In future works, the category of object detection and capture will be added, and more vehicle features will be added, so that the vehicle type can still be judged when the vehicle is obscured, improving the accuracy and speed of object detection and object tracking, and using the object detection method to automatically mark and segment the video trajectory data. In terms of trajectory prediction, more large-scale wheel trajectory data are added, and road environmental parameters and weather factors are added to the model. Batch normalization and regularization methods are added to the model to reduce the problem of overfitting when training the model. We hope that through these improvements, the whole system can operate more smoothly. The system is integrated into the motorcycle to make the motorcycle driving perspective predict the wheel trajectory in the large car and warn the driver, so as to reduce the probability of accidents with large cars.

Author Contributions

Conceptualization, G.-J.H.; methodology, Y.-C.H. and G.-J.H.; software, Y.-C.H.; validation, Y.-C.H., G.-J.H. and Z.-X.Y.; investigation, Y.-C.H.; resources, G.-J.H.; writing—original draft preparation, Y.-C.H. and G.-J.H.; writing—review and editing, G.-J.H. and Z.-X.Y.; supervision, G.-J.H.; project administration, G.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported in part by the Ministry of Science and Technology (MOST) of Taiwan under Grant MOST 110-2221-E-218-007 and in part by the Allied Advanced Intelligent Biomedical Research Center, STUST from Higher Education Sprout Project, Ministry of Education, Taiwan.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. Global Status Report on Road Safety 2018: Summary; World Health Organization: Geneva, Switzerland, 2018.
  2. Caliendo, C.; De Guglielmo, M.L.; Guida, M. Comparison and analysis of road tunnel traffic accident frequencies and rates using random-parameter models. J. Transp. Saf. Secur. 2016, 8, 177–195. [Google Scholar] [CrossRef]
  3. Wu, Y.; Abdel-Aty, M.; Lee, J. Crash risk analysis during fog conditions using real-time traffic data. Accid. Anal. Prev. 2018, 114, 4–11. [Google Scholar] [CrossRef] [PubMed]
  4. Kopelias, P.; Papadimitriou, F.; Papandreou, K.; Prevedouros, P. Urban freeway crash analysis: Geometric, operational, and weather effects on crash number and severity. Transp. Res. Rec. 2015, 9, 123–131. [Google Scholar] [CrossRef]
  5. Sharma, K.P.; Poonia, R.C.; Sunda, S. Accurate real-time location map matching algorithm for large scale trajectory data. In Proceedings of the 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 29–31 August 2018; pp. 646–651. [Google Scholar] [CrossRef]
  6. Chen, Q.H.; Huang, H.; Li, Y.; Lee, J.; Long, K.J.; Gu, R.; Zhai, X. Modeling accident risks in different lane-changing behavioral patterns. Anal. Methods Accid. Res. 2021, 30, 100159. [Google Scholar] [CrossRef]
  7. Road Traffic Safety Committee. Road Traffic Accident (within 30 Days)—According to the Type of Driving by the First Party. Available online: https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=220&ym=10501&ymt=10912&kind=21&type=1&funid=b330704&cycle=41&outmode=0&compmode=0&outkind=1&fldlst=111&codspc0=0,2,4,1,&rdm=pqbiczld (accessed on 10 August 2021).
  8. Park, S.H.; Kim, B.; Kang, C.M.; Chung, C.C.; Choi, J.W. Sequence-to-sequence prediction of vehicle trajectory via lstm encoder-decoder architecture. In 2018 IEEE Intelligent Vehicles Symposium (IV); IEEE: Changshu, China, 2018; pp. 1672–1678. [Google Scholar]
  9. Altché, F.; de La Fortelle, A. An lstm network for highway trajectory prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 353–359. [Google Scholar]
  10. Xue, H.; Huynh, D.Q.; Reynolds, M. Bi-Prediction: Pedestrian trajectory prediction based on bidirectional LSTM classification. In Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, NSW, Australia, 29 November–1 December 2017; pp. 1–8. [Google Scholar] [CrossRef]
  11. Xia, C.; Weng, C.Y.; Zhang, Y.; Chen, I.M. Vision-based measurement and prediction of object trajectory for robotic manipulation in dynamic and uncertain scenarios. IEEE Trans. Instrum. Meas. 2020, 69, 8939–8952. [Google Scholar] [CrossRef]
  12. Ip, A.; Irio, L.; Oliveira, R. Vehicle trajectory prediction based on LSTM recurrent neural networks. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 25–28 April 2021; pp. 1–5. [Google Scholar]
  13. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  14. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–21 June 2018; pp. 4510–4520. [Google Scholar]
  15. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  16. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
  17. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:180402767. [Google Scholar]
  18. Lin, T.-Y.; Dollr, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  19. Bochkovskiy, A.; Wang, C.-Y.; Liao, H. YOLOv4: Optimal speed and accuracy of object detection. In Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
  20. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  22. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  23. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2016, arXiv:1512023259905, 21–37. [Google Scholar]
  24. Kitani, K.M.; Ziebart, B.D.; Bagnell, J.A.; Hebert, M. Activity forecasting. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, 7–13 October 2012; pp. 201–214. [Google Scholar]
  25. Finn, C.; Goodfellow, I.; Levine, S. Unsupervised Learning for Physical Interaction through Video Prediction. Adv. Neural Inf. Processing Syst. 2016, 29, pp.64–72. [Google Scholar]
  26. Lotter, W.; Kreiman, G.; Cox, D. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. arXiv 2016, arXiv:1605.08104. [Google Scholar]
  27. Luc, P.; Neverova, N.; Couprie, C.; Verbeek, J.; LeCun, Y. Predicting Deeper into the Future of Semantic Segmentation. arXiv 2017, arXiv:170307684. [Google Scholar]
  28. Huang, S.; Li, X.; Zhang, Z.; He, Z.; Wu, F.; Liu, W.; Tang, J. Deep Learning Driven Visual Path Prediction from a Single Image. IEEE Trans. Image Process. 2016, 25, 5892–5904. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Kim, B.; Kang, C.M.; Kim, J.; Lee, S.H.; Chung, C.C.; Choi, J.W. Probabilistic Vehicle Trajectory Prediction over Occupancy Grid Map via Recurrent Neural Network. arXiv 2017, arXiv:170407049. [Google Scholar]
  30. Xing, Y.; Lv, C.; Cao, D. Personalized Vehicle Trajectory Prediction Based on Joint Time-Series Modeling for Connected Vehicles. IEEE Trans. Veh. Technol. 2020, 69, 1341–1352. [Google Scholar] [CrossRef]
  31. Bandara, K.; Bergmeir, C.; Hewamalage, H. LSTM-MSNet: Leveraging Forecasts on Sets of Related Time Series With Multiple Seasonal Patterns. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1586–1599. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  33. Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. Signal Process. IEEE Trans. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
  34. Tzutalin. Labelimg. GitHub. Available online: https://github.com/tzutalin/labelImg (accessed on 10 August 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.