**3. Methods**

#### *3.1. Trip Chain Building Based on Vehicle License Plate Information Obtained from Video-Imaging Detectors*

In this section, we introduce the original data collected by the video-imaging detectors, and establish the corresponding mathematical model based on the actual road network. Meanwhile, using the Dijkstra algorithm, the missing data is supplemented and the travel chain is divided according to the time-cost matrixes.

#### 3.1.1. Preparations for the Trip Chain Building

The whole urban road network consists of intersections and sections. Based on graph theory, the whole road network can be represented by a binary group composed of nodes and edges, as shown in Equation (1).

$$G = (V, E) \tag{1}$$

In the binary group, *V* denotes the set of intersections and *E* denotes the set of sections. *V* and *E* are expressed by Equations (2) and (3) respectively.

$$V = \{v\_1, \dots, v\_{\dot{\nu}}, \dots, v\_{\dot{\nu}}\} \tag{2}$$

$$E = \left\{  \right\} \quad i, j \in M \tag{3}$$

where *M* is the number of intersections in the road network. In Equation (3), < *vi*, *vj* > denotes that there is a road section between the *i*-th and the *j*-th intersection, i.e., the two intersections are directly connected.

Considering the directions of the sections and the distance between two adjacent intersections, we use the distance and the travelling time to represent the weights of the edge, and the cost matrixes are shown by Equations (4) and (5).

$$D\dot{s} = \begin{bmatrix} d\_{1,1} & \cdots & d\_{1,j} & \cdots & d\_{1,M} \\ \vdots & \vdots & \vdots \\ d\_{i,1} & \cdots & d\_{i,j} & \cdots & d\_{i,M} \\ \vdots & \vdots & \vdots \\ d\_{M,1} & \cdots & d\_{M,j} & \cdots & d\_{M,M} \end{bmatrix} \tag{4}$$

$$Tra = \begin{bmatrix} tr\_{1,1} & \cdots & tr\_{1,j} & \cdots & tr\_{1,M} \\ \vdots & \vdots & \vdots \\ tr\_{i,1} & \cdots & tr\_{i,j} & \cdots & tr\_{i,M} \\ \vdots & \vdots & \vdots \\ tr\_{M,1} & \cdots & tr\_{M,j} & \cdots & tr\_{M,M} \end{bmatrix} \tag{5}$$

For the cost matrixes, *di*,*<sup>j</sup>* and *tri*,*j* denote the distance and travelling time, respectively, from the *i*-th intersection to the *j*-th intersection when they are directly connected as given in Equation (3). If the *i*-th intersection to the *j*-th intersection are not directly connected or *i* = *j*, *di*,*<sup>j</sup>* and *tri*,*j* are assigned ∞. In this paper, the travelling time is calculated by vehicle license plate data collected by video-imaging detectors referring to [22–24].

Electronic video detectors deployed at the intersection can collect the driving states of a passing vehicle, including the vehicle license plate, detecting time, lane number, vehicle type, body color and others. Moreover, each video detector has basic installation information, for example, the position (longitude and latitude) where the device located, the unique ID of the device, direction of the intersection which it detects, correlations about the intersections and lane. When a single vehicle is on a trip, it will be detected by a series of video detectors on the road and a set of driving states will be formed, expressed as:

$$Ts = \{S\_i\}, i = 1, \dots, N \tag{6}$$

where *N* is the number of samples during the whole trip. Each sample in the series is presented by Equation (7).

$$S\_i = (t\_{i\prime} \mu\_{i\prime} \underline{g}\_{i\prime} \underline{v}\_{i\prime} l\_{i\prime} l\_{i\prime} \underline{v}\_{i\prime}^{\prime}) \tag{7}$$

In Equation (7), the meaning of each field is explained as follows:

*ti* is the detection time.

*ui* is the unique ID of the video detector.

*gi* is the position where the video detector is located. It is expressed by the longitude and latitude. *vi* is the intersection where the video detector locates.

*hi* is the approach direction information of the intersection. In this paper, the direction code is numbered clockwise from a certain approach.

*li* is the lane information of the approach. In this paper, the lane code is numbered from inside lane to outside.

*v i* is the downstream intersection of the current lane. It is acquired by the connectivity of adjacent intersections and channelization.

#### 3.1.2. Trip Chain Optimization and Division Based on Vehicle License Plate

In Equation (6), when all the samples are sorted over time by detection data, the series represents a whole trip chain in the sampling time period. In this section, the whole trip chain is firstly optimized and verified. Furthermore, it is divided into sub-trip chains based on the time interval feature of adjacent samples.

In actual applications, some intersections are not installed with video devices or those installed devices may be damaged. Even though the devices work normally, there are still missing detections or errors in detection of the vehicle license plate with a certain probability caused by the poor lighting condition, the performance of license plates recognition algorithm, and other reasons. Hence, the trip chain acquired by the original data of vehicle license plate is not consecutive in general. For some adjacent samples, the two intersections where video devices are located are not directly connected in the road network graph, as shown in Figure 1.

For any two adjacent samples *Si* and *Si*+<sup>1</sup> in *Ts*, when there is an undetected intersection between them, the values in the cost matrix presented in Equation (4) or (5) should be equal to <sup>∞</sup>, that is:

$$d\_{\mathbb{U}\_{\vec{\nu}, \mathcal{V}\_{i+1}}} = \infty \text{ or } tr\_{\mathbb{U}\_{\vec{\nu}}, \mathbb{P}\_{i+1}} = \infty \tag{8}$$

In order to obtain a complete trip chain for further vehicle driving behavior analysis, the data of the undetected intersection should be compensated when there are missing detections between *Si* and *Si*+1. Suppose that the vehicle drives following the shortest path, the Dijkstra algorithm is used to compensate the trip chain where the two intersections *vi* and *vi*+1 are taken as the origin and destination, respectively. In the road network graph, the compensating intersections series is described by Equation (9) and the situation is shown by Figure 2.

$$V^{i,i+1} = \left\{ v\_1^{i,i+1}, \dots, v\_k^{i,i+1}, \dots, v\_{N\_c}^{i,i+1} \right\} \tag{9}$$

where *vi*,*i*+<sup>1</sup> *k* denotes the *k*-th intersection between *vi* and *vi*+1 in the trip chain. *Nc* is the total number of compensating intersections.

**Figure 1.** Presentation of missing data and the topology.

**Figure 2.** Presentation of intersection series after compensation.

After obtaining the compensating intersections, fields such as position, approach direction, lane information and downstream intersection can all be acquired based on the connectivity of adjacent intersections in the road network graph and channelization in actual scenario.

Considering the calculation error and randomness of the vehicle driving features, we take <sup>λ</sup>*i*,*j*•*tri*,*<sup>j</sup>* as the upper limit of travel time from intersection *i* to intersection *j*, where <sup>λ</sup>*i*,*<sup>j</sup>* is the amplification coefficient. If the actual travelling time for a single vehicle is bigger than <sup>λ</sup>*i*,*j*•*tri*,*j*, it is identified that the vehicle stops between the *i*-th and *j*-th intersection, and the trip chain should be cut off at that place. Referring to this principle, the effectiveness of the compensating nodes is judged as follows:

When the actual travelling time is bigger than sum of upper thresholds of the compensating sections between *vi* and *vi*+1, as described by Equation (10):

$$\max\_{\mathbf{x}^{i,i+1}, \mathbf{x}^{i,i+1}\_{k+1}} \sum\_{k=1}^{N\_c} tr\_{\mathbf{z}^{i,i+1}\_k, \mathbf{z}^{i,i+1}\_{k+1}} < t\_{i+1} - t\_i \tag{10}$$

The compensating intersections series presented in Equation (9) is ineffective. Under this condition, *vi* is set as the destination for the former trip chain, and *vi*+1 is set as a new origin for a new trip chain. *Sensors* **2020**, *20*, 1258

> Otherwise,

$$\lambda \lambda\_{i, i+1} \sum\_{k=1}^{N\_c} tr\_{v\_k^{i, i+1}, v\_{k+1}^{i, i+1}} \ge t\_{i+1} - t\_i \tag{11}$$

The compensating intersections series presented in Equation (9) is e ffective and the detection time is calculated by Equation (12):

$$t\_{v\_k^{i,i+1}} = t\_i + \sum\_{j=1}^k \frac{tr\_{v\_j^{i,i+1}}}{N\_c} (t\_{i+1} - t\_i) \tag{12}$$

By the aforementioned operations, all necessary fields for the compensating samples of a trip chain can be acquired. Since the compensation strategy is proposed based on the assumption that the vehicle drives following the shortest path, there may be some departures with the actual trajectory. To confirm that the compensating samples are accurate enough, we further propose a verification and optimization scheme based on the turning state and downstream intersection. After compensation, the new trip chain can be presented by Equation (13).

$$\mathbf{Ts} = \left\{ \mathbf{S}\_1, \dots, \mathbf{S}\_{\mathbf{i}\prime} \mathbf{S}\_1^{i, j+1}, \dots, \mathbf{S}\_{\mathbf{k}}^{i, i+1}, \dots, \mathbf{S}\_{N\_{\mathbf{k}\prime}}^{i, i+1}, \mathbf{S}\_{i+1\prime}, \dots, \mathbf{S}\_N \right\}\_{\mathbf{1} \times (\mathbf{N} + \mathbf{N}\_{\mathbf{c}})} \tag{13}$$

In Equation (13), the next sample of *Si* is *Si*,*i*+<sup>1</sup> 1 . If the downstream intersection *v i* in *Si* is not in accordance with the *vi*,*i*+<sup>1</sup> 1 in *Si*,*i*+<sup>1</sup> 1 , the acquired *Nc* samples are incorrect and should be re-compensated. The re-compensation algorithm flowchart is presented in Figure 3.

**Figure 3.** Flowchart of re-compensation algorithm.

For simplicity, the whole trip chain presented by Equation (13) is further expressed by a general form, as shown in Equation (14).

$$\mathbf{T}\mathbf{s} = \begin{Bmatrix} \mathbf{S}\_{1\prime} \cdots \mathbf{S}\_{\prime} \mathbf{S}\_{i\prime} \cdots \mathbf{S}\_{\prime N + N\_c} \end{Bmatrix} \tag{14}$$

In the actual scenario, the whole trip consists of one or more sub-trip chains, where each sub-trip chain denotes a complete trip from the origin to destination. The detection time interval of any adjacent samples in Equation (14) denotes the travelling time of the vehicle in the section between *vi* and *vi*+1. When the time interval is bigger than a certain threshold, it implies that the vehicle finishes the trip

at some place between *vi* and *vi*+1. Under this condition, *vi* and *vi*+1 belong to different sub-trip chains. Similarly, we take λ*i*,*i*+<sup>1</sup> · *tri*,*i*+<sup>1</sup> as the threshold to divide the trip chain. For the series shown in Equation (14), the detecting time interval of adjacent samples is calculated in order.

$$t\_{i+1} - t\_i > \lambda\_{i, i+1} \cdot t r\_{i, i+1} \tag{15}$$

The trip chain is divided into two sub-trip chains, in which *vi* is the destination of the former sub-trip chain, and *vi*+1 is the origin of the following sub-trip chain, as shown in Equation (16):

$$\begin{aligned} \,^tTs = \begin{cases} \,^tTs(1) \\ \,^tTs(2) \end{cases} \end{aligned} \tag{16}$$

where,

$$Ts(1) = \langle S\_1, \dots, S\_i \rangle \tag{17}$$

$$\mathbf{T}\mathbf{s}(\mathfrak{2}) = \{\mathbf{S}\_{i+1}, \dots, \mathbf{S}\_{N+Nc}\} \tag{18}$$

Because of the large data coverage time range, the number of vehicle trips is often greater than two. Therefore, according to the above method, the travel chain can be divided into *T* sub-trip chains, as shown by Equation (19).

$$\begin{aligned} \text{Ts} &= \begin{cases} \text{Ts}(\mathbf{1}) & \\ & \cdots \\ \text{Ts}(j) & \\ & \cdots \\ \text{Ts}(N\mathbf{r}) & \end{cases} \end{aligned} \tag{19}$$

where *NT* is the number of sub-trip chains of the vehicle in the sampling time period.

#### *3.2. Vehicle Trajectory Prediction Model Based on Turning State Transition Matrix*

The series of intersections in the trip chain contains the turning information when a vehicle passes a certain intersection. The turning state transition matrix denotes the probability matrix for which direction a vehicle may take. Considering that the series of intersections for the *j*-th sub-trip chain are denoted by Equation (20):

$$\mathcal{V}^{j} = \left\{ v\_{1'}^{j}, \dots, v\_{i'}^{j}, \dots, v\_{N\_{\hat{j}}}^{j} \right\} \tag{20}$$

Referring to the series presented in Equation (20), it is easy to acquire the downstream intersection of each node in the *j*-th sub-trip chain when the vehicle is driving on the road. In the *k*-th intersection, assuming that there are *Na* approaches and *Ne* exits with the associated downstream intersections denoted as %*v k* (1), ··· , *<sup>v</sup> k*(*Ne*)&. The turning state of the case vehicle at a certain intersection can be described by Equation (21).

$$B^{j} = \begin{bmatrix} b\_{a\_{1},c\_{1}}^{j} & \cdots & b\_{a\_{1},c\_{N\_{\ell}}}^{j} \\ \vdots & \ddots & \vdots \\ b\_{a\_{N\_{d},\ell}}^{j} & \cdots & b\_{a\_{N\_{d},\ell},c\_{N\_{\ell}}}^{j} \end{bmatrix}\_{N\_{d}\times N\_{\ell}} \tag{21}$$

In Equation (21), *bam*,*en* denotes the turning relationship when a vehicle drives passing the intersection. When the vehicle enters the intersection from the *m*-th approach and leaves from the *n*-th exit, then,

$$b\_{\mathbb{A}\_W \mathcal{E}\_n} = 1 \tag{22}$$

Otherwise,

$$b\_{a\_m c\_n} = 0\tag{23}$$

For the *j*-th sub-trip chain acquired by Equation (19), the turning relationship can be obtained by the series of intersections and the turning state of the case vehicle (Equation (21)) is established, denoted as *Bj*. In extended time, the case vehicle passes the same intersection for many times. Hence, the total turning state of the vehicle at a certain intersection can be calculated by the sum of all the turning state matrixes. For a case vehicle, suppose that there are *NT* sub-trip chains passing through the *i*-th intersection, the total turning states of the vehicle can be calculated by Equation (24):

$$B\_{i} = \sum\_{j=1}^{N\_{\rm T}} B^{j} = \begin{bmatrix} \sum b\_{a\_{1}, c\_{1}}^{j} & \cdots & \sum b\_{a\_{1}, c\_{N\_{\rm e}}}^{j} \\ \vdots & \ddots & \vdots \\ \sum b\_{a\_{N\_{\rm d}}, c\_{1}}^{j} & \cdots & \sum b\_{a\_{N\_{\rm d}}, c\_{N\_{\rm e}}}^{j} \end{bmatrix}\_{N\_{\rm d} \times N\_{\rm e}} \tag{24}$$

In addition, the turning state transition matrix is acquired by Equation (25):

$$Pr\_i = B\_i / N\_T = \begin{bmatrix} \sum b\_{a\_1, c\_1}^j / N\_T & \cdots & \sum b\_{a\_1, c\_{N\_d}}^j / N\_T \\ \vdots & \ddots & \vdots \\ \sum b\_{a\_{N\_d}, c\_1}^j / N\_T & \cdots & \sum b\_{a\_{N\_d}, c\_{N\_d}}^j / N\_T \end{bmatrix}\_{N\_d \times N\_T} \tag{25}$$

In Equation (25), *<sup>b</sup>jam*,*en*/*NT* denotes the probability when the vehicle chooses the *n*-th exits from the *m*-th approach. For each row of the matrix shown in Equation (25), the following constraints should be satisfied:

$$\sum\_{n=1}^{N\_\varepsilon} \sum b\_{a\_n, c\_n}^j / N\_T = 1 \tag{26}$$

Equation (26) implies that, in case the vehicle drives in an intersection from a certain approach, it must go out from one of the exits. However, for some intersections, there may be no effective trip chain, i.e, the case vehicle does not pass the intersection during experimentation. When this happens, the turning state probability for each exit is assigned an equal average probability value, as shown in Equation (27):

$$\sum b\_{a\_m, c\_n}^j / N\_T = 1 / N\_\varepsilon \tag{27}$$

When a vehicle enters the intersection from the *m*-th approach, it will go to the *n*-th downstream intersection from the *n*-th exit at a probability of *<sup>b</sup>jam*,*en*/*NT*. Hence, the one-step prediction probability of the vehicle for the next intersection *v i*is calculated by Equation (28).

$$P\_{i+1} = O\_i \cdot Pr\_i = \left[o\_{a\_1 \cdot}^i \cdots \cdot o\_{a\_{N\_d}}^i\right]\_{1 \times N\_d} \cdot \begin{bmatrix} \sum b\_{a\_1 \cdot \varepsilon\_1}^j / N\_T & \cdots & \sum b\_{a\_1 \cdot \varepsilon\_{N\_d}}^j / N\_T \\ \vdots & \ddots & \vdots \\ \sum b\_{a\_{N\_d} \varepsilon\_1}^j / N\_T & \cdots & \sum b\_{a\_{N\_d} \varepsilon\_{N\_e}}^j / N\_T \end{bmatrix}\_{N\_d \times N\_e} \tag{28}$$

In Equation (28), *Oi* is the original state of the vehicle. When the vehicle is originally detected at the *m*-th approach:

$$
\sigma\_{a\_m}^i = 1 \tag{29}
$$

For other approaches,

$$
\sigma^i\_{\mathfrak{a}\_1,\dots,\mathfrak{a}\_{m-1},\mathfrak{a}\_{m+1},\dots,\mathfrak{a}\_{\mathfrak{N}\_d}} = 0 \tag{30}
$$

From Equation (28), the turning probabilities of the vehicle to the downstream intersections are acquired, expressed by Equation (31):

$$P\_{i+1} = [p\_{c\_1}^{i+1} \; \cdots \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \tag{31}$$

Based on the analysis above, the one-step predicted intersection which the vehicle will pass through is the downstream intersection corresponding to the maximum probability max%*pi*+<sup>1</sup> *en* &. The one-step prediction method can be described by Figure 4.

**Figure 4.** One step trajectory prediction.

The *k*-step prediction probability of the vehicle for the next *k* intersections is calculated by Equation (32).

$$P\_{i+k} = O\_{i+k-1} \cdot Pr\_{i+k-1} \tag{32}$$

In Equation (32), *Oi*+*k*−<sup>1</sup> is the original probability state of *vi*+*k*−1. Referring to the directly connected relationship with the upstream intersection *vi*+*k*−2, assuming that the vehicle comes from the *m*-th approach of *vi*+*k*−1, *oam* is valued as the turning probability based on the upstream intersection *vi*+*k*−2. For other approaches,

$$
\sigma\_{a\_1,\ldots,a\_{m-1},a\_{m+1},\ldots,a\_{N\_d}}^{i+k-1} = 0 \tag{33}
$$

Similarly, the *k*-step predicted intersection is the *k*-th downstream intersection corresponding to the maximum probability max%*pi*+*<sup>k</sup> en*&.

 The *k*-step prediction method can be described by Figure 5.

**Figure 5.** K-step trajectory prediction.

#### **4. Experiments and Discussion**

In this section, a regional road network in Qingdao, China, is selected for the case study. In the network, there are 27 intersections, 40 sections and 35 positions deploying with video-imaging cameras, as shown in Figure 6.

**Figure 6.** The positions of video-imaging cameras and road network topology in the case study. (**a**): Distribution of video-imaging cameras. (**b**): Road topology.

The original vehicle license plate data sample is acquired from the video-imaging detectors in actual traffic scenario. The proposed method is evaluated based on actual historical video-imaging data for the duration of one month.

#### *4.1. Results of Trip Chain Building and Compensation*

In the proposed method, travel time threshold between adjacent intersections is highly necessary for dividing the trip chain into different sub-trip chains. Since the traffic states are different at different time periods per day, the threshold should be calibrated according to the traffic variation. In this section, we take the morning and evening rush hours as examples to present the amplification coefficient <sup>λ</sup>*i*,*<sup>j</sup>* calibration progress.

Considering the section in Figure 6b as an example, the samples at 7:00–9:00 AM for morning rush hours and 17:00–18:00 for evening rush hours in the original dataset in one month are extracted and analyzed, and the statistics of travelling time for all vehicles are presented in Figure 7.

**Figure 7.** An example of travelling time value distribution of the case section.

From Figure 7, it is evident that the traffic flow of the case section has typical tidal feature since the travelling time values in evening rush hours are much higher than the mornings. However, the travel time values are clustered in major regions referring to different time periods. If the threshold is set too small, some normal sub-trip chains will be over-segmented and much useful information will be lost for the establishment of the turning state transition matrix. If the threshold is too big, two sub-trip chains will be considered as one, inducing a misjudgment of the vehicle travelling state at that the joint points. In order to avoid this, the amplification coefficient <sup>λ</sup>*i*,*<sup>j</sup>* is calculated by the ratio of the upper value and the average travelling time in Equation (5) after excluding data outliers, as shown in Equation (34).

$$
\lambda\_{i,j} = Q\_{\text{max}} / Tr\_{i,j} \tag{34}
$$

After acquiring the trip chains of a target vehicle, missing data points are compensated by the method proposed in Section 3. In order to evaluate the performance of the proposed trip chain compensation method, part of consecutive sampling nodes are selected and removed artificially from a whole trip chain. In this paper, at most 5 consecutive sampling nodes are compensated for. Based on the original trip chain shown in Equation (6), 1 to 5 consecutive sampling nodes are removed respectively to obtain sample sequences for compensation, as shown in Equation (35).

$$T\mathbf{s}\_{\text{conv}} = \left\{ \mathbf{S}\_1, \dots, \mathbf{S}\_{i-1}, \mathbf{S}\_{i+n\_{\text{conv}}}, \dots, \mathbf{S}\_N \right\}, i = \mathbf{2}, \mathbf{3}, \dots, N - n\_{\text{conv}} \tag{35}$$

In Equation (35), The total number of cases is *N* − *ncon*, and the number of consecutive nodes for compensation is *ncon*.

Using the method proposed in Section 3.2, the removed nodes in Equation (35) are compensated. To assess the performance of the compensation method quantitatively, the compensation accuracy is proposed. It is calculated by the ratio of the number of correct nodes after compensation to the total number of nodes for compensation. The compensating accuracy under different cases is shown in Figure 8.

**Figure 8.** Compensating accuracy under different cases.

From Figure 8, it is evident that the proposed method presents a significant performance in the compensating missing nodes. All the cases are with a high accuracy of more than 80%. Moreover, the accuracy presents a declining trend with the increase of the number of nodes for compensation. This is because when several consecutive sampling nodes are missing, there will be more possible trajectories for the vehicle in the undetected region. The Dijkstra method may not perform very well in a large and complex road network.

#### *4.2. Trajectory Prediction Results and Analysis*

Among all vehicles in the original data sample, a section of the vehicles are selected for the verification of the performance of trajectory prediction. In this paper, all the trip chains of the case vehicles are acquired from the original data. Moreover, part of the historical trip chains are used for training the turning state transition matrix and the remaining trip chains are used for testing the accuracy of the trajectory prediction results. For one to four-step prediction, the results of 10 case vehicles are presented in Figure 9.

*Sensors* **2020**, *20*, 1258

**Figure 9.** One to four-step trajectory prediction results of case vehicles.

In Figure 9, it is evident that, the accuracy varies significantly among different vehicles. As shown in Figure 9, 1#, 3# and 4# vehicles present a much higher prediction accuracy than others for one to four-step trajectory. This presentation is mainly caused by the regularity of vehicle driving characteristics. For vehicle trajectories that are relatively regular, such as the trajectories created by the commuters to and from work in each working day, the accuracy presents much high and stable values, while for the random travelling trajectories, such as the trajectories from taxies, the accuracy is relatively low. For example, the 5# vehicle presents a low prediction accuracy and large fluctuation with the gradual increase of training data. In order to show the results more clearly, the average prediction accuracy for testing vehicles together with the fitting results are further presented in Figure 10. According to the variation of the accuracy values, the logarithmic function is applied for the fitting, as shown in Equation (36).

$$y = a\ln(\mathbf{x} + b) + c \tag{36}$$

In Figure 10, with the increase of the amount of training data, the accuracy presents a rising trend. More training data contains more information about the trip chains so that the turning state transition matrix can describe the travelling characteristics more accurately. In the case analysis, vehicles can reach an average accuracy of 0.72 for one-step prediction on the basis that there are more than 200 training data samples. Hence, the proposed method presents a better performance in trajectory prediction. Moreover, the accuracy presents an overall downward trend with the increase of number of prediction steps. The maximum accuracy is about 0.80, 0.63, 0.51 and 0.43 for one-step, two-step, three-step and for four-step trajectory prediction, respectively. The reason is that there are more cases for the vehicle to choose the following intersections with the increase of the number of prediction steps. As the trajectory becomes more unpredictable, the accuracy declines.

**Figure 10.** Average prediction accuracy and the fitting results for different prediction steps.

#### **5. Conclusions and Future Work**

This paper proposes a vehicle trajectory prediction algorithm based on license plate data collected from video-imaging detectors. In order to obtain more complete vehicle travel information, we use the Dijkstra algorithm for data compensation. The driving characteristics are described by the turning state transition matrix which is acquired by the historical trip chains based on the time series of license plate data. Based on the turning state transition matrix, we make a multi-step prediction for specific vehicles. The experimental results show that, although the performance of trajectory prediction for different vehicles varies significantly, the proposed vehicle trajectory prediction algorithm has high average accuracy at the expense of a simple calculation, especially for one-step prediction. Compared with the traditional schemes, the proposed method fully exploits the potential value of existing data and without any extra investment needed. This is really beneficial for urban traffic feature analysis and traffic management.

In this paper, the vehicle license plate data obtained from video-imaging detectors is the unique input of the proposed method. A high-quality license plate data set is the prerequisite for the implementation of the method. Some subtle errors in the original data, such as timestamp error, detector positioning error and others, should be eliminated. Hence in actual applications, a sophisticated data pre-processing scheme is indispensable.

Future research mainly focuses on two aspects. Firstly, the proposed method can be verified using a license plate data set of 10 vehicles in one month. In order to acquire more precise conclusions, the data sample should be further expanded. Secondly, according to the general understanding, the driving characteristics of a section of vehicles in an urban environment is time-sensitive to some extent. Hence, an analysis of the sensitivity of historical data to the prediction accuracy will be carried out.

**Author Contributions:** Conceptualization, H.L. and Z.Z.; methodology, H.L. and Z.Z.; software, Z.Z.; validation, H.L., Z.Z. and S.Z.; formal analysis, H.L.; investigation, H.L.; resources, H.L.; data curation, Z.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, L.R., H.L. and Z.Z.; visualization, Z.Z.; supervision, H.L.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Shandong Provincial Natural Science Foundation, gran<sup>t</sup> number ZR2019QF017 and Basic Research Plan on Application of Qingdao Science and Technology, gran<sup>t</sup> number 19-6-2-3-cg.

**Conflicts of Interest:** The authors declare no conflict of interest.
