In this section, we introduce comprehensive evaluation experiments, which are performed on the proposed intention vector construction method to confirm its effectiveness and generality through a detailed comparison. In
Section 4.1, we first briefly introduce the employed pedestrian trajectory prediction datasets (ETH [
25] and UCY [
26]) and the current mainstream evaluation indicators. In
Section 4.2, the comparison results obtained by adding our intention module to the baseline model and the original baseline model are shown in detail. The displacement coordinate components of the last frame of the input historical trajectory (
) obtained using different forms are experimentally evaluated. The impacts of the random number factors
generated by different distributions are evaluated. At the same time, we remove some components from the random intention module to obtain different variants and evaluate the contributions of different components to the final attained performance. Finally, in
Section 4.3, we present the visualization results obtained on the ETH and UCY datasets.
4.1. Datasets and Evaluation Metrics
Datasets: To train and evaluate our method, we conducted experiments on two public datasets, ETH and UCY, which were sampled every 0.4 s from pedestrian trajectories produced in different real scenarios. The ETH dataset includes two real ETH and Hotel scenarios. The UCY dataset includes ZARA1, ZARA2, and UNIV, which are three real scenes. To evaluate our method, we adopted the mainstream training idea. For the five real scenario datasets, we trained and validated the model by evaluating it on four of the datasets and tested it on the remaining dataset. In the experiment, the historical trajectory lasted for 3.2 s (8 frames), and the predicted future trajectory was 4.8 s (12 frames).
Evaluation Metrics: We used the same evaluation criteria as those of the baseline method to evaluate the prediction results, namely, the mean displacement error (ADE) [
27] and the final displacement error (FDE) [
10], where the ADE calculates the mean Euclidean distance between all the predicted trajectory coordinates and all the true trajectory coordinates. The final displacement distance is calculated by the Euclidean distance between the predicted endpoint coordinates and the real destination coordinates. The specific definitions of these metrics are as follows:
At the same time, the assessment strategy of the baseline method was also adopted during the test; that is, 20 samples were generated, and the samples closest to the real trajectory were selected for evaluation purposes.
4.2. Quantitative Evaluation
In this section, we present all the experiments we performed to determine the effectiveness of the proposed random intention module.
Experimental evaluation of the intention vector construction process: To evaluate the effectiveness of our approach, as shown in
Table 2, we conducted experiments on the official source codes of three representative CNN-based and RNN-based baseline models (SGCN, STGAT, and SVAE) and compared the results of the baseline models with those of Intention-SGCN, Intention-STGAT, and Intention-SVAE rafter adding our approach. An * in
Table 2 indicates the results we reproduced using the official code of the method. Due to the different settings of some hyperparameters or the use of different experimental equipment, our directly reproduced results may have been different from the results produced in the original papers. Therefore, to ensure the fairness of the comparison and to better reflect the actual results of our methods, all the hyperparameters in our evaluations of Intention-SGCN and Intention-STGAT were consistent with those in the baseline models. For different baseline models, our intention vector construction method improved the error metrics produced during model testing. In particular, Intention-SGCN improved the average ADE/FDE values for the five real-world scenarios by 0.02/0.12, respectively, over those of SGCN *. Compared with that of STGAT *, the FDEs of Intention-STGAT improved by 0.03/0.05 over those of STGAT*, and that of Intention-SVAE improved by 0.01 over that of SVAE *. The Stanford UAV dataset [
28], which contains eight different real-world scenarios, was also used as a pedestrian trajectory prediction benchmark [
7,
13,
29]. For the sake of the comprehensiveness of the evaluation, we kept the same dataset segmentation settings as those in [
29] and evaluated the trajectory prediction effectiveness of Intention-SVAE with the addition of our intention module on the SDD dataset, using the latest SVAE approach as the baseline model. As shown in
Table 3, in the evaluation conducted on the SDD dataset, our Intention-SVAE achieved an effective FDE improvement of 0.02 over the baseline SVAE model. In the evaluation of our method, the y-coordinate component of the displacement in the last frame of the historical trajectory was used to construct the intention vector as the benchmark information
.
Table 4 presents our attempts to adopt different forms of displacement coordinate components for the last frame of the historical trajectory (
) and provides the relevant experimental evaluation results.
Evaluation of the results produced when selecting different
. As shown in
Table 4, before determining the final scheme, we tried to use different forms of the coordinate components in the last frame of the historical trajectory as benchmark information for constructing intention vectors. The experiments included three schemes: Intention-SGCN-X, Intention-SGCN-Y, and Intention-SGCN-XY. Intention-SGCN-X involved the use of the y-coordinate of the displacement of frame 8 as
, Intention-SGCN-Y means used the y-coordinate as
, and Intention-SGCN-XY used both the x- and y-coordinates as
. It can be seen that any of the schemes could yield performance improvements over the baseline model and improvements in the randomness of pedestrian intention changes. These results also demonstrate the universality of our intention vector construction method. Overall, the better results produced by Intention-SGCN-Y may have been obtained because pedestrian movements are always regular in specific scenarios, such as pedestrians walking along the street when shopping and pedestrians walking parallel to the zebra crossing, that is, pedestrians walking along a “road”. When the given dataset is labeled, under the set coordinate system, the forward direction for most pedestrians is consistent with the direction of the x-coordinate system, so the x-component of pedestrian displacement mainly affects the speed, while changes in the y-component can not only affect the speeds of pedestrians but also cause more direction changes. At the same time, although pedestrian movements are random, the walking speeds of pedestrians are relatively low, and their speed changes are continuous. Even if a change occurs, it is more represented by a shift in direction. Therefore, Intention-SGCN-Y is more suitable for actual situations; that is, the selected
must be set according to the vertical direction of the route in an actual application scene.
Evaluation of the
generated from different data distributions: To determine the final data distribution we adopted in
Section 3.2 and further verify the validity of our conjecture regarding the standard normal distribution, as shown in
Table 5, we designed relevant experiments to evaluate the actual effects of the random number factors
generated by different data distributions, including the uniform distribution, random normal distribution, and standard normal distribution, on the resulting models. The standard normal distribution achieved the best comprehensive effects in different real scenes, which was consistent with our conjecture. The standard normal distribution provided random positive and negative factors, and small random factors were obtained with high probability so that a large proportion of the pedestrian trajectories were only slightly changed, which was in line with the pedestrian movement patterns observed in real scenes.
Evaluating the contributions of different components in the proposed module: As shown in
Table 6, we evaluated three different variants of our random intention module, where (1) Intention-SGCN w/o RBLL indicates that the random biased linear layer (Linear*) was removed, making it difficult for pedestrians with small initial intention change probabilities to change their intentions; (2) Intention-SGCN w/o RNF means that the random number factor
was not used, so the random intention vector
lost its randomness; and (3) Intention-SGCN w/o POI means that intentions were not used to directly change the probabilities, and each pedestrian had a random intention vector so that the constructed random intention vector was no longer targeted, resulting in some redundant effects. As shown by the results of our ablation experiments (presented in
Table 6), removing any component from the model resulted in a substantial performance degradation. In particular, Intention-SGCN w/o POI did not conform to the pedestrian movement pattern in the real scene because each pedestrian considered their intention change, which directly led to a decline in the prediction performance achieved in scenarios other than the ETH scenario relative to that of the baseline model. This result shows that setting the intention change probability is crucial to the universality of the random intention module.