1. Introduction
With the popularity of mobile smart devices and the development of LBSNs, such as Foursquare, Gowalla and Yelp, users can find preferred POIs through mobile devices and location-based services, publish their check-in information in social network platforms, and share their experience after visiting POIs. POI recommendations can help users quickly find POIs that they are interested in, improve users experience, and help POI providers quickly understand users preferences and further improve service quality in a targeted manner. Therefore, POI recommendation has gradually attracted wide attention from researchers. Since users show strong timing when visiting POIs, the recommendation list will change with the corresponding check-in information. As an extension of the general POI recommendation [
1], next POI recommendation can accurately infer POIs that users will visit at the next moment according to users’ historical check-in trajectories, consequently providing useful assistance to users and merchants. Therefore, next POI recommendation has become one of the research hotspots in the academic and industrial fields [
2].
In next POI recommendation, most studies focus on exploiting users’ mobility sequence patterns hidden in users’ historical check-in trajectories. As for the sequential POI recommendation methods, Markov-chain-based methods [
3] are mainly used in the early stage. However, experiments show that such methods have an excellent effect in sparse scenarios, but fail to capture complex sequence features. Later, researchers tried to improve the accuracy of POI recommendation by using the recurrent neural network (RNN) [
4] with memory mechanisms. In order to alleviate the problem of the vanishing gradient in traditional RNN, sequential POI recommendation methods based on long short-term memory (LSTM) [
5] and gate recurrent unit (GRU) [
6] have been proposed one after another, which have some effects on acquiring the long-term dependencies of users. In recent years, inspired by machine translation Transformer, researchers leveraged the self-attention mechanism for sequential POI recommendation [
7]. Experiments show that the performance of the POI recommendation methods is significantly better than that based on Markov chain and RNN.
In the methods of next POI recommendation, more factors considered are the temporal factor [
8,
9] and spatial factor [
10,
11]. To this end, researchers have proposed various next POI recommendation methods based on the spatio-temporal context to improve the recommendation performance. Although these methods have shown inspiring results, they still have two major limitations as follows:
(1) The effect of spatio-temporal unequal interval information between any two POIs on user selection of next POI is ignored. Most studies model users’ mobility sequence patterns according to the corresponding check-in temporal sequences in users’ historical check-in trajectories. It is assumed that there is an equal temporal/spatial (distance) interval between consecutive check-in activities in users’ check-in trajectories [
12], or considering the spatio-temporal information between POIs visited in only consecutive check-in activities [
13] or only in non-consecutive check-in activities [
14], and ignoring the impact of the spatio-temporal unequal interval between POIs visited in any two check-in activities whether consecutive or not. Studies have shown that the temporal interval between POIs is different, and the corresponding impact on the selection of next POI is also different [
15]. Moreover, different users have different maximum tolerances for temporal and spatial intervals between POIs [
10]. In fact, even if the historical check-in trajectories of users are the same, the time when users visit the corresponding POIs may be different [
16]. The results observed from
Figure 1 support that users have different preferences for spatio-temporal intervals when selecting next POIs. Obviously, users’ different spatio-temporal interval preferences have an impact on the users’ selection of next POIs.
(2) The spatial–temporal unequal interval correlation is not considered when recommending the next POI. Most POI recommendation methods based on the spatial–temporal context only analyze the spatial correlation between POIs visited in consecutive or non-consecutive check-in activities [
17,
18], while lacking the correlation analysis of the impact of the temporal interval on the spatial interval between any two POIs from the user check-in sequence. Furthermore, as shown in
Figure 1, User1 prefers to spend a longer time on visiting POIs with larger spatial intervals, while User2 does the opposite. Therefore, we consider that the greater the temporal interval between the user visiting POIs, the more likely he/she is to visit POIs with larger spatial intervals, and vice versa.
To address the issues mentioned above, we propose a spatio-temporal unequal interval correlation-aware self-attention network (STUIC-SAN) model for next POI recommendation. The model integrates POIs information, spatio-temporal unequal interval correlation information between POIs, and the absolute positional information of corresponding POIs in users’ check-in sequences, learns users’ personalized spatio-temporal unequal interval correlation preference features, and then makes next POI recommendations. Extensive experiments are carried out on two public datasets, namely, Foursquare and Gowalla, to verify the effectiveness of STUIC-SAN. Our primary contributions can be summarized as follows.
(1) We propose a novel approach for next POI recommendation, named STUIC-SAN, which significantly improves the performance of the recommendation.
(2) We deeply mine the correlation between spatio-temporal unequal intervals, and develop an embedding method that takes into account POIs information, spatio-temporal unequal interval correlation information between POIs, and the absolute positional information of corresponding POIs. These factors are considered to be the relationship between any two POIs so as to comprehensively model users’ spatio-temporal unequal interval correlation preferences.
(3) We design a correlation-aware self-attention network model to learn users’ personalized spatio-temporal unequal interval correlation preference features, which can automatically measure the relevance of various inputs mentioned in contribution (2) to the model at each step and then adjust the attention weights for the inputs accordingly.
(4) To verify the effectiveness of the proposed model, we conduct extensive experiments based on two real-world datasets. Experimental results show that the proposed STUIC-SAN outperforms the state-of-the-art next POI recommendation approaches.
The remainder of this paper is organized as follows.
Section 2 reviews the works related to next POI recommendation.
Section 3 gives the preliminaries and dataset analysis. We introduce the details of the proposed STUIC-SAN model in
Section 4. Next, we compare our proposed model with existing next POI recommendation models, and analyze the experimental results as well as the threats to validity in
Section 5. Finally, we conclude this paper and outline the future work in
Section 6.
4. The Proposed Method
In this section, we elaborate the general architecture of STUIC-SAN (note that the temporal interval and spatial interval in our model are unequal by default). As shown in
Figure 4, the STUIC-SAN model mainly consists of four modules: (1) preference modeling layer, which is used to construct spatio-temporal unequal interval matrices of users according to the spatio-temporal interval correlation information between any two POIs from users check-in sequences so as to comprehensively model users personalized spatio-temporal interval correlation preferences; (2) preference embedding layer, which is used for learning the dense representations of POIs information, spatio-temporal interval correlation information between POIs, and absolute positional information of corresponding POIs in the user check-in sequence; (3) Transformer model, which aggregates relatively important POIs from the user check-in sequence and adjusts different weights to each POI to update the representation of the corresponding POIs; and (4) next POI recommendation, which is used to calculate the preference score of next POI by querying the corresponding POI representation update of a specific user at time
t. Then, candidate POIs are sorted according to the corresponding preference scores, and the
top-k ranked POIs are recommended.
4.1. User Personalized Spatio-Temporal Unequal Interval Correlation Preference Modeling Layer
Accurately obtaining spatio-temporal unequal interval correlation information between POIs is critical for next POI recommendation. Considering the impact of spatio-temporal unequal interval correlation information on users’ check-in behaviors, we model the spatio-temporal unequal interval correlation information between any two POIs as a relationship between the corresponding POIs. On this basis, users’ personalized temporal unequal interval matrices are constructed based on the corresponding temporal sequences generated by users’ check-in sequences. Subsequently, users are classified by the comparison results of the average temporal interval between POIs from each user check-in sequence, and the average temporal interval between POIs from all users’ check-in sequences. Then, we use linear regression to obtain the spatio-temporal unequal interval correlation of different kinds of users, and calculate the maximum spatial interval of each user to construct the corresponding user personalized spatial unequal interval matrix. Next, we describe the process of obtaining users’ personalized spatio-temporal unequal interval matrices in detail.
4.1.1. Construction of Users Personalized Temporal Unequal Interval Matrices
This subsection is mainly used to construct users’ personalized temporal unequal interval matrices based on the corresponding temporal sequences generated from users’ check-in sequences. For each user check-in sequence, we adopt the similar method proposed in [
15] to perform the same processing on each timestamp of the corresponding POI. That is, we divide the temporal interval between any two POIs by the minimum temporal interval except 0 so as to scale down it in equal proportion. Meanwhile, considering that the temporal interval between any two POIs is too large, the clip operation is further performed on all temporal intervals after the reduction to better model the personalized temporal unequal interval between POIs.
Specifically, we generate the corresponding temporal sequence
according to the user
check-in sequence, and then calculate the temporal interval between any two POIs, which is represented by
, and
, where
represents the set of temporal intervals in the check-in sequence of user
, and
denotes the minimum temporal interval of user
. Then, each element in set
is scaled down in equal proportion by Formula (1).
Therefore, the personalized temporal unequal interval matrix
of user
is expressed as
Note that the elements on the main diagonal in matrix are all 0.
As mentioned above, we consider the case that the temporal interval between any two POIs is too large. So, we set the maximum threshold for matrix , and adjust each element in the matrix to . Therefore, matrix is further expressed as , while indicates that each element in the matrix is clipped according to the corresponding maximum threshold .
4.1.2. Construction of Users Personalized Spatial Unequal Interval Matrices
The main work of this subsection is to obtain users’ personalized spatial unequal interval matrices according to the spatio-temporal interval correlation between any two POIs. According to the correlation between temporal and spatial intervals of POIs shown in
Figure 2 and
Figure 3, we consider that the spatial interval between users visiting POIs will be affected by the corresponding temporal interval. So we classify users according to the average temporal interval between POIs from each user check-in sequence and the average temporal interval between POIs from all users’ check-in sequences, and use linear regression to obtain the spatio-temporal interval correlation of different kind of users. Then, we calculate the maximum spatial interval of each user as the maximum spatial span in the corresponding user-personalized spatial unequal interval matrix. On this basis, users’ personalized spatial unequal interval matrices are constructed.
Specifically, we obtain the corresponding spatial sequence
generated from the user
check-in sequence, and then calculate the spatial interval between any two POIs, which is represented by
, and
, where
denotes the set of spatial intervals in the check-in sequence of user
. Therefore, the personalized spatial unequal interval matrix
of user
is expressed as
Next, we classify users according to the average temporal interval between POIs from each user check-in sequence and the average temporal interval between POIs from all users’ check-in sequences. Among them, the average temporal interval of each user is denoted by
, and the average temporal interval of all users is represented by
. Then, we compare the average temporal interval of user
with the average temporal interval of all users to classify users. Furthermore, we count the number of users when
,
,
, respectively, and find that there are almost no users with
from two datasets. Therefore, according to the corresponding comparison results, users are divided into two categories, as shown in Formula (4).
where
and
represent the sets of the temporal interval and spatial interval of users with
, respectively. Correspondingly,
and
denote the sets of the temporal interval and spatial interval of users with
, respectively.
Based on the above classification of users, we use the linear regression method [
35] to obtain the spatio-temporal interval correlation between any two POIs from the corresponding check-in sequences of two kinds of users, and adopt Formula (5) to optimize the core objective.
where
j represents the number of elements in set
or
.
represents an element in the temporal interval set of a category of users. Similarly,
denotes an element in the spatial interval set of a category of users.
w and
b represent the slope and intercept in the linear regression equation, respectively. We use the minimization mean square error to solve the model, as shown in Formulas (6) and (7).
where
denotes the mean of the temporal intervals in set
.
For each kind of users, we obtain the corresponding values of
w,
b, denoted by
,
,
,
, respectively. Considering the differences of users temporal interval preferences for visiting POIs, we further calculate the corresponding maximum spatial interval of each user in a more fine-grained manner according to the obtained maximum temporal interval of each user, as shown in Formula (8).
where
indicates the linear regression operation,
represents the maximum spatial interval of matrix
of each user with
;
represents the maximum spatial interval of matrix
of each user with
; and
represents the maximum temporal interval between any two POIs from the check-in sequence of user
. Subsequently, each element in matrix
is denoted as
. Therefore, the personalized spatial unequal interval matrix of user
is further represented as
.
4.2. Embedding Layer Fusing Spatio-Temporal Unequal Interval Correlation Preference
The embedding layer is used to encode the POIs information, spatio-temporal interval correlation information between POIs, and absolute positional information of corresponding POIs in each user check-in sequence as latent representations. Firstly, we create an embedding matrix
for POIs, where
represents the number of POIs and
d represents the latent dimension. Then, for the historical check-in trajectory of user
, we use a constant zero vector as the embedding for padding items, and cut off or pad the user check-in trajectory to the first
n check-in activities. As for POIs from the first
n check-in activities, the embedding look-up operation retrieves the previous
n POI embeddings and stacks them together to generate a embedding matrix
as shown in Formula (9).
where
denotes the embedded representation of the POI visited in the
i-th check-in activity from the user check-in sequence.
Since the self-attention mechanism cannot directly obtain the POIs position from the user check-in sequence, we use two different learnable positional embedding matrices
and
, which represent the keys and values in the self-attention mechanism, respectively. This method is more suitable for the self-attention mechanism without requiring additional linear transformations [
7]. After the retrieval operation, we obtain the absolute positional embedding matrices
,
of the user check-in sequence, as shown in Formula (10).
Similar to the absolute positional embedding, we perform the same operations for temporal interval embedding and spatial interval embedding. Specifically, we use word embedding technology to encode temporal intervals and create two temporal interval embedding matrices
,
. Similarly, we obtain the spatial interval embedding matrices
,
. Then, after retrieving the clipped temporal interval matrix
and spatial interval matrix
, we obtain the corresponding temporal interval embedding matrices
,
as well as the spatial interval embedding matrices
,
of the user check-in sequence, as shown in Formulas (11) and (12).
4.3. Spatio-Temporal Unequal Interval Correlation-Aware Transformer Model
In this section, we elaborate the spatio-temporal unequal interval correlation-aware Transformer model. It considers POIs information, spatio-temporal interval correlation information between POIs, and the absolute positional information of corresponding POIs in each user check-in sequence as the correlation feature between any two POIs. The feature fusion and update of POIs from each user check-in sequence are carried out through the spatio-temporal unequal interval correlation-aware self-attention network, and through the point-wise feed-forward network, adding a fully connected layer to improve the generalization ability of the model.
4.3.1. Spatio-Temporal Unequal Interval Correlation-Aware Self-Attention Network
Inspired by the self-attention mechanism, we propose an extended model of the self-attention mechanism. The model considers the spatio-temporal interval between any two POIs as the relationship between the corresponding POIs. The output of each step of the model can aggregate POIs related to the current step, and adaptively give different weights to each POI to update the representation of each POI.
Specifically, given the POIs embedding matrix
, where
, after the self-attention network, it outputs a new sequence
, where
, to ensure that each element in the new sequence not only contains its own information, but also takes into account the impact of all other POIs from the user check-in sequence on the current step. The
i-th item
of the output sequence is computed as a weighted sum of the linearly transformed POIs embedding, temporal interval embedding, and spatial interval embedding between POIs, as well as the absolute positional embedding of corresponding POIs in the user check-in sequence as shown in Formula (13).
where
n represents the maximum sequence length inputted,
represents the embedding of
,
represents the projection matrix of the corresponding values in POIs embedding matrix.
and
represent the temporal interval embedding and spatial interval embedding between POIs, respectively, and
represents the absolute positional embedding of corresponding POIs in the user check-in sequence.
represents the weight coefficient, which is calculated by the soft-max function shown in Formula (14).
where
represents the attention score, which is calculated by comprehensively considering the POIs information, spatio-temporal interval information between POIs, and absolute positional information of corresponding POIs in the user check-in sequence in Formula (15).
where
and
represent projection matrices of the corresponding queries and keys in the POIs embedding matrices of the user check-in sequence, respectively. The scale factor
is for avoiding the inner product value being too large, which may cause the vanishing gradient after the soft-max function.
4.3.2. Point-Wise Feed-Forward Network
As described in
Section 4.3.1, the spatio-temporal unequal interval correlation-aware self-attention network uses a linear combination-based method to fuse the POIs information, temporal interval information and spatial interval information between POIs, as well as the absolute positional information of corresponding POIs in the user check-in sequence. Inspired by the idea in [
14], we apply two linear transformations with ReLU as the activation function after each spatio-temporal unequal interval correlation-aware self-attention network, consequently making our model nonlinear.
where
are weight matrices, and
are bias terms.
After stacking the self-attention networks and feed-forward networks, problems, such as model overfitting, vanishing gradients, and excessive training time, may occur. Therefore, inspired by reference [
36], we adopt layer normalization, dropout regularization and residual connections to solve these problems as shown in Formula (17).
where
is calculated by Formula (18).
where ⊙ represents the element-wise product.
and
denote the scale factor and the bias term, respectively.
and
represent the mean and variance of
, respectively, while
prevents invalid calculations when the variance is 0.
4.4. Next POI Recommendation
This section computes the preference scores that next POIs may be visited according to the corresponding preference representations of user visiting POIs obtained by the spatio-temporal unequal interval correlation-aware Transformer model. After stacking
N self-attention blocks, we obtain the combined representation of POIs information, spatio-temporal interval information between POIs, and absolute positional information of corresponding POIs in each user check-in sequence. In order to recommend the next POI to a user, we use Formula (19) to calculate the user preference score for POI
, sort the candidate POIs according to the corresponding preference scores, then recommend a list of POIs with higher preference scores to the user.
where
represents the combined representation of POIs embedding visited at the first
t time in the user check-in sequence, spatio-temporal interval embedding between POIs mentioned above and POIs visited at the
t+1 time, as well as the absolute positional embedding of the corresponding check-in sequence.
is the embedding of POI
.
4.5. Model Optimization
The purpose of this section is optimizing our proposed model. According to the user historical check-in trajectory
, we generate a fixed-length check-in sequence
, and further generate the corresponding temporal sequence
as well as the spatial sequence
, and define
as the expected output of the model. Since the interaction information between users and POIs is implicit data, we cannot directly optimize the preference scores of candidate POIs. Moreover, the output of our model is a list of ranked POIs. Therefore, we adopt a negative sampling method to optimize the ranking of candidate POIs. Specifically, for each expected positive output
, a negative sample
is randomly selected and taken to generate a pair of priority
. We normalize the output fraction of the model through soft-max function, and use binary cross entropy as the loss function in Formula (20).
where
is the set of embedding matrices,
denotes the
Frobenius norm, and
is the regularization parameter.
Then we use the Adam optimizer [
37] to optimize our model. Since each training sample
can be constructed independently, we use mini-batch SGD to improve the training efficiency.
5. Experiments
In this section, we conduct experiments to evaluate the effectiveness of the proposed model STUIC-SAN on two real-world datasets by attempting to answer the following four research questions.
RQ1. Does our approach outperform existing methods in the next-POI-recommendation task?
RQ2. Do personalized spatio-temporal interval information and spatio-temporal interval correlation affect the performance of the model recommendation?
RQ3. How do the parameters of the model, such as the latent dimension, the maximum sequence length, the maximum temporal interval and the maximum spatial interval, affect the recommendation performance?
RQ4. What is the impact of different spatio-temporal interval correlation processing on recommendation performance?
5.1. Experimental Setup
5.1.1. Data Collection and Preprocessing
We evaluated the proposed model on two publicly available LBSNs datasets [
38], Foursquare and Gowalla, with densities of 0.13% and 0.22%, respectively. The Foursquare dataset contains users check-in data from April 2012 to September 2013. While the Gowalla dataset contains users’ check-in data from February 2009 to October 2010. Each check-in activity of each user from both datasets is a five-tuple consisting of user ID, POI ID, and POI latitude and longitude, as well as the corresponding visiting timestamp.
For the two datasets, we first remove inactive users who have checked in fewer than 5 POIs, and remove cold-start POIs visited by fewer than 5 users, as they are meaningless data. We further summarize the statistics of the preprocessed datasets in
Table 2. Next, we rank the check-in activities of each user from both datasets by the corresponding visiting timestamps. To ensure that the timestamp of the first check-in activity sorted of each user starts from 0, we subtract the timestamp of each check-in activity after sorting by the smallest timestamp among the user check-in activities. For
n check-in activities in each user check-in sequence, we divide them into three parts, namely, training set, validation set and testing set. The number of training set is
, with the first
check-in activities as the input sequence and the
visited POI as the label; the validation set uses the first
check-in activities as the input sequence and the (
)-st visited POI as the label; the testing set uses the first
check-in activities as the input sequence and the
n-th visited POI as the label. The split of datasets follows the causality that no future data are used in the prediction of future data.
5.1.2. Evaluation Metrics
In order to evaluate the recommendation performance, we adopt two commonly used evaluation metrics [
25,
39]:
NDCG@k and
Recall@k, where
k is the number of recommended POIs.
NDCG@k considers the position of the ground-truth POIs and assigns greater weights to the POIs at higher positions.
Recall@k is used to calculate the ratio of true positive samples from the recommended POIs among all positive samples. In our model,
NDCG@k indicates whether the POIs that users actually check-in rank at the top of the corresponding recommendation lists.
Recall@k indicates whether there are POIs that users actually visit among the
top-k recommended POIs. These metrics are computed as follows:
where
is the graded relevance of POI at position
i. We use the simple binary relevance for our work, namely,
if there is a POI actually visited by the user in the recommended POIs list, and 0 otherwise.
denotes the maximum
in an ideal ranking.
where
denotes the number of positive POIs in the list recommended to user
u. Here,
.
5.1.3. Baseline Approaches
As for RQ1, we compare it with the following representative baseline approaches.
UCF [
40]: a collaborative filtering method based on matrix user-POI, which makes a recommendation according to the correlation between POIs.
FPMC [
41]: this method combines the Markov chain and matrix factorization methods, which can simultaneously capture temporal information and user long-term preference information, and then perform POI recommendation.
ST-RNN [
42]: this model extends RNN to integrate the temporal context and spatial information in a recurrent neural network for next POI recommendation.
ARNN [
43]: this model leverages semantic information, spatial information and user visiting information to build a knowledge graph, obtains POI neighbors through a random walk based on the meta path, and adopts LSTM to model sequence regularity to improve the recommendation performance.
LSTPM [
44]: a new method for modeling users’ long-term and short-term preferences, which uses LSTM to model users’ long-term preferences and geographical relationship between POIs visited by users recently, and then makes the next POI recommendation.
TiSASRec [
15]: a method based on the self-attention mechanism that explores the impact of different temporal intervals on the prediction of next item, and makes a recommendation in combination with the absolute positional information of items and the temporal interval between items.
Table 3 summarizes the approaches considering different factors in our experiments. In general, these methods can be divided into four categories: first, traditional collaborative filtering recommendation methods, e.g., UCF; second, Markov-chain-based methods, such as FPMC; third, methods based on RNN, such as ST-RNN, ARNN, and LSTPM; and fourth, methods based on the self-attention mechanism, such as TiSASRec.
5.1.4. Configurations
All experiments were conducted on a PC with 2.90 GHz Intel(R) Core(TM) i7, 16 GB RAM, and running on Microsoft Windows 10 (64-bit). The code used in our experiments was written in Python 3.6. In the meantime, we used TensorFlow 1.2.0 as a machine learning framework for the experiments. We stacked a total of two spatio-temporal unequal interval correlation-aware self-attention networks and fine-tuned the hyper-parameters on the validation set. The latent dimension
d was set to 50 for two datasets. For each target POI, the number of negative samples was set to 100. We used the Adam optimizer with default betas; the initial learning rate was set to 0.001, and the dropout rate was set to 0.2 to avoid overfitting. The size of each batch was set to 200 in this model. The settings of other hyper-parameters are shown in
Table 4. The source code of the proposed model is publicly available for download at (accessed on 28 August 2022)
https://github.com/huang-0724/STUIC-SAN.git.
5.2. Results and Discussions
5.2.1. Comparison of Recommendation Performance
For RQ1, we compare the effectiveness of seven methods with tuned parameters on two datasets. The mean and standard deviation of
NDCG@k and
Recall@k of all methods are reported in
Table 5 and
Table 6. The numbers shown in bold in
Table 5 and
Table 6 represent the best performance of each column in the corresponding tables.
From the results shown in
Table 5 and
Table 6, we can make the following observations.
First, our proposed model, STUIC-SAN, outperforms other models in terms of two metrics on two datasets. Specifically, compared with the best baseline method, the performance of STUIC-SAN on Foursquare dataset on Recall@10 is improved by about 6.57%, and more than 6.87% relative improvements on NDCG@10. The performance gains on Gowalla dataset is also similarly high. These results essentially demonstrate the competitiveness of our model.
Second, methods considering temporal influence work better than those without temporal influence. Obviously, FPMC performs better than UCF. The performance improvements of FPMC may be due to using the Markov chain model, which incorporates the temporal factor to model users’ check-in sequences, showing good performance compared with the traditional collaborative filtering algorithm.
In addition, methods utilizing spatial influence generally perform better than those without spatial influence. It is clear that ST-RNN, ARNN, and LSTPM perform better than FPMC. ST-RNN is greatly improved compared to FPMC since ST-RNN uses RNN to model the spatio-temporal context, showing good performance compared with those methods based on the Markov chain and matrix factorization model. Compared with ST-RNN, ARNN integrates the semantic information of POIs on the basis of fusing the spatio-temporal context to obtain more related POIs to expand the candidate POIs recommended. LSTPM proposes a spatially extended module to obtain users’ short-term preferences by making full use of the spatial relationship between non-consecutive POIs. Therefore, ARNN and LSTPM gain pure improvement compared with ST-RNN.
Third, the performance increase can also be attributed to the deep mining of spatio-temporal information and the self-attention mechanism. Taking the experimental results from the Foursquare dataset as an example, it can be seen that the performance of TiSASRec is better than that of the methods based on the RNN model, such as ST-RNN, ARNN and LSTPM. The reason is that RNN-based methods usually use short trajectories after slicing rather than long trajectories with long-term periodic information of the visiting POIs. Therefore, in view of the shortcomings of the RNN model itself, it is difficult for RNN-based methods to capture the exact impact of the POI visited in each check-in activity from the corresponding user historical check-in trajectory on the next POI selection, while the self-attention mechanism can effectively model users’ long-term preferences. Therefore, TiSASRec can accurately obtain long-term dependencies of users on item interactions by using the self-attention mechanism, and selectively combine the information of relevant item interactions. Furthermore, TiSASRec deeply mines temporal information and models the temporal interval information as users’ visiting preferences to improve the recommendation performance.
Lastly, the performance of our proposed STUIC-SAN model is significantly better than TiSASRec on both datasets mainly because it comprehensively considers the spatio-temporal unequal interval information between any two POIs. Moreover, the experimental results demonstrate that spatio-temporal unequal interval information between any two POIs helps to better capture spatio-temporal information to accurately infer the spatio-temporal unequal interval correlation preferences of users so as to improve the performance of next POI recommendation.
5.2.2. Effectiveness of Different Components
For RQ2, in order to analyze the impacts of the different modules in our model on the performance of next POI recommendation, we conduct ablation experiments in this section.
We investigate the effectiveness of different components on performance, including sequence influence in users’ check-in trajectories, as well as influences of temporal unequal interval, spatial unequal interval and spatio-temporal unequal interval correlation. Moreover, to further validate the benefits brought by each component, we construct the following variants of STUIC-SAN.
STUIC-SE: The model only considers the sequence influence in users historical check-in trajectoryies. In other words, the model only contains the absolute positional information of corresponding POIs, and assumes that there is an equal temporal/spatial interval between POIs in consecutive check-in activities.
STUIC-TE: The model considers the influence of temporal unequal interval only. So, we redefine Formulas (13) and (15) as follows:
STUIC-SP: This model integrates the spatial unequal interval information on the basis of STUIC-TE model, without considering spatio-temporal unequal interval correlation information between any two POIs. Therefore, we set a unified maximum spatial interval to perform the corresponding clip operation.
The characteristics of the variant models are shown in
Table 7. We take the
NDCG@k metric to illustrate the effectiveness of three newly designed components on two datasets, as shown in
Table 8. Note that the results on
Recall@k are similar to those on
NDCG@k, so we analyzed the effects of different components on
NDCG@k due to space limitation.
It can be seen from
Table 8 that among three variants of STUIC-SAN, STUIC-SE experiences the most performance decrease compared with STUIC-TE and STUIC-SP on both datasets. This is because STUIC-SE does not consider the spatio-temporal unequal interval correlation factors which are particularly important for next POI recommendation, resulting in users’ personalized spatio-temporal unequal interval correlation preferences being unable to be accurately obtained. The significant
NDCG@k drop verifies the positive contribution of the spatio-temporal unequal interval information and spatio-temporal unequal interval correlation information integrated into our STUIC-SAN model for the performance gain.
In addition, the performance of STUIC-TE is better than STUIC-SE, because STUIC-TE can obtain temporal unequal intervals between any two POIs from users’ check-in sequences, and accurately simulate users’ temporal unequal interval preferences for selecting next POI by using the self-attention mechanism, which is more reliable than STUIC-SE, which only considers the sequence information.
Moreover, the performance of STUIC-SP is better than STUIC-TE by integrating the spatial unequal interval information between POIs. The reason is that in the next POI recommendation task, the user selection of the next POI will be affected by the spatial distance between the current POI and the next POI. Therefore, the effective integration of the spatial unequal interval information between POIs is beneficial for improving the performance of the recommendation.
In contrast, the performance of our proposed STUIC-SAN is better than STUIC-SE, STUIC-TE and STUIC-SP, which also demonstrates that fully considering the spatio-temporal unequal interval information and spatio-temporal unequal interval correlation information between POIs can help to better capture users’ personalized spatio-temporal unequal interval correlation preferences so as to improve the performance of the next POI recommendation. Note that the components do not conflict with each other and can be utilized to collaboratively learn users’ personalized spatio-temporal unequal interval correlation preferences.
5.3. Sensitive Analysis of Parameters
For RQ3, we analyze the effects of different model parameters on the performance of STUIC-SAN in this section. Here, we focus on four critical parameters, namely, the number of latent dimension d, the maximum sequence length n, the maximum temporal interval and the maximum spatial interval . Next, we analyze the effects of four parameters on NDCG@k. Note that the results on Recall@k are similar to those on NDCG@k.
5.3.1. Effect of Latent Dimension
In this subsection, we study how sensitive STUIC-SAN is to the number of latent dimension
d, while keeping other hyper-parameters unchanged. As shown in
Figure 5, the performance first grows dramatically with the increase in
d, then improves relatively slowly. This is because
d represents the model complexity. Specifically, the model with a large
d is too complicated to depict the datasets, while the model with a small
d is not enough to describe the datasets. Thus, we set the number of latent dimension to 50 in this paper.
5.3.2. Effect of Maximum Sequence Length
To illustrate the effects of the maximum sequence length
n, we vary
n from 10 to 100 while keeping other hyper-parameters unchanged.
Figure 6 shows the results. As
n increases, we can see that both curves grow slowly and then gradually flatten on the two datasets. The reason is that when the sequence length is too large, many meaningless POIs are utilized to train the representation vector of the target POI, while when the sequence length is too small, it does not accurately depict the spatio-temporal context. Therefore, we choose the maximum sequence length that can obtain the best performance on two datasets as the default settings in
Section 5.1.4.
5.3.3. Effect of Maximum Temporal Interval
In order to demonstrate the effects of the maximum temporal interval
, we set values of the maximum temporal intervals of two datasets as 512, 1024, 2048, 4096, 8192, 12,000, while keeping other hyper-parameters unchanged. As shown in
Figure 7, as the maximum temporal interval increases, both curves show different trends on two datasets. This is because different maximum temporal intervals have different effects on the recommendation performance. It can be seen that the maximum temporal interval that can obtain the best performance on Foursquare is 4096, while that of Gowalla is 8192. Therefore, we choose such values that can obtain the best performance as the corresponding maximum temporal intervals on two datasets, respectively.
5.3.4. Effect of Maximum Spatial Interval
In this subsection, we further demonstrate the effects of the maximum spatial interval
. Considering the difference in spatial unequal interval of users from two datasets, we set different values of the maximum spatial intervals from two datasets while keeping the other hyper-parameters unchanged.
Figure 8 depicts the performance results on two datasets. Similar to the selection of the maximum temporal interval, we set the maximum spatial intervals to 5000 and 15,000, which can achieve the best performance on the two datasets, respectively.
5.4. Effect of Different Spatio-Temporal Interval Correlation Processing
For RQ4, we discuss the impact of different spatio-temporal interval correlation processing on the recommendation in this section. We discuss the following three methods.
STUIC-SP: the maximum temporal interval and the maximum spatial interval are set to a unified value without considering the spatio-temporal interval correlation information between POIs, as described in
Section 5.2.2.
STUIC-LE: users are classified by using the length of each user check-in sequence and the average sequence length of all users from each dataset so as to obtain the linear regression equation of the spatio-temporal unequal interval correlation with different sequence lengths.
STUIC-SAN: the spatio-temporal unequal interval processing we adopt, as described in
Section 4.1.2.
We illustrate the effectiveness of the three methods mentioned above on two datasets in
Figure 9. Among the three methods, STUIC-SP experiences the most performance decrease compared with STUIC-LE and STUIC-SAN on both datasets, while STUIC-SAN has the best performance. This is because STUIC-SP ignores the spatio-temporal interval correlation information between POIs, and sets the maximum temporal interval in all users’ personalized temporal unequal interval matrices to a unified value. Meanwhile, it performs a similar operation for the maximum spatial interval. Thus, it is not enough to capture users personalized spatio-temporal interval correlation preferences. On the basis of STUIC-SP, STUIC-LE and STUIC-SAN consider the correlation between the temporal and spatial interval, and divide users according to the length of users check-in sequences as well as the average temporal interval between any two POIs, respectively, and then obtain the spatio-temporal interval correlation between any two POIs from the corresponding users’ check-in sequences of different categories of users through the linear regression method. Compared with STUIC-LE, STIUC-SAN can more directly simulate the correlation between the temporal interval and spatial interval of users visiting POIs so as to better conduct the next POI recommendation.
5.5. Threats to Validity
In this section, we discuss some potential threats to the validity of our study. Threats to the effectiveness of our research include three aspects: data selection, experimental setup, and the selection of auxiliary information.
Data selection bias is one of the most common threats to validity. In the next POI recommendation task, we need to simulate users sequence visiting patterns according to the corresponding users check-in sequences so as to model users’ personalized visiting preferences. Therefore, we remove those inactive users and unpopular POIs. In addition, in order to facilitate the processing of users historical check-in trajectories, we set historical check-in trajectories of all users to a fixed-length (see the analysis in
Section 5.3.2). We also leverage the negative sampling method commonly used in the recommendation system to improve the performance and efficiency of recommendation. Moreover, we also conduct experiments with different maximum temporal intervals and maximum spatial intervals between any two POIs of users from two datasets so as to select the appropriate temporal and spatial intervals to build users’ personalized preference representations. Therefore, we have to admit that the recommendation performance of our model will decrease without appropriate maximum temporal and spatial intervals (see the analysis in
Section 5.3.3 and
Section 5.3.4).
In our experiment, we trained different types of baseline methods based on their default hyper-parameter settings. As we know, there are also several implicit tricks, e.g., fine tuning, in the baseline approaches based on deep neural networks. Therefore, we cannot ensure that these methods can achieve the same performance shown in their original papers on two datasets.
In our model, we mainly mine the spatio-temporal unequal interval correlation between any two POIs from users’ check-in sequences and then leverage the self-attention mechanism to perform next POI recommendation. However, we cannot obtain the vehicle information from users’ historical check-in trajectories from the datasets commonly used in next POI recommendation, and we also cannot guarantee that users visit the same spatial distance in a unit of time. Therefore, we consider that each user from both datasets visits the POIs in the corresponding user check-in trajectory by the same or similar means of transportation.
In addition, we cannot obtain the check-out time information of users from public datasets, but if the check-in devices that users adopted can apply more sensitive indoor localization technology [
45], then we can obtain the duration spent on each POI according to the corresponding check-in and check-out times so as to more accurately simulate users’ preferences for temporal unequal intervals when visiting POIs, consequently improving the accuracy of next POI recommendation.
In our model, users’ visiting preferences are simulated based on the users’ historical check-in sequence, so all next POIs recommended to users are those that users have already visited rather than new POIs. In future work, we can enrich the candidate pool for users to select next POI and improve the diversity of recommendation by integrating the similarity information between POIs. For example, we can recommend POIs from the same category based on semantic similarity information between POIs, or recommend adjacent POIs according to the spatial similarity information between POIs.
With the rapid development of LBSNs, users generate a large amount of check-in data every day. As we all know, the denser the user–POI interaction matrix, the more accurately we can simulate users’ visiting preferences, and thus provide more accurate recommendation. However, our STUIC-SAN model builds temporal and spatial unequal interval matrices for each user to achieve personalized recommendation. If using large datasets [
46], the running time of the model will experience latency to some extent. In future work, we will attempt to incorporate some lightweight models, e.g., LightMove [
47], into our methods to improve the model efficiency without reducing the accuracy of the recommendations.
6. Conclusions and Future Work
In recent years, next POI recommendation has attracted more attention in the fields of LBSNs and recommendation systems. In this paper, we propose a spatio-temporal unequal interval correlation-aware self-attention network (STUIC-SAN) to improve the performance of next POI recommendation. More specifically, STUIC-SAN uses the linear regression method to analyze the effect of the temporal unequal interval on the spatial unequal interval between any two POIs from users’ check-in sequences, and learns the effect of the spatio-temporal unequal interval correlation on the users’ selection of the next POI through the self-attention mechanism so as to better model users’ personalized spatio-temporal unequal interval correlation preferences, and then improve the performance of next POI recommendation. In addition, we conducted experiments on two publicly available datasets (namely, Foursquare and Gowalla) to verify the effectiveness of STUIC-SAN. The experimental results validate that STUIC-SAN outperforms the state-of-the-art methods regarding two commonly used metrics, namely, NDCG@k, Recall@k.
For future work, we will further enrich and optimize STUIC-SAN by considering more information, such as POI neighbor information, category information, and user check-in frequency, which can model users’ personalized visiting preferences more accurately so as to provide better performance of next POI recommendation. Moreover, we will try to combine this with the lightweight model such that the efficiency of the recommendation will be improved on the basis of ensuring the accuracy of the POI recommendation.