Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network

Hu, Luyao; Han, Guangpu; Liu, Shichang; Ren, Yuqing; Wang, Xu; Liu, Ya; Wen, Junhao; Yang, Zhengyi

doi:10.3390/math13060998

Open AccessArticle

Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network

by

Luyao Hu

¹,

Guangpu Han

¹,

Shichang Liu

¹,

Yuqing Ren

¹,

Xu Wang

¹,

Ya Liu

¹,

Junhao Wen

^2,* and

Zhengyi Yang

^2,*

¹

Chongqing Division, PetroChina Southwest Oil & Gasield Company, Chongging 400707, China

²

School of Bigdata and Software Engineering, Chongqing University, Chongqing 400044, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(6), 998; https://doi.org/10.3390/math13060998

Submission received: 6 February 2025 / Revised: 4 March 2025 / Accepted: 11 March 2025 / Published: 19 March 2025

(This article belongs to the Special Issue Advances in Recommender Systems and Intelligent Agents)

Download

Browse Figures

Versions Notes

Abstract

:

In the era of information overload, location-based social software has gained widespread popularity, and the demand for personalized POI (Point of Interest) recommendation services is growing rapidly. Recommending the next POI is crucial in recommendation systems, aiming to suggest appropriate next-visit locations based on users’ historical trajectories and check-in data. However, the existing research often neglects user preferences’ diversity and dynamic nature and the need for the deep modeling of key collaborative relationships across various dimensions. As a result, the recommendation performance is limited. To address these challenges, this paper introduces an innovative Multi-View Contrastive Fusion Hypergraph Learning Model (MVHGAT). The model first constructs three distinct hypergraphs, representing interaction, trajectory, and geographical location, capturing the complex relationships and high-order dependencies between users and POIs from different perspectives. Subsequently, a targeted hypergraph convolutional network is designed for aggregation and propagation, learning the latent factors within each view. Through multi-view weighted contrastive learning, the model uncovers key collaborative effects between views, enhancing both user and POI representations’ consistency and discriminative power. The experimental results demonstrate that MVHGAT significantly outperforms several state-of-the-art methods across three public datasets, effectively addressing issues such as data sparsity and oversmoothing. This model provides new insights and solutions for the next POI recommendation task.

Keywords:

next POI recommendation; multi-view learning; hypergraph learning; contrastive learning

MSC:

68U35

1. Introduction

In the era of information explosion, an increasing number of social applications are location based, and most people are willing to use location-based social software (such as Facebook, Instagram, and Weibo) to record their daily lives and share their thoughts and experiences. These applications utilize users’ geographical location information to provide functions such as recommendations, check-ins, and social interactions based on real-time location. For example, users can view the activities of nearby friends on social platforms, share their location, or check in at specific places. This indicates that personalized location-based recommendation services are becoming increasingly important for users. To help users discover Point of Interest (POI) from massive location information and recommend appropriate POI in real-time, personalized POI recommendation systems are essential. POI refers to a specific location in geographic space, typically a place that users are interested in visiting, staying at, or interacting with. POIs can include various types of locations such as restaurants, shopping malls, parks, attractions, historical sites, and more. In location-based social software, POI is a key concept because it directly influences user behavior and the performance of recommendation systems. POI data not only include basic information about the location, such as its name, address, and category, but may also include user ratings, reviews, photos, and other interaction data. By analyzing these data, recommendation systems can infer user preferences and recommend suitable POIs. With the development of Geographic Information Systems (GISs) and location data technology, POIs are playing an increasingly important role in personalized recommendations, navigation, tourism, and marketing. Among POI recommendation systems, the next POI recommendation holds a particularly critical position. Simply put, the next POI recommendation provides suitable POI suggestions for users’ next actions based on their check-in data and historical trajectories [1,2,3,4].

Recently, many researchers have delved into POI recommendation methods based on deep learning networks. These methods include Recurrent Neural Networks (RNNs) [5,6,7,8], Transformers [9,10], Graph Convolutional Networks (GCNs) [11,12,13,14,15], and Graph Attention Networks (GATs) [16,17]. These approaches capture latent, nonlinear relationships between users and POI. To address implicit feedback, some researchers have proposed deep learning methods to model local and global relationships separately, enabling personalized preference learning [18] and simulating user choices. While such methods improve recommendation performance to some extent, significant challenges remain.

One key limitation of these methods is the reliance on user–item matrices to model personalized preferences and latent features, which can be flawed. To mitigate the cold-start problem, many researchers have explored user–POI interaction patterns to learn latent relationships between them. While effective to some degree, these methods often overlook high-order collaborative signals, preventing them from capturing the diversity of relationships between users and POI. Graph-based deep learning methods offer an advantage by capturing high-order collaborative structure information. Given the advances in GNNs for modeling complex relationships, several studies [19,20,21] have employed Hypergraph Neural Networks (HGNNs) to learn latent user and POI representations.

For example, the Temporal Graph Convolutional Network (T-GCN) combines graph and temporal convolutional advantages to capture spatiotemporal information, making it suitable for modeling temporal dynamics in POI recommendations. Hypergraph Convolutional Networks (HGCNs), inspired by hypergraph structures, capture high-order collaborative relationships between users and POIs and between POIs themselves, enabling them to model spatiotemporal dependencies comprehensively. Compared to traditional graph models, HGCN handles high-order neighborhood relationships more effectively, alleviating data sparsity and over-smoothing problems. Another approach, GAT, introduces attention mechanisms to dynamically adjust relationships between POIs, enhancing its ability to handle sparse data and improving recommendation quality.

Despite the progress made in next POI recommendation, there remain two critical challenges: (1) existing research often ignores the diversity and dynamic changes in user preferences across contexts, resulting in limited and overly complex user representations. For instance, user behaviors are influenced by spatial and temporal factors and other dimensions. Current graph and hypergraph methods often entangle these preferences, failing to capture multi-dimensional and multi-level user behaviors accurately. (2) Existing methods lack in-depth modeling of collaborative relationships across dimensions, limiting their ability to integrate multi-dimensional representations effectively. Different perspectives should complement and enhance each other to improve the overall recommendation performance.

In this paper, we propose an innovative model called Multi-View Contrastive Fusion Hypergraph Learning (MVHGAT) to address these challenges in the next POI recommendation. The model decouples the complex relationships between users and POIs—such as interaction, trajectory, and geographical relationships—to construct multi-view representations. These relationships are crucial for next POI recommendation. Utilizing hypergraphs, which can effectively represent complex relationships and high-order dependencies that traditional graph methods struggle with, we adopt three hypergraphs, interaction, trajectory, and geographical hypergraphs, to capture global dependencies between nodes from different perspectives.

We design specific hypergraph convolutional networks for each view to encode POIs and learn latent factors for interaction, trajectory, and location views. To further integrate multi-view information, we employ an adaptive fusion strategy to combine user representations dynamically. Additionally, multi-view contrastive learning is used to capture high-order relationships across views, leveraging a self-enhancing mechanism to deepen the complementary recommendation effects among views.

Extensive experiments on three publicly available datasets demonstrate the significant advantages of MVHGAT in next POI recommendation tasks. In summary, our main contributions are as follows:

We address two challenging yet practical issues in next POI recommendation and propose an innovative framework, MVHGAT, to enhance the recommendation performance.
We design three distinct hypergraph convolutional structures—interaction, trajectory, and location hypergraphs—and tailor the hypergraph convolutions to suit different learning needs.
We employ multi-view weighted contrastive learning with self-enhancement to collaboratively supervise across views, addressing the difficulty of capturing complementary recommendation effects during learning.
Extensive experiments on three real-world datasets validate the effectiveness of MVHGAT compared to various state-of-the-art methods.

The structure of the paper is as follows. In Section 2, we introduce the related work. In Section 3, we provide a preliminary introduction to the task formulation and the concept of hypergraphs. Section 4 outlines the methodology, describing the proposed MVHGAT model in detail. Section 5 presents the experimental setup, including the datasets, evaluation metrics, and performance comparisons with the baseline methods. Finally, Section 6 concludes the paper and discusses potential future work.

2. Related Work

The next POI recommendation aims to recommend the most suitable next location for users based on their recent visit behaviors. Most existing methods adopt sequential models to address this task, ranging from Markov Chains [5] to Recurrent Neural Networks (RNNs) and their variants [6,8], and more recently, self-attention mechanisms [9,10]. For example, the Markov Chain method predicts the next POI by modeling the probability distribution of user visit sequences, but its limitation is that it can only capture short-term dependencies. To overcome this limitation, Recurrent Neural Networks (RNNs) are introduced to model the temporal nature of user behaviors. However, RNNs are prone to the vanishing gradient problem when handling long sequences, so their variants, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), are widely adopted. Additionally, Spatiotemporal Graph Neural Networks (STGNs) further incorporate the spatiotemporal information, enhancing the accuracy of recommendations. Despite these advancements, these sequential methods primarily focus on modeling individual user trajectories and often neglect non-continuous POI relationships within a trajectory or across different users.

With the rapid development of graph convolutional models, many researchers have begun applying graph learning methods to POI recommendation, ranging from traditional graph learning approaches [12] to hypergraph learning methods [22], as well as the recent surge in Graph Neural Networks (GNNs) and Hypergraph Neural Networks (HGNNs) [16,19,20,21]. For example, Graph-Flashback [23] empowers POI representations using spatiotemporal knowledge graphs and combines them with RNN-based methods to capture sequential transition patterns in user behaviors. Graph-Flashback constructs a spatiotemporal knowledge graph to transform users’ spatiotemporal behaviors into graph-structured data and utilizes Graph Neural Networks for modeling, thereby better capturing users’ spatiotemporal preferences. This method has achieved significant results across multiple datasets, demonstrating particularly strong performance on the Gowalla and Foursquare datasets. Lai et al. proposed a multi-view spatiotemporal-enhanced hypergraph network to integrate spatiotemporal information and high-order collaborative signals, demonstrating the strong performance of HGNNs in POI recommendation. Hypergraph Neural Networks (HGNNs) introduce hypergraph structures, enabling the more effective modeling of complex relationships and high-order interactions, thereby enhancing the accuracy and personalization of recommendation systems.

However, most graph- or hypergraph-based methods fail to account for differences in user preferences across multiple dimensions, leading to suboptimal user representations and potential confounding. Although a few studies [24] have attempted to use multi-view learning to model different preferences separately, they often only learn preferences straightforwardly without effectively distinguishing the latent features of different views.

We propose a multi-view hypergraph contrastive learning method to address this issue. This method employs distinct hypergraph convolutional models to learn from different hypergraphs. It aims to decouple user representations across interaction, trajectory, and location views and leverage multi-view contrastive learning to capture the diverse preferences of different users more accurately. Thus, it improves the accuracy and personalization of the next POI recommendations.

3. Preliminary

This section begins by defining the problem of the next POI recommendation, followed by an introduction to the concept of a hypergraph.

3.1. Task Formulation

Let

U = {u_{1}, u_{2}, \dots, u_{| U |}}

and

L = {l_{1}, l_{2}, \dots, l_{| L |}}

represent the set of users and Points of Interest, respectively. Each POI

l \in L

is defined by a unique geographic coordinate, denoted by

(longitude, latitude)

. For each user

u \in U

, their trajectory is represented as

s_{u} = {(l_{u, i}, t_{l_{u, i}}) ∣ i = 1, 2, \dots}

, where each pair

(l_{u, i}, t_{l_{u, i}})

indicates that user u visited POI

l_{u, i}

at time

t_{l_{u, i}}

. Given a target user u and their trajectory sequence

s_{u}

, the task of the next POI recommendation is to predict the top-K POIs that the user is most likely to visit at the subsequent timestamp.

3.2. Hypergraph

A hypergraph [22,25] is an advanced representation in graph theory that captures more complex relationships than traditional graphs. In a standard graph, each edge connects exactly two vertices, whereas in a hypergraph, each hyperedge can connect two or more vertices. Formally, a hypergraph is defined as

G = {V, E}

, where V represents the set of vertices, containing all nodes, and E represents the set of hyperedges, with each hyperedge capable of connecting multiple vertices. To describe the topology of the hypergraph, an incidence matrix

H \in R^{| V | \times | E |}

is introduced. Specifically, if a node

v \in V

belongs to a hyperedge

e \in E

, then

H_{e, v} = 1

; otherwise,

H_{e, v} = 0

.

We construct three hypergraphs to capture the complex relationships between users and POIs from different perspectives. First, the Interactive Hypergraph models user–POI interactions and user collaboration by treating POIs as nodes and user trajectories as hyperedges. An incidence matrix

H_{I}

records user visits to POIs, helping to identify users with similar visiting patterns and enriching preference information for recommendations. Second, the Trajectory Hypergraph focuses on transitions between POIs in user trajectories. Directed hyperedges represent user movements, and an incidence matrix

H_{T}

describes these transitions, aiding in predicting the next POI by analyzing user dynamics. Lastly, the Location Hypergraph captures geographical relationships between POIs by connecting those within a distance threshold

Δ d

using hyperedges. An incidence matrix

H_{G}

indicates proximity, enhancing recommendations by incorporating users’ geographical preferences.

4. Methodology

This section provides a detailed explanation of our proposed Multi-View Contrastive Fusion Hypergraph Learning method. As illustrated in Figure 1, we first design tailored hypergraph convolutional networks with adjusted aggregation and propagation strategies for different hypergraph structures. This enables effective hypergraph learning to uncover the latent complex relationships between users and POIs. Subsequently, we employ multi-view contrastive learning to capture complementary recommendation effects across different views, enhancing recommendation performance. Finally, we present our prediction and optimization approach to achieve more accurate POI recommendations.

In the next POI recommendation, complex relationships exist between users and POIs, such as user–POI interactions, POI–POI trajectory relationships, and POI–POI location relationships. The previous methods typically used graphs to represent these relationships, where users and POIs are treated as nodes, and their relationships are modeled as edges [26,27]. However, traditional graph structures are limited to pairwise relationships and cannot effectively connect higher-order neighbors within the same semantic context. Inspired by the highly flexible structure of hypergraphs, we innovatively designed three distinct types of hypergraphs: the interaction view hypergraph, the trajectory view hypergraph, and the geographical view hypergraph.

4.1. Anchor Attention Interaction Hypergraph Convolutional Network

In this study, we propose an innovative anchor attention interaction hypergraph convolutional network. The key innovation of this method lies in dynamically weighting node embeddings by leveraging contextual information, thereby enhancing the model’s ability to capture high-order relationships among Points of Interest (POIs). We first outline the basic framework of the hypergraph convolutional network interaction and subsequently introduce the anchor attention mechanism to emphasize critical features, addressing the limitations of traditional Graph Neural Networks (GNNs) in modeling high-order relationships. The proposed approach is built upon an interaction hypergraph convolutional network architecture, which captures high-order relationships between nodes through a two-step information propagation process. In this framework, hyperedges act as intermediaries for node aggregation and facilitate cross-hyperedge propagation. We introduce a contextual anchor attention mechanism to mitigate the potential over-smoothing issue often encountered in traditional graph convolution methods when modeling high-order relationships and improve the expressiveness of node features. This mechanism adaptively adjusts the weights of node embeddings based on global contextual information, enhancing the model’s focus on critical features. The core concept is to generate dynamic attention factors derived from the contextual information of input features, which are then used to weight node embeddings. As an initial step, global average pooling is applied to the input features to extract global contextual information:

avg features = \frac{1}{batch_size} \sum_{i = 1}^{batch_size} x_{i}

(1)

Then, dimensionality reduction and activation are performed: the features are reduced to

\frac{cananenis}{redaitin}

through a fully connected layer, followed by a nonlinear transformation using an activation function (SiLU):

act (x) = SiLU (FC (x))

(2)

SiLU is the activation function and FC is the fully connected layer. Next, dimensionality expansion and attention generation are performed: the reduced features are expanded back to the original dimensions through another fully connected layer, followed by the Sigmoid activation function to generate attention factors:

attn = σ (FC (act (avg_features)))

(3)

Here, the Sigmoid activation function is used to generate the attention factors. Finally, the generated attention factors are multiplied with the node embeddings to perform weighting:

output = x \cdot attn

(4)

This allows the model to dynamically adjust the weight of each node’s embedding based on its contextual information, thereby enhancing the expressiveness of key features. We combine the contextual anchor attention mechanism with the interaction hypergraph convolutional network to learn node embeddings. Initially, the POI embeddings are propagated through the graph convolutional network. In each layer, the representation of POI nodes is updated through the node-to-hyperedge aggregation and hyperedge-to-node propagation steps, capturing high-order relationships between users and POIs. Next, the anchor attention mechanism is applied to the outputs of each layer to dynamically adjust the weights of node embeddings using global contextual information, thereby enhancing the model’s focus on key features. Specifically, attention factors are generated through global average pooling, dimensionality reduction activation, and dimensionality expansion and then used to adjust the node feature representations by weighted multiplication with the node embeddings. To avoid the over-smoothing problem, residual connections are applied in each layer, and the embeddings from all layers are fused through mean pooling to generate the final POI node representations. Through this series of operations, the anchor attention interaction hypergraph convolutional network can efficiently and accurately learn the complex relationships within the multi-view interaction hypergraph, thereby improving the performance of the recommendation system.

4.2. Compressed Activation Attention Trajectory Hypergraph Convolutional Network

To learn POI representations from trajectory hypergraphs, we propose an innovative directed hypergraph convolutional network that integrates compressed activation attention. This method combines the trajectory hypergraph structure with the compressed activation attention mechanism, capturing the complex relationships between nodes while enhancing the focus on essential features, thus improving the model’s performance. Traditional hypergraph convolutional networks are mainly used for undirected hypergraphs and cannot effectively model the directed relationships between nodes. However, in practical scenarios such as recommendation systems, interactions between nodes are often directional (the user’s visit trajectory to POIs). To address this, we adopt a trajectory hypergraph convolutional layer, which explicitly models the directed relationships between nodes within the hypergraph structure. We introduce the compressed activation attention mechanism into the trajectory hypergraph convolutional layer to further enhance the model’s focus on essential features in node embeddings. The compressed activation module explicitly models the importance coefficients of each feature, dynamically adjusting their weights to improve the model’s ability to express key features. First, dimensionality reduction and activation are applied to the global information of each channel:

z = ReLU (F C_{1} (x))

(5)

Here,

x \in R^{atath-sisexectannels}

is the input feature, and

{FC}_{1}

is the dimensionality reduction fully connected layer. Then, the original dimensionality of the channels is restored, and the attention factors are generated:

a = σ ({FC}_{2} (z))

(6)

Here,

σ

is the Sigmoid activation function, and

a \in R^{thannels}

represents the importance coefficient for each channel. By multiplying the attention factors element-wise with the input features, the weight of each channel is dynamically adjusted:

x^{'} = x \cdot a

(7)

Here,

x^{'}

represents the weighted node embeddings. In the model, we first process the POI node embeddings through the trajectory hypergraph convolutional layer to capture the directed relationships between nodes. Specifically, the input node embeddings are propagated twice through the directed hypergraph structure: first from the source node to the target node

(H G_{poisisrr})

, and then from the target node back to the source node

(H G_{p i_{i} (a r)})

. Initially, the target node’s embeddings are aggregated to the source node. Then, the source node’s embeddings are propagated back to the target node, thus effectively modeling the directed interactions between nodes. We introduce the compressed activation attention mechanism to enhance the model’s focus on essential features. By generating attention weights for different features through a fully connected layer, the model can adaptively adjust the feature channels of node embeddings, focusing on more critical channel features. Finally, the weighted node embeddings are output. The entire model effectively captures the directed relationships between nodes and improves the model’s ability to express key features by dynamically adjusting channel weights.

4.3. Depth-Separable Geographical Convolution

In this work, we propose a network architecture that combines geographical information and graph convolution—Depth-Separable Geographical Convolution—for efficiently modeling the update of POI (Point of Interest) embeddings. The network employs a two-stage convolution mechanism: first, information is propagated via geographical graph convolution, and then features are further refined through depth-separable convolution to optimize the POI embeddings. The input to the depth-separable geographical convolution is the embedding representation of POI nodes,

e^{(0)} \in R^{L \times d}

, where L is the number of POI nodes, and d is the embedding dimension. Additionally, the adjacency matrix of the geographical graph,

G \in R^{L \times L}

, is required, where each element

G_{i j}

represents the geographical relationship between node i and node j.

POI embeddings propagate information through the product with the geographical graph in each layer of the graph convolution process. Let the POI embedding at the ℓ-th layer be

e^{(ℓ)}

, then the information propagation process can be expressed as

e^{(t)} = G e^{(t - 1)}

(8)

Here,

e^{(0)}

represents the initial POI embedding, and

e^{(ℓ)}

denotes the embedding at the ℓ-th layer. In this process, the information of the POI nodes is propagated to neighboring nodes through the connectivity of the geographical graph. To alleviate the over-smoothing problem and enhance the expressive power of the representations, we apply residual connections after each convolution layer. Specifically, the output at the ℓ-th layer is obtained by adding the output from the previous layer:

e^{(t)} = e^{(t)} + e^{(t - 1)}

(9)

This residual connection helps to avoid the excessive smoothing of information while retaining the independent representational capability of each layer. In the final stage of the depth-separable geographical convolution, we incorporate the Depthwise Separable Convolution method to further refine the POI node embedding features and reduce the computational overhead. Depthwise Separable Convolution consists of two stages: Depthwise Convolution and Pointwise Convolution.

Depthwise Convolution: Each input channel is convolved independently, and thus depthwise convolution processes the input features channel by channel. The output after depthwise convolution undergoes a nonlinear transformation through Batch Normalization and the LeakyReLU activation function.

Pointwise Convolution: A

1 \times 1

convolution is used to adjust the number of channels, followed by Batch Normalization and the LeakyReLU activation function to enhance the feature representation further. The main advantage of depthwise separable convolution is its low computational cost, significantly reducing the number of parameters while retaining complex feature representations.

Let the input to the depthwise separable convolution module be

e_{fnml} \in R^{L \times d}

, and the output is

e_{out} = DepthwiseseparableConv (e_{foul})

(10)

Here, DepthwiseSeparableConv represents the depthwise separable convolution operation, and its output,

e_{out}

, is the updated POI embedding. After multiple layers of graph convolution and depthwise separable convolution, the embeddings are integrated through average pooling to obtain the final embedding representation:

e_{fnml} = \frac{1}{L} \sum_{t = 0}^{L - 1} e^{(t)}

(11)

where L is the number of graph convolution layers representing information propagation at different levels. The embedding

e_{final} \in R^{L \times d}

is the final output of the network and can be used for downstream tasks, such as POI recommendation.

4.4. Multi-View Weighted Contrastive Learning

In this section, we propose a multi-view contrastive learning approach to enhance the specific representations of users and POIs (Points of Interest) in each view by exploring the key collaborative effects between the interaction, trajectory, and geographical perspectives. Specifically, the method optimizes the interactions between different views by introducing self-supervised signals, allowing them to complement each other during the learning process. For instance, in the interaction view, the model can use users’ historical behavior data to generate user representations; in the trajectory view, the users’ movement path information can be incorporated to enrich the representations further; and in the geographical view, the model can take into account users’ geographic preferences and location information.

Through this multi-view contrastive learning, the model captures the shared features of users and POIs across different views and effectively distinguishes between various entities, improving the accuracy and robustness of the recommendation system. For example, the model can better understand users’ diverse needs and preferences in recommendation tasks, providing more personalized and precise recommendations. Additionally, applying self-supervised signals enables the model to adapt flexibly to unsupervised or semi-supervised scenarios.

The core of this method is to enhance the consistency of user and POI representations across different views through contrastive learning. Specifically, we treat the representations of the same user or POI in different views as positive pairs, while those of different users or POIs are treated as negative pairs. In this way, the model learns to generate similar representations across different views, thereby enhancing the consistency of the embeddings. This approach also helps the model distinguish between different entities, improving its accuracy in recognition and classification tasks. Through contrastive learning, the model can better capture the shared features of users and POIs across different views, leading to higher performance and robustness in tasks such as recommendation and analysis. Specifically, we achieve this goal through the following steps.

Firstly, we design specific encoders for each view to extract the feature representations within that view. In each view, we aim to ensure that the representations of the same user or POI are consistent across different views. To this end, we employ a contrastive learning approach to maximize the similarity of representations of the same entity across different views while minimizing the similarity of representations between different entities. Specifically, for each user, we define the following contrastive loss function: we first define the user contrastive loss between the collaborative and trajectory views. For each user u, we calculate the similarity between their embeddings in the collaborative view

e_{C, u}

and the trajectory view

e_{T, u}

, and define the loss through the following formula:

L_{U}^{C T} = - \frac{1}{| U |} \sum_{u \in U} \log \frac{\exp (sim (e_{C, u}, e_{T, u}) / τ)}{\sum_{u^{'} \in U} \exp (sim (e_{C, u}, e_{T, u^{'} / τ}) .}

(12)

Here,

sim (\dots)

denotes the cosine similarity,

τ

is the temperature coefficient, and U represents the set of all users. Similarly, we define the user contrastive loss between the collaborative view and the geographical view:

L_{U}^{CG} = - \frac{1}{| U |} \sum_{u \in U} \log \frac{\exp (\sin (e_{C, u}, e_{G, u}) / τ)}{\sum_{u^{'} \in U} \exp (\sin (e_{C, u}, e_{G, u}^{'}) / τ)}

(13)

Similarly, the user contrastive loss between the trajectory view and the geographical view can be expressed as

L_{U}^{T G} = - \frac{1}{| U |} \sum_{u \in U} \log \frac{\exp (sim (e_{T, u}, e_{G, u}) / τ)}{\sum_{u^{'} \in U} \exp (sim (e_{T, u}, e_{G, u^{'}}) / τ)}

(14)

By introducing the weighting coefficients

w_{C T}, w_{C G},

and

w_{T G}

, the final user contrastive loss can be expressed as a weighted sum. Specifically, the final user contrastive loss is

L_{U, S S L} = w_{C T} \cdot L_{U}^{C T} + w_{C G} \cdot L_{U}^{C G} + w_{T G} \cdot L_{U}^{T G}

(15)

A similar approach can be used for the embeddings to calculate the weighted contrastive loss. Let the contrastive losses between the collaborative view and the trajectory view, the collaborative view and the geographical view, and the trajectory view and the geographical view for POIs be denoted as

L_{L}^{C T}, L_{L}^{C G}, L_{L}^{T G}

, respectively. Then, the final POI contrastive loss is

L_{L, S S L} = w_{C T} \cdot L_{L}^{C T} + w_{C G} \cdot L_{L}^{C G} + w_{T G} \cdot L_{L}^{T G}

(16)

Finally, by combining the user and POI contrastive losses, we obtain the overall contrastive learning loss:

L_{S S L} = L_{U, S S L} + L_{L, S S L}

(17)

During the training process, the above contrastive loss is combined with the loss of the main task (such as the recommendation task) for optimization. By minimizing the total loss, the model not only learns the specific representations within each view but also enhances the consistency of representations across different views through contrastive learning. This approach enables the model to capture the shared features of users and POIs across different views, thereby improving the accuracy and robustness of the recommendation system.

In summary, our multi-view contrastive learning method enhances the model’s ability to understand multi-source data by performing contrastive learning on users and POIs across different views, thereby improving the performance of the recommendation system.

4.5. Prediction and Optimization

In recommendation systems, we learn user and POI (Point of Interest) representations by combining information from different views (e.g., interaction view, trajectory view, and geographical view). We design a new loss function, optimized by PolyLoss, to improve recommendation tasks and accelerate the convergence process. Specifically, we fuse the embeddings of user u and POI l into the final embeddings

e_{F, u}

and

e_{F, l}

, and compute their interaction score

{\hat{y}}_{u, l}

as follows:

{\hat{y}}_{u, l} = softmax (e_{F, u}^{T} e_{F, l})

(18)

Here,

e_{F, u}

and

e_{F, l}

represent the final embeddings of user u and POI l, respectively. We use PolyLoss to calculate the interaction score between users and POIs based on the original cross-entropy loss. Specifically, PolyLoss improves the traditional cross-entropy loss by introducing a polynomial weighting factor p:

L_{Poly} = - \sum_{u \in U} \sum_{l \in L} y_{u, l} \log ({\hat{y}}_{u, l}^{p}) + (1 - y_{u, l}) \log ({(1 - {\hat{y}}_{u, l})}^{p})

(19)

where p is the degree of the polynomial, typically set to a positive integer greater than 1. This loss function becomes the standard cross-entropy loss when

p = 1

. Based on this polynomial loss formula, the recommendation loss function becomes

L_{Rec} = - \sum_{u \in U} \sum_{l \in L} y_{u, l} \log ({\hat{y}}_{u, l}^{p}) + (1 - y_{u, l}) \log ({(1 - {\hat{y}}_{u, l})}^{p})

(20)

We introduce self-supervised contrastive learning loss to enhance the collaborative information between different views. The contrastive loss improves the model’s robustness by maximizing the similarity between different views. Specifically, we compute the contrastive losses between different views, such as the contrastive loss between the interaction view and the trajectory view

L_{C, T}

, the contrastive loss between the interaction view and the geographical view

L_{C, G}

, and the contrastive loss between the trajectory view and the geographical view

L_{T, G}

, which are expressed as follows:

L_{C, T} = \frac{1}{| U |} \sum_{u \in U} - \log \frac{\exp (s (c_{C, u}, c_{T, u}) / τ)}{\sum_{u^{'} \in U} \exp (s (c_{C, u}, c_{T, u^{'}}) / τ)}

(21)

where

s (\cdot)

denotes cosine similarity, and

τ

is the temperature hyperparameter. Similarly, we can define

L_{C, G}

and

L_{T, G}

to compute the contrastive losses between other views. Finally, the weighted sum of all the contrastive losses forms the total loss for self-supervised learning:

L_{S S L} = λ_{1} L_{C, T} + λ_{2} L_{C, G} + λ_{3} L_{T, G}

(22)

where

λ_{1}, λ_{2}, λ_{3}

are hyperparameters that control the weights of the contrastive losses between different views. Finally, we combine the recommendation loss, contrastive loss, and regularization term into a multi-task learning objective function:

L = L_{Rec} + λ_{4} L_{S S L} + λ_{5} {| Θ |}_{2}

(23)

5. Experiments

5.1. Experimental Setting

(1)

Datasets. To validate the effectiveness of our proposed method, we conducted extensive experiments on three publicly available location-based social network (LBSN) datasets, including Foursquare-NYC (referred to as NYC), Foursquare-TKY (referred to as TKY) [28], and Gowalla [29]. We preprocessed the datasets by first filtering out less popular POIs to ensure that the POIs in the dataset were representative enough for the user recommendation task. Subsequently, we segmented each user’s trajectory data into multiple daily sessions and removed sessions with excessively short durations, which helped to reduce the impact of noisy data on model training. Finally, we adopted a common train–test split strategy, using the first 80% of each user’s sessions for training and the remaining 20% for testing. This approach ensures that the model can learn users’ behavioral patterns during training while being effectively evaluated on the test set. The statistical details and characteristics of each dataset used in our experiments are summarized in Table 1.

(2)

Evaluation Metrics. To ensure consistency with most existing next POI recommendation methods, we adopted two widely used evaluation metrics: Recall@K and Normalized Discounted Cumulative Gain (NDCG@K). Recall@K evaluates the label coverage in the top K recommended items, while NDCG@K measures the ranking quality of the recommendation list. To ensure the fairness of the experimental results, we conducted 10 runs for each metric and reported the average values of Recall@K and NDCG@K for

K \in {5, 10}

.

(3)

Baselines. We compared our proposed method with several typical next POI recommendation approaches: (1) a statistical method, UserPop; (2) an RNN-based method, STGN; (3) a self-attention-based method, STAN; (4) methods based on Graph Neural Networks (GNNs) or hypergraphs, including LightGCN, GETNext, and MSTHN; and (5) graph or hypergraph contrastive learning-based methods, such as HCCF and ASTHL.

UserPop: A statistical method recommending the most popular POIs based on users’ historical behaviors. It is simple but has limited effectiveness.
STGN [8]: A sequential model based on RNN, designed to handle temporal dependencies in users’ trajectories by leveraging recurrent networks to capture the sequential nature of user behaviors.
STAN [9]: A self-attention-based model that captures long-term dependencies in user behaviors using self-attention, effectively addressing global dependency relationships.
LightGCN [30]: A Graph Neural Network (GNN)-based method that simplifies computation by removing node features and focusing on convolution operations over the graph structure, thereby improving computational efficiency.
GETNext [11]: A model that combines graph and temporal information, leveraging Graph Convolutional Networks (GCNs) to process the temporal and structural relationships in user behaviors.
MSTHN [16]: A multi-scale hypergraph network designed for complex sequential data, capturing short-term and long-term user behavior patterns through a multi-scale learning mechanism.
HCCF [24]: A hypergraph and contrastive learning-based model that improves the recommendation accuracy by optimizing user and POI embeddings through contrastive learning.
ASTHL [19]: A model that integrates spatiotemporal attention mechanisms and hypergraph contrastive learning to enhance recommendation system accuracy and robustness, making it suitable for large-scale sparse data.

(4)

Parameter Settings. We conducted our experiments using PyTorch 1.12.1 on a 24 GB Nvidia RTX 3090 GPU. For the baseline methods, we first followed the settings in the original papers and fine-tuned the hyperparameters of each model on the three datasets. For the MVHGAT model, we used the Adam optimizer [31] with a learning rate of

1 \times 10^{- 3}

, a weight decay of

5 \times 10^{- 4}

, and selected the hyperedge dropout rate from

{0.25, 0.5, 0.75, 1}

. The dimensions of the user and POI embeddings were both set to 128. Empirically, we chose 1.5 km as the distance threshold for the datasets. The number of layers in the hypergraph convolutional network was selected from 1 to 4. The hyperparameters

λ_{1}

and

λ_{2}

for the regularization terms were chosen from

{1 \times 10^{- 5}, 1 \times 10^{- 4}, 1 \times 10^{- 3}, 1 \times 10^{- 2}, 1 \times 10^{- 1}}

to balance the loss.

5.2. Performance Comparison

The experimental results of all methods are reported in Table 2. Based on these results, we have the following observations:

Our proposed MVHGAT achieves the best performance on all datasets. Across the three datasets, MVHGAT consistently outperforms other baseline methods on all evaluation metrics. To ensure a more comprehensive comparison, we incorporated additional recent works in our evaluation, providing a more thorough analysis of how MVHGAT compares with state-of-the-art models. We attribute these improvements to the following factors.

Firstly, by learning from the interaction, trajectory, and geographical views, MVHGAT effectively captures users’ multi-view personalized preferences, improving its ability to address the data sparsity problem. By introducing tailored hypergraph convolutional networks for different hypergraphs, MVHGAT captures the latent information within each hypergraph. Compared to recent multi-view POI recommendation models, such as MSTHG [32] and MSHTN [33], MVHGAT demonstrates superior adaptability in modeling complex user–POI relationships, particularly in sparse data scenarios.

Secondly, MVHGAT employs multi-view contrastive learning to enhance each view’s user and POI representations. This strengthens the signals during self-supervised learning, allowing complementary learning across different views and producing fused recommendations. Among various POI recommendation methods, incorporating spatiotemporal information significantly improves the recommendation performance. For example, the GNN-based GETNext, which leverages spatiotemporal information, performs significantly better than LightGCN, which does not use such information, with Recall@10 improving by approximately 23% across the three datasets. Moreover, contrastive learning-based methods, such as HCCF, have demonstrated improved generalization capability in recommendation tasks. Our results indicate that MVHGAT further enhances these advantages by integrating hypergraph structures with contrastive learning, leading to a 6.2% improvement in NDCG@10 over HCCF.

MSTHN achieves better results than HCCF across all three datasets in Hypergraph Neural Network-based methods. However, MVHGAT further learns the latent high-order relationships between users and POIs, resulting in more personalized representations and outperforming existing hypergraph models, including MSTHN, on most metrics. To validate the significance of these improvements, we conducted paired t-tests on our results. The statistical tests confirm that the improvements of MVHGAT over MSTHN and other baselines are statistically significant (p < 0.05), reinforcing the robustness of our findings. These results highlight the importance of incorporating multi-view representation modeling in recommendations.

Furthermore, the results show that methods using non-sequential information outperform sequence-based methods. For example, the RNN-based STGN performs worse in recommendation tasks compared to STAN, which can handle non-continuous POI sequences. Our MVHGAT better captures high-order collaborative signals between POIs, mitigating data sparsity and over-smoothing issues, surpassing GETNext performance. This aligns with recent findings in sequential POI recommendation research, where non-sequential models such as STAN+ [34] have demonstrated superior performance in capturing long-range dependencies.

Contrastive learning methods based on graphs or hypergraphs generally achieve better results than traditional GNN methods, such as LightGCN. These methods can capture diverse data features from multiple views, such as user behavior patterns across different times and locations or relationships between different POIs. By integrating this information, these methods provide a more comprehensive understanding of user interests and needs [35]. Compared to single hypergraph convolutional models, MVHGAT adopts tailored hypergraph networks to learn from different types of hypergraphs, effectively capturing the spatiotemporal and contextual information of POIs. As a result, it outperforms ASTHL in recommendation effectiveness.

Although MVHGAT shows slightly weaker performance on frequent interactions (e.g., TKY) datasets, it significantly improves on sparse datasets, particularly in addressing data sparsity and over-smoothing issues. By combining multi-view contrastive learning and various types of hypergraph convolutional networks, MVHGAT validates the effectiveness of multi-view representation modeling in recommendation systems, significantly enhancing the recommendation performance.

5.3. Ablation Study

In the effectiveness analysis of the MVHGAT model, we conducted ablation experiments to explore the contribution of each component. In the experiments, we removed the following four key components individually:

w/o I: Removing the Anchor Attention Interactive Hypergraph Convolution (Interactive View).
w/o T: Removing the Compressed Activation Attention Trajectory Hypergraph Convolution (Trajectory View).
w/o P: Removing the Depth-Separable Geographical Convolution (Location View).
w/o MCL: Removing Multi-view Weighted Contrastive Learning.

Table 3 presents the experimental results, from which we draw the following conclusions:

First, the model performance drops after removing the interactive view convolution. This demonstrates that this component is crucial for capturing high-order interactive relationships between users and POIs, enhancing the recommendation effectiveness. It also helps capture or view convolution, resulting in a slight performance decline, but its impact is minor compared to the removal of the interactive view convolution. Since the Gowalla dataset is sparser than the NYC and TKY datasets, capturing global trajectory relationships helps alleviate the data sparsity problem. This result aligns with that of GETNext, which also considers global trajectory influences. This indicates that the trajectory view convolution helps capture variations in user behavior paths, though its contribution is relatively weaker than the interactive view.

Removing the location view convolution significantly decreases performance on the NYC and TKY datasets because it fails to learn the latent high-order relationships in the location view effectively. However, its impact is minimal on the Gowalla dataset, likely because Gowalla is a social location check-in application where users are less sensitive to location preferences than spontaneous check-ins. This highlights the importance of the location view convolution when considering the influence of location preferences on user behavior.

Lastly, removing multi-view weighted contrastive learning also leads to performance degradation, indicating that this component effectively enhances the complementary effects between different views and improves the model’s understanding of user and POI preferences. Through these ablation experiments, we observe the importance of each view and component in MVHGAT. They complement each other and collectively contribute to the model’s performance improvement.

5.4. Hyperparameter Analysis

We analyzed the impact of the number of hypergraph convolutional layers and the learning rate on MVHGAT from a qualitative perspective.

Impact of the number of layers: To evaluate the effect of stacking hypergraph convolutional layers, we conducted experiments with the number of layers ranging from 1 to 5. As shown in Figure 2, when using 3 layers on the NYC and TKY datasets, MVHGAT achieves a good balance in Recall@10 and NDCG@10, demonstrating its ability to capture high-order collaborative signals effectively. On the Gowalla dataset, the model performs best with 1 layer. The performance degradation with more layers may be attributed to the introduction of excessive noise.

Impact of the learning rate: To analyze the impact of the learning rate on the MVHGAT model, we conducted experiments with different learning rates in the range

{0.001, 0.005, 0.01, 0.05, 0.1}

. By observing the model’s performance on the NYC, TKY, and Gowalla datasets under different learning rates, we gained insights into the role of the learning rate during the training process.

The experimental results show that when the learning rate is set to 0.01, MVHGAT performs best in Recall@10 and NDCG@10 on the NYC and TKY datasets. This indicates that a moderate learning rate helps the model converge better during training and effectively capture high-order collaborative signals. A smaller learning rate (e.g., 0.001) results in slow convergence during training, preventing the model from reaching optimal performance. Conversely, a more significant learning rate (e.g., 0.1) causes unstable training, with significant fluctuations during the convergence process, ultimately leading to performance degradation.

5.5. In-Depth Analysis of MVHGAT

To explore the effectiveness of our adjusted hypergraph convolutional network aggregation and propagation methods, we kept other parts of the MVHGAT model unchanged and replaced the hypergraph convolutional network in each view with LightGCN [30]. According to the experimental results in Table 4, replacing the specific hypergraph convolutional network in each view with LightGCN results in performance degradation to varying degrees.

When replacing the hypergraph convolutional network in the interaction view with LightGCN, the performance in terms of Recall@10 and NDCG@10 significantly declines. This is likely because LightGCN fails to capture high-order collaborative signals and struggles with the cold start problem. The feature aggregation method in LightGCN uses simple weighted propagation through the adjacency matrix, which prevents it from effectively capturing the higher-order dependencies between users and POIs in complex hypergraph structures, thereby impairing recommendation performance.

Similarly, replacing the hypergraph convolutional network in the trajectory view with LightGCN also leads to a performance drop. The reason is that LightGCN, based on undirected message passing for feature aggregation, cannot thoroughly learn directed trajectory relationships. As a result, using such models instead of hypergraph convolutional networks causes the model to fail in effectively capturing the directed information in the trajectory view, thereby impacting overall performance.

Our designed Compressed Activation Attention Trajectory Hypergraph Convolution can effectively handle directed hypergraph structures by leveraging the high-order connectivity information in hypergraphs to capture complex relationships between nodes. In contrast, undirected models often cannot fully exploit these advantages in such tasks.

6. Conclusions

The Multi-View Contrastive Fusion Hypergraph Learning Model (MVHGAT) proposed in this study effectively addresses the challenges in the next POI recommendation task, particularly in terms of data sparsity and over-smoothing. By constructing and integrating hypergraphs from three perspectives (interaction, trajectory, and geographical), the model captures the complex relationships and high-order dependencies between users and POIs. Compared to existing recommendation methods, MVHGAT enhances the consistency and discriminative power of user and POI representations through multi-view contrastive learning, significantly improving the accuracy and robustness of the recommendation system.

Experimental results demonstrate that MVHGAT outperforms other traditional and Graph Neural Network-based methods across three public datasets. This outcome validates the importance of multi-view representation modeling in recommendation systems, especially in handling complex recommendation tasks with multidimensional user preferences, leading to more personalized and accurate recommendations. By optimizing the collaborative effects between different views, MVHGAT overcomes the limitations of traditional methods that fail to integrate multi-dimensional information, achieving superior recommendation performance.

In future work, the authors plan to explore complementary feature learning methods to better model the latent intentions underlying user–POI interactions, enhance the interpretability of recommendations, and further improve the model’s performance in complex application scenarios.

Author Contributions

Conceptualization, Z.Y.; Methodology, L.H.; Software, Y.L.; Validation, G.H., J.W. and Z.Y.; Formal analysis, G.H. and S.L.; Investigation, L.H., S.L. and X.W.; Resources, L.H., Y.R. and J.W.; Data curation, Y.R. and Z.Y.; Writing—original draft, X.W.; Writing—review & editing, Z.Y.; Visualization, Y.L.; Supervision, L.H.; Project administration, L.H. and J.W.; Funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Innovation Key R&D Program of Chongqing, China (CSTB2022TIAD-STX0006).

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

Authors Luyao Hu, Guangpu Han, Shichang Liu, Yuqing Ren, Xu Wang, and Ya Liu were employed by the PetroChina Southwest Oil & Gasield Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, Z.; Zhu, Y.; Zhang, Q.; Liu, H.; Wang, C.; Liu, T. Graph-Enhanced Spatial-Temporal Network for Next POI Recommendation. ACM Trans. Knowl. Discov. Data 2022, 16, 1–21. [Google Scholar] [CrossRef]
Zhao, S.; Zhao, T.; Yang, H.; Lyu, M.; King, I. STELLAR: Spatial-Temporal Latent Ranking for Successive Point-of-Interest Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
Wang, Z. Learning Graph-Based Disentangled Representations for Next POI Recommendation, n.d. Available online: https://api.semanticscholar.org/CorpusID:250340157 (accessed on 5 February 2025).
Yin, S.; Xia, Y.; Liu, Y.; Han, S.; Ouyang, Z. Fusing User Preferences and Spatiotemporal Information for Sequential Recommendation. IEEE Access 2022, 10, 89545–89554. [Google Scholar] [CrossRef]
Aljunid, M.F.; Dh, M. An efficient deep learning approach for collaborative filtering recommender system. Procedia Comput. Sci. 2020, 171, 829–836. [Google Scholar] [CrossRef]
Yin, P.; Wang, J.; Zhao, J.; Wang, H.; Gan, H. Deep collaborative filtering: A recommendation method for crowdfunding project based on the integration of deep neural network and collaborative filtering. Math. Probl. Eng. 2022, 2022, 1–15. [Google Scholar] [CrossRef]
Liu, S.; Li, Z.; Li, J.; Wang, S. Recency-based spatio-temporal similarity exploration for POI recommendation in location-based social networks. Front. Sustain. Cities 2024, 7, 1331642. [Google Scholar]
Zhao, P.; Luo, A.; Liu, Y.; Zhuang, F.; Xu, J.; Li, Z.; Sheng, V.S.; Zhou, X. Where to go next: A spatio-temporal gated network for next poi recommendation. IEEE Trans. Knowl. Data Eng. 2020, 34, 2512–2524. [Google Scholar] [CrossRef]
Luo, Y.; Liu, Q.; Liu, Z. Stan: Spatio-temporal attention network for next location recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2177–2185. [Google Scholar]
Feng, J.; Li, Y.; Zhang, C.; Sun, F.; Meng, F.; Guo, A.; Jin, D. DeepMove: Predicting Human Mobility with Attentional Recurrent Networks. In Proceedings of the 2018 World Wide Web Conference on World Wide Web—WWW’18, Lyon, France, 23–27 April 2018. [Google Scholar]
Yang, S.; Liu, J.; Zhao, K. GETNext: Trajectory flow map enhanced transformer for next POI recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1144–1153. [Google Scholar]
Han, H.; Zhang, M.; Hou, M.; Zhang, F.; Wang, Z.; Chen, E.; Wang, H.; Ma, J.; Liu, Q. STGCN: A spatial-temporal aware graph learning method for POI recommendation. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1052–1057. [Google Scholar]
Wang, Z.; Zhu, Y.; Wang, C.; Ma, W.; Li, B.; Yu, J. Adaptive Graph Representation Learning for Next POI Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 393–402. [Google Scholar]
Qin, Y.; Wang, Y.; Sun, F.; Ju, W.; Hou, X.; Wang, Z.; Cheng, J.; Lei, J.; Zhang, M. DisenPOI: Disentangling sequential and geographical influence for point-of-interest recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 508–516. [Google Scholar]
Wang, X.; Liu, X.; Li, L.; Chen, X.; Liu, J.; Wu, H. Time-aware user modeling with check-in time prediction for next POI recommendation. In Proceedings of the 2021 IEEE International Conference on Web Services (ICWS), Chicago, IL, USA, 5–10 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 125–134. [Google Scholar]
Lai, Y.; Su, Y.; Wei, L.; Chen, G.; Wang, T.; Zha, D. Multi-view Spatial-Temporal Enhanced Hypergraph Network for Next POI Recommendation. In Proceedings of the International Conference on Database Systems for Advanced Applications, Tianjin, China, 17–20 April 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 237–252. [Google Scholar]
Lim, N.; Hooi, B.; Ng, S.K.; Wang, X.; Goh, Y.L.; Weng, R.; Varadarajan, J. STP-UDGAT: Spatial-temporal-preference user dimensional graph attention network for next POI recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 845–854. [Google Scholar]
Liu, Q.; Wu, S.; Wang, L.; Tan, T. Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Pan, W.; Yang, K. Enhanced Multi-Head Self-Attention Graph Neural Networks for Session-based Recommendation. Eng. Lett. 2021, 30. Available online: https://www.engineeringletters.com/issues_v30/issue_1/EL_30_1_05.pdf (accessed on 10 March 2025).
Wang, C.; Yuan, M.; Zhang, R.; Peng, K.; Liu, L. Efficient point-of-interest recommendation services with heterogenous hypergraph embedding. IEEE Trans. Serv. Comput. 2022, 16, 1132–1143. [Google Scholar] [CrossRef]
Wang, X.; Fukumoto, F.; Cui, J.; Suzuki, Y.; Li, J.; Yu, D. Eedn: Enhanced encoder-decoder network with local and global context learning for poi recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 383–392. [Google Scholar]
Benson, A.R.; Gleich, D.F.; Leskovec, J. Higher-order organization of complex networks. Science 2016, 353, 6163–6166. [Google Scholar] [CrossRef] [PubMed]
Rao, X.; Chen, L.; Liu, Y.; Shang, S.; Yao, B.; Han, P. Graph-flashback network for next location recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 1463–1471. [Google Scholar]
Xia, L.; Huang, C.; Xu, Y.; Zhao, J.; Yin, D.; Huang, J. Hypergraph contrastive collaborative filtering. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 70–79. [Google Scholar]
Bai, S.; Zhang, F.; Torr, P.H.S. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
Ahmadian Yazdi, H.; Seyyed Mahdavi, S.J.; Ahmadian Yazdi, H. Dynamic educational recommender system based on improved LSTM neural network. Sci. Rep. 2024, 14, 4381. [Google Scholar] [CrossRef] [PubMed]
Lai, Y.; Su, Y.; Wei, L.; He, T.; Wang, H.; Chen, G.; Zha, D.; Liu, Q.; Wang, X. Disentangled Contrastive Hypergraph Learning for Next POI Recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’24), Washington, DC, USA, 14–18 July 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1452–1462. [Google Scholar]
Cheng, Z.; Caverlee, J.; Lee, K.; Sui, D. Exploring Millions of Footprints in Location Sharing Services. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2022; Volume 5, pp. 81–88. [Google Scholar]
Liu, S.; Li, Z.; Li, J.; Li, X.; Wang, S. A Sequential Recommendation Model for Balancing Long- and Short-Term Benefits. Front. Inf. Technol. Electron. Eng. 2024, 25, 1509–1527. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 639–648. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Zhao, P.; Zhang, X.; Zhang, Z. Multi-Scale Temporal Hypergraph Modeling for POI Recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2345–2353. [Google Scholar]
Wang, Y.; Liu, Y.; Li, J. Modeling Multi-Scale Hypergraph Structures for Next POI Recommendation. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management, Atlanta, GA, USA, 17–22 October 2022; pp. 1021–1030. [Google Scholar]
Gao, M.; Zhang, J.; Chen, L. STAN+: Spatio-Temporal Attention Network for Next POI Recommendation. IEEE Trans. Knowl. Data Eng. 2021, 33, 1481–1494. [Google Scholar]
Chen, M.; Huang, C.; Xia, L.; Wei, W.; Xu, Y.; Luo, R. Heterogeneous graph contrastive learning for recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 544–552. [Google Scholar]

Figure 1. The overall framework of our proposed MVHGAT.

Figure 2. Hyperparameter study of MVHGAT. (a) Performance comparison in terms of Recall@10 with different layer numbers. (b) Performance comparison in terms of NDCG@10 with different layernumbers.

Table 1. Dataset statistics.

	#Users	#POIs	#Check-Ins	#Sessions	Sparsity
NYC	834	3835	44,686	8841	98.61%
TKY	2173	7038	308,566	41,307	97.82%
Gowalla	5802	40,868	301,080	75,733	99.87%

Table 2. Performance comparison on three datasets regarding Recall (R@K) and NDCG (N@K). The relative improvements are calculated between the best and the second-best scores.

Method	NYC				TKY				Gowalla
Method	R@5	R@10	N@5	N@10	R@5	R@10	N@5	N@10	R@5	R@10	N@5	N@10
UserPop	0.2866	0.3297	0.2283	0.2423	0.2229	0.2668	0.1718	0.1861	0.0982	0.1489	0.0907	0.1336
STGN	0.2371	0.2594	0.2261	0.2307	0.2112	0.2587	0.1482	0.1589	0.1600	0.2041	0.1191	0.1333
STAN	0.3523	0.3827	0.3025	0.3137	0.2621	0.3317	0.2074	0.2189	0.2449	0.2878	0.1837	0.1942
LightGCN	0.3221	0.3488	0.2958	0.3042	0.2213	0.2594	0.1977	0.2098	0.2356	0.2590	0.1801	0.1915
GETNext	0.3572	0.3866	0.3079	0.3094	0.2686	0.3282	0.2212	0.2242	0.2425	0.2882	0.1986	0.2003
MSTHN	0.4076	0.4398	0.3612	0.3702	0.3378	0.3927	0.2567	0.2721	0.2331	0.2853	0.1825	0.2120
HCCF	0.3534	0.3745	0.3025	0.3134	0.2689	0.3253	0.2325	0.2424	0.2451	0.2933	0.1936	0.2005
ASTHL	0.4119	0.4477	0.3597	0.3707	0.2967	0.3509	0.2358	0.2506	0.2827	0.3271	0.2285	0.2402
MVHGAT	0.4283	0.4761	0.3759	0.3917	0.3462	0.4103	0.2751	0.2878	0.2957	0.3386	0.2373	0.2550
%Improv.	4.03%	6.34%	4.07%	5.81%	2.49%	4.53%	7.16%	5.77%	4.60%	3.51%	3.85%	6.17%

Table 3. Ablation study on key components of MVHGAT.

Method	NYC		TKY		Gowalla
Method	R@10	N@10	R@10	N@10	R@10	N@10
w/o I	0.4345	0.3552	0.3368	0.2464	0.3189	0.2315
w/o T	0.4456	0.3637	0.3389	0.2476	0.3196	0.2327
w/o P	0.4324	0.3547	0.3341	0.2451	0.3171	0.2303
w/o MCL	0.4401	0.3643	0.3402	0.2459	0.3219	0.2336
MVHGAT	0.4761	0.3917	0.4103	0.2878	0.3386	0.2550

Table 4. Performance comparison of different hypergraph convolutional methods with respect to Recall@10 and NDCG@10.

Method	NYC		TKY
Method	R@10	N@10	R@10	N@10
C-LightGCN	0.4663	0.3868	0.3665	0.2658
T-LightGCN	0.4754	0.3897	0.3969	0.2694
G-LightGCN	0.4732	0.3889	0.3966	0.2797
MVHGAT	0.4761	0.3917	0.4103	0.2878

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, L.; Han, G.; Liu, S.; Ren, Y.; Wang, X.; Liu, Y.; Wen, J.; Yang, Z. Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network. Mathematics 2025, 13, 998. https://doi.org/10.3390/math13060998

AMA Style

Hu L, Han G, Liu S, Ren Y, Wang X, Liu Y, Wen J, Yang Z. Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network. Mathematics. 2025; 13(6):998. https://doi.org/10.3390/math13060998

Chicago/Turabian Style

Hu, Luyao, Guangpu Han, Shichang Liu, Yuqing Ren, Xu Wang, Ya Liu, Junhao Wen, and Zhengyi Yang. 2025. "Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network" Mathematics 13, no. 6: 998. https://doi.org/10.3390/math13060998

APA Style

Hu, L., Han, G., Liu, S., Ren, Y., Wang, X., Liu, Y., Wen, J., & Yang, Z. (2025). Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network. Mathematics, 13(6), 998. https://doi.org/10.3390/math13060998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-View Contrastive Fusion POI Recommendation Based on Hypergraph Neural Network

Abstract

1. Introduction

2. Related Work

3. Preliminary

3.1. Task Formulation

3.2. Hypergraph

4. Methodology

4.1. Anchor Attention Interaction Hypergraph Convolutional Network

4.2. Compressed Activation Attention Trajectory Hypergraph Convolutional Network

4.3. Depth-Separable Geographical Convolution

4.4. Multi-View Weighted Contrastive Learning

4.5. Prediction and Optimization

5. Experiments

5.1. Experimental Setting

5.2. Performance Comparison

5.3. Ablation Study

5.4. Hyperparameter Analysis

5.5. In-Depth Analysis of MVHGAT

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI