1. Introduction
As the Internet continues to expand, a proliferation of information has ensued, leading to widespread generation and utilisation of online data. The volume of information accessible to individuals on a daily basis is vast. Similar to navigating through a large shopping mall, individuals may become overwhelmed by the plethora of options and struggle to locate their desired items due to an excessive number of categories. Similarly, when people have access to too much information, they are overwhelmed by a large amount of irrelevant information and cannot find what they need. Recommender systems have been developed to filter out the redundant information and find the appropriate content for people, which has also become an important growth engine for a large number of Internet companies [
1,
2,
3]. As the largest Internet company in the world, Google (Mountain View, CA, USA) has gained huge profits through its efficient recommendation system. In the search engine and related news business, Google has gained huge user traffic through their accurate recommendation system and then achieved rapid development by placing advertisements for users through the recommendation algorithm. It can be said that the recommendation system is a necessary tool for the profitability and development of the contemporary Internet.
With the development of various service websites and software, new users and new content are constantly being added, which poses a great challenge to traditional recommender systems. Traditional recommender systems [
4] use a large number of user-content interaction histories to achieve knowledge of user preferences and understanding of content and find the right users for the content. However, when new users and content are added, since they have little or no interaction history, this makes traditional methods unable to understand user preferences and content, which leads to lower accuracy of recommendation results [
5,
6]. For example, in the most classical collaborative filtering algorithm [
1], the algorithm constructs a co-occurrence matrix from the interaction history of the user and the content, which contains the user’s rating of the content. When it is necessary to recommend to the user, the algorithm extracts the co-occurrence signals to find some other users with similar interests as the current user, estimates the possible scores of the current user for the current content through the similarity between users and the scores of the other users for the current content, and ranks the content collection according to the obtained estimated scores of the users for the content, so as to recommend content that the user is most likely to like to the users. However, when new users and new content are added to the system, since they have little or almost no interaction history, this makes the co-occurrence matrix very sparse, and the model is unable to obtain effective co-occurrence signals so as to find other similar users and similar content, let alone produce accurate score predictions, and thus is unable to complete the subsequent steps to find the content that the users are likely to like. Related improved algorithms such as matrix decomposition also fail to obtain accurate rating predictions because the data are too sparse to compute an accurate vector representation.
With the development of deep learning, recommender systems also ushered into a new development era [
7]. After the introduction of deep learning algorithms, the recommendation efficiency and accuracy of recommendation systems have been significantly improved, but the cold-start problem (that is, the recommendation problem of new users and new content) is still a key challenge to be solved. Deep learning-related algorithms mainly use deep networks to learn a user vector representation and content vector representation through user and content interaction data [
8], after which the user’s predicted rating of the content is obtained by inputting the vector representations of the user and the content into a trained deep network. However, when new users or new items are added to the system, traditional recommendation algorithms often struggle to provide accurate recommendations due to the lack of historical interaction data, a problem known as the cold-start problem [
9].
The interplay between the cold-start problem, large user bases, and related deriving issues can be comprehensively examined from the following perspectives:
The amplification effect of data sparsity: When the number of users reaches tens of millions (for instance, TikTok (Beijing, China)’s global monthly active users total 1.5 billion), a 1% increase in new users translates into 1.5 million individuals encountering recommendation system failures. Observational studies indicate that new users on e-commerce platforms achieve only 23% in first-order conversion rate compared to returning users (as reported in Alibaba (Hangzhou, China)’s 2022 annual report). This “data scarcity trap” escalates exponentially as the scale of the platform grows.
The privacy paradox of cross-domain migration: The cross-platform data integration adopted to address the cold-start problem triggers privacy compliance risks when the user scale exceeds a critical point (approximately 500 million). Meta’s cross-domain tracking tool led to a GDPR fine of 746 million euros in 2023, demonstrating that the scale effect amplifies legal risks.
Research on user loyalty indicates that when users are aware that their data are being utilised for cross-domain recommendations, the risk of attrition among high-value users (ARPU > $20) increases by 2.4 times (Gartner (Stamford, CT, USA) 2023).
For service software and websites, when new users do not receive a good enough user experience when they are first contacted, i.e., they do not obtain the recommended content they want, it is very easy for the software and websites to lose these users, which seriously hinders the user growth and future development of these software tools and websites and even causes serious economic losses. For new content, the cold-start problem can prevent the content from getting enough exposure and create a serious long-tail effect. Therefore, effectively solving the cold-start problem [
10] can help recommender-based platforms to be more attractive to new users, make better use of new content, achieve better user growth, and bring more economic benefits to related companies.
In recent years, meta-learning has been introduced to the cold-start problem of recommender systems as a technique capable of quickly adapting to new tasks across multiple learning tasks, showing potential benefits. Most existing meta-learning algorithms assume that a priori knowledge can be shared globally among all users. However, information sharing among users with vastly different interests is ineffective and negative, which leads to suboptimal solutions of the model. Therefore, it is important and promising to improve meta-learning models in order to make the cold-start problem in recommender systems effectively mitigated. The contributions of our work are summarized below:
In this paper, we introduce a graph community detection clustering algorithm for user cluster classification, thereby improving the effectiveness of meta-learning, experimentally verifying the effectiveness of our new idea in improving the performance of cold-start recommendations.
This is the first time that meta-learning algorithms have been combined with graph community detection clustering algorithms. Specifically, graph community detection methods are used as a preparatory work for meta-learning methods.
2. Analysis of the Current Status of the Art and Its Shortcomings
Due to the significant role of cold-start recommendations in user growth, there is an increasing focus on cold-start related algorithms. The primary cause of the cold-start problem stems from insufficient information, resulting in a lack of knowledge about the user and content within the recommender system. Early solutions primarily relied on edge information, such as gender, region, age, behaviour on other social platforms, and social relationships. However, these approaches did not yield satisfactory results due to the difficulty in mining edge information, which consumes substantial resources and raises privacy concerns. As small-data learning methods have gained traction in computer vision with promising outcomes, they have gradually been applied to address the cold-start recommendation problem. Meta-learning, comparative learning, and generative adversarial networks are widely employed algorithms that have demonstrated success across various cold-start scenarios. These approaches do not necessitate large quantities of labelled data; instead they can achieve desirable recommendation effects by leveraging limited user interaction data effectively mitigating the challenges posed by cold-start issues. Existing research methods can be broadly classified into two categories: data-based methods and small-data learning model methods.
Among the data-based approaches, e.g., cross-domain migration techniques, heterogeneous graphs were introduced to mitigate the cold-start recommendation problem [
11], and although these approaches mitigate the cold-start problem to some extent, the results are still not good enough. For cross-domain methods, it is very difficult to obtain information about another domain in practical situations, and for methods utilising similar users and content, since there is only a very small amount of information about both the users and the content in a cold-start situation, this makes it very difficult to obtain a very accurate classification of the users in the previous methods as well. Because of its unique multi-modal fusion capability and semantic sensing relational reasoning mechanism, the graph-based method can make use of easily obtained and abundant implicit information, which has great advantages over previous methods. But the lack of sufficient supervised signal guidance models in cold-start scenarios makes the graph structure-based methods unable to accurately utilise the mined user and content information, and the generated vector representations are still not accurate enough.
Among the methods for learning models based on small data, the GAN (Generative Adversarial Network) is a relatively early proposed method for small data learning, which mainly solves the problem of insufficient training data in the case of small data by generating suitable virtual data close to the real data [
12] and thus is also used to alleviate the cold-start problem in the recommendation domain. Contrastive learning is also a common learning method for small data representations, which can generate better vector representations with less supervised signals or even no supervision at all by mining the features of the data itself. Meta-learning, also known as learning to learn, aims to learn a common prior knowledge for all similar tasks [
13] so that a small number of samples can be quickly adapted to a new task. Due to its prominence in the field of small-data image recognition, attempts have been made to use it to solve the cold-start recommendation problem [
14].
Meta-learning has demonstrated promising potential in mitigating the cold-start problem through leveraging user content information and effectively circumvents overfitting issues by using a finite number of gradient descent steps [
15]. However, conventional meta-learning algorithms only rely on the similarities among users, without considering the differences among users [
16]. Information sharing among users with large interest gaps causes the effective information of current users to be overwhelmed by a large amount of invalid information, ultimately resulting in performance degradation [
17].
3. Related Works
3.1. Cold-Start Recommendations
The cold-start recommendation problem is a recognised difficulty in the field of recommender systems [
18]. It refers to the difficulties faced by recommendation algorithms when there is limited information about user preferences or when new users join the system. In such cases, it is difficult for the algorithms to provide accurate recommendations due to lack of user data [
19].
One of the primary causes of the cold-start problem stems from the paucity of user data [
20]. When a new user engages with a recommender system, there exists limited information regarding their preferences. This poses challenges for the algorithm in accurately predicting items that would pique the user’s interest. Consequently, the recommendations provided are prone to ineffectiveness. Another contributing factor to the cold-start problem is the scarcity of item data [
21]. In numerous recommender systems, there tends to be a higher number of items compared to users, resulting in an abundance of unknown preferences [
22]. Even if the algorithm possesses some insights into user preferences, it may not suffice for making precise predictions about their future behaviour [
23]. The dearth of item data further exacerbates this predicament.
In recent years, machine learning techniques have been used to solve the cold-start problem. Specifically, deep learning models have shown promising results in dealing with the sparsity of user and project data. These models can learn hidden patterns and representations from available data to make more accurate predictions. However, these models require significant computational resources and training time and may not be feasible for real-time recommender systems.
3.2. Meta-Learning
3.2.1. Meta-Learning
The fundamental distinction between meta-learning and deep learning resides in their respective objectives [
24]: while the latter directly learns mathematical models for making predictions, the former essentially learns a meta-process of “learning how to learn”. In traditional machine learning, an algorithm is chosen, a set of parameters can be obtained by substituting the data, and the final result is obtained by using it on test data, while in meta-learning, the goal is to learn the algorithm. In single tasks [
25], the meta-learning algorithm learns the parameter
after obtaining the training data and uses the most applicable F. After putting it in the training data again, the final model F is determined, and the final prediction can be obtained by substituting the test data into the model. However, when encountering multi-tasks, it is necessary to make sure that the applicable F is effective for all tasks, which puts a higher requirement on the parameter
. However, meta-learning comes up with the algorithm that is most appropriate for the training data on all classification tasks [
26], based on the content of the training data, thus generating the corresponding models. The learning process is detailed in
Figure 1.
So how do we obtain a reasonable parameter ? Similar to traditional machine learning, the core process of meta-learning also includes model construction, data feedback, and gradient optimisation, but its essential difference is that the task is taken as the learning unit rather than individual data samples, and all loss computations are performed on test-environment samples. The training data are divided into the support set and the query set, where the support set is used to optimise the model parameters, and the query set is used specifically to optimise the algorithm parameters. During the training process, the meta-learning algorithm is trained on a support set to obtain a generic model, which is evaluated on the query set to obtain an algorithmic parameter , which is then updated on the support set of the test set to obtain a new model, which is then used to make the final prediction on the query set.
From a methodological point of view, meta-learning is divided into three categories [
27]: optimisation-based, model-based, and metric-based. Among them, we introduce a typical optimisation-based method: the MAML.
3.2.2. MAML
MAML stands for Model-Agnostic Meta-Learning and thus it can be compatible with any existing models to help them achieve better performance in small data situations. The MAML algorithm [
28], after learning the optimal initialisation parameters, can complete the process of fast convergence and fine-tuning with a small quantity of data on a specific task, which helps small-data tasks to achieve good performance with only a small quantity of labelled data.
MAML mainly addresses multiple small data tasks of the same type, where the important metrics are the number of tasks and the quantity of data within a single task [
29]. The essence of MAML is to complement the problem of insufficient information for small data tasks by using the training data to learn a priori knowledge that can be shared among all tasks, and this shareable a priori knowledge is embodied in the final initialisation parameter weights that are learned at the end. As shown in
Figure 2, a model optimal initialisation parameter
is first learned, and then according to the specific subtasks, the model optimal initialisation parameters
1,
2, and
3 for specific subtasks can be obtained by fine-tuning through a finite-step gradient descent on a small quantity of known data. For a specific task, since the initialisation parameters of the model are close to the optimal solution of the specific task, the model parameters can be obtained with a limited number of steps of the gradient descent, which effectively avoids the serious overfitting problem caused by multiple gradient descent runs on a small quantity of data. For ease of illustration, let the task be N-ways and T-shots, that is, N similar subtasks, each with T pieces of known data. Let the training and test sets of the whole task be Dtrain and Dtest, where Dtrain and Dtest are subdivided into support set and query set; the support set and query set in the training set are known data, the support set in the test set is the known small data part of the small-data task, the query set is the part to be predicted, and the support sets in the two sets both contain only a small quantity of known data. The parameter updating process of the whole meta-learning in training is divided into local and global updating [
30].
In the meta-learning framework, the predictive model’s parameters undergo rapid task-specific adaptation via localized updating procedures.This adaptation mechanism fundamentally constitutes a limited-step gradient descent optimisation performed on the support set of individual subtasks.
where
denotes the optimal parameter for the specific
i-th subtask,
is the global parameter for model initialisation,
denotes the loss of the global parameter
on the support set
of subtask
i, and
denotes the learning rate, through which the global parameter can be transformed into the optimal model parameter for subtask
i. This is followed by the global update process, where the goal is to learn prior knowledge that can be shared among all subtasks through multiple known subtasks, and also to use the data from multiple subtasks to optimise the global parameters through this process to help the model find the optimal initialisation parameters:
where
denotes the global learning rate,
N denotes the total number of subtasks, and
denotes the loss of the model parameter
in the query set
in subtask
i after fine-tuning, which can help the model to find the optimal model initialisation parameter by minimising the loss in the query set of all subtasks of the same type.
When the testing process is performed, then only the local updating process is included, and after obtaining the optimal subtask model parameters by gradient descent on the support set of the subtasks of the testing set, a more accurate inference can be performed on the data of the query set to be predicted.
3.3. Meta-Learning Recommendation Algorithm Applying Clustering
Recently, meta-learning-related methods have achieved remarkable success in the field of sampleless learning [
31]. As a result, many meta-learning-related recommendation models have been proposed to alleviate the cold-start problem. They consider users or items as tasks, record data as samples, and learn an optimal initialisation parameter by sharing the knowledge of all users, which can be quickly adapted to new users by a finite number of gradient descent steps. However, when solving the cold-start problem, meta-learning usually relies on extracting transferable patterns from a small quantity of data, while the clustering of rough data trends (cluster-based meta-learning) may become a key bottleneck for the following reasons:
Data sparsity: the small quantity of data in the cold-start phase leads to unstable clustering (such as K-means, DBSCAN) results, which affects the construction of meta-tasks.
Noise sensitivity: meta-learning (such as MAML) relies on high-quality task distribution, but outliers (such as order-brushing behaviour, sensor error data) distort the cluster structure and cause the model to learn the wrong pattern.
Gradient contamination: the two-layer optimisation of meta-learning (inner-loop adaptation and outer-loop meta-update) is sensitive to noise, and abnormal samples may cause model parameter drift and reduce generalization ability.
Due to the destructive impact of outliers on meta-learning and the high dependence of the cold-start phase on data quality, data cleaning is not an option but a must when outliers appear in the cold-start phase; otherwise, meta-learning may learn the wrong migration pattern. From existing research [
32], we know that the key methods of data cleaning can be roughly divided into three categories: anomaly detection is used to remove obvious noise, robust meta-learning is used to deal with residual noise, and data augmentation is used to compensate for data loss. Among them, anomaly detection includes clustering methods, such as DBSCAN, density-based clustering that automatically identifies outliers (samples that cannot be classified into any cluster), and Isolation Forest, specialized for anomaly detection and suitable for high-dimensional sparse data. Despite their degree of success in mitigating the cold-start problem, most existing meta-learning algorithms assume that relocatable knowledge can be shared globally among all users; however, in real recommender systems, there are huge differences in user preferences, and information sharing among users with opposite interests is useless or even negatively affects performance. At the same time, the direction of gradient descent for users with different preferences varies greatly or is even opposite, which makes the initialisation parameters learned by existing meta-learning methods not optimal for most users. In order to solve the above problems, it is necessary to classify users under cold-start conditions. Most of the existing cold-start user classification methods look for highly distinguishable content and classify users based on their ratings of the content. However, these methods perform poorly in most cases as it is difficult to find such content due to the scarcity of new user rating data. Therefore, based on meta-learning, this paper proposes a new idea for user classification. Unlike existing meta-learning methods that directly learn global initialisation parameters for all users, this paper uses a clustering algorithm for graph community detection to classify users and customises unique optimal initialisation parameters for users with similar interests, which are more suitable for these users than directly learned global initialisation parameters and can be adapted to this interest cluster more quickly. In this way, the model can find users with similar interests and share knowledge only among them at a fine-grained level to improve recommendation results.
3.3.1. Graph Community Detection
Graph community detection algorithms are generally used when the samples can be represented as a network or a graph to find out the more closely connected samples among them. A graph group is usually defined as a subset of vertices, where the vertices in each subset are more tightly connected relative to the other vertices in the network. The criterion used to measure the quality of graph group partitioning is called modularity, whose larger value means better quality of graph group partitioning and it is calculated as in Equation (
3):
where
L denotes the number of edges contained in the graph,
N denotes the number of vertices, the value of
denotes the number of edges between vertex
i and vertex
j,
denotes the expected value of the number of edges between node
i and node
j if the edges are placed randomly, where
and
denote the degree of each vertex, that is, how many vertices are connected to it, and lastly, Kronecker’s delta function, where two parameters return 1 if they are equal or 0 if they are not. In Equation (
3), if the vertices belong to the same cluster, then it returns 1, and 0 otherwise.
The overall execution process of the graph community detection algorithm is as follows:
Each vertex is first formed into a cluster independently, and the modularity M of the whole network is computed.
Select two clusters for fusion and calculate the modularity of the whole network to derive the change .
Take the two cluster fusions that show the largest increase in and compute the new modularity M.
Repeat steps 2 and 3 until all vertices are grouped into a clustered position and find the M value that returns the highest among them.
3.3.2. Louvain Algorithm
The Louvain algorithm is a community detection algorithm based on modularity, used for graph community detection. Its core concept is to traverse the neighbour community labels of all nodes in the network and select the label with the highest modularity. The algorithm consists of two steps: first, the modularity optimisation phase, where each node is assigned to a unique community, followed by sequentially moving nodes to optimise modularity between communities; second, the network aggregation phase, where each community generated in the first phase is treated as a single node, and the weights of the connections between nodes are calculated to achieve secondary clustering and stabilize the results.
Since the Louvain algorithm is suitable for large-scale networks and has a relatively low time complexity, it is commonly used for community detection in graphs. In this paper, the Louvain algorithm was employed to achieve user clustering.
3.4. Meta-Learning Applied to User Clusters After Graph Community Detection
The idea proposed in this paper is to prioritise the use of graph community detection algorithms for clustering users before using meta-learning algorithms for learning in the recommender system and to obtain clustered users for meta-learning algorithms application.
3.4.1. Graph Community Detection Algorithm Usage
Before learning the meta-learning algorithm, we use the above graph community detection algorithm for clustering the users and represent the relationship between users as a graph; this relationship can be follow each other, like each other, like the same content, and so on, in order to find the users who have a close relationship. The comparison of the graph community detection algorithm before and after is shown in
Figure 3.
3.4.2. Application of User Cluster Meta-Learning Based on Graph Community Detection
After the user cluster is completed, the system realizes the classification of existing users. For different user groups, we used the Melu framework for meta-learning training. The core idea of Melu is to treat each user as an independent learning task and share knowledge among all users through meta-learning, so as to learn the optimal model initialisation parameters. This design is particularly suitable for solving the cold-start problem in recommendation systems: when a new user joins, the model can quickly adapt to the characteristics of the new user through only a few gradient descent steps to achieve accurate recommendation.
For the processing of new users, the system does not need to recluster all users. A graph-based recommendation system automatically associates new users with existing user groups. This approach circumvents the computational inefficiency inherent in conventional clustering techniques while simultaneously enabling real-time preference inference for new users through similarity group profiling, thereby providing an effective solution to the cold-start problem.
4. Experimental Set-Up and Assessment Indicators
4.1. Experimental Datasets
In this paper, experiments were conducted on two real-world datasets, Last.fm and MovieLens, from publicly accessible repositories.
For each dataset, in this paper, we divided the data into two groups, i.e., existing users and content and new users and content, based on the time when users joined (or the time of the user’s first operation) and the time of the content release. Then, we split each dataset into a training set and a test set. The rest was divided into four recommendation scenarios according to users and content, namely, traditional scenario, user cold-start scenario, content cold-start scenario, and user and content cold-start scenario, to evaluate the final performance of the model.
4.2. Evaluation Metrics for the Experiments in This Paper
In this paper, the results of the recommendation algorithms were evaluated using three widely used evaluation methods, namely, Mean Absolute Error (MAE), Root-Mean-Square Error (RMSE), and Normalised Discounted Cumulative Gain at Level K (nDCG@K). In this paper, K = 5 was used to measure the accuracy of the model score prediction and the goodness of the ranking result when the top-n recommendation was finally completed by each metric, respectively.
4.3. Experimental Process
We validated the methods in this paper in a variety of environments, including score prediction in the traditional, user cold-start scenarios, content cold-start scenarios, and user and content cold-start scenarios. This paper compared whether or not to front-load the use of graph community detection algorithms. This section presents the results of experiments conducted on two real-world datasets from publicly accessible repositories, Last.fm and MovieLens. For each dataset, this paper divided the data into two groups, existing users and content and new users and content, based on the time when users joined (or the time of the first user action) and the time of the content release, in order to evaluate the performance of the model’s multiple metrics in both the cold-start condition and the traditional scenario. We then split each dataset into a training set and a test set. The training set contained a small quantity of user rating data, while 15% of the data were randomly sampled in this paper as the validation set for adjusting the structure of the model as well as the corresponding hyper-parameters, and the rest was divided into four recommendation scenarios according to the users and content: traditional scenario, user cold-start scenario, content cold-start scenario, and user and content cold-start scenarios for evaluating the final performance of the model. In the comparative experimental design, this paper focused on the influence of graph group detection (using the classic Louvain algorithm) as a meta-learning pre-module on user clustering. The meta-learning algorithm used was the Melu [
33] algorithm. We used the Melu algorithm with or without graph group detection and compared the results of the two experiments.
4.4. Experimental Results
4.4.1. Results of Data After Clustering by Louvain’s Algorithm
First, we analysed the Last.fm dataset by creating an undirected graph for the dataset that related users to artists and plotting it as shown in
Figure 4. We performed community detection using Louvain’s algorithm on the generated subgraph and plotted the community detection results in
Figure 5.
Similarly, we performed the same experiment on MovieLens and obtained its graph and community detection results as shown in
Figure 6 and
Figure 7.
4.4.2. Comparison of Recommendation Effect After Applying Meta-Learning to Different Models
In the process of meta-learning recommendation algorithms, we compared the recommendations in different scenarios with or without the use of clustering algorithms for the front-loading and evaluated them using three widely used evaluations, i.e., Mean Absolute Error (MAE), Root-Mean-Square Error (RMSE), and Normalised Discounted Cumulative Gain at Level K (nDCG@K), and obtained the results tabulated in
Table 1.
For recommendation in traditional scenarios, we also compared whether or not to front-load the addition of clustering algorithms and obtained the results shown in
Table 2.
4.5. Analysis and Discussion of Experimental Results
Table 1 shows the performance of the graph group detection clustering algorithm model in the cold-start scenario. On the whole, the model using Melu as the meta-learning algorithm performed well in the cold-start scenario and effectively alleviated the cold-start problem. Its core advantages were as follows:
Parameter initialisation optimisation: by sharing global user information through meta-learning, the initial parameters of the model were closer to the optimal parameter distribution of users in vector space.
Fast adaptation ability: only a small number of the user’s support-set data required fine-tuning, so it could make the model parameters converge to the user’s personalised optimal solution.
However, using Melu as a model alone still has shortcomings. Global sharing of user knowledge helps to learn the initialisation parameters with strong generalization, but for users with different interests or even opposite interests, their information may produce negative interference. Specifically, in the process of backpropagation, the conflicting gradient directions can cause the initial parameters of the model to deviate from the optimal parameter space of the target user, thus significantly reducing the cold-start performance. Hence, the algorithm adding graph community detection clustering before Melu proposed in this paper achieved better results.
Specifically, from the data in the table, it can be seen that compared with Movielens, the performance improvement of the model in this paper on DBook relative to the baseline was generally smaller, because DBook did not have the same precise item category labelling as in Movielens, which made the model’s classification performance on the user degrade, which led to a certain degree of adulteration of ineffective information when information was shared and therefore also weakened the performance of the model to some extent.
5. Conclusions
This paper mainly introduced a solution to alleviate the cold-start recommendation problem from a model perspective: a clustering-based meta-learning model. Firstly, the basic principles and current development status of meta-learning related cold-start algorithms were introduced, the sub-optimal solution problem caused by a meta-learning model that shares information among all users without taking into account the differences among users was introduced, and a meta-learning recommendation model based on graph community detection clustering was proposed. Afterwards, fine-grained knowledge only available among similar users was achieved based on user cluster information. Finally, the effectiveness of the model was validated by a comparative experiment. Our study provides valuable insights into the importance of considering user cluster information in meta-learning-based cold-start recommendation models. However, there is still a need for more experiments and validation to assess its generalisability across different domains and user groups. In addition, we need to investigate the scalability and computational efficiency of the model to ensure its usefulness in the real world.
However, we can expect that this optimisation will significantly improve the efficiency of cold-start solutions. Specifically, the task distribution will be more reasonable, similar tasks will share similar distributions, and the meta-learner will be more likely to capture common patterns (such as the rapid adaptation of the “fashion lovers” group). It can reduce noise interference, and clustering can filter outliers (such as abnormal users), so that meta-learning can be trained on high-quality subsets to improve generalization.
At the same time, this experiment was dependent on data and structure. It is necessary to ensure that the input data clustering features are discriminative (e.g., user portraits and historical interactions are sparse but uniform) and rely on the stability of the clustering results (e.g., spectral clustering is sensitive to sparse matrices). These will have some impact on our results.
We also know that there are still weaknesses in the above experimental process. For example, there is clustering bias transmission. The initial clustering error causes the meta-learner to inherit the bias. In this case, we can explore the use of dynamic clustering in the future: updating clusters online and introducing meta-learner confidence feedback. This may be our future optimisation direction.
6. Suggestions for Future Work
Developing a dynamic user representation model: Our study focused on designing a model that incorporated user differences but did not consider the dynamic nature of user preferences over time. Future research could explore the development of dynamic user representation models to capture changing user preferences.
Incorporating contextual information: Although our study took into account user differences, it did not consider contextual information that may influence user preferences. Future research could explore the integration of contextual information such as location, time, and device usage to improve the accuracy of cold-start recommendations.
Enhancing model scalability: Our study demonstrated the effectiveness of incorporating clusters of users into a cold-start recommendation model but did not address the issue of scalability. Future research could focus on developing scalable models to handle large datasets and large numbers of users.
Evaluating long-term effects: Our study focused primarily on the short-term effects of incorporating user cluster information into cold-start recommendation models. Future research could assess the long-term effects of these models on user engagement, satisfaction, and retention. This might require conducting longitudinal research or long-term tracking of user behaviours while observing the correlation between cluster dynamics and recommendation effectiveness in stages. Short-term assessment should focus on click-through rate and cold-start retention rate. Mid-term monitoring should cover interest drift. Long-term analysis should examine user lifetime value and churn rate to verify the adaptability and sustainability of the recommendations.
Author Contributions
Conceptualization, H.W. and W.W.; Data curation, Y.D.; Writing—original draft, H.W., Y.D. and W.W.; Writing—review & editing, Y.D.; Funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China (Grant Nos. 61772249, 61702241), the Basic Research Projects of the Liaoning Provincial Department of Education (Grant No. LJKZ0362).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Dataset available on request from the authors. The raw data supporting the conclusions of this article will be made available by the authors on request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Chen, W.; Cai, F.; Chen, H.; de Rijke, M. Joint Neural Collaborative Filtering for Recommender Systems. Acm Trans. Inf. Syst. 2019, 37, 39:1–39:30. [Google Scholar] [CrossRef]
- Li, J.; Jing, M.; Lu, K.; Zhu, L.; Yang, Y.; Huang, Z. From Zero-shot Learning to Cold-start Recommendation. Proc. Aaai Conf. Artif. Intell. 2019, 33, 4189–4196. [Google Scholar] [CrossRef]
- Zhou, K.; Wang, H.; Zhao, W.X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; Wen, J.-R. S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. In Proceedings of the CIKM ’20 the 29th ACM International Conference on Information and Knowledge Management, Virtual Event, 19–23 October 2020. [Google Scholar]
- Kang, W.-C.; McAuley, J.J. Self-Attentive Sequential Recommendation. In Proceedings of the IEEE International Conference on Data Mining, ICDM 2018, Singapore, 17–20 November 2018; IEEE Computer Society: Washington, DC, USA, 2018; pp. 197–206. [Google Scholar]
- Ashkan, A.; Kveton, B.; Berkovsky, S.; Wen, Z. Optimal Greedy Diversity for Recommendation. In Proceedings of the Twenty-Fourth International Joint Conference on Artiicial Intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July 2015; Yang, Q., Wooldridge, M.J., Eds.; AAAI Press: Menlo Park, CA, USA, 2015; pp. 1742–1748. [Google Scholar]
- Chen, L.; Zhang, G.; Zhou, E. Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity. In Advances in Neural Information Processing Systems 31, Proceedings of the Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, 3–8 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 5627–5638. [Google Scholar]
- Berbague, C.E.; Karabadji, N.E.I.; Seridi, H.; Symeonidis, P.; Manolopoulos, Y.; Dhili, W. An Overlapping Clustering Approach for Precision, Diversity and Novelty-aware recommendations. Expert Syst. Appl. 2021, 177, 114917. [Google Scholar] [CrossRef]
- Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorising Personalized Markov Chains for Next-basket Recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, NC, USA, 26–30 April 2010; Rappa, M., Jones, P., Freire, J., Chakrabarti, S., Eds.; ACM: New York, NY, USA, 2010; pp. 811–820. [Google Scholar]
- Geng, X.; Zhang, H.; Bian, J.; Chua, T.-S. Learning Image and User Features for Recommendation in Social Networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4274–4282. [Google Scholar]
- Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.-S. Neural Graph Collaborative Filtering. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
- Krichene, W.; Rendle, S. On Sampled Metrics for Item Recommendation. In KDD ’20, Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, 23–27 August 2020; ACM: New York, NY, USA, 2020; pp. 1748–1757. [Google Scholar]
- Jiang, H.; Wang, W.; Wei, Y.; Gao, Z.; Wang, Y.; Nie, L. What Aspect Do You Like: Multi-scale Time-aware User Interest Modeling for Micro-video Recommendation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3487–3495. [Google Scholar]
- Volkovs, M.; Yu, G.; Poutanen, T. Dropoutnet: Addressing Cold-start in Recommender Systems. Adv. Neural Inf. Process. Syst. 2017, 30, 4957–4966. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised Contrastive Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
- Deldjoo, Y.; Dacrema, M.F.; Constantin, M.G.; Eghbal-Zadeh, H.; Cereda, S.; Schedl, M.; Ionescu, B.; Cremonesi, P. Movie Genome: Alleviating New Item Cold-start in Movie Recommendation. User Model -User-Adapt. Interact. 2019, 29, 291–343. [Google Scholar] [CrossRef]
- Glorot, X.; Bengio, Y. Understanding The Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.; Azar, M.G.; et al. Bootstrap Your Own Latent: A New Approach To Self-supervised Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Barjasteh, I.; Forsati, R.; Ross, D.; Esfahanian, A.H.; Radha, H. Cold-start Recommendation with Provable Guarantees: A Decoupled Approach. IEEE Trans. Knowl. Data Eng. 2016, 28, 1462–1474. [Google Scholar] [CrossRef]
- Sun, W.; Ma, M.; Ren, P.; Lin, Y.; Chen, Z.; Ren, Z.; Ma, J.; De Rijke, M. Parallel Split-Join Networks for Shared Account Cross-Domain Sequential Recommendations. IEEE Trans. Knowl. Data Eng. 2023, 35, 4106–4123. [Google Scholar] [CrossRef]
- Ji, W.; Li, X.; Wei, L.; Wu, F.; Zhuang, Y. Context-aware Graph Label Propagation Network for Saliency Detection. IEEE Trans. Image Process. 2020, 29, 8177–8186. [Google Scholar] [CrossRef] [PubMed]
- Ji, W.; Li, X.; Wu, F.; Pan, Z.; Zhuang, Y. Human-centric Clothing Segmentation Via Deformable Semantic Locality-preserving Network. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 4837–4848. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method For Stochastic Optimisation. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–16. [Google Scholar]
- Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. Conference Track Proceedings. [Google Scholar]
- He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural Collaborative Filtering. In Proceedings of the International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
- Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning Deep Representations by Mutual Information Estimation and Maximisation. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Kunaver, M.; Pozrl, T. Diversity in Recommender Systems—A Survey. Knowl. Based Syst. 2017, 123, 154–162. [Google Scholar] [CrossRef]
- Li, J.; Lu, K.; Huang, Z.; Shen, H.T. On Both Cold-Start and Long-Tail Recommendation with Social Data. IEEE Trans. Knowl. Data Eng. 2021, 33, 194–208. [Google Scholar] [CrossRef]
- Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural Attentive Session-based Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 6–10 November 2017; ACM: New York, NY, USA, 2017; pp. 1419–1428. [Google Scholar]
- Li, S.; Zhou, Y.; Zhang, D.; Zhang, Y.; Lan, X. Learning to Diversify Recommendations Based on Matrix Factorization. In Proceedings of the 15th IEEE International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress, DASC/PiCom/DataCom/CyberSciTech 2017, Orlando, FL, USA, 6–10 November 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 68–74. [Google Scholar]
- Pan, F.; Li, S.; Ao, X.; Tang, P.; He, Q. Warm Up Cold-start Advertisements: Improving Ctr Predictions Via Learning to Learn Id Embeddings. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 695–704. [Google Scholar]
- Lee, H.; Im, J.; Jang, S.; Cho, H.; Chung, S. Melu: Meta-learned User Preference Estimator for Cold-start Recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1073–1082. [Google Scholar]
- Du, X.; Wang, X.; He, X.; Li, Z.; Tang, J.; Tat-Seng Chua, T.-S. How to Learn Item Representation for Cold-Start Multimedia Recommendation? In Proceedings of the ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 3469–3477. [Google Scholar]
- Lu, Y.; Fang, Y.; Shi, C. Meta-learning on Heterogeneous Information Networks for Cold-start Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 23–27 August 2020; pp. 1563–1573. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).