Next Article in Journal
The T-Spherical Fuzzy Einstein Interaction Operation Matrix Energy Decision-Making Approach: The Context of Vietnam Offshore Wind Energy Storage Technologies Assessment
Previous Article in Journal
Dynamic Contact Analysis of Flexible Telescopic Boom Systems with Moving Boundary
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Diverse but Relevant Recommendations with Continuous Ant Colony Optimization

by
Hakan Yılmazer
1,* and
Selma Ayşe Özel
2
1
IT Office, Çukurova University, 01250 Adana, Türkiye
2
Department of Computer Engineering, Cukurova University, 01250 Adana, Türkiye
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(16), 2497; https://doi.org/10.3390/math12162497
Submission received: 5 July 2024 / Revised: 7 August 2024 / Accepted: 9 August 2024 / Published: 13 August 2024
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
This paper introduces a novel method called AcoRec, which employs an enhanced version of Continuous Ant Colony Optimization for hyper-parameter adjustment and integrates a non-deterministic model to generate diverse recommendation lists. AcoRec is designed for cold-start users and long-tail item recommendations by leveraging implicit data from collaborative filtering techniques. Continuous Ant Colony Optimization is revisited with the convenience and flexibility of deep learning solid methods and extended within the AcoRec model. The approach computes stochastic variations of item probability values based on the initial predictions derived from a selected item-similarity model. The structure of the AcoRec model enables efficient handling of high-dimensional data while maintaining an effective balance between diversity and high recall, leading to recommendation lists that are both varied and highly relevant to user tastes. Our results demonstrate that AcoRec outperforms existing state-of-the-art methods, including two random-walk models, a graph-based approach, a well-known vanilla autoencoder model, an ACO-based model, and baseline models with related similarity measures, across various evaluation scenarios. These evaluations employ well-known metrics to assess the quality of top-N recommendation lists, using popular datasets including MovieLens, Pinterest, and Netflix.

1. Introduction

Recently, visual media platforms such as YouTube, Spotify, Netflix, Twitch, and others have become increasingly popular, especially during the COVID-19 lockdown periods. These platforms typically provide recommendation lists to their users on their mobile devices, tablets, or television screens, based on their item preferences. These recommendations, presented in horizontal or vertical forms on the main screens of many media platforms, are usually based on the user’s past likes, trending items, or related demographic information. With the development and competition of recommender system technologies, users expect personalized or session-based recommendations on these platforms [1]. However, generating online recommendations in live recommendation systems is challenging due to the absence of initial or accomplished data. This type of recommendation requires evaluating ongoing and noisy data systems rather than employing data from scratch. Whereas recommender systems have widely used traditional deterministic models like Collaborative Filtering (CF) and Content-Based Filtering (CBF) to solve this problem, they tend to offer the same recommendations to all users and require continuous updating and diversification of the home screen recommendations, due to the changing tastes of users [2]. To address these limitations, researchers in recommendation systems have recently considered heuristic and deep learning methods to offer continuous and variable recommendations [3]. The vital processes of a recommender system are to increase the connected nodes of the user–item graph and produce more accurate predictions between users and new items. While doing this, the system must find user-specific relations that are considered to be of quality. One of the challenges to ensuring quality is the presence of cold-start users. While most recommender systems address the problem of cold-start users in offline settings, it is crucial to consider their evolving preferences within the system itself. This is because all users can be considered cold start, due to their ever-changing and unpredictable tastes. However, recommender systems mainly offer recommendation sets for each user based on their past clicks, which might turn out to be similar, uncompelling, and poor-quality recommendations for the users [2,4,5]. This challenge drives us to tackle another issue related to cold-start users: the over-specialization problem, where recommendations become too narrowly focused, potentially limiting the diversity and discovery of unexplored content.
In this paper, we deal with the problems related to the recommendations for cold-start users, personalized recommendations, over-specialization issues, and facilitating time complexity in the recommender systems. AcoRec, the proposed method in this paper, is a promising alternative that can provide diverse recommendations for the issues mentioned above. We initiate the AcoRec framework, which we developed using the Continuous Ant Colony Optimization method, ACO, as described in [6], to enhance the variety of user-item relationships and diversify recommendations for users in the system. Based on ACO, AcoRec employs various item-similarity or proximity models as input to generate user-specific, probabilistic, and highly diverse recommendations based on users’ past clicks. As a meta-heuristic and hybrid framework, AcoRec seeks diverse recommendations, addressing the challenges associated with relevant recommendations for cold-start users and long-tail items. The primary approach of this study is to generate an initial prediction based on the user’s click vector using the selected item-similarity model. Subsequently, these initial predictions are updated based on the user’s clicks and the scale vector obtained by adjusting the diagonal elements of the selected item-similarity matrix to modulate the influence over the matrix. AcoRec validates the initial predictions as the preliminary pheromone values τ and utilizes the item-similarity model as the heuristic tool for the model η. Through this prior process, we establish new item connections for cold-start users based on their recent preferences within the context of the selected similarity model. The initial pheromone values define the user’s recent preferences for the items within the scope of the selected similarity model. AcoRec optimizes the likelihood of user–item interactions within the system to infer how the similarity model responds to user knowledge. It achieves this by maximizing the importance of items for the specific user through hyperparameter tuning in the continuous domain. Additionally, we searched within a continuous domain to conduct hyperparameter tuning, which allowed for more precise optimization of the model’s parameters. Unlike deterministic approaches, ACO incorporates probabilistic elements and some degree of randomness to address the challenges mentioned above. Although Ant Colony Optimization (and, by extension, ACO) has been used for decades, we revisited it with the help of advanced coding libraries, GPU capabilities, and techniques that have gained prominence with the rise of deep learning. This approach allows us to reassess its potential by leveraging modern computational advancements to better understand and possibly enhance its efficacy. However, we avoided employing such deep learning models in this study to prevent potential complications in the backpropagation process that could arise from introducing randomness into the weights. Consequently, we identified ACO as the effective solution for the specific needs of this study. Other optimization methods could be used in future work. During training, AcoRec identifies the valuable items for the relevant user based on the expected probabilities of those items. Subsequently, our model generates a top-N recommendation list that ranks the users’ estimated probabilities of belonging to items. These predictions can vary and differ across sessions, which is the core concept of our novel model. AcoRec enabled parallelization and running on multiple processors with a row-based user recommendation structure. This approach further reduced the estimation time, making it feasible to handle large item numbers and giant user dimensions in recommendation systems, as detailed in the proposed Section 3.3 (see Algorithm 1).
Algorithm 1 AcoRec
Inputs: item similarity model S Rm×m, click vector of user ru R1×m, Sc Frobenius norm of columns of S, μ ← 0, σ ← 1, ant_size ← ant size, archive_size ← archive size, T ← epoch count, a weight template NDCG@100
Output: Predictionsu ← predictions of user u for items
compute τ(u) according to Equation (8)
construct SolutionArchive(1...arch_size) {}
for epoch  1 to T do
  for each k, ant_size
 // sample β variable using the Gaussian distribution with mean μ and deviation σ
β* = N(μ,σ)
 // estimate probabilities for ant, according to Equation (10)
  pk = τ(u) × S × Scβ*
 // fitness value for each ant
fitness = likelihood(r(u), pk). NDCG@100(u, pk)
SolutionArchive.insert(β*, fitness)
  end for
   // sort solutions and trim them for the best solutions
  sort(SolutionArchive, by fitness descending)
  trim(SolutionArchive, archive_size)
  update μ and σ from Solution Archive via Adam
end for
β = μ
return τ(u) × S × Scβ
In various scenarios, we evaluated our models on popular datasets such as MovieLens, Pinterest, and Netflix. We utilized state-of-the-art item-based similarity models (Gram, Cosine, and Jaccard) as inputs and initially compared our model with these simple baseline estimators. While our model offers recommendations that change during sessions, we aim to maintain the relevance and satisfaction of these items with the user’s preferences. We also noticed an increase in the diversity of recommended items.
The rest of the paper is organized as follows. In Section 2 we review related works that have employed similar approaches in the literature. Section 3 explains ACO and our proposed method. In Section 4 the datasets, metrics, and methods used to evaluate our model are described. In Section 5 we compare our proposed method with the state-of-the-art methods and present evaluation results. Section 6 includes discussions of the results. Finally, Section 7 concludes this paper.

2. Related Work

In the existing literature, to our best knowledge, there is no study that employs ACO in recommender systems to provide recommendations for cold-start users or long-tail item recommendations using the best available data. Most applications of Ant Colony Optimization (ACO) in recommender systems focus on the discrete version of ACO for solving combinatorial problems such as item ranking, user clustering, and collaborative filtering. These studies have employed ACO as a core implementation in the literature to address these types of problems. For example, Sobecki et al. used actual data to recommend student courses based on ACO [7]. In addition, T-BAR, which is considered as one of the efficient probabilistic models, is also implemented using ACO [8]. Although T-BAR is effective in offering diverse user predictions, the problem with offering effective predictions to cold-start users has prevailed. The authors proposed an updated DT-BAR (Dynamic T-BAR) to overcome the cold-start problem [9]. In another study, Massa proposed MoleTrust, a basic collaborative filtering model that incorporates Pearson similarity and trust in recommender systems [10]. Bedi and Sharma introduced the Trust-based Ant Recommender System (TARS), which produces recommendations by combining user trust assumptions with similarity based on Ant Colony Optimization (ACO). During training, TARS establishes new user relationships and generates predictions using updated, trusted users [11]. In contrast, the Semantic-enhanced Trust-based Ant Recommender System (STARS) represents a more advanced model that addresses some of TARS’s limitations. STARS enhances the original approach by incorporating semantic user similarity and clustering, offering a more nuanced and progressive solution [12]. TCFACO investigated user trust statements and developed an ACO-based collaborative filtering method aimed at predicting user effectiveness [13]. In a different approach, Tengkiattrakul et al. combined SVD-based user factors with trustworthiness to enhance user similarity in ACO-based recommendations [14,15]. While TCFACO focuses on leveraging user trust for effectiveness predictions, Tengkiattrakul et al.’s work integrates matrix factorization techniques with trust metrics to improve similarity measures in the ACO framework. Bellaachia et al. introduced ALT-BAR, a progressive approach that employs an averaged localized trust-based ant recommender system specifically designed to tackle the cold-start problem in recommendations [16]. Expanding on the TARS framework, Kaleroun et al. further refined the model by integrating item deviation distance into the prediction formula. Their enhanced model was rigorously tested against several challenges, including Shilling Attacks, Cold-Start users, Sparse Matrix issues, and Grey Sheep users [17]. In contrast, Liao et al. focused on improving ranking accuracy through a different mechanism. They computed user and item pheromones separately, and then combined them in the rating prediction process, highlighting the role of pheromone dynamics in ranking [18,19]. This approach diverges from trust-based models by emphasizing pheromone-based ranking strategies. Meanwhile, Nadi et al. explored a fuzzy-based Ant Colony system for website recommendations. Their model utilized Jaccard-based user similarity and applied fuzzification to the user–item interaction matrix, presenting an alternative method for integrating user similarity and interaction into the recommendation process [20].
The typical approach in these ACO-based recommendation system studies is as follows:
  • Computing user similarities using metrics such as Cosine, Jaccard, Pearson, and trust measures.
  • Obtaining users as nodes and selecting similar users with Ant Colony Optimization steps.
  • Predicting the new recommendations from similar neighbors (users) based on Resnick’s prediction formula [21].
Conventional ACO applications for recommendation systems usually involve computations based on users, as expressed in the above. Given that the number of users typically exceeds the number of items, this leads to significant computational challenges. When a new user is added to the system, similarities with other users need to be recalculated. In contrast, our approach relies on lower-dimensional item-similarity matrices rather than user similarities. Additionally, the optimization process for the ants in our method requires minimal traversal paths rather than extensive graph-based exploration. When we model our work according to the ACO algorithm, although the nodes in the graph structure represent the items and the edge values seem to reflect the probability of the user’s interest in the neighbor item, we opted to use ACO for the system’s parameter optimization. This choice was due to the inherent limitations of traditional ACO algorithms, such as their discrete nature and potential for premature convergence. ACO, a more advanced variant, allows for continuous optimization of parameters, thus providing a more flexible and robust approach to fine-tuning the system’s performance. The specific details and advantages of using ACO for parameter optimization are discussed in the next section.

3. Proposed Method

Deterministic recommendation models are robust algorithms, despite their simple structures. For instance, neighborhood models or regression models can be overwhelmed by many models [22,23]. In deterministic recommendation models, users are given a set of recommendations {S} at time t1, and this set {S} remains the same as long as there is no change in the model between time t1 and t2. Nevertheless, we might acknowledge these results as adequate or sufficient, based on the evaluation metrics [2]. Many researchers obtain evaluation results for algorithms by averaging the results of multiple experiments. Yet, these results can vary depending on the selection of the dataset, sampling methods, chosen metrics, and hyperparameter evaluations [23,24].
In heuristic-based systems, outcomes could be provided in various ways without updating the parameters or data, due to the randomness of its core, which could be an attractive feature for users. However, a challenge in providing diverse recommendation lists for a current user is that randomly recommended items may be difficult to match with the user’s taste. The recommendations given to a user are boundless, but we have inferential approximation illustrations, like a top-N recommendation list. These lists can be updated over time, but inadequate feedback may prevent these lists from changing. When the number of items m is significantly larger than the number of items to recommend n, the number of possible recommendation sets is C(m,n) = ( m n ) . Exhaustively evaluating all possible item sets is computationally intractable. Therefore, generating a top-N recommendation list in recommendation systems can be considered a combinatorial optimization problem. Hence, heuristic methods such as Ant Colony Optimization can be seen as an effective solution to the problem.

3.1. Ant Colony Optimization

Ant Colony Optimization models are derived from the behavior of real ants to solve many optimization problems. Ants can discover the shortest path from a food source to the nest. While traveling, each ant deposits a chemical hormone called pheromone on the ground, reflecting the pheromones the other ants deposited. It is a suitable model for mimicking the behavior of users in recommendation systems, where nodes represent items and a set of nodes visited by ants can be recommended to the users. Initially, ants are randomly distributed to the nodes in the graph. An ant k at time t, being in node I, chooses the next node j with a probability given by the random proportional rule defined in Equation (1)
p r o b a b i l i t y t k ( i , j ) = τ t ( i , j ) . η ( i , j ) β k | u | τ t ( i , k ) . η ( i , k ) β
where u is a set of nodes in the neighborhood of i, τ is the pheromone value of the edge, and η is the desirability of the edge. After evaluating all the ant’s tour costs in the current iteration, the pheromone values of each edge (i, j) are updated. The evaporation of pheromones is calculated, and better solutions are indicated by a higher amount of pheromones deposited by the ants.

3.2. Ant Colony Optimization in the Continuous Domain

Combinatorial optimization, such as classic ACO, deals with finding optimal combinations or permutations of available problem components like in the Travelling Salesman Problem (TSP) problem. However, some issues may be tackled with a combinatorial optimization that is only sometimes convenient, especially if the bounds are wide and the sensitivity of the parameters is high. In such cases, algorithms that optimize continuous variables yield better results. Blum [25] attempted to extend ACO algorithms to tackle discrete- and continuous-optimization problems. Two approaches are presented for integrating ACO into the continuous domain. The first method uses a familiar approach to ant behavior, and the second method carries the fundamental ACO graph structure to investigate it in the continuous domain. This evolution could be flawless due to proper discretization or probabilistic search-space sampling [26]. In the second method, Socha and Dorigo introduced the continuous Ant Colony Optimization algorithm ACO [27], used a Gaussian kernel probability density function (pdf) expression for the distribution model, and presented the ACO as a meta-heuristic framework. In ACO, given a problem with n decision variables, a vector xj = {xj,1, xj,2, xj,3, ..., xj,n} represents probabilities from a probabilistic density function as a solution by an ant, j, and f(xj) represents the objective function value of the solution. In ACO, each ant represents a row of the Solution Archive. During the iterations, the candidate solutions in the Solution Archive are ordered according to their objective function values. Each solution has an associated weight, ωj, which keeps the proportion of its solution quality on the whole. The weight of the jth solution is defined in Equation (2)
ω j = 1 q σ 2 π e ( G ( j ) μ ) 2 2 q 2 σ 2
where G(j) is the value of the Gaussian function with argument j, μ is the distribution mean, σ is the standard deviation, and q is the parameter for the deviation distance of the algorithm. When q is a small value, the high-fit solutions are promoted, and the probability intensifies with the increase in q. By sticking with the original ACO’s pheromone model, the algorithm updates μ and σ values after each iteration to optimize the probability distribution. Once the initial Solution Archive is constructed, each ant selects a distribution from the Solution Archive with the asset of a fitness proportionate selection function such as the roulette wheel selection algorithm, and the solution probabilities of each row are obtained by dividing all sums by themselves,
p ( j ) = ω j r = 1 k ω r
In Equation (3), p(j) is the probability of the jth row in the Solution Archive. The quality of the solution is calculated based on the objective function and merged with the Solution Archive. After sorting, the first k best solutions are selected, and the others are discarded for forthcoming iterations. For example, for a maximization problem, the Solution Archive constructed by k ants is ordered in descending order, where f(x1) ≥ f(x2) ≥ ⋯ ≥ f(xk) and ω1ω2 ≥ ⋯ ≥ ωk. The sample Solution Archive structure is given in Figure 1.
In the search process, iterations aim to find the best solution and converge the model. After each iteration, the pheromone update strategy (like ACO) is performed by adding k newly generated solutions to the Solution Archive. After sorting the solutions, the worst k solutions were eliminated, so the total number of solutions in the archive remained equal to k. This method maintains the better solutions in the Solution Archive, due to the practical guidance of ants in the search process for better quality.
In this paper, we investigated the issues associated with recommender systems (RSs), as noted in Section 1, and utilized the ACO to overcome these challenges. Additionally, we introduced novel enhancements to this method to address the challenges posed by RSs problems, as will be detailed in the following section.

3.3. Stochastic Approach of AcoRec

This paper introduces AcoRec, a novel method that aims to leverage Bayesian inference and users’ past click history to predict their interest in items. The approach involves utilizing a vector pheromone model and adjusting user-specific hyper-parameters to optimize expected outcomes, allowing for seamless adaptation to session-based or real-time systems tailored to individual users. In AcoRec, the probabilistic transition rule for the users, selected by ant k who mimics user u at time t, is given in Equation (4),
p r o b a b i l i t y t k ( u ) = τ ( u ) t . η β
where τ(u)t represents the pheromone values for user u on items at time t,  η denotes the selected input model, and α and β represent the pheromone regularization and heuristic model adjustment parameters, respectively. Notably, normalization was not applied in the denominator. These parameters maximize the posterior information of the items for users, similar to the prediction process of item-based models. In item-based models, user scores for items are predicted using the base equation in (5),
p r e d i c t i o n s ( u ) = r u . S
where S is an m × m item-similarity matrix, and ru is an item vector with size m. It is shown as ru = [ru1, ..., rum], where rui equals 1 if the user u clicks the item i; otherwise, it is set to 0, as given in Equation (6). Suppose we accept the clicked items that users tasted before using pheromone-traced items for users. In that case, AcoRec denotes the ru vector as pheromone vectors and S as heuristic information between the items for further optimizations.
r u i = { 1 ,     i f   u s e r   u   c l i c k e d   i t e m   i ,         0 ,                       o t h e r w i s e .
In this context, we estimate posterior probabilities by selecting the rows corresponding to items previously clicked by the user from the item–item-similarity model (assuming a symmetric matrix structure, where column values mirror row values in Hermitian matrices). These selected rows are then assembled into a low-rank vector to form the Lp-norm derived from the columns of this subset matrix. The norms of the user-clicked items represent the user’s actions as a pheromone vector (prior probabilities) analogous to social network behavior. This serves as an initial pheromone interpolation, aligning with the foundational principles of the ACO.
Let xu = [xu1, ..., xuq] be a subset vector of ru containing all clicked items belonging to user u, where q is the count of clicked items. The formula for the Lp-norm of these clicked items is shown below:
L p ( u ) = | | S u q m | | p = i = 1 m   j = 1 q S ( j , i ) p
where Su is a subset matrix of S that only keeps the xu item rows, the clicked items of user u, S is the item-similarity model, and i is the column id in the item-similarity model. In Equation (7), when the p-value is 1, this means L1-norm, and if the p-value is equal to 2, it is equal to L2-norm, also known as Euclidean Space. We analyzed how the similarity model responds to user knowledge by examining the probability of user–item interactions within the system. Additionally, we established a relationship between general Lp-norm vectors and user clicks. Equation (8) is used to differentiate between positive (clicked) and negative (not clicked) interactions for a user. The goal is to predict the likelihood of a user clicking on an item and to evaluate the likelihood of this prediction compared to the actual interaction. We used the Bernoulli transformation with Binary Cross Entropy (BCE) for this conversion. The formula for τ(u) is given by
τ ( u ) = r u . log ( L p ( u ) ) + ( ( 1 r u ) log ( 1 L p ( u ) )
where τ ( u ) represents the pheromone value of the items for user u at time t = 0. Lp(u) represents the likelihood of a user u interacting (clicking) with an item at time t. Based on the dataset characteristics, we must determine the appropriate transformation, either tanh or sigmoid, for Lp(u). Since our datasets are binary, we used these transformations to ensure that Lp(u) is within the [0, 1] range, suitable for representing probabilities.
In ACO, the pheromone model is pivotal in introducing randomness into the search space [27]. The initial estimation of pheromone values for user-clicked items is conducted through Equations (4), (7), and (8). Specifically, Equation (4) governs the adjustment of pheromone values’ tendency in ACO models via the α parameter, while the β parameter, controlling the model’s heuristic knowledge, is deemed crucial for its performance [28]. Following initializing of pheromone values with the user’s prior clicks, we set α = 1 to regulate the pheromone bias in our model, thereby directing our focus solely on adjusting the β parameter. Our observations indicate that controlling the heuristic knowledge with β enables us to either enhance the pheromone effect while mitigating bias or diminish the pheromone effect while reinforcing the tendency towards heuristic knowledge. The parametric scaling of heuristic knowledge, which gauges the resistance of Euclidean norm data to popularity, has been widely integrated into numerous models, resulting in favorable outcomes [29,30,31]. This scaling ensures a coherent development path and facilitates seamless integration with the user’s selected item model. The scaling applied to the heuristic matrix S is defined by Equation (9). This approach aligns with the principles of the ACO model and provides a clear and consistent development trajectory.
S = [ s 11 s 12 s 1 m s 21 s 22 s m 1 s m 2 s mm ]   .   [ s 1 F 0 0 0 0 s 2 F 0 0 0 0 0 0 0 0 s m F ] β
where β is the scaling parameter and { s 1 F ,   , s m F } are the Frobenius norm (L2-norms) of each column in the S. This β parameter is utilized to adjust the impact of high-norm values in both popular and long-tail items, dynamically adapting, based on the specific scenario, to either emphasize or reduce their influence. Re-inserting the pheromone values and the scaled heuristic model into Equation (8) and Equation (9), respectively, Equation (4) yields the following formula, denoted as Equation (10).
p r o b a b i l i t y t k ( u ) = τ ( u ) 1 m . S m m .   Diag ( { s 1 F ,   , s m F } ) β
Our scaling method differs significantly from others in an important aspect. While the referenced models [29,30,31] apply this scaling using a uniform parameter across all users and select the best parameter in a grid search, our algorithm performs scaling parameter tuning as an internal hyper-parameter optimization. This means that the scaling parameter is computed differently for each user, enabling the generation of personalized recommendations tailored to individual user preferences. In Equation (10), by fixing the α value to 1, the term τ ( u ) 1 m . S m m becomes constant and pre-computable for all users. As a result, the β value only needs to be computed from the remaining term, which significantly reduces computational cost. Instead of performing an m × m matrix multiplication, this approach achieves an O(m) complexity through a vector-based multiplication m × 1, making it highly efficient in terms of the computational burden. The optimized β parameter reflects the user’s preferences and behavior [31]. This parameter is highly user-specific and can vary significantly among users, based on their unique tastes and preferences. When β takes on a negative value, it can highlight rare items for the user, offering a personalized touch to the recommendations. The optimal value for β can vary for each user, and users may have multiple probabilities for the best value that maximizes their posterior. These probabilities may form either a tight distribution or a wide range. However, discrete probabilities may not provide certainty when searching for a hyper-parameter. Consequently, the optimization challenges of continuous fields have spurred new directions in Ant Colony Optimization research. Instead of relying on a discrete probability distribution, a pdf is employed to sample the probabilistic hyper-parameters in the continuous domain. Conceptually, a node in a conventional ACO problem can be likened to a local parameter in a Gaussian Distribution.
In our model, each ant samples candidate β parameters from a Gaussian distribution G(x) = N(μ,σ), and we set μ = 0 and σ = 1 initially. Points close to each other in the continuous domain produce similar results, facilitating stochastic exploration of the optimal β parameter. We discuss finding the maximized β value in the ACO domain, and provide details in Algorithm 1.
During each iteration we estimate each ant’s probability values using Equation (10), followed by a non-linear transformation to adjust for varying β values and ensure alignment in fitness measurements. Additionally, we found dropout beneficial in this process. To optimize the computation of pk, we precompute the product τ(u) × S whenever τ(u) and S remain constant before entering the loop. The model estimates the fitness value for the selected β* parameter from the count of the validation items in the recommended current top-N list.
The fitness process uses a likelihood evaluation function (e.g., Bernoulli, Gaussian) to assess the consistency between each ant’s recommended list and user preferences. We used Bernoulli for likelihood in our experiments. We incorporated NDCG@100 (a detailed formula for this metric is explained in Section 4) as the weights, allowing us to evaluate the fitness value of each ant’s list. We utilized the process mentioned above to update the solution matrix in ACO. The row count of the solution space equals the archive-size parameter in our approach, with each row containing a sampled β value and its fitness value. Our approach diverges from ACO, in that the central Solution Archive is initially empty. Subsequently, all solutions are sorted, and the best archive-size solutions, determined by their fitness value, are selected for the next iteration. At the end of each iteration, μ and σ are optimized by Adam [32] from the Solution Archive. This training process shifts μ of the distribution to concentrate on the best quality and best β simultaneously. After each iteration, we applied evaporation to the Solution Archive, ensuring the model continuously improved throughout the iterations.
Algorithm 1 presents the methodology for generating user-specific recommendations. This row-based approach significantly improves computational efficiency by independently processing each user’s data, allowing for parallel execution and reducing overall computation time. Additionally, the input matrices involved in the process are precomputed, which further streamlines the workflow. Given the additional information that τ(u) × S is precomputed, we can simplify the time complexity analysis of Algorithm 1 by ignoring this operation in the computational steps. Initializing an empty Solution Archive has a negligible time complexity, O(1). Epochs run T times, so the total complexity will be multiplied by T. Ants walk ant_size times, so the complexity of the inner operations will be multiplied by ant_size. Sampling β variable using Gaussian distribution is O(1). Since τ(u) × S is precomputed, this step is reduced to a vector–scalar multiplication with Scβ*. The time complexity of this operation is O(m), where m is the item vector size. Given that both the likelihood(r(u), pk) and NDCG@100(u, pk) have a complexity of O(m), due to the item size, the overall complexity of this step remains O(m). Inserting into an archive of size archive_size is O(1). Sorting the archive of size archive_size is O(archive_size lg(archive_size)) and trimming is O(archive_size). Updating parameters with Adam involves simple arithmetic operations over archive_size items, which is O(archive_size). Since τ(u) × S is precomputed, the returning final predictions operation is O(m). Assuming m > archive_size, total time complexity for a single user is O(T×ant_size×m + T×(m+archive_size lg(archive_size))). This approach also enhances scalability by enabling the distribution of computational tasks across multiple processors, thereby optimizing the performance of large-scale recommendation systems.

3.4. Heuristic Base of AcoRec and Item Model Selection

In ACO-based recommender systems, the distance between nodes is determined by the similarity or proximity between users or items. We prefer to measure this similarity using distance metrics in inter-nodal Euclidean space. Our model is designed to be low-dimensional and focuses on gauging a user’s interest in items rather than the distance between nodes. The relationship between items is managed through various forms, such as similarity, proximity, dissimilarity, or correlation, utilizing specific methods. Collaborative Filtering (CF) models consider the collaborative benefits of items, while Content-Based Filtering (CBF) models focus on items’ metadata (e.g., demography, mood, etc.). Graph Similarity Models are based on the relationships in the user–item network structure. Time-based models track the temporal sequences of item purchases. Latent Factor-Based Models extract hidden components from low-rank computations. Demographic Models consider collaborative behaviors in the same geographical areas.
This study evaluated three well-known item-based similarity measures for computational simplicity and popularity. Let Sm×m be the similarity matrix, i and j be the two items, Sij represent the similarity between two items, and vi and vj be the column vectors of these items.

3.4.1. Gram Matrix (Gram)

The dot-product similarity of two items equals the inner product of these item vectors, as given by the formula in Equation (11).
S g r a m ( i , j ) = | v i     v j | = v i . v j  

3.4.2. Cosine Similarity (Cosine)

The cosine similarity between the two items is the cosine of the angle between their rating vectors. It is estimated by the inner product of these item vectors divided by vector norm multiplication, as shown in Equation (12).
S c o s i n e ( i , j ) = | v i     v j | | v i | . | v j | = v i . v j | | v i | | . | | v J | |

3.4.3. Jaccard Similarity (Jaccard)

The Jaccard similarity between two items is defined as the ratio of the number of users that co-rated items based on the number of users that rated at least either i or j items, as described in Equation (13).
S j a c c a r d ( i , j ) = | v i     v j | | v i     v j | = v i . v j | | v i | | + | | v j | | v i . v j  

4. Evaluation

4.1. Dataset

We utilized three widely recognized datasets from different domains, which are Movie-Lens 1M (ML-1M) [33], Netflix [34] for movie recommendations, and the Pinterest [35] dataset about interactions of the users who pinned the images to their boards. Due to the large size of the Netflix and Pinterest datasets, we created subsets from the originals, a common practice among researchers, to facilitate faster benchmarks and parameter tuning. For the ML-1M and Netflix datasets, ratings of 4 and 5 stars were converted to binary ones, while all other ratings were converted to zero. Subsequently, in the ML-1M dataset, we filtered for users who rated at least one item and movies rated by at least one user, resulting in a sparser dataset than the original. For Pinterest, we selected users who had pinned at least 20 images to their boards and boards pinned by 5-to-200 users. In the Netflix dataset, we chose users who had watched between 20 and 500 movies and movies that had been watched by 20 to 500 users. The counts of users, items, and ratings, along with their sparsity and density values, are summarized in Table 1. The sparsity percentage is calculated as (1 − density) × 100, where density = #ratings/(#users × #items). As indicated in Table 1, the sparsity values of the sampled subsets are higher than those of the original datasets.
We utilized the k-fold cross-validation method to split the raw datasets to evaluate the models. We randomly shuffled all datasets and divided them into k = 5 sampled subsets. Each unique sampled group was used as a probe set held out from the raw dataset. After removing the probe set, the remaining portion of the raw dataset was referred to as the ‘training set’. We then selected users and their ratings from these probe sets based on the criteria defined in the experiments. These selected users and their ratings in the probe set were considered the ‘test set’. This process allowed us to obtain an average estimate of the results for different users and items in each experiment.

4.2. Evaluation Metrics

Our models did not consider the similarity between the estimated and actual ratings. Instead, we assessed the quality of the recommended items for the users. To evaluate the quality of the top-N recommendations, we used Cremonesi’s method for benchmarking models [36]. However, instead of selecting 1000 items, we modified the approach by calculating the top-N lists by sorting all the items the user did not click on. Because our method is a probabilistic model, sampled items may produce different results in each experiment. Evaluating all items in a row is a challenge, due to the growing size of candidate items, but it yielded more consistent results according to the sampled metrics denoted in [37]. After sorting all the items that were not clicked, we used prediction models to estimate their rating scores. We selected N items from the sorted list based on their higher predicted rating scores. This final list represents the top-N item recommendation list for the ‘test user’. In our experiments, we tested with N values of 10 and 20 for the length of recommendation lists.
We employed two utility-based metrics to evaluate the quality of the recommendation lists regarding relevant items to the user: normalized Discounted Cumulative Gain [38] and Recall [39]. By considering the relevance and position of items, nDCG evaluates the quality of rankings and rewards systems that prioritize highly relevant items, offering a precise measure of ranking effectiveness. Conversely, Recall evaluates the system’s coverage of relevant items within the top-N list, focusing on the proportion of relevant items included in the recommendations. Additionally, we used the Coverage [24] metric to gauge the proportion of unique items recommended to users in the lists, focusing on the diversity and breadth of recommendations rather than the relevance or interest of the recommended items to individual users.
In these metric formulas, we denoted T as the number of users in the test set, N as the length of the recommendation list, and i as the position of the recommended item in the list. If the item ranked at position i in the list belongs to a user in the test set, we consider this item a ‘relevant item’ for the user and set rel(i) = 1; if it does not belong to the test user, we set rel(i) = 0.

4.2.1. Recall

To evaluate the model’s retrieval score for specific datasets in different list lengths, we divide the sum of all ‘relevant items’ by the number of users in the test set. The Recall formula is given in Equation (14).
R e c a l l ( @ N ) = 1 T i = 1 N r e l ( i )

4.2.2. Normalized Discounted Cumulative Gain (NDCG)

The position of the ‘relevant item’ in the listwise recommendation is ignored in the Recall formula. The recommendations at the top of the list are more valuable than the others. We measured the importance of the position of the item in the list by dividing the position of the ‘relevant item’. NDCG gives significance to the gain of the position logarithmically while looking at list quality. This metric estimates the test set’s Discounted Cumulative Gain (DCG) in Equation (15). Then, the Ideal Discounted Cumulative Gain (IDCG) is estimated in Equation (16) for the best probability. In our case, every test item belonging to the selected user is in the top-N list. Then, we normalized these gain values with Equation (17) and obtained the NDCG value for a benchmark test.
D C G ( @ N ) = 1 T i = 1 N r e l ( i ) log 2 ( i + 2 )
I D C G ( @ N ) = 1 T i = 1 N 1 log 2 ( i + 2 )
N D C G ( @ N ) = D C G ( @ N ) I D C G ( @ N )

4.2.3. Coverage

The coverage metric measures the ability of a recommender system with the percentage of different items in total items in the whole recommendation list. We define the Coverage of the system as the average of all users in Equation (18):
C o v e r a g e ( @ N ) = i = 1 # U U u ϵ U i | I |
UuϵU ∩ i is the number of unique items in the recommended list, and |I| is the number of items counted in the system.
We show the best percentage value for the Coverage value related to the best NDCG@10 parameters for each model. We considered it a fairer way of evaluating the diversity of items on that list.

4.3. Baselines

To validate the effectiveness of AcoRec, we compared it with item-based, user-based, random-walk-based, graph-based, and ACO-based models for different scenarios. The models used for the benchmark tests are summarized below.
BaseGram, BaseJaccard, and BaseCosine refer to three item-based similarity models employed in the study. These models include the Gram Similarity, Cosine Similarity, and Jaccard Similarity models. They are estimated using Equations (11), (12), and (13), respectively. These baseline models serve as the foundation for evaluating the performance of more complex recommendation models.
TARS [11] is a state-of-the-art ACO model in recommender systems. It introduces a user-based approach that builds a trust-based user-relationship graph, identifies similar users using Pearson Correlation, and estimates ratings. This model leverages ACO to enhance recommendation accuracy by considering user trust and relationships in the recommendation process.
RP3ß [31] is a random-walk model based on the user–item graph, aimed at extending diversification to reduce the bias toward popular items in recommendation systems. This approach utilizes random walks on the user–item graph to explore less-popular items, thus improving the overall diversity of recommendations.
RecWalkPR and RecWalkK [40] are frameworks that capture new, rich network interactions for generating top-N recommendation lists. These methods leverage the concept of random walks on the network structure to uncover previously unnoticed connections between items or users, enhancing the diversity and quality of recommendations.
EASER [41] is a robust linear model that presents the closed-form solution of Ridge Regression in a manner akin to vanilla auto-encoders. This model offers a novel approach to linear regression, leveraging auto-encoder principles to enhance its performance and robustness.
UserKNN [21] employs Resnick’s user-based CF approach. We used Pearson Correlation to obtain user similarities.
Random is a baseline model that involves benchmarking by filling the empty cells in the user–item matrix with random values ranging between 0 and 1.
Popular is a baseline model that evaluates items according to their usage frequency.

4.4. Parameter Tuning and Experimental Setup

We evaluated a grid search to find the best parameters for each baseline model in the scenarios of recommendations for cold-start users and long-tail items, allowing us to compare their performance against each other. For the TARS model, the user neighbor size k was set with values ranging from 10 to 250 in step 10, and confidence values ranging from 0 to 1 in step 0.1 were tested. The RP3ß model was tested with β values ranging from −1 to 1 and α values ranging from 0 to 1 in steps of 0.2. The EASER model was evaluated with λ values ranging from 5 to 20,000. For the RecWalkPR and RecWalkK models, as in their original paper, the trained SLIM model (W) was used as input. The parameters were set as follows: C ∈ {0.1|I|}, l1 ∈ {1, 3, 5, 10}, l2 ∈ {0.1, 0.5, 1, 3, 5, 7, 9, 11, 15, 20} and tested with fixed α value 0.005. RecWalkPR was tested with η values ranging from 0 to 1 in steps of 0.2. RecWalkK was tested with k values ranging from 2 to 30 in step 2.
Our AcoRec models were evaluated using archive sizes of 20, 50, and 100 and ant sizes of 50, 100, and 200. We employed non-linear functions such as ‘tanh’, ‘sigmoid’, or ‘softmax’ to convert the likelihood of user interactions. Dropout rates of 0, 0.2, and 0.5 were applied. We set the iteration count to 300. Additionally, for the initial σ value in the long-tail item scenarios, we used values of 2 and 3. For the Adam Optimizer learning rate value, we used 0.01 in the cold-start scenario and 0.05 in the long-tail scenario. Each experiment for AcoRec was repeated five times, due to random choices, and the results were averaged. Section 5 presents the best results achieved by using the optimal parameters for each model.

5. Results

To assess the performance of our models, we conducted experiments in two scenarios. The first scenario was designed to evaluate the accuracy of our model in providing recommendations to cold-start users, who had fewer ratings in the system, making it challenging to offer high-quality recommendations [42,43].
For the first scenario, we selected heat or warm users as candidate users from the probe set who were also in the training set. We randomly assigned 100 users, each with at least one rating in the probe set and at least twenty ratings in the training set. We then transformed these warm users into cold-start users by reducing their rating counts in the training set. To do this, we considered examples from studies in the existing literature. Whereas some studies defined cold-start users by keeping only three items in the training set [43], others used 5% of the user’s ratings [29]. There are also other studies that experimented with numbers ranging from 1 to 20 or used percentage rates [44]. For a more challenging approach, we kept between 5 and 10 random ratings for each selected user in the training set and removed the rest. This process turned the candidate users into cold-start users, each represented by a minimum of 5 and a maximum of 10 random ratings in the training set.
The second scenario, focused on long-tail item recommendations, was designed to test how effectively recommendations accommodate a variety of less-popular items. Popular items are familiar to users and can become monotonous over time [45]. Therefore, recommending less-popular items can be more engaging. Traditional CF methods often concentrate on popular items or users, overshadowing diverse relationships. Since the quality of models depends on the diversity of recommendations they offer, these CF methods may struggle to generate diverse suggestions, especially with inadequate data [46].
To create an experimental environment suitable for the long-tail item scenario, we adopted the method described in [36]. As noted by the authors, the most prevalent 1.7% of items, accounting for 33% of the ratings in the Netflix dataset, were referred to as short-head items, whereas the remaining items were called long-tail items. Following this method, we sorted the items in all datasets by popularity, determined by rating frequency, in descending order. We marked items as short-head from top to bottom until the sum of their frequencies equaled or exceeded 33% of the total ratings, and marked the remaining as long-tail items. We kept long-tail items in the probe set and removed the others. We then created a test set from the probe set, by randomly selecting 250 users who had rated at least one long-tail item. This process allowed us to randomly choose users with less-common tastes for each repeated holdout evaluation.
Experimental results for both scenarios based on the Recall, NDCG, and Coverage metrics are summarized in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. The best results for each column are highlighted in bold, while the second-best results are underlined. In the Coverage columns, since the random model performed well, as expected, we highlighted the second-best result in bold and the third-best result with an underline. We used three item-based models for AcoRec as input: Co-occurrence (AcoRecGram), Cosine-Similarity (AcoRecCosine), and Jaccard-Similarity (AcoRecJaccard).

5.1. Cold-Start User Scenario

Table 2 presents the results of cold-start experiments conducted on the ML-1M dataset, which is notably less sparse than the other datasets analyzed in our study. All three of our models outperformed their respective base models, with the AcoRecGram model consistently delivering superior results across most metrics. This indicates that the AcoRec algorithm, when integrated with Gram similarity, provides highly effective recommendations. Notably, after AcoRecGram, the AcoRecCosine model emerged as the second-best performer, significantly enhancing the results of the BaseCosine model while also outperforming other models across all metrics. The AcoRecJaccard model, meanwhile, surpassed its base model by generating more diverse recommendation lists than all other models. Jaccard similarity is particularly effective at identifying less-obvious connections between items, making it a powerful tool for enhancing diversity in recommendation systems. However, it is important to remark that while Jaccard’s ability to find diverse connections can improve coverage, it may negatively impact relevance compared to other models. The specific improvements in our models based on their input models are discussed in detail in the Discussion section.
When comparing AcoRecGram to the closest competing models, including TARS, it demonstrates significant performance differences in several metrics. Specifically, AcoRecGram outperforms RecWalkK by 8.7%, 5.2%, 13.0%, and 5.6% in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, respectively. Compared to RecWalkPR, AcoRecGram shows improvements of 9.8%, 5.2%, 16.0%, and 6.2% across these same metrics. When compared to RP3ß, AcoRecGram offers performance gains of 11.2%, 6.3%, 9.6%, and 3.2%. Against the TARS model, AcoRecGram exhibits a significant performance improvement of 18.6%, 12.5%, 23.0%, and 12.1%. In terms of coverage, AcoRecGram provides 20.8% higher coverage than RecWalkK, 9.4% higher than RecWalkPR, 14.7% higher than RP3ß, and 175.6% higher than TARS. These results indicate that AcoRecGram offers distinct advantages over these models in both relevance and diversity.
In this experiment, RecWalkPR and RecWalkK demonstrate better performance compared to other models, aside from our AcoRecGram and AcoRecCosine models. However, RecWalkPR achieves better coverage than RecWalkK, indicating that while RecWalkK excels in list quality, RecWalkPR is more effective in covering a broader range of items. As sparsity decreases, the performance of state-of-the-art models designed for sparse datasets, such as EASER, declines. Consequently, while both EASER and RP3ß perform less effectively than RecWalk models in NDCG, they demonstrate better performance in Recall, achieving better results in Recall@10 and Recall@20. The UserKNN and TARS base models, while previously effective, lag behind in performance. Despite both models utilizing Pearson correlation for user similarity, TARS, which is based on ACO techniques, did not achieve better results compared to UserKNN. Among the evaluated models, RecWalkPR demonstrated the highest coverage outside of our models, which may be attributed to the influence of the PageRank algorithm utilized in RecWalkPR.
For AcoRecGram in this experiment, tanh was used for the likelihood in Equation (8). Dropout was not applied, the ant size was 200, and the archive size was 50. These parameter settings were essential in accomplishing the model’s best performance, and the results of the experiment are shown in Table 2.
Table 2. Comparison of Cold-Start User Scenario on ML-1M.
Table 2. Comparison of Cold-Start User Scenario on ML-1M.
ModelNDCG@10NDCG@20Recall@10Recall@20Coverage
Random0.00210.00280.00200.003653.01
Popular0.03950.05700.05280.09510.97
BaseGram0.05270.07140.06960.11302.54
BaseCosine0.06530.08900.08350.140110.22
BaseJaccard0.05830.08090.07520.129515.02
UserKNN0.06820.09230.08580.14368.82
TARS0.06620.08980.08200.13845.99
RecWalkPR0.07150.09600.08700.146015.09
RecWalkK0.07220.09600.08930.146913.67
EASER0.06660.09240.08780.150913.38
RP3ß0.07060.09500.09210.150314.39
AcoRecGram0.07850.10100.10090.155116.51
AcoRecCosine0.07310.09900.09260.154818.81
AcoRecJaccard0.06620.09010.08610.142323.38
Table 3 presents the results of cold-start experiments for the Netflix dataset. All three of our models outperformed their respective base models, although AcoRecCosine and AcoRecJaccard did not show significant improvement. The AcoRecGram model maintained its strong performance, surpassing all other methods and achieving superior results across all metrics. In this dataset, the runner-up models varied, depending on the metric: RecWalkPR for NDCG@10, RP3ß for NDCG@20 and Recall@20, AcoRecJaccard for Recall@10, and AcoRecCosine for Coverage. This variation reflects the diverse strengths and design focus of each model, as well as their interaction with the specific characteristics of the dataset and metrics. This observation highlights the AcoRecGram model’s ability to adapt effectively to different data and evaluation metrics.
Table 3. Comparison of Cold-Start User Scenario on Netflix.
Table 3. Comparison of Cold-Start User Scenario on Netflix.
ModelNDCG@10NDCG@20Recall@10Recall@20Coverage
Random0.00130.00150.00160.002033.60
Popular0.00100.00270.00170.00600.46
BaseGram0.06310.08240.08020.126220.17
BaseCosine0.07220.09250.09230.142427.13
BaseJaccard0.07200.09380.09150.144327.15
UserKNN0.07050.09010.08700.134024.77
TARS0.06650.08490.07880.123825.39
RecWalkPR0.07600.09600.09340.142928.12
RecWalkK0.07560.09600.09250.142627.30
EASER0.07550.09540.09410.141827.64
RP3ß0.07540.10010.09200.152825.30
AcoRecGram0.07840.10070.09810.153829.40
AcoRecCosine0.07420.09530.09260.143029.32
AcoRecJaccard0.07240.09500.09460.144328.66
When comparing AcoRecGram to other models, including TARS, significant performance differences are observed across several metrics. Specifically, AcoRecGram outperforms RecWalkK by 3.7%, 4.9%, 6.1%, and 4.9% in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, respectively. Compared to RecWalkPR, AcoRecGram shows improvements of 3.2%, 4.9%, 5.0%, and 7.6% across the same metrics. The AcoRecGram model also outperforms RP3ß by 4.0%, 0.6%, 6.6%, and 0.7%, and EASER by 3.8%, 5.6%, 4.3%, and 8.5%. When compared to the TARS model, AcoRecGram exhibits a significant performance gain of 17.9%, 18.6%, 24.5%, and 24.2% across these metrics. In terms of Coverage, AcoRecGram provides 7.7% higher coverage than RecWalkK, 4.6% higher coverage than RecWalkPR, 16.2% higher coverage than RP3ß, 6.4% higher coverage than EASER, and 15.8% higher coverage than TARS. An important observation is that while RP3ß produces results comparable to AcoRecGram in longer lists (NDCG@20, Recall@20), it does so at the expense of lower coverage.
Unlike the previous dataset, the results across different models are generally closer to each other in the Netflix dataset, with no single model clearly outperforming the others. It is also notable that while the RecWalk and EASER models perform well with shorter lists, their effectiveness diminishes with longer lists.
For AcoRecGram in this experiment, tanh was used for the likelihood in Equation (8). The dropout was set to 0.2, the ant size was 200, and the archive size was 20.
Table 4 presents the results of cold-start experiments conducted on the Pinterest dataset. All three of our models outperformed their respective base models; however, the AcoRecGram model did not show improvement in the NDCG@20 and Recall@20 metrics. On this dataset, AcoRecGram and AcoRecCosine demonstrated superiority over other models across all metrics except for Coverage. In terms of Coverage, the RecWalkPR model offered more diverse recommendations, though it did not achieve the same level of success in NDCG and Recall values. While EASER and RP3ß trailed our models, they performed better than other models, albeit with lower Coverage.
Table 4. Comparison of Cold-Start User Scenario on Pinterest.
Table 4. Comparison of Cold-Start User Scenario on Pinterest.
ModelNDCG@10NDCG@20Recall@10Recall@20Coverage
Random0.00240.00310.00330.005042.03
Popular0.00620.00980.00880.01800.71
BaseGram0.06790.10000.09400.176725.64
BaseCosine0.06850.09910.09280.172033.57
BaseJaccard0.06650.09470.09370.165233.99
UserKNN0.06780.09820.09640.174626.20
TARS0.06750.09360.09630.162430.97
RecWalkPR0.06770.09350.09320.160836.96
RecWalkK0.06880.09360.09640.160836.28
EASER0.07010.09950.09900.174827.55
RP3ß0.07050.09980.09860.173428.37
AcoRecGram0.07110.10070.09960.176736.66
AcoRecCosine0.07110.99980.10090.176035.01
AcoRecJaccard0.07000.10000.09440.171835.06
When comparing our models with others, the performance of the AcoRecGram model showed slight differences compared to RP3ß and EASER, but outperformed TARS across all metrics. Specifically, in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, AcoRecGram outperformed RP3ß by 0.9%, 0.9%, 1.0%, and 1.9%, respectively. Compared to EASER, AcoRecGram showed improvements of 1.4%, 1.2%, 0.6%, and 1.1%, respectively. Against the TARS model, AcoRecGram exhibited performance gains of 5.3%, 7.6%, 3.4%, and 8.8%, respectively. In terms of Coverage, AcoRecGram provided 29.2% higher coverage than RP3ß and 33.1% higher coverage than EASER.
In this experiment, RecWalkPR and RecWalkK exhibited poorer performance compared to other datasets, except in Coverage. RecWalk models’ reliance on the SLIM model as input likely influenced its overall success. The UserKNN and TARS models, once again, did not demonstrate a notable success.
For AcoRecGram in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.5, the ant size was 50, and the archive size was 50. For AcoRecCosine, the sigmoid was used for the likelihood conversion. Dropout was not applied, the ant size was 250, and the archive size was 50.

5.2. Long-Tail Item Scenario

Table 5 presents the results of long-tail item experiments conducted on the ML-1M dataset. Our models outperformed all others, demonstrating their effectiveness even in scenarios where input models typically favor popular items. Remarkably, all three of our models ranked in the top three across all metric measurements. The balanced relationship between high Coverage and Recall highlights the superiority of our models. Notably, while the base models struggled in the long-tail item scenario, our models that utilized the base models as inputs achieved significant success.
When comparing our models with the closest competitors, including TARS, the AcoRecJaccard model displayed significant performance advantages over RecWalkK, RecWalkPR, RP3ß, EASER, and TARS across several metrics. Specifically, in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, AcoRecJaccard outperformed RecWalkK by 71.6%, 60.9%, 59.4%, and 48.9%, respectively. Compared to RecWalkPR, AcoRecJaccard showed performance improvements of 79.9%, 66.4%, 66.0%, and 52.5%, respectively. Against the RP3ß model, AcoRecJaccard demonstrated improvements of 45.8%, 38.7%, 39.7%, and 30.8%, respectively. Compared to the EASER model, AcoRecJaccard achieved performance gains of 98.6%, 79.2%, 77.8%, and 57.5%, respectively. Against the TARS model, AcoRecJaccard exhibited a remarkable performance increase of 636.5%, 418.7%, 472.7%, and 275.3%, respectively. In terms of Coverage, AcoRecJaccard provided 55.5% higher coverage than RecWalkK, 57.6% higher coverage than RecWalkPR, 22.1% higher coverage than RP3ß, 64.2% higher coverage than EASER, and 142.9% higher coverage than TARS.
For AcoRecJaccard in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 50, and the archive size was 50.
Table 5. Comparison of long-tail item scenario on ML-1M.
Table 5. Comparison of long-tail item scenario on ML-1M.
ModelNDCG@10NDCG@20Recall@10Recall@20Coverage
Random0.00190.00280.00350.005880.62
Popular0.00000.00000.00000.00002.94
BaseGram0.00000.00000.00000.00003.20
BaseCosine0.00710.00990.01060.018010.38
BaseJaccard0.01250.01650.01770.027814.10
UserKNN0.03700.05430.05660.102627.64
TARS0.01370.02460.02310.053517.11
RecWalkPR0.05610.07670.07970.131726.37
RecWalkK0.05880.07930.08300.134926.73
EASER0.05080.07120.07440.127525.31
RP3ß0.06920.09200.09470.153534.05
AcoRecGram0.09910.12620.13070.200349.35
AcoRecCosine0.09460.12150.12720.196042.04
AcoRecJaccard0.10090.12760.13230.200841.56
Table 6 presents the results of long-tail item experiments conducted on the Netflix dataset. In this experiment, our models demonstrated clear superiority over others, with the exception of the RecWalk models. For shorter lists, RecWalk exhibited slightly better performance than our models. Specifically, in terms of NDCG@10, NDCG@20, and Recall@10, our AcoRecGram model lagged behind RecWalkK by −3.2%, −2.3%, and −1.2%, respectively, and behind RecWalkPR by −2.2%, −1.4%, and −0.8%, respectively. However, for Recall@20, AcoRecGram outperformed RecWalkK by 0.2% and RecWalkPR by 0.6%.
In the Coverage metric, all three of our models outperformed all models. It is important to note that the Coverage value for each model is based on the best NDCG@10 result. A key strength of our models is their ability to simultaneously enhance both recommendation accuracy and diversity. Achieving a Coverage result close to that of the Random model on the Netflix dataset underscores a highly successful outcome in terms of list diversity.
When comparing our models with others, the AcoRecGram model demonstrated significant performance improvements over the RP3ß, EASER, and TARS models across all metrics, except when compared to RecWalk. Specifically, AcoRecGram outperformed RP3ß in NDCG@10, NDCG@20, Recall@10, and Recall@20 by 12.7%, 10.9%, 16.0%, and 11.0%, respectively. Compared to EASER, AcoRecGram showed improvements of 24.2%, 21.5%, 21.0%, and 16.9%, respectively. Against the TARS model, AcoRecGram achieved a remarkable performance improvement of 165.5%, 141.6%, 143.5%, and 111.7%, respectively. In terms of Coverage, AcoRecGram provided 15.6% higher coverage than RP3ß, 23.2% higher than EASER, and 63.9% higher than TARS.
For AcoRecGram in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 100, and the archive size was 20.
Table 6. Comparison of long-tail item scenario on Netflix.
Table 6. Comparison of long-tail item scenario on Netflix.
ModelNDCG@10NDCG@20Recall@10Recall@20Coverage
Random0.00050.00120.00100.002861.22
Popular0.00000.00000.00000.00000.60
BaseGram0.03110.04410.04220.076626.28
BaseCosine0.08540.10400.10540.153740.03
BaseJaccard0.09130.10930.11000.156340.59
UserKNN0.07300.08900.09370.135842.83
TARS0.05300.06780.06880.107636.23
RecWalkPR0.14390.16610.16880.226752.73
RecWalkK0.14540.16760.16960.227452.14
EASER0.11330.13480.13840.194948.18
RP3ß0.12490.14770.14440.205351.36
AcoRecGram0.14070.16380.16750.227859.37
AcoRecCosine0.13710.16050.16200.225958.36
AcoRecJaccard0.13150.15570.15230.214753.39
Table 7 presents the results of long-tail item experiments conducted on the Pinterest dataset. Consistent with the ML-1M experiment, all three of our models ranked in the top three across all metrics. Among our models, AcoRecCosine was the most successful, outperforming all other models in accuracy metrics except for Coverage. Similar to the results from the Netflix dataset, AcoRecGram achieved a Coverage score close to that of the Random model, indicating a successful outcome in terms of list diversity.
When comparing AcoRecCosine to the closest competing models, including RecWalkK, RecWalkPR, RP3ß, EASER, and TARS, it demonstrated substantial performance improvements across several metrics. Specifically, AcoRecCosine outperformed RecWalkK in NDCG@10, NDCG@20, Recall@10, and Recall@20 by 33.2%, 26.7%, 34.8%, and 26.0%, respectively. Compared to RecWalkPR, AcoRecCosine showed improvements of 26.6%, 25.5%, 27.7%, and 27.3%, respectively. Against RP3ß, AcoRecCosine improved by 22.9%, 18.5%, 21.7%, and 16.9%, respectively. In comparison with the EASER model, it demonstrated enhancements of 58.6%, 46.8%, 54.6%, and 40.8%, respectively. Compared to TARS, AcoRecCosine exhibited the most significant gains, with improvements of 141.3%, 107.6%, 136.0%, and 94.2%, respectively. In terms of Coverage, AcoRecCosine achieved 20.2% higher coverage than RecWalkK, 20.9% higher than RecWalkPR, 9.1% higher than RP3ß, 20.1% higher than EASER, and 46.9% higher than TARS.
Other than our models, when RecWalkK, RecWalkPR, RP3ß, and EASER are evaluated across the entire dataset, we observe that the EASER model lagged behind the others. This may be attributed to its parametric nature, which may not effectively highlight niche items. Graph-based models like RecWalkK, RecWalkPR, and RP3ß appear more successful in capturing new relationships. Base models generally assess products based on co-occurrence frequency, which limits their effectiveness in long-tail item scenarios. Similarly, UserKNN and TARS models underperformed compared to base models, across all three datasets.
For AcoRecCosine in this experiment, we used the sigmoid function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 200, and the archive size was 50.
Table 7. Comparison of long-tail item scenario on Pinterest.
Table 7. Comparison of long-tail item scenario on Pinterest.
ModelNDCG@10NDCG@20Recall@10Recall@20Coverage
Random0.00150.00200.00260.004170.87
Popular0.00000.00000.00000.00000.91
BaseGram0.02060.03520.02990.071133.94
BaseCosine0.03450.05290.04930.100648.68
BaseJaccard0.03710.05360.05410.099548.73
UserKNN0.02180.03620.03190.072335.31
TARS0.02760.04320.04140.085547.07
RecWalkPR0.05260.07150.07650.130457.18
RecWalkK0.05000.07080.07250.131757.51
EASER0.04200.06110.06320.117957.59
RP3ß0.05420.07570.08030.142063.38
AcoRecGram0.06660.08970.09770.166069.14
AcoRecCosine0.06850.09480.10330.179965.22
AcoRecJaccard0.06480.08820.09860.165259.16

5.3. Effect of Parameters

Figure 2 and Figure 3 illustrate the relationship between NDCG@10 and the number of iterations across different input models and scenarios. The horizontal x-axis represents the number of iterations, while the vertical y-axis represents NDCG@10. Our models are depicted with continuous lines, and each model’s base input is indicated by dashed horizontal lines in the color of that model. Dashed vertical lines on the y-axis indicate the best epoch value for our models. Dashed horizontal lines on the x-axis represent the best NDCG@10 values for the base models (BaseGram, BaseCosine, BaseJaccard), each in the same color as their corresponding models. We tested our models in each dataset in ten steps, with their best parameter combinations, from 1 to 300 iterations. In both cold-start and long-tail item scenarios, our models started producing consistent results after a certain number of iterations. In Figure 2, we observed the training process of the models in the cold-start scenario. Across all datasets, model results stabilized after a specific number of iterations. For example, in the ML-1M dataset, the Gram model began producing similar results around the 80th iteration and reached its peak performance at the 120th iteration. The peak levels of the models are indicated by dashed vertical lines in the graphs.
A notable feature of our study is the model’s rapid convergence and minimal stagnation, attributed to its structure. Regarding the effect of Gaussian Distribution during training, we discovered an equal distribution in all localities.
The ants converged at the same distribution position as the focus space tightened, throughout the iterations. At this point, when the variance decreased below a specified threshold, our model completed its training. During our experiments, we observed that the models quickly achieved high success, and that beyond this point further iterations did not impact its performance.
Figure 4 and Figure 5 examine the relationship between our models ‘Ant Size’ and ‘Archive Size’ parameters. ‘Ant Size’ determines how many points you sample from your Gaussian distribution, while the ‘Archive Size’ is the number of points used as input for the Gaussian negative log-likelihood loss. In the figures, we set the Ant Size values to {200, 100, 50} and the Archive Size values to {50, 20, 10}. For each dataset, we selected the most successful <ant size, archive size> values pair. To better understand the differences between the results, we normalized the values using max–min normalization, and displayed them in Figure 4 and Figure 5. We found that the best parameter values vary, depending on the dataset. For example, the AcoRecCosine model only produced meaningful results with the <50, 50> values (i.e., ant size = 50 and archive size = 50) in the cold-start scenario with the Netflix dataset. The AcoRecGram model performed better with the <200, 20> pair. The AcoRecJaccard model required a low ‘Ant Size’ value.

5.4. Experimental Environment and Tools

The experimental results presented in this paper were conducted in a hyper-threading test environment which was supported by the TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources) [47]. We evaluated all benchmarks with the Python-based open-source recommendation toolkit Cornac https://cornac.preferred.ai (accessed on 5 July 2024) [48,49], and we installed codes in the Codeocean cluster https://codeocean.com/capsule/4724589/tree/v2 (accessed on 5 July 2024) and Github repository https://github.com/yilmazerhakan/acorec (accessed on 5 July 2024).

6. Discussion

While the similarity matrices (BaseGram, BaseCosine, and BaseJaccard) are not particularly effective when employed alone as recommendation models in both scenarios (recommendations for cold-start users and recommendations of long-tail items), they exhibit strong performance when combined with AcoRec (i.e., AcoRecGram, AcoRecCosine, and AcoRecJaccard). We estimated the percentage improvements for each metric by comparing our three AcoRec models to their corresponding item-based similarity models in Table 2, Table 3 and Table 4. Table 8 illustrates the percentage enhancement of each AcoRec model over its base item-similarity model in a cold-start scenario. The results demonstrate that our AcoRec models significantly enhance the performance of their base models. Notably, the improvements in the Gram model surpass those in the Jaccard and Cosine similarity models.
On the other hand, the baseline models perform poorly in the long-tail item scenario. The cold-start scenario requires highlighting less-prominent items, which the baseline models struggle to do because of their inherent focus on popular items.
Table 9 shows each AcoRec model’s improvement percentage on its base item-similarity models (i.e., BaseGram, BaseCosine, BaseJaccard) in the long-tail item scenario. The results indicate that our AcoRec models significantly enhance the performance of their base models, especially in the long-tail scenario. These improvements surpass those observed for cold-start users, demonstrating the model’s efficacy in highlighting diverse items.
The comparisons have shown that AcoRec models also provide further improvements on the Gram matrix. The Gram matrix, used as input without normalization, retains more inherent information about data relationships, proving beneficial during iterations. Notably, our models exclusively utilized implicit data, avoiding ethical concerns related to demographic, personal, or tracking data.
One of our observations from all the experiments is that, while a model may excel in one dataset, it can fail in another. However, the results of the experiments conducted in this study demonstrated that our models consistently delivered successful and stable results across all datasets. The fundamental reason for our study’s success across different scenarios is its parametric structure, which allows for flexibility in addressing diverse contexts. The cold-start and long-tail item scenarios require evaluating items under completely different conditions. In the cold-start scenario, models generally achieve success by highlighting popular items. This is evident from the success of the baseline models (Popular, Gram, Cosine, and Jaccard), which emphasize high-frequency relationships among items, predominantly found among popular items. In contrast, the long-tail item scenario focuses on the ability to highlight less-popular items. The β parameter in our algorithm is automatically tuned, allowing it to adapt to the specific requirements of each scenario and exhibit the desired behavior. Despite the failure of base algorithms in this scenario, our models have shown quite successful results using these inputs.
Another observation is that, in the experiments, an inverse correlation is observed between NDCG and Recall metrics with Coverage, where increased NDCG and Recall values led to decreased list diversity. One of the most crucial aspects of our models is their ability to improve both metrics simultaneously.
We established that data sparsity contributes to the cold-start issue and that addressing the cold-start problem effectively requires incorporating a popularity bias; our heuristic AcoRec model showed promising results in mitigating data unavailability in the cold-start and long-tail item scenarios.
If we were to discuss the drawbacks of our model, computing the input model (i.e., the similarity matrix used in our model) in high-dimensional datasets can incur computational costs. For example, computing similarities such as Gram, Cosine, or Jaccard can be challenging in high-dimensional spaces. However, such computations can be carried out as a pre-processing step, and no operations are performed on these inputs during our model’s training. Additionally, as mentioned in [40], the Gram matrix can be computed more efficiently using the Coppersmith–Winograd algorithm.

7. Conclusions

Our paper introduced AcoRec, a novel heuristic-based model that enhances item-based models by incorporating continuous Ant Colony Optimization for hyperparameter tuning. With this model, we aimed to generate diverse recommendations, addressing challenges related to cold-start users and long-tail items. Unlike traditional ACO models, AcoRec can be customized for different similarity models and domains. AcoRec, through ACO, performs personalized hyperparameter searches to enhance recommendation quality and diversity.
We compared our three models (AcoRecGram, AcoRecCosine, AcoRecJaccard) against state-of-the-art models for three datasets from different domains using five metrics. AcoRecGram ranked first in sixteen out of thirty experiments and second in nine, while AcoRecCosine ranked first in six and second in ten. AcoRecJaccard secured first place in four experiments and second in four. The results indicated that our three AcoRec-based models successfully maintained recommendation quality while offering diverse recommendation lists.
Future research could contribute by developing a metric that balances relevance and diversity, facilitating the generation of recommendations that excel in both aspects. In addition, AcoRec’s continuous-domain parameter search is versatile; thus, future research might consider adapting it to other similarity or proximity methods to allow users to fine-tune hyperparameters.

Author Contributions

Conceptualization, H.Y. and S.A.Ö.; Methodology, H.Y. and S.A.Ö.; Software, H.Y.; Validation, H.Y.; Formal analysis, H.Y.; Investigation, H.Y. and S.A.Ö.; Resources, H.Y.; Data curation, H.Y.; Writing—original draft, H.Y.; Writing—review & editing, H.Y. and S.A.Ö.; Visualization, H.Y.; Supervision, H.Y. and S.A.Ö. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This research employed publicly available datasets for its experimental studies. The original data presented in the study are openly available at https://codeocean.com/capsule/4724589/tree/v2 (accessed on 5 July 2024), https://doi.org/10.24433/CO.7483457.v2 (accessed on 5 July 2024).

Acknowledgments

The numerical calculations reported in this paper were performed entirely in the TUBITAK ULAKBIM High Performance and Grid Computing Center.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-Based Recommendations with Recurrent Neural Networks. arXiv 2016, arXiv:1511.06939. [Google Scholar]
  2. Olaleke, O.; Oseledets, I.; Frolov, E. Dynamic Modeling of User Preferences for Stable Recommendations. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation, and Personalization, Utrecht, The Netherlands, 21–25 June 2021; ACM: Utrecht, The Netherlands, 2021; pp. 262–266. [Google Scholar]
  3. Vargas, S. Novelty and Diversity Enhancement and Evaluation in Recommender Systems and Information Retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, QLD, Australia, 6–11 July 2014; ACM: Gold Coast, QLD, Australia, 2014; p. 1281. [Google Scholar]
  4. Balabanović, M.; Shoham, Y. Fab: Content-Based, Collaborative Recommendation. Commun. ACM 1997, 40, 66–72. [Google Scholar] [CrossRef]
  5. Ar, Y.; Bostanci, E. A Genetic Algorithm Solution to the Collaborative Filtering Problem. Expert Syst. Appl. 2016, 61, 122–128. [Google Scholar] [CrossRef]
  6. Dorigo, M.; Gambardella, L.M. Ant Colonies for the Travelling Salesman Problem. Biosystems 1997, 43, 73–81. [Google Scholar] [CrossRef] [PubMed]
  7. Sobecki, J.; Tomczak, J.M. Student Courses Recommendation Using Ant Colony Optimization. In Intelligent Information and Database Systems; Nguyen, N.T., Le, M.T., Świątek, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 5991, pp. 124–133. ISBN 9783642121005/9783642121012. [Google Scholar]
  8. Bellaachia, A.; Alathel, D. Trust-Based Ant Recommender (T-BAR). In Proceedings of the 2012 6th IEEE International Conference Intelligent Systems, Sofia, Bulgaria, 6–8 September 2012; pp. 130–135. [Google Scholar]
  9. Bellaachia, A.; Alathel, D. DT-BAR: A Dynamic ANT Recommender to Balance the Overall Prediction Accuracy for All Users. In Computer Science & Information Technology (CS & IT), Proceedings of the Second International Conference on Computational Science and Engineering (CSE-2014), Dubai, United Arab Emirates, 4–5 April 2014; Academy & Industry Research Collaboration Center (AIRCC): Dubai, United Arab Emirates, 2014; pp. 141–151. [Google Scholar]
  10. Massa, P.; Avesani, P. Trust Metrics in Recommender Systems. In Computing with Social Trust; Golbeck, J., Ed.; Springer: London, UK, 2009; pp. 259–285. ISBN 9781848003552/9781848003569. [Google Scholar]
  11. Bedi, P.; Sharma, R. Trust Based Recommender System Using Ant Colony for Trust Computation. Expert Syst. Appl. 2012, 39, 1183–1190. [Google Scholar] [CrossRef]
  12. Gohari, F.S.; Haghighi, H.; Aliee, F.S. A Semantic-Enhanced Trust Based Recommender System Using Ant Colony Optimization. Appl. Intell. 2017, 46, 328–364. [Google Scholar] [CrossRef]
  13. Parvin, H.; Moradi, P.; Esmaeili, S. TCFACO: Trust-Aware Collaborative Filtering Method Based on Ant Colony Optimization. Expert Syst. Appl. 2019, 118, 152–168. [Google Scholar] [CrossRef]
  14. Tengkiattrakul, P.; Maneeroj, S.; Takasu, A. Applying Ant-Colony Concepts to Trust-Based Recommender Systems. In Proceedings of the 18th International Conference on Information Integration and Web-Based Applications and Services, Singapore, 28–30 November 2016; ACM: Singapore, 2016; pp. 34–41. [Google Scholar]
  15. Tengkiattrakul, P.; Maneeroj, S.; Takasu, A. Integrating the Importance Levels of Friends into Trust-Based Ant-Colony Recommender Systems. Int. J. Web Inf. Syst. 2019, 15, 28–46. [Google Scholar] [CrossRef]
  16. Bellaachia, A.; Alathel, D. Improving the Recommendation Accuracy for Cold Start Users in Trust-Based Recommender Systems. Int. J. Comput. Commun. Eng. 2016, 5, 206–214. [Google Scholar] [CrossRef]
  17. Kaleroun, A.; Batra, S. Collaborating Trust and Item-Prediction with Ant Colony for Recommendation. In Proceedings of the 2014 Seventh International Conference on Contemporary Computing (IC3), Noida, India, 7–9 August 2014; IEEE: Noida, India, 2014; pp. 334–339. [Google Scholar]
  18. Liao, X.; Wu, H.; Wang, Y. Ant Collaborative Filtering Addressing Sparsity and Temporal Effects. IEEE Access 2020, 8, 32783–32791. [Google Scholar] [CrossRef]
  19. Liao, X.; Li, X.; Xu, Q.; Wu, H.; Wang, Y. Improving Ant Collaborative Filtering on Sparsity via Dimension Reduction. Appl. Sci. 2020, 10, 7245. [Google Scholar] [CrossRef]
  20. Nadi, S.; Saraee, M.H.; Bagheri, A.; Davarpanh Jazi, M. FARS: Fuzzy Ant Based Recommender System for Web Users. Int. J. Comput. Sci. Issues 2011, 8, 203–209. [Google Scholar]
  21. Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work—CSCW’94, Chapel Hill, NC, USA, 22–26 October 1994; ACM Press: Chapel Hill, NC, USA, 1994; pp. 175–186. [Google Scholar]
  22. Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; ACM: Hong Kong, China, 2001; pp. 285–295. [Google Scholar]
  23. Ferrari Dacrema, M.; Cremonesi, P.; Jannach, D. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; ACM: Copenhagen, Denmark, 2019; pp. 101–109. [Google Scholar]
  24. Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [Google Scholar] [CrossRef]
  25. Blum, C. Ant Colony Optimization: Introduction and Recent Trends. Phys. Life Rev. 2005, 2, 353–373. [Google Scholar] [CrossRef]
  26. Riadi, I.C.J. Cognitive Ant Colony Optimization: A New Framework in Swarm Intelligence. Ph.D. Thesis, University of Salford, Salford, UK, 2014. [Google Scholar]
  27. Socha, K.; Dorigo, M. Ant Colony Optimization for Continuous Domains. Eur. J. Oper. Res. 2008, 185, 1155–1173. [Google Scholar] [CrossRef]
  28. Stützle, T.; López-Ibáñez, M.; Pellegrini, P.; Maur, M.; Montes De Oca, M.; Birattari, M.; Dorigo, M. Parameter Adaptation in Ant Colony Optimization. In Autonomous Search; Hamadi, Y., Monfroy, E., Saubion, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 191–215. ISBN 9783642214332/9783642214349. [Google Scholar]
  29. Nikolakopoulos, A.N.; Kalantzis, V.; Gallopoulos, E.; Garofalakis, J.D. EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations. Knowl. Inf. Syst. 2019, 58, 59–81. [Google Scholar] [CrossRef]
  30. Frolov, E.; Oseledets, I. HybridSVD: When Collaborative Information Is Not Enough. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; ACM: Copenhagen, Denmark, 2019; pp. 331–339. [Google Scholar]
  31. Paudel, B.; Christoffel, F.; Newell, C.; Bernstein, A. Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Applications. ACM Trans. Interact. Intell. Syst. 2017, 7, 1–34. [Google Scholar] [CrossRef]
  32. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  33. Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2016, 5, 1–19. [Google Scholar] [CrossRef]
  34. Netflix Prize Data. Available online: https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data (accessed on 21 May 2024).
  35. Xiangnan, H.; Lizi, L.; Hanwang, Z. Neural Collaborative Filtering. In Proceedings of the International World Wide Web Conference, Perth, Australia, 3–7 April 2017; ACM: New York, NY, USA, 2017. ISBN 978-1-4503-4913-0/17/04. [Google Scholar] [CrossRef]
  36. Cremonesi, P.; Koren, Y.; Turrin, R. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, Spain, 26–30 September 2010; ACM: Barcelona, Spain, 2010; pp. 39–46. [Google Scholar]
  37. Krichene, W.; Rendle, S. On sampled metrics for item recommendation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1748–1757. [Google Scholar]
  38. Basilico, J.; Hofmann, T. A Joint Framework for Collaborative and Content Filtering. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 25–29 July 2004; ACM: Sheffield, UK, 2004; pp. 550–551. [Google Scholar]
  39. Deshpande, M.; Karypis, G. Item-Based Top- N Recommendation Algorithms. ACM Trans. Inf. Syst. 2004, 22, 143–177. [Google Scholar] [CrossRef]
  40. Nikolakopoulos, A.N.; Karypis, G. RecWalk: Nearly Uncoupled Random Walks for Top-N Recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019; ACM: Melbourne, VIC, Australia, 2019; pp. 150–158. [Google Scholar]
  41. Steck, H. Embarrassingly Shallow Autoencoders for Sparse Data. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–19 May 2019; ACM: San Francisco, CA, USA, 2019; pp. 3251–3257. [Google Scholar]
  42. Bobadilla, J.; Ortega, F.; Hernando, A.; Gutiérrez, A. Recommender Systems Survey. Knowl.-Based Syst. 2013, 46, 109–132. [Google Scholar] [CrossRef]
  43. Son, L.H. Dealing with the New User Cold-Start Problem in Recommender Systems: A Comparative Review. Inf. Syst. 2016, 58, 87–104. [Google Scholar] [CrossRef]
  44. Ahn, H.J. A New Similarity Measure for Collaborative Filtering to Alleviate the New User Cold-Starting Problem. Inf. Sci. 2008, 178, 37–51. [Google Scholar] [CrossRef]
  45. Anderson, C. The Long Tail: Why the Future of Business Is Selling Less of More; Hachette Books: New York, NY, USA, 2016; ISBN 9781401384630. [Google Scholar]
  46. Yin, H.; Cui, B.; Li, J.; Yao, J.; Chen, C. Challenging the Long Tail Recommendation. arXiv 2012. [Google Scholar] [CrossRef]
  47. Türk Ulusal Bilim E-Altyapısı—TRUBA. Available online: https://www.truba.gov.tr (accessed on 21 May 2024).
  48. Salah, A.; Truong, Q.-T.; Lauw, H.W. Cornac: A Comparative Framework for Multimodal Recommender Systems. J. Mach. Learn. Res. 2020, 21, 1–5. [Google Scholar]
  49. Truong, Q.-T.; Salah, A.; Tran, T.-B.; Guo, J.; Lauw, H.W. Exploring Cross-Modality Utilization in Recommender Systems. IEEE Internet Comput. 2021, 25, 50–57. [Google Scholar] [CrossRef]
Figure 1. The archive of solutions kept by ants.
Figure 1. The archive of solutions kept by ants.
Mathematics 12 02497 g001
Figure 2. In the col- start user scenario, the effect of the iteration and comparison with each baseline was evaluated using the NDCG metric.
Figure 2. In the col- start user scenario, the effect of the iteration and comparison with each baseline was evaluated using the NDCG metric.
Mathematics 12 02497 g002
Figure 3. In the long-tail item scenario, the effect of the iteration and comparison with each baseline was evaluated using the NDCG metric.
Figure 3. In the long-tail item scenario, the effect of the iteration and comparison with each baseline was evaluated using the NDCG metric.
Mathematics 12 02497 g003
Figure 4. The effects of ant and archive size are evaluated using the NDCG metric in the cold-start user scenario.
Figure 4. The effects of ant and archive size are evaluated using the NDCG metric in the cold-start user scenario.
Mathematics 12 02497 g004
Figure 5. The effects of ant and archive size are evaluated using the NDCG metric in the long-tail item scenario.
Figure 5. The effects of ant and archive size are evaluated using the NDCG metric in the long-tail item scenario.
Mathematics 12 02497 g005
Table 1. Benchmark dataset.
Table 1. Benchmark dataset.
DatasetDomain#User#Item#RatingSparsityDensity
SamplesML-1MMovie60383487575,28198.0731.927
NetflixMovie11,5856897491,59599.6910.309
PinterestMusic72245005170,34099.3850.615
OriginalsML-1MMovie604039521M95.8091.927
NetflixMovie480K17K100M98.8220.148
PinterestMusic55,18799161.5M99.7220.278
Table 8. Comparisons of AcoRec with Base Models for Cold-Start User Scenario.
Table 8. Comparisons of AcoRec with Base Models for Cold-Start User Scenario.
DatasetModelNDCG@10 Recall@10Coverage
ML-1MBaseGram48.77%43.25%515.80%
BaseCosine12.56%12.34%93.46%
BaseJaccard11.32%11.70%36.56%
NetflixBaseGram24.25%22.32%45.76%
BaseCosine2.77%0.33%8.07%
BaseJaccard0.56%3.39%5.56%
PinterestBaseGram4.71%5.96%42.98%
BaseCosine3.80%8.73%4.29%
BaseJaccard5.26%0.75%3.15%
Table 9. Comparisons of AcoRec with Base Models for Long-tail Item Scenario.
Table 9. Comparisons of AcoRec with Base Models for Long-tail Item Scenario.
DatasetModelNDCG@10 Recall@10Coverage
ML-1MBaseGram9810.00%12,970.00%1442.19%
BaseCosine1232.39%1100.00%305.01%
BaseJaccard707.20%647.46%194.75%
NetflixBaseGram352.41%296.92%125.91%
BaseCosine60.54%53.70%45.79%
BaseJaccard44.03%38.45%31.53%
PinterestBaseGram223.30%226.76%103.71%
BaseCosine98.55%109.53%33.98%
BaseJaccard74.66%82.26%21.40%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yılmazer, H.; Özel, S.A. Diverse but Relevant Recommendations with Continuous Ant Colony Optimization. Mathematics 2024, 12, 2497. https://doi.org/10.3390/math12162497

AMA Style

Yılmazer H, Özel SA. Diverse but Relevant Recommendations with Continuous Ant Colony Optimization. Mathematics. 2024; 12(16):2497. https://doi.org/10.3390/math12162497

Chicago/Turabian Style

Yılmazer, Hakan, and Selma Ayşe Özel. 2024. "Diverse but Relevant Recommendations with Continuous Ant Colony Optimization" Mathematics 12, no. 16: 2497. https://doi.org/10.3390/math12162497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop