Diverse but Relevant Recommendations with Continuous Ant Colony Optimization

Yılmazer, Hakan; Özel, Selma Ayşe

doi:10.3390/math12162497

Open AccessArticle

Diverse but Relevant Recommendations with Continuous Ant Colony Optimization

by

Hakan Yılmazer

^1,*

and

Selma Ayşe Özel

²

¹

IT Office, Çukurova University, 01250 Adana, Türkiye

²

Department of Computer Engineering, Cukurova University, 01250 Adana, Türkiye

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2497; https://doi.org/10.3390/math12162497

Submission received: 5 July 2024 / Revised: 7 August 2024 / Accepted: 9 August 2024 / Published: 13 August 2024

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper introduces a novel method called AcoRec, which employs an enhanced version of Continuous Ant Colony Optimization for hyper-parameter adjustment and integrates a non-deterministic model to generate diverse recommendation lists. AcoRec is designed for cold-start users and long-tail item recommendations by leveraging implicit data from collaborative filtering techniques. Continuous Ant Colony Optimization is revisited with the convenience and flexibility of deep learning solid methods and extended within the AcoRec model. The approach computes stochastic variations of item probability values based on the initial predictions derived from a selected item-similarity model. The structure of the AcoRec model enables efficient handling of high-dimensional data while maintaining an effective balance between diversity and high recall, leading to recommendation lists that are both varied and highly relevant to user tastes. Our results demonstrate that AcoRec outperforms existing state-of-the-art methods, including two random-walk models, a graph-based approach, a well-known vanilla autoencoder model, an ACO-based model, and baseline models with related similarity measures, across various evaluation scenarios. These evaluations employ well-known metrics to assess the quality of top-N recommendation lists, using popular datasets including MovieLens, Pinterest, and Netflix.

Keywords:

cold start; collaborative filtering; continuous ant colony optimization; long-tail; recommender systems; top-N recommendation

MSC:

68T20

1. Introduction

Recently, visual media platforms such as YouTube, Spotify, Netflix, Twitch, and others have become increasingly popular, especially during the COVID-19 lockdown periods. These platforms typically provide recommendation lists to their users on their mobile devices, tablets, or television screens, based on their item preferences. These recommendations, presented in horizontal or vertical forms on the main screens of many media platforms, are usually based on the user’s past likes, trending items, or related demographic information. With the development and competition of recommender system technologies, users expect personalized or session-based recommendations on these platforms [1]. However, generating online recommendations in live recommendation systems is challenging due to the absence of initial or accomplished data. This type of recommendation requires evaluating ongoing and noisy data systems rather than employing data from scratch. Whereas recommender systems have widely used traditional deterministic models like Collaborative Filtering (CF) and Content-Based Filtering (CBF) to solve this problem, they tend to offer the same recommendations to all users and require continuous updating and diversification of the home screen recommendations, due to the changing tastes of users [2]. To address these limitations, researchers in recommendation systems have recently considered heuristic and deep learning methods to offer continuous and variable recommendations [3]. The vital processes of a recommender system are to increase the connected nodes of the user–item graph and produce more accurate predictions between users and new items. While doing this, the system must find user-specific relations that are considered to be of quality. One of the challenges to ensuring quality is the presence of cold-start users. While most recommender systems address the problem of cold-start users in offline settings, it is crucial to consider their evolving preferences within the system itself. This is because all users can be considered cold start, due to their ever-changing and unpredictable tastes. However, recommender systems mainly offer recommendation sets for each user based on their past clicks, which might turn out to be similar, uncompelling, and poor-quality recommendations for the users [2,4,5]. This challenge drives us to tackle another issue related to cold-start users: the over-specialization problem, where recommendations become too narrowly focused, potentially limiting the diversity and discovery of unexplored content.

In this paper, we deal with the problems related to the recommendations for cold-start users, personalized recommendations, over-specialization issues, and facilitating time complexity in the recommender systems. AcoRec, the proposed method in this paper, is a promising alternative that can provide diverse recommendations for the issues mentioned above. We initiate the AcoRec framework, which we developed using the Continuous Ant Colony Optimization method, ACO_ℝ, as described in [6], to enhance the variety of user-item relationships and diversify recommendations for users in the system. Based on ACO_ℝ, AcoRec employs various item-similarity or proximity models as input to generate user-specific, probabilistic, and highly diverse recommendations based on users’ past clicks. As a meta-heuristic and hybrid framework, AcoRec seeks diverse recommendations, addressing the challenges associated with relevant recommendations for cold-start users and long-tail items. The primary approach of this study is to generate an initial prediction based on the user’s click vector using the selected item-similarity model. Subsequently, these initial predictions are updated based on the user’s clicks and the scale vector obtained by adjusting the diagonal elements of the selected item-similarity matrix to modulate the influence over the matrix. AcoRec validates the initial predictions as the preliminary pheromone values τ and utilizes the item-similarity model as the heuristic tool for the model η. Through this prior process, we establish new item connections for cold-start users based on their recent preferences within the context of the selected similarity model. The initial pheromone values define the user’s recent preferences for the items within the scope of the selected similarity model. AcoRec optimizes the likelihood of user–item interactions within the system to infer how the similarity model responds to user knowledge. It achieves this by maximizing the importance of items for the specific user through hyperparameter tuning in the continuous domain. Additionally, we searched within a continuous domain to conduct hyperparameter tuning, which allowed for more precise optimization of the model’s parameters. Unlike deterministic approaches, ACO_ℝ incorporates probabilistic elements and some degree of randomness to address the challenges mentioned above. Although Ant Colony Optimization (and, by extension, ACO_ℝ) has been used for decades, we revisited it with the help of advanced coding libraries, GPU capabilities, and techniques that have gained prominence with the rise of deep learning. This approach allows us to reassess its potential by leveraging modern computational advancements to better understand and possibly enhance its efficacy. However, we avoided employing such deep learning models in this study to prevent potential complications in the backpropagation process that could arise from introducing randomness into the weights. Consequently, we identified ACO_ℝ as the effective solution for the specific needs of this study. Other optimization methods could be used in future work. During training, AcoRec identifies the valuable items for the relevant user based on the expected probabilities of those items. Subsequently, our model generates a top-N recommendation list that ranks the users’ estimated probabilities of belonging to items. These predictions can vary and differ across sessions, which is the core concept of our novel model. AcoRec enabled parallelization and running on multiple processors with a row-based user recommendation structure. This approach further reduced the estimation time, making it feasible to handle large item numbers and giant user dimensions in recommendation systems, as detailed in the proposed Section 3.3 (see Algorithm 1).

Algorithm 1 AcoRec

Inputs: item similarity model S € R^m^×m, click vector of user ru € R¹^×m, Sc Frobenius norm of columns of S, μ ← 0, σ ← 1, ant_size ← ant size, archive_size ← archive size, T ← epoch count, a weight template NDCG@100
Output: Predictions_u ← predictions of user u for items

compute τ(u) according to Equation (8)
construct SolutionArchive(1...arch_size) ← {}

for epoch ← 1 to T do
  for each k, ant_size
// sample β variable using the Gaussian distribution with mean μ and deviation σ
β* = N(μ,σ)
// estimate probabilities for ant, according to Equation (10)
p_k = τ(u) × S × Sc^β*
// fitness value for each ant
fitness = likelihood(r(u), p_k). NDCG@100(u, p_k)
SolutionArchive.insert(β*, fitness)
  end for
   // sort solutions and trim them for the best solutions
  sort(SolutionArchive, by fitness descending)
  trim(SolutionArchive, archive_size)
  update μ and σ from Solution Archive via Adam
end for
β = μ
return τ(u) × S × Sc^β

In various scenarios, we evaluated our models on popular datasets such as MovieLens, Pinterest, and Netflix. We utilized state-of-the-art item-based similarity models (Gram, Cosine, and Jaccard) as inputs and initially compared our model with these simple baseline estimators. While our model offers recommendations that change during sessions, we aim to maintain the relevance and satisfaction of these items with the user’s preferences. We also noticed an increase in the diversity of recommended items.

The rest of the paper is organized as follows. In Section 2 we review related works that have employed similar approaches in the literature. Section 3 explains ACO_ℝ and our proposed method. In Section 4 the datasets, metrics, and methods used to evaluate our model are described. In Section 5 we compare our proposed method with the state-of-the-art methods and present evaluation results. Section 6 includes discussions of the results. Finally, Section 7 concludes this paper.

2. Related Work

In the existing literature, to our best knowledge, there is no study that employs ACO_ℝ in recommender systems to provide recommendations for cold-start users or long-tail item recommendations using the best available data. Most applications of Ant Colony Optimization (ACO) in recommender systems focus on the discrete version of ACO for solving combinatorial problems such as item ranking, user clustering, and collaborative filtering. These studies have employed ACO as a core implementation in the literature to address these types of problems. For example, Sobecki et al. used actual data to recommend student courses based on ACO [7]. In addition, T-BAR, which is considered as one of the efficient probabilistic models, is also implemented using ACO [8]. Although T-BAR is effective in offering diverse user predictions, the problem with offering effective predictions to cold-start users has prevailed. The authors proposed an updated DT-BAR (Dynamic T-BAR) to overcome the cold-start problem [9]. In another study, Massa proposed MoleTrust, a basic collaborative filtering model that incorporates Pearson similarity and trust in recommender systems [10]. Bedi and Sharma introduced the Trust-based Ant Recommender System (TARS), which produces recommendations by combining user trust assumptions with similarity based on Ant Colony Optimization (ACO). During training, TARS establishes new user relationships and generates predictions using updated, trusted users [11]. In contrast, the Semantic-enhanced Trust-based Ant Recommender System (STARS) represents a more advanced model that addresses some of TARS’s limitations. STARS enhances the original approach by incorporating semantic user similarity and clustering, offering a more nuanced and progressive solution [12]. TCFACO investigated user trust statements and developed an ACO-based collaborative filtering method aimed at predicting user effectiveness [13]. In a different approach, Tengkiattrakul et al. combined SVD-based user factors with trustworthiness to enhance user similarity in ACO-based recommendations [14,15]. While TCFACO focuses on leveraging user trust for effectiveness predictions, Tengkiattrakul et al.’s work integrates matrix factorization techniques with trust metrics to improve similarity measures in the ACO framework. Bellaachia et al. introduced ALT-BAR, a progressive approach that employs an averaged localized trust-based ant recommender system specifically designed to tackle the cold-start problem in recommendations [16]. Expanding on the TARS framework, Kaleroun et al. further refined the model by integrating item deviation distance into the prediction formula. Their enhanced model was rigorously tested against several challenges, including Shilling Attacks, Cold-Start users, Sparse Matrix issues, and Grey Sheep users [17]. In contrast, Liao et al. focused on improving ranking accuracy through a different mechanism. They computed user and item pheromones separately, and then combined them in the rating prediction process, highlighting the role of pheromone dynamics in ranking [18,19]. This approach diverges from trust-based models by emphasizing pheromone-based ranking strategies. Meanwhile, Nadi et al. explored a fuzzy-based Ant Colony system for website recommendations. Their model utilized Jaccard-based user similarity and applied fuzzification to the user–item interaction matrix, presenting an alternative method for integrating user similarity and interaction into the recommendation process [20].

The typical approach in these ACO-based recommendation system studies is as follows:

Computing user similarities using metrics such as Cosine, Jaccard, Pearson, and trust measures.
Obtaining users as nodes and selecting similar users with Ant Colony Optimization steps.
Predicting the new recommendations from similar neighbors (users) based on Resnick’s prediction formula [21].

Conventional ACO applications for recommendation systems usually involve computations based on users, as expressed in the above. Given that the number of users typically exceeds the number of items, this leads to significant computational challenges. When a new user is added to the system, similarities with other users need to be recalculated. In contrast, our approach relies on lower-dimensional item-similarity matrices rather than user similarities. Additionally, the optimization process for the ants in our method requires minimal traversal paths rather than extensive graph-based exploration. When we model our work according to the ACO algorithm, although the nodes in the graph structure represent the items and the edge values seem to reflect the probability of the user’s interest in the neighbor item, we opted to use ACO_ℝ for the system’s parameter optimization. This choice was due to the inherent limitations of traditional ACO algorithms, such as their discrete nature and potential for premature convergence. ACO_ℝ, a more advanced variant, allows for continuous optimization of parameters, thus providing a more flexible and robust approach to fine-tuning the system’s performance. The specific details and advantages of using ACO_ℝ for parameter optimization are discussed in the next section.

3. Proposed Method

Deterministic recommendation models are robust algorithms, despite their simple structures. For instance, neighborhood models or regression models can be overwhelmed by many models [22,23]. In deterministic recommendation models, users are given a set of recommendations {S} at time t1, and this set {S} remains the same as long as there is no change in the model between time t1 and t2. Nevertheless, we might acknowledge these results as adequate or sufficient, based on the evaluation metrics [2]. Many researchers obtain evaluation results for algorithms by averaging the results of multiple experiments. Yet, these results can vary depending on the selection of the dataset, sampling methods, chosen metrics, and hyperparameter evaluations [23,24].

In heuristic-based systems, outcomes could be provided in various ways without updating the parameters or data, due to the randomness of its core, which could be an attractive feature for users. However, a challenge in providing diverse recommendation lists for a current user is that randomly recommended items may be difficult to match with the user’s taste. The recommendations given to a user are boundless, but we have inferential approximation illustrations, like a top-N recommendation list. These lists can be updated over time, but inadequate feedback may prevent these lists from changing. When the number of items m is significantly larger than the number of items to recommend n, the number of possible recommendation sets is C(m,n) =

(\begin{matrix} m \\ n \end{matrix})

. Exhaustively evaluating all possible item sets is computationally intractable. Therefore, generating a top-N recommendation list in recommendation systems can be considered a combinatorial optimization problem. Hence, heuristic methods such as Ant Colony Optimization can be seen as an effective solution to the problem.

3.1. Ant Colony Optimization

Ant Colony Optimization models are derived from the behavior of real ants to solve many optimization problems. Ants can discover the shortest path from a food source to the nest. While traveling, each ant deposits a chemical hormone called pheromone on the ground, reflecting the pheromones the other ants deposited. It is a suitable model for mimicking the behavior of users in recommendation systems, where nodes represent items and a set of nodes visited by ants can be recommended to the users. Initially, ants are randomly distributed to the nodes in the graph. An ant k at time t, being in node I, chooses the next node j with a probability given by the random proportional rule defined in Equation (1)

p r o b a b i l i t y_{t}^{k} (i, j) = \frac{τ_{t} {(i, j)}^{\propto} . η {(i, j)}^{β}}{\sum_{k \in | u |} τ_{t} {(i, k)}^{\propto} . η {(i, k)}^{β}}

(1)

where u is a set of nodes in the neighborhood of i, τ is the pheromone value of the edge, and η is the desirability of the edge. After evaluating all the ant’s tour costs in the current iteration, the pheromone values of each edge (i, j) are updated. The evaporation of pheromones is calculated, and better solutions are indicated by a higher amount of pheromones deposited by the ants.

3.2. Ant Colony Optimization in the Continuous Domain

Combinatorial optimization, such as classic ACO, deals with finding optimal combinations or permutations of available problem components like in the Travelling Salesman Problem (TSP) problem. However, some issues may be tackled with a combinatorial optimization that is only sometimes convenient, especially if the bounds are wide and the sensitivity of the parameters is high. In such cases, algorithms that optimize continuous variables yield better results. Blum [25] attempted to extend ACO algorithms to tackle discrete- and continuous-optimization problems. Two approaches are presented for integrating ACO into the continuous domain. The first method uses a familiar approach to ant behavior, and the second method carries the fundamental ACO graph structure to investigate it in the continuous domain. This evolution could be flawless due to proper discretization or probabilistic search-space sampling [26]. In the second method, Socha and Dorigo introduced the continuous Ant Colony Optimization algorithm ACO_ℝ [27], used a Gaussian kernel probability density function (pdf) expression for the distribution model, and presented the ACO_ℝ as a meta-heuristic framework. In ACO_ℝ, given a problem with n decision variables, a vector x_j = {x_j,1, x_j,2, x_j,3, ..., x_j,n} represents probabilities from a probabilistic density function as a solution by an ant, j, and f(x_j) represents the objective function value of the solution. In ACO_ℝ, each ant represents a row of the Solution Archive. During the iterations, the candidate solutions in the Solution Archive are ordered according to their objective function values. Each solution has an associated weight, ω_j, which keeps the proportion of its solution quality on the whole. The weight of the jth solution is defined in Equation (2)

ω_{j} = \frac{1}{q σ \sqrt{2 π}} e^{- \frac{{(G (j) - μ)}^{2}}{2 q^{2} σ^{2}}}

(2)

where G(j) is the value of the Gaussian function with argument j, μ is the distribution mean, σ is the standard deviation, and q is the parameter for the deviation distance of the algorithm. When q is a small value, the high-fit solutions are promoted, and the probability intensifies with the increase in q. By sticking with the original ACO’s pheromone model, the algorithm updates μ and σ values after each iteration to optimize the probability distribution. Once the initial Solution Archive is constructed, each ant selects a distribution from the Solution Archive with the asset of a fitness proportionate selection function such as the roulette wheel selection algorithm, and the solution probabilities of each row are obtained by dividing all sums by themselves,

p (j) = \frac{ω_{j}}{\sum_{r = 1}^{k} ω_{r}}

(3)

In Equation (3), p(j) is the probability of the jth row in the Solution Archive. The quality of the solution is calculated based on the objective function and merged with the Solution Archive. After sorting, the first k best solutions are selected, and the others are discarded for forthcoming iterations. For example, for a maximization problem, the Solution Archive constructed by k ants is ordered in descending order, where f(x₁) ≥ f(x₂) ≥ ⋯ ≥ f(x_k) and ω₁ ≥ ω₂ ≥ ⋯ ≥ ω_k. The sample Solution Archive structure is given in Figure 1.

In the search process, iterations aim to find the best solution and converge the model. After each iteration, the pheromone update strategy (like ACO) is performed by adding k newly generated solutions to the Solution Archive. After sorting the solutions, the worst k solutions were eliminated, so the total number of solutions in the archive remained equal to k. This method maintains the better solutions in the Solution Archive, due to the practical guidance of ants in the search process for better quality.

In this paper, we investigated the issues associated with recommender systems (RSs), as noted in Section 1, and utilized the ACO_ℝ to overcome these challenges. Additionally, we introduced novel enhancements to this method to address the challenges posed by RSs problems, as will be detailed in the following section.

3.3. Stochastic Approach of AcoRec

This paper introduces AcoRec, a novel method that aims to leverage Bayesian inference and users’ past click history to predict their interest in items. The approach involves utilizing a vector pheromone model and adjusting user-specific hyper-parameters to optimize expected outcomes, allowing for seamless adaptation to session-based or real-time systems tailored to individual users. In AcoRec, the probabilistic transition rule for the users, selected by ant k who mimics user u at time t, is given in Equation (4),

p r o b a b i l i t y_{t}^{k} (u) = τ {(u)}_{t}^{\propto} . η^{β}

(4)

where τ(u)_t represents the pheromone values for user u on items at time t,

η

denotes the selected input model, and α and β represent the pheromone regularization and heuristic model adjustment parameters, respectively. Notably, normalization was not applied in the denominator. These parameters maximize the posterior information of the items for users, similar to the prediction process of item-based models. In item-based models, user scores for items are predicted using the base equation in (5),

p r e d i c t i o n s (u) = r u . S

(5)

where S is an m × m item-similarity matrix, and ru is an item vector with size m. It is shown as ru = [ru₁, ..., ru_m], where ru_i equals 1 if the user u clicks the item i; otherwise, it is set to 0, as given in Equation (6). Suppose we accept the clicked items that users tasted before using pheromone-traced items for users. In that case, AcoRec denotes the ru vector as pheromone vectors and S as heuristic information between the items for further optimizations.

r u_{i} = {\begin{array}{l} 1, i f u s e r u c l i c k e d i t e m i, \\ 0, o t h e r w i s e . \end{array}

(6)

In this context, we estimate posterior probabilities by selecting the rows corresponding to items previously clicked by the user from the item–item-similarity model (assuming a symmetric matrix structure, where column values mirror row values in Hermitian matrices). These selected rows are then assembled into a low-rank vector to form the Lp-norm derived from the columns of this subset matrix. The norms of the user-clicked items represent the user’s actions as a pheromone vector (prior probabilities) analogous to social network behavior. This serves as an initial pheromone interpolation, aligning with the foundational principles of the ACO.

Let xu = [xu₁, ..., xu_q] be a subset vector of ru containing all clicked items belonging to user u, where q is the count of clicked items. The formula for the Lp-norm of these clicked items is shown below:

L p (u) = {| | S u^{q m} | |}_{p} = \sum_{i = 1}^{m} \sum_{j = 1}^{q} S {(j, i)}^{p}

(7)

where Su is a subset matrix of S that only keeps the xu item rows, the clicked items of user u, S is the item-similarity model, and i is the column id in the item-similarity model. In Equation (7), when the p-value is 1, this means L1-norm, and if the p-value is equal to 2, it is equal to L2-norm, also known as Euclidean Space. We analyzed how the similarity model responds to user knowledge by examining the probability of user–item interactions within the system. Additionally, we established a relationship between general Lp-norm vectors and user clicks. Equation (8) is used to differentiate between positive (clicked) and negative (not clicked) interactions for a user. The goal is to predict the likelihood of a user clicking on an item and to evaluate the likelihood of this prediction compared to the actual interaction. We used the Bernoulli transformation with Binary Cross Entropy (BCE) for this conversion. The formula for τ(u) is given by

τ (u) = r u . \log (L p (u)) + ((1 - r u) \log (1 - L p (u))

(8)

where

τ (u)

represents the pheromone value of the items for user u at time t = 0. Lp(u) represents the likelihood of a user u interacting (clicking) with an item at time t. Based on the dataset characteristics, we must determine the appropriate transformation, either tanh or sigmoid, for Lp(u). Since our datasets are binary, we used these transformations to ensure that Lp(u) is within the [0, 1] range, suitable for representing probabilities.

In ACO, the pheromone model is pivotal in introducing randomness into the search space [27]. The initial estimation of pheromone values for user-clicked items is conducted through Equations (4), (7), and (8). Specifically, Equation (4) governs the adjustment of pheromone values’ tendency in ACO models via the α parameter, while the β parameter, controlling the model’s heuristic knowledge, is deemed crucial for its performance [28]. Following initializing of pheromone values with the user’s prior clicks, we set α = 1 to regulate the pheromone bias in our model, thereby directing our focus solely on adjusting the β parameter. Our observations indicate that controlling the heuristic knowledge with β enables us to either enhance the pheromone effect while mitigating bias or diminish the pheromone effect while reinforcing the tendency towards heuristic knowledge. The parametric scaling of heuristic knowledge, which gauges the resistance of Euclidean norm data to popularity, has been widely integrated into numerous models, resulting in favorable outcomes [29,30,31]. This scaling ensures a coherent development path and facilitates seamless integration with the user’s selected item model. The scaling applied to the heuristic matrix S is defined by Equation (9). This approach aligns with the principles of the ACO model and provides a clear and consistent development trajectory.

S = [\begin{matrix} s_{11} & s_{12} & \dots & s_{1 m} \\ s_{21} & s_{22} & \dots & ⋮ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ s_{m 1} & s_{m 2} & \dots & s_{mm} \end{matrix}] . {[\begin{matrix} ‖ s_{1} ‖_{F} & 0 & 0 & 0 \\ 0 & ‖ s_{2} ‖_{F} & 0 & 0 \\ 0 & 0 & ⋱ & 0 \\ 0 & 0 & 0 & ‖ s_{m} ‖_{F} \end{matrix}]}^{β}

(9)

where β is the scaling parameter and

{‖ s_{1} ‖_{F}, \dots, ‖ s_{m} ‖_{F}}

are the Frobenius norm (L₂-norms) of each column in the S. This β parameter is utilized to adjust the impact of high-norm values in both popular and long-tail items, dynamically adapting, based on the specific scenario, to either emphasize or reduce their influence. Re-inserting the pheromone values and the scaled heuristic model into Equation (8) and Equation (9), respectively, Equation (4) yields the following formula, denoted as Equation (10).

p r o b a b i l i t y_{t}^{k} (u) = τ {(u)}_{1 m}^{\propto} . S_{m m} . Diag {({‖ s_{1} ‖_{F}, \dots, ‖ s_{m} ‖_{F}})}^{β}

(10)

Our scaling method differs significantly from others in an important aspect. While the referenced models [29,30,31] apply this scaling using a uniform parameter across all users and select the best parameter in a grid search, our algorithm performs scaling parameter tuning as an internal hyper-parameter optimization. This means that the scaling parameter is computed differently for each user, enabling the generation of personalized recommendations tailored to individual user preferences. In Equation (10), by fixing the α value to 1, the term

τ {(u)}_{1 m}^{\propto} . S_{m m}

becomes constant and pre-computable for all users. As a result, the β value only needs to be computed from the remaining term, which significantly reduces computational cost. Instead of performing an m × m matrix multiplication, this approach achieves an O(m) complexity through a vector-based multiplication m × 1, making it highly efficient in terms of the computational burden. The optimized β parameter reflects the user’s preferences and behavior [31]. This parameter is highly user-specific and can vary significantly among users, based on their unique tastes and preferences. When β takes on a negative value, it can highlight rare items for the user, offering a personalized touch to the recommendations. The optimal value for β can vary for each user, and users may have multiple probabilities for the best value that maximizes their posterior. These probabilities may form either a tight distribution or a wide range. However, discrete probabilities may not provide certainty when searching for a hyper-parameter. Consequently, the optimization challenges of continuous fields have spurred new directions in Ant Colony Optimization research. Instead of relying on a discrete probability distribution, a pdf is employed to sample the probabilistic hyper-parameters in the continuous domain. Conceptually, a node in a conventional ACO problem can be likened to a local parameter in a Gaussian Distribution.

In our model, each ant samples candidate β parameters from a Gaussian distribution G(x) = N(μ,σ), and we set μ = 0 and σ = 1 initially. Points close to each other in the continuous domain produce similar results, facilitating stochastic exploration of the optimal β parameter. We discuss finding the maximized β value in the ACO_ℝ domain, and provide details in Algorithm 1.

During each iteration we estimate each ant’s probability values using Equation (10), followed by a non-linear transformation to adjust for varying β values and ensure alignment in fitness measurements. Additionally, we found dropout beneficial in this process. To optimize the computation of p_k, we precompute the product τ(u) × S whenever τ(u) and S remain constant before entering the loop. The model estimates the fitness value for the selected β* parameter from the count of the validation items in the recommended current top-N list.

The fitness process uses a likelihood evaluation function (e.g., Bernoulli, Gaussian) to assess the consistency between each ant’s recommended list and user preferences. We used Bernoulli for likelihood in our experiments. We incorporated NDCG@100 (a detailed formula for this metric is explained in Section 4) as the weights, allowing us to evaluate the fitness value of each ant’s list. We utilized the process mentioned above to update the solution matrix in ACO_ℝ. The row count of the solution space equals the archive-size parameter in our approach, with each row containing a sampled β value and its fitness value. Our approach diverges from ACO_ℝ, in that the central Solution Archive is initially empty. Subsequently, all solutions are sorted, and the best archive-size solutions, determined by their fitness value, are selected for the next iteration. At the end of each iteration, μ and σ are optimized by Adam [32] from the Solution Archive. This training process shifts μ of the distribution to concentrate on the best quality and best β simultaneously. After each iteration, we applied evaporation to the Solution Archive, ensuring the model continuously improved throughout the iterations.

Algorithm 1 presents the methodology for generating user-specific recommendations. This row-based approach significantly improves computational efficiency by independently processing each user’s data, allowing for parallel execution and reducing overall computation time. Additionally, the input matrices involved in the process are precomputed, which further streamlines the workflow. Given the additional information that τ(u) × S is precomputed, we can simplify the time complexity analysis of Algorithm 1 by ignoring this operation in the computational steps. Initializing an empty Solution Archive has a negligible time complexity, O(1). Epochs run T times, so the total complexity will be multiplied by T. Ants walk ant_size times, so the complexity of the inner operations will be multiplied by ant_size. Sampling β variable using Gaussian distribution is O(1). Since τ(u) × S is precomputed, this step is reduced to a vector–scalar multiplication with Sc^β*. The time complexity of this operation is O(m), where m is the item vector size. Given that both the likelihood(r(u), p_k) and NDCG@100(u, p_k) have a complexity of O(m), due to the item size, the overall complexity of this step remains O(m). Inserting into an archive of size archive_size is O(1). Sorting the archive of size archive_size is O(archive_size lg(archive_size)) and trimming is O(archive_size). Updating parameters with Adam involves simple arithmetic operations over archive_size items, which is O(archive_size). Since τ(u) × S is precomputed, the returning final predictions operation is O(m). Assuming m > archive_size, total time complexity for a single user is O(T×ant_size×m + T×(m+archive_size lg(archive_size))). This approach also enhances scalability by enabling the distribution of computational tasks across multiple processors, thereby optimizing the performance of large-scale recommendation systems.

3.4. Heuristic Base of AcoRec and Item Model Selection

In ACO-based recommender systems, the distance between nodes is determined by the similarity or proximity between users or items. We prefer to measure this similarity using distance metrics in inter-nodal Euclidean space. Our model is designed to be low-dimensional and focuses on gauging a user’s interest in items rather than the distance between nodes. The relationship between items is managed through various forms, such as similarity, proximity, dissimilarity, or correlation, utilizing specific methods. Collaborative Filtering (CF) models consider the collaborative benefits of items, while Content-Based Filtering (CBF) models focus on items’ metadata (e.g., demography, mood, etc.). Graph Similarity Models are based on the relationships in the user–item network structure. Time-based models track the temporal sequences of item purchases. Latent Factor-Based Models extract hidden components from low-rank computations. Demographic Models consider collaborative behaviors in the same geographical areas.

This study evaluated three well-known item-based similarity measures for computational simplicity and popularity. Let S^m^×m be the similarity matrix, i and j be the two items, S_ij represent the similarity between two items, and v_i and v_j be the column vectors of these items.

3.4.1. Gram Matrix (Gram)

The dot-product similarity of two items equals the inner product of these item vectors, as given by the formula in Equation (11).

S_{g r a m} (i, j) = | v_{i} \cap v_{j} | = \overset{⇀}{v_{i}} . \overset{⇀}{v_{j}}

(11)

3.4.2. Cosine Similarity (Cosine)

The cosine similarity between the two items is the cosine of the angle between their rating vectors. It is estimated by the inner product of these item vectors divided by vector norm multiplication, as shown in Equation (12).

S_{c o s i n e} (i, j) = \frac{| v_{i} \cap v_{j} |}{\sqrt{| v_{i} | . | v_{j} |}} = \frac{\overset{⇀}{v_{i}} . \overset{⇀}{v_{j}}}{| | \overset{⇀}{v_{i}} | | . | | \overset{⇀}{v_{J}} | |}

(12)

3.4.3. Jaccard Similarity (Jaccard)

The Jaccard similarity between two items is defined as the ratio of the number of users that co-rated items based on the number of users that rated at least either i or j items, as described in Equation (13).

S_{j a c c a r d} (i, j) = \frac{| v_{i} \cap v_{j} |}{| v_{i} \cup v_{j} |} = \frac{\overset{⇀}{v_{i}} . \overset{⇀}{v_{j}}}{| | \overset{⇀}{v_{i}} | | + | | \overset{⇀}{v_{j}} | | - \overset{⇀}{v_{i}} . \overset{⇀}{v_{j}}}

(13)

4. Evaluation

4.1. Dataset

We utilized three widely recognized datasets from different domains, which are Movie-Lens 1M (ML-1M) [33], Netflix [34] for movie recommendations, and the Pinterest [35] dataset about interactions of the users who pinned the images to their boards. Due to the large size of the Netflix and Pinterest datasets, we created subsets from the originals, a common practice among researchers, to facilitate faster benchmarks and parameter tuning. For the ML-1M and Netflix datasets, ratings of 4 and 5 stars were converted to binary ones, while all other ratings were converted to zero. Subsequently, in the ML-1M dataset, we filtered for users who rated at least one item and movies rated by at least one user, resulting in a sparser dataset than the original. For Pinterest, we selected users who had pinned at least 20 images to their boards and boards pinned by 5-to-200 users. In the Netflix dataset, we chose users who had watched between 20 and 500 movies and movies that had been watched by 20 to 500 users. The counts of users, items, and ratings, along with their sparsity and density values, are summarized in Table 1. The sparsity percentage is calculated as (1 − density) × 100, where density = #ratings/(#users × #items). As indicated in Table 1, the sparsity values of the sampled subsets are higher than those of the original datasets.

We utilized the k-fold cross-validation method to split the raw datasets to evaluate the models. We randomly shuffled all datasets and divided them into k = 5 sampled subsets. Each unique sampled group was used as a probe set held out from the raw dataset. After removing the probe set, the remaining portion of the raw dataset was referred to as the ‘training set’. We then selected users and their ratings from these probe sets based on the criteria defined in the experiments. These selected users and their ratings in the probe set were considered the ‘test set’. This process allowed us to obtain an average estimate of the results for different users and items in each experiment.

4.2. Evaluation Metrics

Our models did not consider the similarity between the estimated and actual ratings. Instead, we assessed the quality of the recommended items for the users. To evaluate the quality of the top-N recommendations, we used Cremonesi’s method for benchmarking models [36]. However, instead of selecting 1000 items, we modified the approach by calculating the top-N lists by sorting all the items the user did not click on. Because our method is a probabilistic model, sampled items may produce different results in each experiment. Evaluating all items in a row is a challenge, due to the growing size of candidate items, but it yielded more consistent results according to the sampled metrics denoted in [37]. After sorting all the items that were not clicked, we used prediction models to estimate their rating scores. We selected N items from the sorted list based on their higher predicted rating scores. This final list represents the top-N item recommendation list for the ‘test user’. In our experiments, we tested with N values of 10 and 20 for the length of recommendation lists.

We employed two utility-based metrics to evaluate the quality of the recommendation lists regarding relevant items to the user: normalized Discounted Cumulative Gain [38] and Recall [39]. By considering the relevance and position of items, nDCG evaluates the quality of rankings and rewards systems that prioritize highly relevant items, offering a precise measure of ranking effectiveness. Conversely, Recall evaluates the system’s coverage of relevant items within the top-N list, focusing on the proportion of relevant items included in the recommendations. Additionally, we used the Coverage [24] metric to gauge the proportion of unique items recommended to users in the lists, focusing on the diversity and breadth of recommendations rather than the relevance or interest of the recommended items to individual users.

In these metric formulas, we denoted T as the number of users in the test set, N as the length of the recommendation list, and i as the position of the recommended item in the list. If the item ranked at position i in the list belongs to a user in the test set, we consider this item a ‘relevant item’ for the user and set rel(i) = 1; if it does not belong to the test user, we set rel(i) = 0.

4.2.1. Recall

To evaluate the model’s retrieval score for specific datasets in different list lengths, we divide the sum of all ‘relevant items’ by the number of users in the test set. The Recall formula is given in Equation (14).

R e c a l l (@ N) = \frac{1}{T} \sum_{i = 1}^{N} r e l (i)

(14)

4.2.2. Normalized Discounted Cumulative Gain (NDCG)

The position of the ‘relevant item’ in the listwise recommendation is ignored in the Recall formula. The recommendations at the top of the list are more valuable than the others. We measured the importance of the position of the item in the list by dividing the position of the ‘relevant item’. NDCG gives significance to the gain of the position logarithmically while looking at list quality. This metric estimates the test set’s Discounted Cumulative Gain (DCG) in Equation (15). Then, the Ideal Discounted Cumulative Gain (IDCG) is estimated in Equation (16) for the best probability. In our case, every test item belonging to the selected user is in the top-N list. Then, we normalized these gain values with Equation (17) and obtained the NDCG value for a benchmark test.

D C G (@ N) = \frac{1}{T} \sum_{i = 1}^{N} \frac{r e l (i)}{\log_{2} (i + 2)}

(15)

I D C G (@ N) = \frac{1}{T} \sum_{i = 1}^{N} \frac{1}{\log_{2} (i + 2)}

(16)

N D C G (@ N) = \frac{D C G (@ N)}{I D C G (@ N)}

(17)

4.2.3. Coverage

The coverage metric measures the ability of a recommender system with the percentage of different items in total items in the whole recommendation list. We define the Coverage of the system as the average of all users in Equation (18):

C o v e r a g e (@ N) = \sum_{i = 1}^{# U} \frac{U_{u ϵ U} \cap i}{| I |}

(18)

U_u_ϵU ∩ i is the number of unique items in the recommended list, and |I| is the number of items counted in the system.

We show the best percentage value for the Coverage value related to the best NDCG@10 parameters for each model. We considered it a fairer way of evaluating the diversity of items on that list.

4.3. Baselines

To validate the effectiveness of AcoRec, we compared it with item-based, user-based, random-walk-based, graph-based, and ACO-based models for different scenarios. The models used for the benchmark tests are summarized below.

Base^Gram, Base^Jaccard, and Base^Cosine refer to three item-based similarity models employed in the study. These models include the Gram Similarity, Cosine Similarity, and Jaccard Similarity models. They are estimated using Equations (11), (12), and (13), respectively. These baseline models serve as the foundation for evaluating the performance of more complex recommendation models.

TARS [11] is a state-of-the-art ACO model in recommender systems. It introduces a user-based approach that builds a trust-based user-relationship graph, identifies similar users using Pearson Correlation, and estimates ratings. This model leverages ACO to enhance recommendation accuracy by considering user trust and relationships in the recommendation process.

RP³_ß [31] is a random-walk model based on the user–item graph, aimed at extending diversification to reduce the bias toward popular items in recommendation systems. This approach utilizes random walks on the user–item graph to explore less-popular items, thus improving the overall diversity of recommendations.

RecWalk^PR and RecWalk^K [40] are frameworks that capture new, rich network interactions for generating top-N recommendation lists. These methods leverage the concept of random walks on the network structure to uncover previously unnoticed connections between items or users, enhancing the diversity and quality of recommendations.

EASE^R [41] is a robust linear model that presents the closed-form solution of Ridge Regression in a manner akin to vanilla auto-encoders. This model offers a novel approach to linear regression, leveraging auto-encoder principles to enhance its performance and robustness.

UserKNN [21] employs Resnick’s user-based CF approach. We used Pearson Correlation to obtain user similarities.

Random is a baseline model that involves benchmarking by filling the empty cells in the user–item matrix with random values ranging between 0 and 1.

Popular is a baseline model that evaluates items according to their usage frequency.

4.4. Parameter Tuning and Experimental Setup

We evaluated a grid search to find the best parameters for each baseline model in the scenarios of recommendations for cold-start users and long-tail items, allowing us to compare their performance against each other. For the TARS model, the user neighbor size k was set with values ranging from 10 to 250 in step 10, and confidence values ranging from 0 to 1 in step 0.1 were tested. The RP³_ß model was tested with β values ranging from −1 to 1 and α values ranging from 0 to 1 in steps of 0.2. The EASE^R model was evaluated with λ values ranging from 5 to 20,000. For the RecWalk^PR and RecWalk^K models, as in their original paper, the trained SLIM model (W) was used as input. The parameters were set as follows: C ∈ {0.1|I|}, l1 ∈ {1, 3, 5, 10}, l2 ∈ {0.1, 0.5, 1, 3, 5, 7, 9, 11, 15, 20} and tested with fixed α value 0.005. RecWalk^PR was tested with η values ranging from 0 to 1 in steps of 0.2. RecWalk^K was tested with k values ranging from 2 to 30 in step 2.

Our AcoRec models were evaluated using archive sizes of 20, 50, and 100 and ant sizes of 50, 100, and 200. We employed non-linear functions such as ‘tanh’, ‘sigmoid’, or ‘softmax’ to convert the likelihood of user interactions. Dropout rates of 0, 0.2, and 0.5 were applied. We set the iteration count to 300. Additionally, for the initial σ value in the long-tail item scenarios, we used values of 2 and 3. For the Adam Optimizer learning rate value, we used 0.01 in the cold-start scenario and 0.05 in the long-tail scenario. Each experiment for AcoRec was repeated five times, due to random choices, and the results were averaged. Section 5 presents the best results achieved by using the optimal parameters for each model.

5. Results

To assess the performance of our models, we conducted experiments in two scenarios. The first scenario was designed to evaluate the accuracy of our model in providing recommendations to cold-start users, who had fewer ratings in the system, making it challenging to offer high-quality recommendations [42,43].

For the first scenario, we selected heat or warm users as candidate users from the probe set who were also in the training set. We randomly assigned 100 users, each with at least one rating in the probe set and at least twenty ratings in the training set. We then transformed these warm users into cold-start users by reducing their rating counts in the training set. To do this, we considered examples from studies in the existing literature. Whereas some studies defined cold-start users by keeping only three items in the training set [43], others used 5% of the user’s ratings [29]. There are also other studies that experimented with numbers ranging from 1 to 20 or used percentage rates [44]. For a more challenging approach, we kept between 5 and 10 random ratings for each selected user in the training set and removed the rest. This process turned the candidate users into cold-start users, each represented by a minimum of 5 and a maximum of 10 random ratings in the training set.

The second scenario, focused on long-tail item recommendations, was designed to test how effectively recommendations accommodate a variety of less-popular items. Popular items are familiar to users and can become monotonous over time [45]. Therefore, recommending less-popular items can be more engaging. Traditional CF methods often concentrate on popular items or users, overshadowing diverse relationships. Since the quality of models depends on the diversity of recommendations they offer, these CF methods may struggle to generate diverse suggestions, especially with inadequate data [46].

To create an experimental environment suitable for the long-tail item scenario, we adopted the method described in [36]. As noted by the authors, the most prevalent 1.7% of items, accounting for 33% of the ratings in the Netflix dataset, were referred to as short-head items, whereas the remaining items were called long-tail items. Following this method, we sorted the items in all datasets by popularity, determined by rating frequency, in descending order. We marked items as short-head from top to bottom until the sum of their frequencies equaled or exceeded 33% of the total ratings, and marked the remaining as long-tail items. We kept long-tail items in the probe set and removed the others. We then created a test set from the probe set, by randomly selecting 250 users who had rated at least one long-tail item. This process allowed us to randomly choose users with less-common tastes for each repeated holdout evaluation.

Experimental results for both scenarios based on the Recall, NDCG, and Coverage metrics are summarized in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. The best results for each column are highlighted in bold, while the second-best results are underlined. In the Coverage columns, since the random model performed well, as expected, we highlighted the second-best result in bold and the third-best result with an underline. We used three item-based models for AcoRec as input: Co-occurrence (AcoRec^Gram), Cosine-Similarity (AcoRec^Cosine), and Jaccard-Similarity (AcoRec^Jaccard).

5.1. Cold-Start User Scenario

Table 2 presents the results of cold-start experiments conducted on the ML-1M dataset, which is notably less sparse than the other datasets analyzed in our study. All three of our models outperformed their respective base models, with the AcoRec^Gram model consistently delivering superior results across most metrics. This indicates that the AcoRec algorithm, when integrated with Gram similarity, provides highly effective recommendations. Notably, after AcoRec^Gram, the AcoRec^Cosine model emerged as the second-best performer, significantly enhancing the results of the Base^Cosine model while also outperforming other models across all metrics. The AcoRec^Jaccard model, meanwhile, surpassed its base model by generating more diverse recommendation lists than all other models. Jaccard similarity is particularly effective at identifying less-obvious connections between items, making it a powerful tool for enhancing diversity in recommendation systems. However, it is important to remark that while Jaccard’s ability to find diverse connections can improve coverage, it may negatively impact relevance compared to other models. The specific improvements in our models based on their input models are discussed in detail in the Discussion section.

When comparing AcoRec^Gram to the closest competing models, including TARS, it demonstrates significant performance differences in several metrics. Specifically, AcoRec^Gram outperforms RecWalk^K by 8.7%, 5.2%, 13.0%, and 5.6% in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, respectively. Compared to RecWalk^PR, AcoRec^Gram shows improvements of 9.8%, 5.2%, 16.0%, and 6.2% across these same metrics. When compared to RP³_ß, AcoRec^Gram offers performance gains of 11.2%, 6.3%, 9.6%, and 3.2%. Against the TARS model, AcoRec^Gram exhibits a significant performance improvement of 18.6%, 12.5%, 23.0%, and 12.1%. In terms of coverage, AcoRec^Gram provides 20.8% higher coverage than RecWalk^K, 9.4% higher than RecWalk^PR, 14.7% higher than RP³_ß, and 175.6% higher than TARS. These results indicate that AcoRec^Gram offers distinct advantages over these models in both relevance and diversity.

In this experiment, RecWalk^PR and RecWalk^K demonstrate better performance compared to other models, aside from our AcoRec^Gram and AcoRec^Cosine models. However, RecWalk^PR achieves better coverage than RecWalk^K, indicating that while RecWalk^K excels in list quality, RecWalk^PR is more effective in covering a broader range of items. As sparsity decreases, the performance of state-of-the-art models designed for sparse datasets, such as EASE^R, declines. Consequently, while both EASE^R and RP³_ß perform less effectively than RecWalk models in NDCG, they demonstrate better performance in Recall, achieving better results in Recall@10 and Recall@20. The UserKNN and TARS base models, while previously effective, lag behind in performance. Despite both models utilizing Pearson correlation for user similarity, TARS, which is based on ACO techniques, did not achieve better results compared to UserKNN. Among the evaluated models, RecWalk^PR demonstrated the highest coverage outside of our models, which may be attributed to the influence of the PageRank algorithm utilized in RecWalk^PR.

For AcoRec^Gram in this experiment, tanh was used for the likelihood in Equation (8). Dropout was not applied, the ant size was 200, and the archive size was 50. These parameter settings were essential in accomplishing the model’s best performance, and the results of the experiment are shown in Table 2.

Table 2. Comparison of Cold-Start User Scenario on ML-1M.

Model	NDCG@10	NDCG@20	Recall@10	Recall@20	Coverage
Random	0.0021	0.0028	0.0020	0.0036	53.01
Popular	0.0395	0.0570	0.0528	0.0951	0.97
Base^Gram	0.0527	0.0714	0.0696	0.1130	2.54
Base^Cosine	0.0653	0.0890	0.0835	0.1401	10.22
Base^Jaccard	0.0583	0.0809	0.0752	0.1295	15.02
UserKNN	0.0682	0.0923	0.0858	0.1436	8.82
TARS	0.0662	0.0898	0.0820	0.1384	5.99
RecWalk^PR	0.0715	0.0960	0.0870	0.1460	15.09
RecWalk^K	0.0722	0.0960	0.0893	0.1469	13.67
EASE^R	0.0666	0.0924	0.0878	0.1509	13.38
RP³_ß	0.0706	0.0950	0.0921	0.1503	14.39
AcoRec^Gram	0.0785	0.1010	0.1009	0.1551	16.51
AcoRec^Cosine	0.0731	0.0990	0.0926	0.1548	18.81
AcoRec^Jaccard	0.0662	0.0901	0.0861	0.1423	23.38

Table 3 presents the results of cold-start experiments for the Netflix dataset. All three of our models outperformed their respective base models, although AcoRec^Cosine and AcoRec^Jaccard did not show significant improvement. The AcoRec^Gram model maintained its strong performance, surpassing all other methods and achieving superior results across all metrics. In this dataset, the runner-up models varied, depending on the metric: RecWalk^PR for NDCG@10, RP³_ß for NDCG@20 and Recall@20, AcoRec^Jaccard for Recall@10, and AcoRec^Cosine for Coverage. This variation reflects the diverse strengths and design focus of each model, as well as their interaction with the specific characteristics of the dataset and metrics. This observation highlights the AcoRec^Gram model’s ability to adapt effectively to different data and evaluation metrics.

Table 3. Comparison of Cold-Start User Scenario on Netflix.

Model	NDCG@10	NDCG@20	Recall@10	Recall@20	Coverage
Random	0.0013	0.0015	0.0016	0.0020	33.60
Popular	0.0010	0.0027	0.0017	0.0060	0.46
Base^Gram	0.0631	0.0824	0.0802	0.1262	20.17
Base^Cosine	0.0722	0.0925	0.0923	0.1424	27.13
Base^Jaccard	0.0720	0.0938	0.0915	0.1443	27.15
UserKNN	0.0705	0.0901	0.0870	0.1340	24.77
TARS	0.0665	0.0849	0.0788	0.1238	25.39
RecWalk^PR	0.0760	0.0960	0.0934	0.1429	28.12
RecWalk^K	0.0756	0.0960	0.0925	0.1426	27.30
EASE^R	0.0755	0.0954	0.0941	0.1418	27.64
RP³_ß	0.0754	0.1001	0.0920	0.1528	25.30
AcoRec^Gram	0.0784	0.1007	0.0981	0.1538	29.40
AcoRec^Cosine	0.0742	0.0953	0.0926	0.1430	29.32
AcoRec^Jaccard	0.0724	0.0950	0.0946	0.1443	28.66

When comparing AcoRec^Gram to other models, including TARS, significant performance differences are observed across several metrics. Specifically, AcoRec^Gram outperforms RecWalk^K by 3.7%, 4.9%, 6.1%, and 4.9% in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, respectively. Compared to RecWalk^PR, AcoRec^Gram shows improvements of 3.2%, 4.9%, 5.0%, and 7.6% across the same metrics. The AcoRec^Gram model also outperforms RP³_ß by 4.0%, 0.6%, 6.6%, and 0.7%, and EASE^R by 3.8%, 5.6%, 4.3%, and 8.5%. When compared to the TARS model, AcoRec^Gram exhibits a significant performance gain of 17.9%, 18.6%, 24.5%, and 24.2% across these metrics. In terms of Coverage, AcoRec^Gram provides 7.7% higher coverage than RecWalk^K, 4.6% higher coverage than RecWalk^PR, 16.2% higher coverage than RP³_ß, 6.4% higher coverage than EASE^R, and 15.8% higher coverage than TARS. An important observation is that while RP³_ß produces results comparable to AcoRec^Gram in longer lists (NDCG@20, Recall@20), it does so at the expense of lower coverage.

Unlike the previous dataset, the results across different models are generally closer to each other in the Netflix dataset, with no single model clearly outperforming the others. It is also notable that while the RecWalk and EASE^R models perform well with shorter lists, their effectiveness diminishes with longer lists.

For AcoRec^Gram in this experiment, tanh was used for the likelihood in Equation (8). The dropout was set to 0.2, the ant size was 200, and the archive size was 20.

Table 4 presents the results of cold-start experiments conducted on the Pinterest dataset. All three of our models outperformed their respective base models; however, the AcoRec^Gram model did not show improvement in the NDCG@20 and Recall@20 metrics. On this dataset, AcoRec^Gram and AcoRec^Cosine demonstrated superiority over other models across all metrics except for Coverage. In terms of Coverage, the RecWalk^PR model offered more diverse recommendations, though it did not achieve the same level of success in NDCG and Recall values. While EASE^R and RP³_ß trailed our models, they performed better than other models, albeit with lower Coverage.

Table 4. Comparison of Cold-Start User Scenario on Pinterest.

Model	NDCG@10	NDCG@20	Recall@10	Recall@20	Coverage
Random	0.0024	0.0031	0.0033	0.0050	42.03
Popular	0.0062	0.0098	0.0088	0.0180	0.71
Base^Gram	0.0679	0.1000	0.0940	0.1767	25.64
Base^Cosine	0.0685	0.0991	0.0928	0.1720	33.57
Base^Jaccard	0.0665	0.0947	0.0937	0.1652	33.99
UserKNN	0.0678	0.0982	0.0964	0.1746	26.20
TARS	0.0675	0.0936	0.0963	0.1624	30.97
RecWalk^PR	0.0677	0.0935	0.0932	0.1608	36.96
RecWalk^K	0.0688	0.0936	0.0964	0.1608	36.28
EASE^R	0.0701	0.0995	0.0990	0.1748	27.55
RP³_ß	0.0705	0.0998	0.0986	0.1734	28.37
AcoRec^Gram	0.0711	0.1007	0.0996	0.1767	36.66
AcoRec^Cosine	0.0711	0.9998	0.1009	0.1760	35.01
AcoRec^Jaccard	0.0700	0.1000	0.0944	0.1718	35.06

When comparing our models with others, the performance of the AcoRec^Gram model showed slight differences compared to RP³_ß and EASE^R, but outperformed TARS across all metrics. Specifically, in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, AcoRec^Gram outperformed RP³_ß by 0.9%, 0.9%, 1.0%, and 1.9%, respectively. Compared to EASE^R, AcoRec^Gram showed improvements of 1.4%, 1.2%, 0.6%, and 1.1%, respectively. Against the TARS model, AcoRec^Gram exhibited performance gains of 5.3%, 7.6%, 3.4%, and 8.8%, respectively. In terms of Coverage, AcoRec^Gram provided 29.2% higher coverage than RP³_ß and 33.1% higher coverage than EASE^R.

In this experiment, RecWalk^PR and RecWalk^K exhibited poorer performance compared to other datasets, except in Coverage. RecWalk models’ reliance on the SLIM model as input likely influenced its overall success. The UserKNN and TARS models, once again, did not demonstrate a notable success.

For AcoRec^Gram in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.5, the ant size was 50, and the archive size was 50. For AcoRec^Cosine, the sigmoid was used for the likelihood conversion. Dropout was not applied, the ant size was 250, and the archive size was 50.

5.2. Long-Tail Item Scenario

Table 5 presents the results of long-tail item experiments conducted on the ML-1M dataset. Our models outperformed all others, demonstrating their effectiveness even in scenarios where input models typically favor popular items. Remarkably, all three of our models ranked in the top three across all metric measurements. The balanced relationship between high Coverage and Recall highlights the superiority of our models. Notably, while the base models struggled in the long-tail item scenario, our models that utilized the base models as inputs achieved significant success.

When comparing our models with the closest competitors, including TARS, the AcoRec^Jaccard model displayed significant performance advantages over RecWalk^K, RecWalk^PR, RP³_ß, EASE^R, and TARS across several metrics. Specifically, in terms of NDCG@10, NDCG@20, Recall@10, and Recall@20, AcoRec^Jaccard outperformed RecWalk^K by 71.6%, 60.9%, 59.4%, and 48.9%, respectively. Compared to RecWalk^PR, AcoRec^Jaccard showed performance improvements of 79.9%, 66.4%, 66.0%, and 52.5%, respectively. Against the RP³_ß model, AcoRec^Jaccard demonstrated improvements of 45.8%, 38.7%, 39.7%, and 30.8%, respectively. Compared to the EASE^R model, AcoRec^Jaccard achieved performance gains of 98.6%, 79.2%, 77.8%, and 57.5%, respectively. Against the TARS model, AcoRec^Jaccard exhibited a remarkable performance increase of 636.5%, 418.7%, 472.7%, and 275.3%, respectively. In terms of Coverage, AcoRec^Jaccard provided 55.5% higher coverage than RecWalk^K, 57.6% higher coverage than RecWalk^PR, 22.1% higher coverage than RP³_ß, 64.2% higher coverage than EASE^R, and 142.9% higher coverage than TARS.

For AcoRec^Jaccard in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 50, and the archive size was 50.

Table 5. Comparison of long-tail item scenario on ML-1M.

Model	NDCG@10	NDCG@20	Recall@10	Recall@20	Coverage
Random	0.0019	0.0028	0.0035	0.0058	80.62
Popular	0.0000	0.0000	0.0000	0.0000	2.94
Base^Gram	0.0000	0.0000	0.0000	0.0000	3.20
Base^Cosine	0.0071	0.0099	0.0106	0.0180	10.38
Base^Jaccard	0.0125	0.0165	0.0177	0.0278	14.10
UserKNN	0.0370	0.0543	0.0566	0.1026	27.64
TARS	0.0137	0.0246	0.0231	0.0535	17.11
RecWalk^PR	0.0561	0.0767	0.0797	0.1317	26.37
RecWalk^K	0.0588	0.0793	0.0830	0.1349	26.73
EASE^R	0.0508	0.0712	0.0744	0.1275	25.31
RP³_ß	0.0692	0.0920	0.0947	0.1535	34.05
AcoRec^Gram	0.0991	0.1262	0.1307	0.2003	49.35
AcoRec^Cosine	0.0946	0.1215	0.1272	0.1960	42.04
AcoRec^Jaccard	0.1009	0.1276	0.1323	0.2008	41.56

Table 6 presents the results of long-tail item experiments conducted on the Netflix dataset. In this experiment, our models demonstrated clear superiority over others, with the exception of the RecWalk models. For shorter lists, RecWalk exhibited slightly better performance than our models. Specifically, in terms of NDCG@10, NDCG@20, and Recall@10, our AcoRec^Gram model lagged behind RecWalk^K by −3.2%, −2.3%, and −1.2%, respectively, and behind RecWalk^PR by −2.2%, −1.4%, and −0.8%, respectively. However, for Recall@20, AcoRec^Gram outperformed RecWalk^K by 0.2% and RecWalk^PR by 0.6%.

In the Coverage metric, all three of our models outperformed all models. It is important to note that the Coverage value for each model is based on the best NDCG@10 result. A key strength of our models is their ability to simultaneously enhance both recommendation accuracy and diversity. Achieving a Coverage result close to that of the Random model on the Netflix dataset underscores a highly successful outcome in terms of list diversity.

When comparing our models with others, the AcoRecGram model demonstrated significant performance improvements over the RP³_ß, EASE^R, and TARS models across all metrics, except when compared to RecWalk. Specifically, AcoRec^Gram outperformed RP³_ß in NDCG@10, NDCG@20, Recall@10, and Recall@20 by 12.7%, 10.9%, 16.0%, and 11.0%, respectively. Compared to EASE^R, AcoRec^Gram showed improvements of 24.2%, 21.5%, 21.0%, and 16.9%, respectively. Against the TARS model, AcoRec^Gram achieved a remarkable performance improvement of 165.5%, 141.6%, 143.5%, and 111.7%, respectively. In terms of Coverage, AcoRec^Gram provided 15.6% higher coverage than RP³_ß, 23.2% higher than EASE^R, and 63.9% higher than TARS.

For AcoRec^Gram in this experiment, we used the tanh function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 100, and the archive size was 20.

Table 6. Comparison of long-tail item scenario on Netflix.

Model	NDCG@10	NDCG@20	Recall@10	Recall@20	Coverage
Random	0.0005	0.0012	0.0010	0.0028	61.22
Popular	0.0000	0.0000	0.0000	0.0000	0.60
Base^Gram	0.0311	0.0441	0.0422	0.0766	26.28
Base^Cosine	0.0854	0.1040	0.1054	0.1537	40.03
Base^Jaccard	0.0913	0.1093	0.1100	0.1563	40.59
UserKNN	0.0730	0.0890	0.0937	0.1358	42.83
TARS	0.0530	0.0678	0.0688	0.1076	36.23
RecWalk^PR	0.1439	0.1661	0.1688	0.2267	52.73
RecWalk^K	0.1454	0.1676	0.1696	0.2274	52.14
EASE^R	0.1133	0.1348	0.1384	0.1949	48.18
RP³_ß	0.1249	0.1477	0.1444	0.2053	51.36
AcoRec^Gram	0.1407	0.1638	0.1675	0.2278	59.37
AcoRec^Cosine	0.1371	0.1605	0.1620	0.2259	58.36
AcoRec^Jaccard	0.1315	0.1557	0.1523	0.2147	53.39

Table 7 presents the results of long-tail item experiments conducted on the Pinterest dataset. Consistent with the ML-1M experiment, all three of our models ranked in the top three across all metrics. Among our models, AcoRec^Cosine was the most successful, outperforming all other models in accuracy metrics except for Coverage. Similar to the results from the Netflix dataset, AcoRec^Gram achieved a Coverage score close to that of the Random model, indicating a successful outcome in terms of list diversity.

When comparing AcoRec^Cosine to the closest competing models, including RecWalk^K, RecWalk^PR, RP³_ß, EASE^R, and TARS, it demonstrated substantial performance improvements across several metrics. Specifically, AcoRec^Cosine outperformed RecWalk^K in NDCG@10, NDCG@20, Recall@10, and Recall@20 by 33.2%, 26.7%, 34.8%, and 26.0%, respectively. Compared to RecWalk^PR, AcoRec^Cosine showed improvements of 26.6%, 25.5%, 27.7%, and 27.3%, respectively. Against RP³_ß, AcoRec^Cosine improved by 22.9%, 18.5%, 21.7%, and 16.9%, respectively. In comparison with the EASE^R model, it demonstrated enhancements of 58.6%, 46.8%, 54.6%, and 40.8%, respectively. Compared to TARS, AcoRec^Cosine exhibited the most significant gains, with improvements of 141.3%, 107.6%, 136.0%, and 94.2%, respectively. In terms of Coverage, AcoRec^Cosine achieved 20.2% higher coverage than RecWalk^K, 20.9% higher than RecWalk^PR, 9.1% higher than RP³_ß, 20.1% higher than EASE^R, and 46.9% higher than TARS.

Other than our models, when RecWalk^K, RecWalk^PR, RP³_ß, and EASE^R are evaluated across the entire dataset, we observe that the EASE^R model lagged behind the others. This may be attributed to its parametric nature, which may not effectively highlight niche items. Graph-based models like RecWalk^K, RecWalk^PR, and RP³_ß appear more successful in capturing new relationships. Base models generally assess products based on co-occurrence frequency, which limits their effectiveness in long-tail item scenarios. Similarly, UserKNN and TARS models underperformed compared to base models, across all three datasets.

For AcoRec^Cosine in this experiment, we used the sigmoid function for the likelihood conversion. The dropout rate was set to 0.2, the ant size was 200, and the archive size was 50.

Table 7. Comparison of long-tail item scenario on Pinterest.

Model	NDCG@10	NDCG@20	Recall@10	Recall@20	Coverage
Random	0.0015	0.0020	0.0026	0.0041	70.87
Popular	0.0000	0.0000	0.0000	0.0000	0.91
Base^Gram	0.0206	0.0352	0.0299	0.0711	33.94
Base^Cosine	0.0345	0.0529	0.0493	0.1006	48.68
Base^Jaccard	0.0371	0.0536	0.0541	0.0995	48.73
UserKNN	0.0218	0.0362	0.0319	0.0723	35.31
TARS	0.0276	0.0432	0.0414	0.0855	47.07
RecWalk^PR	0.0526	0.0715	0.0765	0.1304	57.18
RecWalk^K	0.0500	0.0708	0.0725	0.1317	57.51
EASE^R	0.0420	0.0611	0.0632	0.1179	57.59
RP³_ß	0.0542	0.0757	0.0803	0.1420	63.38
AcoRec^Gram	0.0666	0.0897	0.0977	0.1660	69.14
AcoRec^Cosine	0.0685	0.0948	0.1033	0.1799	65.22
AcoRec^Jaccard	0.0648	0.0882	0.0986	0.1652	59.16

5.3. Effect of Parameters

Figure 2 and Figure 3 illustrate the relationship between NDCG@10 and the number of iterations across different input models and scenarios. The horizontal x-axis represents the number of iterations, while the vertical y-axis represents NDCG@10. Our models are depicted with continuous lines, and each model’s base input is indicated by dashed horizontal lines in the color of that model. Dashed vertical lines on the y-axis indicate the best epoch value for our models. Dashed horizontal lines on the x-axis represent the best NDCG@10 values for the base models (Base^Gram, Base^Cosine, Base^Jaccard), each in the same color as their corresponding models. We tested our models in each dataset in ten steps, with their best parameter combinations, from 1 to 300 iterations. In both cold-start and long-tail item scenarios, our models started producing consistent results after a certain number of iterations. In Figure 2, we observed the training process of the models in the cold-start scenario. Across all datasets, model results stabilized after a specific number of iterations. For example, in the ML-1M dataset, the Gram model began producing similar results around the 80th iteration and reached its peak performance at the 120th iteration. The peak levels of the models are indicated by dashed vertical lines in the graphs.

A notable feature of our study is the model’s rapid convergence and minimal stagnation, attributed to its structure. Regarding the effect of Gaussian Distribution during training, we discovered an equal distribution in all localities.

The ants converged at the same distribution position as the focus space tightened, throughout the iterations. At this point, when the variance decreased below a specified threshold, our model completed its training. During our experiments, we observed that the models quickly achieved high success, and that beyond this point further iterations did not impact its performance.

Figure 4 and Figure 5 examine the relationship between our models ‘Ant Size’ and ‘Archive Size’ parameters. ‘Ant Size’ determines how many points you sample from your Gaussian distribution, while the ‘Archive Size’ is the number of points used as input for the Gaussian negative log-likelihood loss. In the figures, we set the Ant Size values to {200, 100, 50} and the Archive Size values to {50, 20, 10}. For each dataset, we selected the most successful <ant size, archive size> values pair. To better understand the differences between the results, we normalized the values using max–min normalization, and displayed them in Figure 4 and Figure 5. We found that the best parameter values vary, depending on the dataset. For example, the AcoRec^Cosine model only produced meaningful results with the <50, 50> values (i.e., ant size = 50 and archive size = 50) in the cold-start scenario with the Netflix dataset. The AcoRec^Gram model performed better with the <200, 20> pair. The AcoRec^Jaccard model required a low ‘Ant Size’ value.

5.4. Experimental Environment and Tools

The experimental results presented in this paper were conducted in a hyper-threading test environment which was supported by the TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources) [47]. We evaluated all benchmarks with the Python-based open-source recommendation toolkit Cornac https://cornac.preferred.ai (accessed on 5 July 2024) [48,49], and we installed codes in the Codeocean cluster https://codeocean.com/capsule/4724589/tree/v2 (accessed on 5 July 2024) and Github repository https://github.com/yilmazerhakan/acorec (accessed on 5 July 2024).

6. Discussion

While the similarity matrices (Base^Gram, Base^Cosine, and Base^Jaccard) are not particularly effective when employed alone as recommendation models in both scenarios (recommendations for cold-start users and recommendations of long-tail items), they exhibit strong performance when combined with AcoRec (i.e., AcoRec^Gram, AcoRec^Cosine, and AcoRec^Jaccard). We estimated the percentage improvements for each metric by comparing our three AcoRec models to their corresponding item-based similarity models in Table 2, Table 3 and Table 4. Table 8 illustrates the percentage enhancement of each AcoRec model over its base item-similarity model in a cold-start scenario. The results demonstrate that our AcoRec models significantly enhance the performance of their base models. Notably, the improvements in the Gram model surpass those in the Jaccard and Cosine similarity models.

On the other hand, the baseline models perform poorly in the long-tail item scenario. The cold-start scenario requires highlighting less-prominent items, which the baseline models struggle to do because of their inherent focus on popular items.

Table 9 shows each AcoRec model’s improvement percentage on its base item-similarity models (i.e., Base^Gram, Base^Cosine, Base^Jaccard) in the long-tail item scenario. The results indicate that our AcoRec models significantly enhance the performance of their base models, especially in the long-tail scenario. These improvements surpass those observed for cold-start users, demonstrating the model’s efficacy in highlighting diverse items.

The comparisons have shown that AcoRec models also provide further improvements on the Gram matrix. The Gram matrix, used as input without normalization, retains more inherent information about data relationships, proving beneficial during iterations. Notably, our models exclusively utilized implicit data, avoiding ethical concerns related to demographic, personal, or tracking data.

One of our observations from all the experiments is that, while a model may excel in one dataset, it can fail in another. However, the results of the experiments conducted in this study demonstrated that our models consistently delivered successful and stable results across all datasets. The fundamental reason for our study’s success across different scenarios is its parametric structure, which allows for flexibility in addressing diverse contexts. The cold-start and long-tail item scenarios require evaluating items under completely different conditions. In the cold-start scenario, models generally achieve success by highlighting popular items. This is evident from the success of the baseline models (Popular, Gram, Cosine, and Jaccard), which emphasize high-frequency relationships among items, predominantly found among popular items. In contrast, the long-tail item scenario focuses on the ability to highlight less-popular items. The β parameter in our algorithm is automatically tuned, allowing it to adapt to the specific requirements of each scenario and exhibit the desired behavior. Despite the failure of base algorithms in this scenario, our models have shown quite successful results using these inputs.

Another observation is that, in the experiments, an inverse correlation is observed between NDCG and Recall metrics with Coverage, where increased NDCG and Recall values led to decreased list diversity. One of the most crucial aspects of our models is their ability to improve both metrics simultaneously.

We established that data sparsity contributes to the cold-start issue and that addressing the cold-start problem effectively requires incorporating a popularity bias; our heuristic AcoRec model showed promising results in mitigating data unavailability in the cold-start and long-tail item scenarios.

If we were to discuss the drawbacks of our model, computing the input model (i.e., the similarity matrix used in our model) in high-dimensional datasets can incur computational costs. For example, computing similarities such as Gram, Cosine, or Jaccard can be challenging in high-dimensional spaces. However, such computations can be carried out as a pre-processing step, and no operations are performed on these inputs during our model’s training. Additionally, as mentioned in [40], the Gram matrix can be computed more efficiently using the Coppersmith–Winograd algorithm.

7. Conclusions

Our paper introduced AcoRec, a novel heuristic-based model that enhances item-based models by incorporating continuous Ant Colony Optimization for hyperparameter tuning. With this model, we aimed to generate diverse recommendations, addressing challenges related to cold-start users and long-tail items. Unlike traditional ACO models, AcoRec can be customized for different similarity models and domains. AcoRec, through ACO_ℝ, performs personalized hyperparameter searches to enhance recommendation quality and diversity.

We compared our three models (AcoRec^Gram, AcoRec^Cosine, AcoRec^Jaccard) against state-of-the-art models for three datasets from different domains using five metrics. AcoRec^Gram ranked first in sixteen out of thirty experiments and second in nine, while AcoRec^Cosine ranked first in six and second in ten. AcoRec^Jaccard secured first place in four experiments and second in four. The results indicated that our three AcoRec-based models successfully maintained recommendation quality while offering diverse recommendation lists.

Future research could contribute by developing a metric that balances relevance and diversity, facilitating the generation of recommendations that excel in both aspects. In addition, AcoRec’s continuous-domain parameter search is versatile; thus, future research might consider adapting it to other similarity or proximity methods to allow users to fine-tune hyperparameters.

Author Contributions

Conceptualization, H.Y. and S.A.Ö.; Methodology, H.Y. and S.A.Ö.; Software, H.Y.; Validation, H.Y.; Formal analysis, H.Y.; Investigation, H.Y. and S.A.Ö.; Resources, H.Y.; Data curation, H.Y.; Writing—original draft, H.Y.; Writing—review & editing, H.Y. and S.A.Ö.; Visualization, H.Y.; Supervision, H.Y. and S.A.Ö. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This research employed publicly available datasets for its experimental studies. The original data presented in the study are openly available at https://codeocean.com/capsule/4724589/tree/v2 (accessed on 5 July 2024), https://doi.org/10.24433/CO.7483457.v2 (accessed on 5 July 2024).

Acknowledgments

The numerical calculations reported in this paper were performed entirely in the TUBITAK ULAKBIM High Performance and Grid Computing Center.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-Based Recommendations with Recurrent Neural Networks. arXiv 2016, arXiv:1511.06939. [Google Scholar]
Olaleke, O.; Oseledets, I.; Frolov, E. Dynamic Modeling of User Preferences for Stable Recommendations. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation, and Personalization, Utrecht, The Netherlands, 21–25 June 2021; ACM: Utrecht, The Netherlands, 2021; pp. 262–266. [Google Scholar]
Vargas, S. Novelty and Diversity Enhancement and Evaluation in Recommender Systems and Information Retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, QLD, Australia, 6–11 July 2014; ACM: Gold Coast, QLD, Australia, 2014; p. 1281. [Google Scholar]
Balabanović, M.; Shoham, Y. Fab: Content-Based, Collaborative Recommendation. Commun. ACM 1997, 40, 66–72. [Google Scholar] [CrossRef]
Ar, Y.; Bostanci, E. A Genetic Algorithm Solution to the Collaborative Filtering Problem. Expert Syst. Appl. 2016, 61, 122–128. [Google Scholar] [CrossRef]
Dorigo, M.; Gambardella, L.M. Ant Colonies for the Travelling Salesman Problem. Biosystems 1997, 43, 73–81. [Google Scholar] [CrossRef] [PubMed]
Sobecki, J.; Tomczak, J.M. Student Courses Recommendation Using Ant Colony Optimization. In Intelligent Information and Database Systems; Nguyen, N.T., Le, M.T., Świątek, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; Volume 5991, pp. 124–133. ISBN 9783642121005/9783642121012. [Google Scholar]
Bellaachia, A.; Alathel, D. Trust-Based Ant Recommender (T-BAR). In Proceedings of the 2012 6th IEEE International Conference Intelligent Systems, Sofia, Bulgaria, 6–8 September 2012; pp. 130–135. [Google Scholar]
Bellaachia, A.; Alathel, D. DT-BAR: A Dynamic ANT Recommender to Balance the Overall Prediction Accuracy for All Users. In Computer Science & Information Technology (CS & IT), Proceedings of the Second International Conference on Computational Science and Engineering (CSE-2014), Dubai, United Arab Emirates, 4–5 April 2014; Academy & Industry Research Collaboration Center (AIRCC): Dubai, United Arab Emirates, 2014; pp. 141–151. [Google Scholar]
Massa, P.; Avesani, P. Trust Metrics in Recommender Systems. In Computing with Social Trust; Golbeck, J., Ed.; Springer: London, UK, 2009; pp. 259–285. ISBN 9781848003552/9781848003569. [Google Scholar]
Bedi, P.; Sharma, R. Trust Based Recommender System Using Ant Colony for Trust Computation. Expert Syst. Appl. 2012, 39, 1183–1190. [Google Scholar] [CrossRef]
Gohari, F.S.; Haghighi, H.; Aliee, F.S. A Semantic-Enhanced Trust Based Recommender System Using Ant Colony Optimization. Appl. Intell. 2017, 46, 328–364. [Google Scholar] [CrossRef]
Parvin, H.; Moradi, P.; Esmaeili, S. TCFACO: Trust-Aware Collaborative Filtering Method Based on Ant Colony Optimization. Expert Syst. Appl. 2019, 118, 152–168. [Google Scholar] [CrossRef]
Tengkiattrakul, P.; Maneeroj, S.; Takasu, A. Applying Ant-Colony Concepts to Trust-Based Recommender Systems. In Proceedings of the 18th International Conference on Information Integration and Web-Based Applications and Services, Singapore, 28–30 November 2016; ACM: Singapore, 2016; pp. 34–41. [Google Scholar]
Tengkiattrakul, P.; Maneeroj, S.; Takasu, A. Integrating the Importance Levels of Friends into Trust-Based Ant-Colony Recommender Systems. Int. J. Web Inf. Syst. 2019, 15, 28–46. [Google Scholar] [CrossRef]
Bellaachia, A.; Alathel, D. Improving the Recommendation Accuracy for Cold Start Users in Trust-Based Recommender Systems. Int. J. Comput. Commun. Eng. 2016, 5, 206–214. [Google Scholar] [CrossRef]
Kaleroun, A.; Batra, S. Collaborating Trust and Item-Prediction with Ant Colony for Recommendation. In Proceedings of the 2014 Seventh International Conference on Contemporary Computing (IC3), Noida, India, 7–9 August 2014; IEEE: Noida, India, 2014; pp. 334–339. [Google Scholar]
Liao, X.; Wu, H.; Wang, Y. Ant Collaborative Filtering Addressing Sparsity and Temporal Effects. IEEE Access 2020, 8, 32783–32791. [Google Scholar] [CrossRef]
Liao, X.; Li, X.; Xu, Q.; Wu, H.; Wang, Y. Improving Ant Collaborative Filtering on Sparsity via Dimension Reduction. Appl. Sci. 2020, 10, 7245. [Google Scholar] [CrossRef]
Nadi, S.; Saraee, M.H.; Bagheri, A.; Davarpanh Jazi, M. FARS: Fuzzy Ant Based Recommender System for Web Users. Int. J. Comput. Sci. Issues 2011, 8, 203–209. [Google Scholar]
Resnick, P.; Iacovou, N.; Suchak, M.; Bergstrom, P.; Riedl, J. GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work—CSCW’94, Chapel Hill, NC, USA, 22–26 October 1994; ACM Press: Chapel Hill, NC, USA, 1994; pp. 175–186. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; ACM: Hong Kong, China, 2001; pp. 285–295. [Google Scholar]
Ferrari Dacrema, M.; Cremonesi, P.; Jannach, D. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; ACM: Copenhagen, Denmark, 2019; pp. 101–109. [Google Scholar]
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [Google Scholar] [CrossRef]
Blum, C. Ant Colony Optimization: Introduction and Recent Trends. Phys. Life Rev. 2005, 2, 353–373. [Google Scholar] [CrossRef]
Riadi, I.C.J. Cognitive Ant Colony Optimization: A New Framework in Swarm Intelligence. Ph.D. Thesis, University of Salford, Salford, UK, 2014. [Google Scholar]
Socha, K.; Dorigo, M. Ant Colony Optimization for Continuous Domains. Eur. J. Oper. Res. 2008, 185, 1155–1173. [Google Scholar] [CrossRef]
Stützle, T.; López-Ibáñez, M.; Pellegrini, P.; Maur, M.; Montes De Oca, M.; Birattari, M.; Dorigo, M. Parameter Adaptation in Ant Colony Optimization. In Autonomous Search; Hamadi, Y., Monfroy, E., Saubion, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 191–215. ISBN 9783642214332/9783642214349. [Google Scholar]
Nikolakopoulos, A.N.; Kalantzis, V.; Gallopoulos, E.; Garofalakis, J.D. EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations. Knowl. Inf. Syst. 2019, 58, 59–81. [Google Scholar] [CrossRef]
Frolov, E.; Oseledets, I. HybridSVD: When Collaborative Information Is Not Enough. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; ACM: Copenhagen, Denmark, 2019; pp. 331–339. [Google Scholar]
Paudel, B.; Christoffel, F.; Newell, C.; Bernstein, A. Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Applications. ACM Trans. Interact. Intell. Syst. 2017, 7, 1–34. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2016, 5, 1–19. [Google Scholar] [CrossRef]
Netflix Prize Data. Available online: https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data (accessed on 21 May 2024).
Xiangnan, H.; Lizi, L.; Hanwang, Z. Neural Collaborative Filtering. In Proceedings of the International World Wide Web Conference, Perth, Australia, 3–7 April 2017; ACM: New York, NY, USA, 2017. ISBN 978-1-4503-4913-0/17/04. [Google Scholar] [CrossRef]
Cremonesi, P.; Koren, Y.; Turrin, R. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, Spain, 26–30 September 2010; ACM: Barcelona, Spain, 2010; pp. 39–46. [Google Scholar]
Krichene, W.; Rendle, S. On sampled metrics for item recommendation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1748–1757. [Google Scholar]
Basilico, J.; Hofmann, T. A Joint Framework for Collaborative and Content Filtering. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 25–29 July 2004; ACM: Sheffield, UK, 2004; pp. 550–551. [Google Scholar]
Deshpande, M.; Karypis, G. Item-Based Top- N Recommendation Algorithms. ACM Trans. Inf. Syst. 2004, 22, 143–177. [Google Scholar] [CrossRef]
Nikolakopoulos, A.N.; Karypis, G. RecWalk: Nearly Uncoupled Random Walks for Top-N Recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, VIC, Australia, 11–15 February 2019; ACM: Melbourne, VIC, Australia, 2019; pp. 150–158. [Google Scholar]
Steck, H. Embarrassingly Shallow Autoencoders for Sparse Data. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–19 May 2019; ACM: San Francisco, CA, USA, 2019; pp. 3251–3257. [Google Scholar]
Bobadilla, J.; Ortega, F.; Hernando, A.; Gutiérrez, A. Recommender Systems Survey. Knowl.-Based Syst. 2013, 46, 109–132. [Google Scholar] [CrossRef]
Son, L.H. Dealing with the New User Cold-Start Problem in Recommender Systems: A Comparative Review. Inf. Syst. 2016, 58, 87–104. [Google Scholar] [CrossRef]
Ahn, H.J. A New Similarity Measure for Collaborative Filtering to Alleviate the New User Cold-Starting Problem. Inf. Sci. 2008, 178, 37–51. [Google Scholar] [CrossRef]
Anderson, C. The Long Tail: Why the Future of Business Is Selling Less of More; Hachette Books: New York, NY, USA, 2016; ISBN 9781401384630. [Google Scholar]
Yin, H.; Cui, B.; Li, J.; Yao, J.; Chen, C. Challenging the Long Tail Recommendation. arXiv 2012. [Google Scholar] [CrossRef]
Türk Ulusal Bilim E-Altyapısı—TRUBA. Available online: https://www.truba.gov.tr (accessed on 21 May 2024).
Salah, A.; Truong, Q.-T.; Lauw, H.W. Cornac: A Comparative Framework for Multimodal Recommender Systems. J. Mach. Learn. Res. 2020, 21, 1–5. [Google Scholar]
Truong, Q.-T.; Salah, A.; Tran, T.-B.; Guo, J.; Lauw, H.W. Exploring Cross-Modality Utilization in Recommender Systems. IEEE Internet Comput. 2021, 25, 50–57. [Google Scholar] [CrossRef]

Figure 1. The archive of solutions kept by ants.

Figure 2. In the col- start user scenario, the effect of the iteration and comparison with each baseline was evaluated using the NDCG metric.

Figure 3. In the long-tail item scenario, the effect of the iteration and comparison with each baseline was evaluated using the NDCG metric.

Figure 4. The effects of ant and archive size are evaluated using the NDCG metric in the cold-start user scenario.

Figure 5. The effects of ant and archive size are evaluated using the NDCG metric in the long-tail item scenario.

Table 1. Benchmark dataset.

Dataset		Domain	#User	#Item	#Rating	Sparsity	Density
Samples	ML-1M	Movie	6038	3487	575,281	98.073	1.927
	Netflix	Movie	11,585	6897	491,595	99.691	0.309
	Pinterest	Music	7224	5005	170,340	99.385	0.615
Originals	ML-1M	Movie	6040	3952	1M	95.809	1.927
	Netflix	Movie	480K	17K	100M	98.822	0.148
	Pinterest	Music	55,187	9916	1.5M	99.722	0.278

Table 8. Comparisons of AcoRec with Base Models for Cold-Start User Scenario.

Dataset	Model	NDCG@10	Recall@10	Coverage
ML-1M	Base^Gram	48.77%	43.25%	515.80%
	Base^Cosine	12.56%	12.34%	93.46%
	Base^Jaccard	11.32%	11.70%	36.56%
Netflix	Base^Gram	24.25%	22.32%	45.76%
	Base^Cosine	2.77%	0.33%	8.07%
	Base^Jaccard	0.56%	3.39%	5.56%
Pinterest	Base^Gram	4.71%	5.96%	42.98%
	Base^Cosine	3.80%	8.73%	4.29%
	Base^Jaccard	5.26%	0.75%	3.15%

Table 9. Comparisons of AcoRec with Base Models for Long-tail Item Scenario.

Dataset	Model	NDCG@10	Recall@10	Coverage
ML-1M	Base^Gram	9810.00%	12,970.00%	1442.19%
	Base^Cosine	1232.39%	1100.00%	305.01%
	Base^Jaccard	707.20%	647.46%	194.75%
Netflix	Base^Gram	352.41%	296.92%	125.91%
	Base^Cosine	60.54%	53.70%	45.79%
	Base^Jaccard	44.03%	38.45%	31.53%
Pinterest	Base^Gram	223.30%	226.76%	103.71%
	Base^Cosine	98.55%	109.53%	33.98%
	Base^Jaccard	74.66%	82.26%	21.40%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yılmazer, H.; Özel, S.A. Diverse but Relevant Recommendations with Continuous Ant Colony Optimization. Mathematics 2024, 12, 2497. https://doi.org/10.3390/math12162497

AMA Style

Yılmazer H, Özel SA. Diverse but Relevant Recommendations with Continuous Ant Colony Optimization. Mathematics. 2024; 12(16):2497. https://doi.org/10.3390/math12162497

Chicago/Turabian Style

Yılmazer, Hakan, and Selma Ayşe Özel. 2024. "Diverse but Relevant Recommendations with Continuous Ant Colony Optimization" Mathematics 12, no. 16: 2497. https://doi.org/10.3390/math12162497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Diverse but Relevant Recommendations with Continuous Ant Colony Optimization

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Ant Colony Optimization

3.2. Ant Colony Optimization in the Continuous Domain

3.3. Stochastic Approach of AcoRec

3.4. Heuristic Base of AcoRec and Item Model Selection

3.4.1. Gram Matrix (Gram)

3.4.2. Cosine Similarity (Cosine)

3.4.3. Jaccard Similarity (Jaccard)

4. Evaluation

4.1. Dataset

4.2. Evaluation Metrics

4.2.1. Recall

4.2.2. Normalized Discounted Cumulative Gain (NDCG)

4.2.3. Coverage

4.3. Baselines

4.4. Parameter Tuning and Experimental Setup

5. Results

5.1. Cold-Start User Scenario

5.2. Long-Tail Item Scenario

5.3. Effect of Parameters

5.4. Experimental Environment and Tools

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI