Next Article in Journal
Design and ARM-Based Implementation of Bitstream-Oriented Chaotic Encryption Scheme for H.264/AVC Video
Previous Article in Journal
An Interval Iteration Based Multilevel Thresholding Algorithm for Brain MR Image Segmentation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Method Combining Pattern Prediction and Preference Prediction for Next Basket Recommendation

1
College of Computer Science and Technology, Jilin University, Changchun 130012, China
2
Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun 130012, China
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(11), 1430; https://doi.org/10.3390/e23111430
Submission received: 16 September 2021 / Revised: 18 October 2021 / Accepted: 25 October 2021 / Published: 29 October 2021

Abstract

:
Market basket prediction, which is the basis of product recommendation systems, is the concept of predicting what customers will buy in the next shopping basket based on analysis of their historical shopping records. Although product recommendation systems develop rapidly and have good performance in practice, state-of-the-art algorithms still have plenty of room for improvement. In this paper, we propose a new algorithm combining pattern prediction and preference prediction. In pattern prediction, sequential rules, periodic patterns and association rules are mined and probability models are established based on their statistical characteristics, e.g., the distribution of periods of a periodic pattern, to make a more precise prediction. Products that have a higher probability will have priority to be recommended. If the quantity of recommended products is insufficient, then we make a preference prediction to select more products. Preference prediction is based on the frequency and tendency of products that appear in customers’ individual shopping records, where tendency is a new concept to reflect the evolution of customers’ shopping preferences. Experiments show that our algorithm outperforms those of the baseline methods and state-of-the-art methods on three of four real-world transaction sequence datasets.

1. Introduction

Data mining technology is an efficient tool for business. Early in 1993, Agrawal et al. [1] proposed association rule mining for transaction databases to discover the intrinsic connection between different products and the shopping habits of customers. This technology makes sales prediction easier. By predicting what customers will buy in the next shopping basket and then recommending the products to them, retailers can improve their services and promote sales. We call such a technique market basket prediction, which is the basis of product recommendation systems.
Since Agrawal et al. proposed association rule mining, both data mining and recommendation systems have been developing rapidly. On the one hand, sequential patterns [2], sequential rules [3], coverage patterns [4], temporal patterns [5], subgraph patterns [6] and periodic patterns [7] have been proposed. Data mining, as an increasingly sophisticated technology, has been used for many domains, such as time series analysis [8], medicine [9] and image processing [10]. On the other hand, recommendation systems include other kinds of implementation methods including pattern-based models, collaborative filtering [11] and Markov chains [12]. Advances in data mining technology make pattern-based models promising. There are many efficient and ready-made algorithms for pattern mining [13], and they can be easily used to implement pattern-based recommendation systems.
A pattern reveals the relation between different products, which makes pattern-based models comprehensive. Among them, a sequential rule reveals the relation between the products in two consecutive transactions. This means that a customer bought a product at some time and will buy another product at a future time. For example, if a customer buys a computer, he or she will need a U-Disk or printer possibly when working on the computer. The higher the confidence of a sequential rule, the higher the possibility. A number of sophisticated and efficient sequential rule mining algorithms have been proposed, including RuleGen [13], ERMiner [14] and RuleGrowth [3,15]. Because all products have a limited service life, when a product is used out, we will buy another product again. Therefore, some products periodically appear in our market baskets. If we know the period of a periodic pattern, then we can predict when it will appear again. Some periodic pattern mining algorithms have been proposed, including SPP [16], MPFPS [17] and LPPM [18]. The association rule reveals the relation between the products in a basket. This means that if a customer buys a product, he or she will buy another product at the same time. Some efficient algorithms such as TopKRules [19] and TNR [20] focus on association rule mining.
As described above, pattern-based models have the advantages of popularity and comprehensibility. However, existing pattern-based recommendation algorithms are insufficient to capture customers’ shopping habits. For example, Ref. [21] focused on the association rule only, and the periodicity was neglected. Furthermore, the statistical characteristics inside a pattern, e.g., the distribution of periods of a periodic pattern, are also neglected. For example, by the current definition of periodic patterns [22], when a periodic pattern is used to make a prediction, we can only predict that the pattern will reoccur between a time interval but the probability at an exact time.
In terms of the disadvantages of existing methods, our method leverages not only the association rule but also the sequential rule and periodic pattern for prediction at the same time. We call such a strategy pattern prediction. Furthermore, the frequency and tendency of a product will be considered the preference that customers have for this product and the evolution of preference, respectively, to make predictions, which we call preference prediction. Combining pattern prediction and preference prediction, we propose a new algorithm for market basket prediction, which we call S P A P ( S equential rule, P eriodic pattern, A ssociation rule, and P reference).
In this paper, first, we present a new definition of periodic patterns and tendencies. Generally, if a product is bought periodically by a customer, then the period will be nearly equal to the service life of the product. However, service lives of a kind of product may differ from one another, leading to a fluctuation in the period. What type of pattern is a periodic pattern, which has a virtue to reveal the periodicity of product purchases that have not a fixed period? Obviously, if a pattern has periods that the fluctuation is too large compared to the average period, it will not be periodic. Taking the average period and standard deviation into account, the coefficient of variation, which is the specific value of the standard deviation and mean value, is used to measure the periodicity of patterns in our definition. The concept of tendency is based on the following considerations: if a product is more frequently bought in recent baskets than in early baskets of a customer, then the customer tends to be increasingly inclined to the product. Otherwise, if a product is more frequently bought in early baskets than recent ones, the customer tends to be increasingly estranged from the product. We use a new concept of tendency to reflect this fact.
Second, we propose probability models for pattern prediction. The sequential rule reveals the relation of two patterns belonging to two consecutive transactions. The former pattern is called the antecedent, and the latter the consequent. When a sequential rule is used for predicting the next basket, the time interval between antecedent and consequent is usually neglected. That is, the consequent will follow the antecedent with a given confidence, however, we do not know for sure at what time it occurs. In this paper, we learn the statistical model for the time interval of all sequential rules in the training data. In prediction, we use the statistical model to compute the probability of the occurrence of consequents at an exact time stamp. For periodic patterns, the statistical model is determined by the average period and standard deviation. After training, we obtain all sequential rules, periodic patterns and association rules, along with their statistical characteristics. Consequently, we can calculate the probability of all products in a customer’s next basket. Products that have a higher probability will have priority to be recommended. If the quantity of recommended products is insufficient, then we will make a preference prediction to select more products.
Preference prediction is based on this observation: if a product is more frequently bought by a customer, then we draw the conclusion that the customer has a preference for this product, and this product will have priority to be recommended. If some products have the same frequency, then the product with a higher tendency will be selected first in such a case.
Our contributions in this paper are summarized as follows:
We present a new definition for periodic patterns and the tendency of patterns.
We propose probability models for pattern prediction to predict the next basket.
We design a new algorithm combining pattern prediction and preference prediction for next basket recommendation.
Empirically, we show that our algorithm outperforms the baseline methods and state-of-the-art methods on three of four real-world transaction sequence datasets under the evaluation metrics of F 1 - S c o r e and H i t - R a t i o .
The remainder of this paper is organized as follows: Section 2 reviews the existing approaches. Section 3 includes the preliminary. We introduce our prediction method in Section 4. The implementation of our algorithm is described in Section 5, and the experimental analysis is reported in Section 6. Finally, we draw a conclusion in Section 7.

2. Related Work

Implementation methods of recommendation systems can be categorized into sequential, general, pattern-based, and hybrid models. Sequential models [23,24], mostly relying on Markov chains, explore sequential transaction data by predicting the next purchase based on the last actions to capture sequential behavior. A major advantage of this model is its ability to capture sequential behavior to provide good recommendations. The general model [25], in contrast, does not consider sequential behavior but makes recommendations based on customers’ whole purchase history. The key idea is collaborative filtering. The pattern-based model bases predictions on the frequent patterns that are extracted from the shopping records of all customers [26]. Among them, the hybrid model combines the models mentioned above or other ideas, such as graph-based models [27] and recurrent neural network models [28,29]. Since there are so many works devoted to recommendation systems, it is impossible to list all here. So, we only briefly review pattern-based approaches in the next paragraph.
Fu et al. [30] first used an association rule for recommendation systems. Candidate items are listed for her in their order of support. Wang et al. [31] proposed an association rule mining algorithm with maximal nonblank for recommendation. The weighted association rule mining algorithm based on F P - t r e e and its application procedure in personalization recommendation was given by Wang et al. [32]. Ding et al. [33,34] proposed a method for personalized recommendation, which could decrease the number of association rules by merging different rules. Li et al. [21,35] proposed the notion of strongest association rules ( S A R s ), and developed a matrix-based algorithm for mining S A R sets. As the subset of the entire association rule set, the S A R set includes many fewer rules with the special suitable form for personalized recommendation without information loss. Lazcorreta et al. [26] applied a modified version of the well-known a p r i o r i data mining algorithm towards personalized recommendation. Najafabadi et al. [36] applied the users’ implicit interaction records with items to efficiently process massive data by employing association rules mining. It captures the multiple purchases per transaction in association rules, rather than just counting total purchases made. Chen et al. [37] mined simple association rules with a single item in consequent to avoid exponential pattern growth. The method proposed by Zhou et al. [38] involved implementation of genetic network programming and ant colony optimization to solve the sequential rule mining problem for commercial recommendations in time-related transaction sequence databases. Maske et al. [39] proposed a method describing how customer behavior predicted based on the customer purchase items by association rule mining algorithm Apriori.
In the other methods, Cumby et al. [40] proposed a predictor that embraces a user-centric vision by reformulating basket prediction as a classification problem. They build a distinct classifier for every customer and perform predictions by relying just on their personal data. Unfortunately, this approach assumes the independence of items purchased together. Wang et al. [41] employed a two-layer structure to construct a hybrid representation over customers and items purchase history from last transactions: the first layer represents the transactions by aggregating item vectors from the last transactions, while the second layer realizes the hybrid representation by aggregating the customer’s vectors and the transactions representations. Guidotti et al. [42,43,44] defined a new pattern named the Temporal Annotated Recurring Sequence ( T A R S ), which seeks to simultaneously and adaptively capture the co-occurrence, sequentiality, periodicity and recurrence of the items in the transaction sequence. Jain et al. [45] designed a business strategy prediction system for market basket analysis. Kraus et al. [46] proposed similarity matching based on subsequential dynamic time warping as a novel predictor of market baskets, and leverage the Wasserstein distance for measuring the similarity among embedded purchase histories. Hu et al. [47] presented a k-nearest neighbors (kNN) based method to directly capture two useful patterns: repeated purchase pattern and collaborative purchase pattern that associate with personalized item frequency. Faggioli et al. [48] proposed an efficient solution to achieve the next basket recommendation, under a more general top-n recommendation framework by exploiting a set of collaborative filtering based techniques to capture customers’ shopping patterns and intentions.

3. Preliminary

Retailers usually preserve their customers’ shopping histories in a database that we call the transaction sequence database. A customer’s shopping history contains many transactions. Transaction, also called basket, usually contains ID, date, product list and quantity. All transactions of a customer are sorted according to date. This is called the transaction sequence, as Table 1 shows. A transaction sequence database contains all customers’ transaction sequence, as Table 2 shows.
Let C = { c 1 , . . . , c n } be a set of n customers and I = { i 1 , . . . , i m } be a set of m items or products in the market. The transaction sequence of customer c is denoted as B c , and B c = b 1 , . . . , b r c , where b i I , i { 1 , . . . , r c } , denotes a transaction or basket. The terms transaction, basket and itemset will be used interchangeably, due to the fact that we are referring to an unordered set of items (or products). The size of sequence B c is denoted as | B c | , and | B c | = r c . b r c + 1 denotes the next basket that will be purchased by customer c at the next time. We use indexes set { 1 , . . . , r c } of baskets in transaction sequence rather than formal dates as timestamps to simplify the problem. The interval of two transactions is denoted as G a p ( b i , b j ) and defined as G a p ( b i , b j ) = j i , where 1 i < j r c . The transaction sequence dataset D = { B c 1 , . . . , B c n } consists of transaction sequences of n customers.
Problem 1.
Assume a transaction sequence dataset, our aim is to predict the next basket for each customer according to their transaction sequence. Then, we will select k products to recommend to him or her. Formally, given dataset D, for all transaction sequence B c D and B c = b 1 , . . . , b r c to predict b r c + 1 , which contains a set of candidate items for recommendation. Let b c * denote the selected item set; then, b c * contains k items selected from b r c + 1 to recommend to customer c.
Definition 1.
(Frequent Itemset Pattern) Given an itemset p, p I , frequency threshold θ, and transaction sequence B = b 1 , . . . , b r . If b i B , we have p b i , then we call b i a support for p. Let S u p ( p ) denotes all supports of p, then S u p ( p ) = { b i | b i B , p b i } . The absolute frequency of p is defined as F r e q ( p ) = | S u p ( p ) | , and the relative frequency is F r e q ( p ) = | S u p ( p ) | / | B | . If F r e q ( p ) θ , then we call p a frequent itemset pattern and itemset pattern for short. If p contains only a single item, that is, | p | = 1 , we call it a single item pattern.
Definition 2.
(Association Rule) Given two itemset patterns p 1 and p 2 of a transaction sequence, confidence threshold η, if
(1)
p 1 , p 2 a n d p 1 p 2 = ,
(2)
p = p 1 p 2 is an itemset pattern, and
(3)
F r e q ( p ) / F r e q ( p 1 ) η ,
then p 1 p 2 is an association rule. Its frequency is denoted as F r e q ( p 1 p 2 ) and defined as F r e q ( p 1 p 2 ) = F r e q ( p ) . Its confidence is denoted as C o n f ( p 1 p 2 ) and defined as C o n f ( p 1 p 2 ) = F r e q ( p ) / F r e q ( p 1 ) . We call p 1 the antecedent, and p 2 the consequent.
Definition 3.
(Frequent Sequential Pattern) A sequence s = p 1 , . . . , p k is a subsequence of transaction sequence B = b 1 , . . . , b r , denoted as s B , if and only if there exist k integers { e 1 , . . . , e k } such that k r , e 1 < < e k and i { 1 , . . . , k } it holds that p i b e i . We call this integer set an embedding of s in B, denoted as E m b ( s ) , E m b ( s ) = { e 1 , . . . , e k } . E m b s ( s ) denotes the set of all embeddings of s in B. In transaction sequence dataset D, if s B c , we call B c a support of s. The set of all supports of s is denoted as S u p ( s ) and S u p ( s ) = { B c | B c D , s B c } . The absolute frequency of s is defined as F r e q ( s ) = | S u p ( s ) | and the relative frequency F r e q ( s ) = | S u p ( s ) | / | D | . Given a threshold θ for frequency, if F r e q ( s ) θ , then s is a frequent sequential pattern, sequential pattern for short.
In this paper, the length of s is denoted as L e n ( s ) , and defined as L e n ( s ) = v { 1 , . . . , k } | p v | . If a sequence s contains only an itemset p, that is, s = p , | s | = 1 , then it can be mapped into itemset p, and we have s = p . If L e n ( s ) = 1 , viz. s has only a single itemset, and this itemset contains only a single item. We call it a single item pattern.
Definition 4.
(Sequential Rule) Given two sequential patterns s 1 and s 2 of a transaction sequence dataset and confidence threshold η, if
(1)
s 1 , s 2 ,
(2)
s = s 1 , s 2 ( s 1 concatenates with s 2 ) is a sequential pattern, and
(3)
F r e q ( s ) / F r e q ( s 1 ) η ,
then we call s 1 s 2 a sequential rule. Its frequency F r e q ( s 1 s 2 ) = F r e q ( s ) , and confidence C o n f ( s 1 s 2 ) = F r e q ( s ) / F r e q ( s 1 ) . We call s 1 the antecedent, and s 2 the consequent.
Property 1.
Assume a sequential rule s 1 s 2 : C o n f , where s 2 = p 1 , . . . , p k , then we have s 1 p 1 : C o n f 1 ,..., s 1 p k : C o n f k , and C o n f C o n f 1 ,..., C o n f C o n f k .
Proof. 
The proof comes from the anti-monotonicity of sequential patterns’ frequency.    □
Definition 5.
(Periodic Pattern) Let p be an itemset pattern of transaction sequence B = b 1 , . . . , b r , λ = F r e q ( p ) . The occurrence list of p is denoted as O c c u r ( p ) , and defined as O c c u r ( p ) = { i | i { 1 , . . . , r } s . t . b i B a n d p b i } . Then, the period list of p in B is denoted as P e r ( p ) , and defined as P e r ( p ) = { p e r a = w a + 1 w a | a { 1 , . . . , λ 1 } , w a O c c u r ( p ) } . The coefficient of variation of p is denoted as C o e f v a ( p ) , and C o e f v a ( p ) = s t d ( P e r ( p ) ) / m e a n ( P e r ( p ) ) , where s t d ( * ) and m e a n ( * ) are the standard deviation and mean, respectively. Given a threshold δ for the coefficient of variation, if C o e f v a ( p ) δ , then p is a periodic itemset pattern or periodic pattern for short.
Definition 6.
(Gap of Two Itemsets) Given two itemsets p 1 and p 2 of transaction sequence B. If o 1 O c c u r ( p 1 ) , o 2 O c c u r ( p 2 ) and o 1 < o 2 , then the gap of p 1 and p 2 in B is defined as G a p ( p 1 , p 2 ) = m i n { g | o 1 O c c u r ( p 1 ) , o 2 O c c u r ( p 2 ) , o 1 < o 2 , g = o 2 o 1 } ; otherwise, G a p ( p 1 , p 2 ) = .
Definition 7.
(Gap of Two Subsequences) Given two subsequences s 1 and s 2 of transaction sequence B. If E 1 E m b s ( s 1 ) , E 2 E m b s ( s 2 ) and m a x ( E 1 ) < m i n ( E 2 ) , then the gap of s 1 and s 2 in B is defined as G a p ( s 1 , s 2 ) = m i n { g | E 1 E m b s ( s 1 ) , E 2 E m b s ( s 2 ) , m a x ( E 1 ) < m i n ( E 2 ) , g = m i n ( E 2 ) m a x ( E 1 ) } ; otherwise, G a p ( p 1 , p 2 ) = .
Definition 8.
(Tendency) Given an itemset p of transaction sequence B, we call T e n ( p ) = m e a n ( O c c u r ( p ) ) the tendency of itemset p in transaction sequence B.
Example 1.
For the transaction sequence given inTable 1, the support of itemset p = { a , h } is S u p ( p ) = { b 3 , b 9 , b 11 } , frequency F r e q ( p ) = 3 , O c c u r ( p ) = { 3 , 9 , 11 } , p e r 1 = 9 3 = 6 , p e r 2 = 11 9 = 2 , periods list P e r ( p ) = { 6 , 2 } , C o e f v a ( p ) = 0.71 . InTable 2, the frequency of sequence s 1 = { a , c } is F r e q ( s 1 ) = 7 , the frequency of sequence s 2 = { f , g } is F r e q ( s 2 ) = 6 , and the frequency of sequence s = s 1 , s 2 is F r e q ( s ) = 6 . We have C o n f ( s 1 s 2 ) = 6 / 7 . E m b s ( s ) c 1 = { { 1 , 3 } } . E m b s ( s ) c 6 = { { 1 , 3 } , { 1 , 4 } , { 2 , 3 } , { 2 , 4 } } . The frequencies of itemsets p 1 = { a , c } and p 2 = { f , g } in transaction sequence B c 6 are F r e q ( p 1 ) c 6 = 2 and F r e q ( p 2 ) c 6 = 2 , respectively. The occurrence list of itemset { d } in transaction sequence B c 2 is O c c u r ( { d } ) c 2 = { 1 , 4 } , tendency T e n ( { d } ) c 2 = 2.5 .

4. Framework

Our method includes pattern prediction and preference prediction. In pattern prediction, first, all sequential rules, periodic patterns and association rules are found together with their statistical characteristics. Then, probability models are built based on their statistical characteristics. Afterward, we use the probability models to calculate the probability of all products in the next basket for a customer. The products that have a higher probability will be selected preferentially to recommend to him or her. If k products have been selected, then continue to the prediction of the next customer; otherwise, make preference predictions. In preference prediction, the product that is more frequent in the individual shopping records will be selected first. If some products have the same frequency, then the product that has a higher tendency will be selected. Until all k products are selected, we continue to predict the next customer. See Algorithm 1. We introduce the probability list p l to preserve the probability of all items in b r + 1 , viz. p l : = { ( i t e m 1 : v a l u e 1 ) , ( i t e m 2 : v a l u e 2 ) , . . . } .
Algorithm 1: SPAP ( D , θ r , η , θ p , δ , k ) .
Entropy 23 01430 i001

4.1. Sequential Rule Prediction

4.1.1. Probability Model of Sequential Rule

Sequential rules reveal the relation between products in two consecutive transactions. This means that a customer bought a product at some time, and he or she will buy another product at a future time. However, the sequential rule defined by the previous section has a limitation to use for the next basket prediction. For example, given sequential rule E 1 E 2 : C o n f = 0.8 . If event E 1 occurs, then it will lead to event E 2 occurring at a confidence of 0.8 , and the confidence is considered as the probability here, that is, P ( E 2 | E 1 ) = 0.8 . However, we did not know for sure at what time event E 2 occurs, and did not know the probability of event E 2 occurring at an exact time after event E 1 occurred. To address this limitation, we build a probability model for time intervals of the sequential rule. The time interval of a sequential rule, i.e., the time interval between event E 1 and event E 2 , is a random variable and represented by X here, X = G a p ( E 1 , E 2 ) . Generally, the larger the time interval, the lower the relevance of E 1 and E 2 , and vice versa. We suppose that the probability model nearly follows an exponential distribution with a parameter of m e a n ( G a p ( s 1 , s 2 ) ) .
Example 2.
For a transaction sequence dataset showed inTable 2, let s 1 = { a , c } , s 2 = { f , g } and s = s 1 , s 2 = { a , c } , { f , g } , we have S u p ( s 1 ) = { c 1 , c 2 , c 3 , c 4 , c 5 , c 6 , c 7 } , S u p ( s 2 ) = { c 1 , c 2 , c 3 , c 4 , c 5 , c 6 } . If we set θ = 5 , η = 0.8 , and we have C o n f ( s 1 s 2 ) = 6 / 7 > η , then s 1 s 2 is a sequential rule. The time interval between s 1 and s 2 is G a p ( s 1 , s 2 ) c 1 = 3 1 = 2 in c 1 , where O c c u r ( s 1 ) c 1 = { 1 } and O c c u r ( s 2 ) c 1 = { 3 } . In a similar way, G a p ( s 1 , s 2 ) c 2 = 1 , G a p ( s 1 , s 2 ) c 3 = 1 , G a p ( s 1 , s 2 ) c 4 = 3 , G a p ( s 1 , s 2 ) c 5 = 2 , G a p ( s 1 , s 2 ) c 6 = 1 . Note that in sequence c 6 , we have O c c u r ( s 1 ) c 6 = { 1 , 2 } and O c c u r ( s 2 ) c 6 = { 3 , 4 } , leading G a p ( s 1 , s 2 ) c 6 to be multiple-valued. According to Definition 6, we have G a p ( s 1 , s 2 ) c 6 = m i n { ( 3 1 ) , ( 3 2 ) , ( 4 1 ) , ( 4 2 ) } = 1 . Consequently, we obtain a probability model for G a p ( s 1 , s 2 ) as P { G a p ( s 1 , s 2 ) = 1 } = 1 / 2 , P { G a p ( s 1 , s 2 ) = 2 } = 1 / 3 and P { G a p ( s 1 , s 2 ) = 3 } = 1 / 6 , respectively.

4.1.2. Principle of Sequential Rule Prediction

Given transaction sequence B = b 1 , . . . , b r , sequential rule s 1 s 2 : C o n f , and its probability distribution P { X = G a p ( s 1 , s 2 ) } of time intervals between s 1 and s 2 . Suppose the consequent contains only a single itemset, that is, | s 2 | = 1 (Since if | s 2 | > 1 , then we can break it down into several sequential rules, which have a consequent containing only a single itemset, according to Property 1). If s 1 B , then P ( s 1 B ) = 0 . Otherwise, P ( s 1 b 1 , . . . , b λ ) = 1 , where λ = m a x { e | E E m b s ( s 1 ) , e E } . Since sequential rule s 1 s 2 : C o n f means that P ( s 2 b λ + 1 , b λ + 2 , . . . | s 1 b 1 , . . . , b λ ) = C o n f , then P ( s 2 b λ + 1 , b λ + 2 , . . . ) = C o n f × P ( s 1 b 1 , . . . , b λ ) = C o n f . Suppose A 1 denote the event s 2 b r + 1 . In a similar way, A u denote the event s 2 b r + u . If event A 1 occurs, then G a p ( s 1 , s 2 ) = r + 1 λ , P ( A 1 | s 2 b λ + 1 , b λ + 2 , . . . ) = P { X = ( r + 1 λ ) } . In a similar way, if event A u occurs, then G a p ( s 1 , s 2 ) = r + u λ , P ( A u | s 2 b λ + 1 , b λ + 2 , . . . ) = P { X = ( r + u λ ) } . Since P ( s 2 b λ + 1 , b λ + 2 , . . . ) = C o n f , we obtain the probability of s 2 b r + u , P ( A u ) = P { X = ( r + u λ ) } × C o n f .
First, the time variable is continuous in general. However, in our case, we use the index of baskets as the timestamp, so the time variable is discretized. The probability of an exact value of the time variable is the probability within a unit interval over this value. Second, if an item is contained in the consequents of several sequential rules, then it will be predicted several times. In such a case, we update its probability as the maximal value. See Algorithm 2.
Algorithm 2: SequentialRulePrediction ( R u l e S e t , B , b r + 1 , p l ) .
Entropy 23 01430 i002
Continue with the Example 2 Let B = { a , b , c } , { b , e , g } , { a , c , d , e } , { c , f } ; we predict the probability of all products in b 5 . In Example 2, we obtain a sequential rule s 1 s 2 : C o n f and its probability model. E m b s ( s 1 ) = { { 1 } , { 3 } } , λ = 3 . If s 2 b 5 , then G a p ( s 1 , s 2 ) = 2 . We have P { G a p ( s 1 , s 2 ) = 2 } = 1 / 3 from the probability distribution. Finally, we have P ( { f , g } b 5 ) = 6 / 7 × 1 / 3 = 6 / 21 , as Figure 1 shows.

4.2. Periodic Pattern Prediction

4.2.1. Probability Model of Periodic Pattern

Periodic events intrinsically reoccur with a fixed period. However, in the real world, the period is influenced by different factors, leading to a fluctuation in the period. In this case, we only have an average value for periods. As we all know, the smaller the fluctuation, the better. So, we compare the fluctuation with the average period. If the fluctuation is too large compared to the average period, we do not define it as a periodic event. The coefficient of variation, which is the specific value of the standard deviation and mean value, is suitable for periodicity measure, since standard deviation is a good measure for fluctuation. The service life of a product is no exception. If a product is bought periodically by a customer, then the period will be nearly equal to the service life of the product. However, service lives of a kind of products may differ from one another, leading to a fluctuation in the period.
According to Definition 5, we define periodic patterns based on the coefficient of variation. If a pattern has a higher coefficient of variation, which means that the standard deviation is too large compared to the mean value, the pattern is not periodic. Otherwise, we classify it as a periodic pattern. Given a periodic pattern, if it occurs at some time, then it is more likely to reoccur after an average period of time. The probability at a time closer to the time after an average period later has a higher value; otherwise, it has a lower value. We suppose that the probability model of periods follows a normal distribution and has two parameters: m e a n ( P e r ( p ) ) and s t d ( P e r ( p ) ) , as Figure 2 shows.

4.2.2. Principle of Periodic Pattern Prediction

Given transaction sequence B = b 1 , . . . , b r , and a periodic pattern p of B, the prediction of the probability of P ( p b r + 1 ) is analogous to the sequential rule prediction.

4.3. Association Rule Prediction

Given an association rule p 1 p 2 : C o n f , if itemset pattern p 1 b , then itemset pattern p 2 b with a probability of P ( p 2 b | p 1 b ) = C o n f . If p 1 b , then P ( p 1 b ) = 0 , and P ( p 2 b | p 1 b ) = 0 .
After sequential rule prediction and periodic pattern prediction, we obtain a set of candidate products with their probability in b r + 1 . At the same time, we define the probability of p 1 b r + 1 as P ( p 1 b r + 1 ) = m i n { P ( i ) | i p 1 , ( i : P ( i ) ) p l } . Therefore, we get the probability of p 2 b r + 1 as P ( p 2 b r + 1 ) = P ( p 1 b r + 1 ) × C o n f . See Algorithm 3.
After pattern prediction, if a product has not been predicted, then it will have a default probability value of zero and is not included in b r + 1 to be recommended in the pattern prediction stage. Therefore, we obtain a set of candidate products along with their probabilities in the next basket b r + 1 . In b r + 1 , a product with a higher probability will have priority to be recommended to customers. If | b r + 1 | < k , that is, the number of candidate items is less than k in b r + 1 , then we will make a preference prediction to continue to select products to recommend until we get all k products.
Algorithm 3: AssociationRulePrediction ( R u l e S e t , b r + 1 , p l ) .
Entropy 23 01430 i003

4.4. Preference Prediction

In pattern prediction, we selected a set of products to recommend to a customer. If the number of selected products is less than k, then we continue to select based on preference prediction.
In preference prediction, if a product is more frequently bought by a customer, then we conclude that the customer has a preference for this product. However, the preference will evolve over time. The purchase distribution of a product over a shopping history will indicate a change in preference. If a product is more frequently bought in the recent baskets than the earlier ones of a customer, then the customer is more and more inclined toward the product; otherwise, if a product is more frequently bought in earlier baskets than recent ones, then the customer tends to be increasingly estranged from the product. According to Definition 8, tendency reflects such a fact. The preference prediction is based on the frequency and tendency of a product in customers’ shopping histories. We first select the products that are more frequent to recommend. If some products have the same frequency, then the product that has a higher tendency is prioritized.

4.5. A Comprehensive Example

For the transaction sequence given in Table 1, itemset p = { c , f } has an occurrence list of O c c u r ( p ) = { 1 , 4 , 8 , 12 , 16 } and a period list of P e r ( p ) = { 3 , 4 , 4 , 4 } . By calculating, we get m e a n ( P e r ( p ) ) = 3.75 and s t d ( P e r ( p ) ) = 0.866 , C o e f v a ( p ) = s t d ( P e r ( p ) ) / m e a n ( P e r ( p ) ) = 0.23 . If we set θ = 5 and δ = 0.5 , then p is a periodic pattern and its period follows p e r N ( 3.75 , 0.866 ) . Now let us predict b 17 and select 5 products to recommend, viz. k = 5 . Suppose we have a sequential rule of Example 2 and an association rule of { c , f , g } { d } : 0.9 at the same time. First, in sequential rule prediction, we have P ( { f , g } b 17 ) = 6 / 21 = 0.2857 , p l = { ( f : 0.2857 ) , ( g : 0.2857 ) } . Second, we use periodic pattern p = { c , f } to predict. As O c c u r ( p ) = { 1 , 4 , 8 , 12 , 16 } , λ = 16 . If p b 17 , then p e r 5 = 17 16 = 1 . We have P { p e r = 1 } = 0.0046 from the probability model. Merging { ( c : 0.0046 ) , ( f : 0.0046 ) } into p l , we get p l = { ( c : 0.0046 ) , ( f : 0.2857 ) , ( g : 0.2857 ) } . Third, the association rule is used to predict. By definition, we have P ( { c , f , g } b 17 ) = m i n { 0.0046 , 0.2857 , 0.2857 } = 0.0046 , then P ( { d } b 17 ) = 0.0046 × 0.9 = 0.0041 . Finally, the products are selected in the order of f , g , c , d . However, there are not enough products to recommend. Thus, we will have a preference prediction.
In preference prediction, all products that are in the individual shopping history, excluding selected products, are sorted by frequency. We have F r e q ( { a } ) = 9 , F r e q ( { b } ) = 9 , F r e q ( { h } ) = 6 and F r e q ( { e } ) = 5 . Because the frequencies of items a and b are the same, their tendencies are calculated. Item a has an occurrence list of O c c u r ( { a } ) = { 1 , 3 , 5 , 7 , 8 , 9 , 11 , 13 , 15 } , and item b has O c c u r ( { b } ) = { 1 , 2 , 3 , 7 , 9 , 10 , 11 , 13 , 14 } . We get T e n ( { a } ) = 8 and T e n ( { b } ) = 7.78 , and select a. Finally, all 5 products are selected.

4.6. Relation of Two Prediction Strategies

In pattern prediction, we must set a value to the frequency threshold, confidence threshold and threshold for the coefficient of variation in pattern mining. These parameters determine the weight of pattern prediction and preference prediction.
The confidence threshold parameter is set for sequential rule mining and association rule mining. A higher value of the confidence threshold is set, and fewer sequential rules and association rules are found. The threshold parameter for the coefficient of variation is set for periodic pattern mining. If it has a lower value, then fewer periodic patterns are found. The frequency threshold parameter is set for all three types of patterns. If a higher value is set, then fewer patterns are found. There is no parameter for preference prediction.
In general, if we have a higher value on the frequency threshold and confidence threshold and a lower value on the threshold for the coefficient of variation, then we will select fewer products in pattern prediction. Preference prediction will have a higher weight on the selected result, and pattern prediction will have the opposite effect. Conversely, preference prediction will have a lower weight on the selected result.

5. Implementation

5.1. Optimizations

In pattern mining, the result grows exponentially as the pattern expands in size. It is a severe problem that traditional pattern mining approaches face to efficiently mine large patterns in dense datasets. The growth of the number of patterns is not proportional to the improvement of prediction performance in our method but leads to heavier workload. In the pattern prediction of our method, if an item is contained in several patterns, it will probably be repeatedly predicted several times, leading to redundancies in workload. The larger the number of patterns or the larger the size of the pattern, the more redundancies in the workload.
For the two reasons mentioned above, we use simple association rules, simple sequential rules and simple periodic patterns to implement pattern prediction.
Definition 9.
(Simple Association Rule) Given an association rule p 1 p 2 : C o n f , if both p 1 and p 2 are single item patterns, then we call it a simple association rule.
Definition 10.
(Simple Sequential Rule) Given a sequential rule s 1 s 2 : C o n f , if both s 1 and s 2 are single item patterns, then we call it a simple sequence rule.
Definition 11.
(Simple Periodic Pattern) Given a periodic pattern p, if p is a single item pattern, then we call it a simple periodic pattern.
This strategy dramatically reduces the number of patterns and is easier to implement. As a result, our algorithm is incomplete. We use two ready-made algorithms introduced by Philippe Fournier-Viger et al. [13], namely, F P G r o w t h and R u l e G e n , to mine simple association rules and simple sequential rules, respectively. The implementation of simple periodic pattern mining will be discussed in the next subsection.

5.2. Data Structure

Inspired by Fabio Fumarola et al. [49], we propose a new data structure named v e r t i c a l b i t - l i s t to represent the dataset. Each item i I is assigned a v e r t i c a l b i t - l i s t . A v e r t i c a l b i t - l i s t is made up of a bit vector and several integer arrays, as Figure 3 shows. Bit vectors have a size of the dataset size, i.e. the number of sequences in the dataset. The jth bit, j { 1 , . . . , | D | } , in each bit vector correspond to the jth sequence in the dataset D. If the jth bit in the bit vector of a v e r t i c a l b i t - l i s t is flipped, then means there exist an itemset in the jth sequence contains the item to which the v e r t i c a l b i t - l i s t is assigned, and each flipped bit is assigned an integer array to preserve the occurrence list of the item in the sequence to which the flipped bit correspond. If the jth sequence contains item i, that is, there exist itemsets in the jth sequence containing item i, then the frequency and tendency of i in the jth sequence can be calculated by its corresponding integer array, and we can determine whether the item i is a simple periodic pattern in the jth sequence based on the integer array according to Definition 5, since simple periodic pattern mining is free from pattern growth.

5.3. Complexity

In the stage of pattern mining, we dramatically reduce the complexity of our algorithm by mining only simple sequential rules, simple association rules and simple periodic patterns. We mine simple sequential rules and simple association rules in O ( m 2 ) , where m is the number of distinct items in I since the rule of two types only contains two items. Simple periodic pattern mining and tendency calculation have a complexity of O ( n × v ) , n is the number of all customers and v is the average count of products the customers have bought, that is, one scan over the dataset, and the complexity of the prediction stage is the same.

6. Experiment

To evaluate the performance of our S P A P algorithm, experiments were conducted on four read-world datasets. First, we assessed the influences of parameters on the weight of pattern prediction and preference prediction. Second, we compared our algorithm with those of the baseline methods and state-of-the-art methods in the evaluation metrics of F 1 - S c o r e and H i t - R a t i o . The experiments were conducted on a computer with an Intel Core I7-8550U 1.8 GHz processor and 8 GB of RAM, running Windows 10 (64-bit version). The S P A P is implemented in Java.

6.1. Datasets

We performed our experiments on four real-world transaction sequence datasets. Three of the datasets, Ta-Feng (https://www.kaggle.com/chiranjivdas09/ta-feng-grocery-dataset (Accessed data: 19 October 2021)), Dunnhumby (https://www.kaggle.com/frtgnn/dunnhumby-the-complete-journey (Accessed data: 19 October 2021)) and X5-Retail-Hero (https://www.kaggle.com/mvyurchenko/x5-retail-hero (Accessed data: 19 October 2021)), are based on physical markets; and one dataset, T-Mall (Provided by Guidotti et al. [50] at https://github.com/riccotti/CustomerTemporalRegularities/tree/master/datasets (Accessed data: 19 October 2021)), is based on an online market.
Ta-Feng is a dataset of a physical market, covering food, stationery and furniture, with a total of 23,812 different items in China. It contains 817,741 transactions made by 32,266 customers over 4 months.
The Dunnhumby (Dunnh for short) dataset contains household level transactions over two years from a group of 2500 households who are frequent shoppers at a retailer.
The X5-Retail-Hero (X5RH for short) dataset contains 183,258 transactions made by 9305 customers over 4 months and a total of 27,766 distinct items.
The T-Mall dataset records four months of online transactions of an online e-commerce website. It contains 4298 transactions belonging to 884 users and 9531 distinct brands considered as items.
In preprocessing these datasets, we remove customers who have fewer than 10 baskets for Ta-Feng, Dunnh and T-Mall, and remove customers who have fewer than 21 baskets for X5RH. For simplicity, we adopt the index of the basket in sequence as the time unit rather than the real date. Table 3 shows the details of these datasets used in our experiment.

6.2. Evaluate Metrics F 1 - S c o r e and H i t - R a t i o

Following Guidotti et al. [44], first, we sort the transactions by the timestamps for each customer. Then, we split the dataset into training set and testing set. Testing set contains the latest transaction of all customers for model evaluation. Training set contains the remainder of the transactions of all customers for model training. This is known as l e a v e - o n e - o u t strategy. The product set that customer c actually buys is denoted as b c . The product set recommended to customer c is denoted as b c * . The metrics we use for evaluation, F 1 - S c o r e and H i t - R a t i o , are defined as
p r e c i s o n ( b c , b c * ) = | b c b c * | | b c * |
r e c a l l ( b c , b c * ) = | b c b c * | | b c |
F 1 S c o r e = 2 × p r e c i s o n ( b c , b c * ) × r e c a l l ( b c , b c * ) p r e c i s o n ( b c , b c * ) + r e c a l l ( b c , b c * )
H i t R a t i o = c C I ( b c b c * ) | C |
where I ( * ) is an indicator function. The F 1 - S c o r e is reported by the average value of all customers.
Furthermore, to evaluate the contribution of two prediction strategies, we introduce a new measure: W e i g h t . Let b p a denote all the items selected in pattern prediction for all customers, and b p r denote all the items selected in preference prediction for all customers. The number of all items selected to recommend to all customers is c C | b c * | , and we have | b p a | + | b p r | = c C | b c * | . The weights of pattern prediction and preference prediction are denoted as W e i g h t p a and W e i g h t p r , respectively, and defined as
W e i g h t p a = | b p a | c C | b c * |
W e i g h t p r = | b p r | c C | b c * | = 1 W e i g h t p a
due to the complementation of W e i g h t p a and W e i g h t p r , we report W e i g h t p a only. A higher value of W e i g h t p a means that more items are selected in pattern prediction, and vice versa. In the remainder of this paper, we use W e i g h t to replace W e i g h t p a . Note that if the number of all distinct products customer c has bought is less than k, then we cannot select a set of products including more than or equal to k items to recommend to him or her, that is, | b c * | k , leading to c C | b c * | k × n , where | C | = n .

6.3. Influence of Parameters

Our algorithm is composed of two prediction strategies. We obtain the best performance only on the right proportion of weight on pattern prediction and preference prediction. There are four parameters in pattern prediction: relative frequency threshold θ r and confidence threshold η for sequential rule mining and association rule mining, threshold for coefficient of variation δ and absolute frequency threshold θ p for periodic pattern mining. All of these parameters have an influence on the weight of these two prediction strategies. In this subsection, we will evaluate the influences of these parameters. k is set to the average basket size of each dataset. The values for the remainder of the parameters are preset to be θ r = 0.16 , η = 0.2 , δ = 0.2 and θ p = 5 for Ta-Feng, θ r = 0.35 , η = 0.7 , δ = 0.4 and θ p = 12 for Dunnh, θ r = 0.15 , η = 0.4 , δ = 0.8 and θ p = 10 for X5RH, and θ r = 0.1 , η = 0.4 , δ = 0.8 and θ p = 5 for T-Mall.
First, we test the parameter of relative frequency threshold θ r , and the results are shown in Figure 4. We can see that when θ r rises, the total number of distinct rules, including sequential rules and association rules, and W e i g h t decrease in all cases of the four datasets. W e i g h t is always lower than 0.5, which means that preference prediction plays a dominant role. For the Ta-Feng dataset, when θ r has a value less than 0.16, both F 1 - S c o r e and H i t - R a t i o remain unchanged and achieve the optimal values, and the situation is the same on the Dunnh dataset when θ r is greater than 0.75. Both F 1 - S c o r e and H i t - R a t i o are constant on the X5RH dataset when θ r is greater than 0.63 . For the T-Mall dataset, when θ r is near to 0.11, both F 1 - S c o r e and H i t - R a t i o achieve the optimal values. Figure 5 shows the influences of different values for threshold η . As η rises, the number of rules and W e i g h t decrease on all four datasets. Both F 1 - S c o r e and H i t - R a t i o achieve the optimal values on the Ta-Feng dataset when η is less than 0.38, achieve the optimal values on the Dunnh when η is greater than 0.9 and achieve the optimal values on the T-Mall dataset when η is equal to 0.5, respectively. When η is greater than 0.8, F 1 - S c o r e or H i t - R a t i o achieves the optimal value on the X5RH dataset.
The threshold for the coefficient of variation δ is set for periodic pattern mining. A higher value for δ will lead to a larger number of periodic patterns, as Figure 6 shows. At the same time, a larger number of periodic patterns results in a higher W e i g h t on all four datasets. When δ is less than 0.4, F 1 - S c o r e or H i t - R a t i o achieves the optimal value on the Ta-Feng dataset. Both F 1 - S c o r e and H i t - R a t i o remain unchanged and achieve the optimal values on the Dunnh dataset when δ is greater than 2.8 , and the situation is the same on the X5RH and T-Mall datasets when δ is greater than 2.0 .
Finally, we test the absolute frequency threshold θ p for periodic pattern mining. As shown in Figure 7, when θ p rises, the number of periodic patterns and W e i g h t decrease on all four datasets. Both F 1 - S c o r e and H i t - R a t i o remain unchanged and achieve the optimal values on the Ta-Feng dataset when θ p is greater than 11, and the situation is the same on the Dunnh dataset when θ p is greater than 27. In the case of X5RH, when θ p is equal to 7, both F 1 - S c o r e and H i t - R a t i o achieve the optimal values. For the T-Mall dataset, F 1 - S c o r e and H i t - R a t i o achieve their optimal values when θ p is equal to 10 and 5, respectively.

6.4. Comparison with Baseline Methods and State-of-the-Art Methods

In this subsection, we report the comparisons of our method with baseline methods, including T O P , M C , C L F and N M F ; and state-of-the-art methods, including H R M , T B P , T I F U and U P C F .
T O P predicts the top-k most frequent items with respect to their appearance, i.e., the number of times that they are purchased, in a customer’s purchasing history B c .
M C [40] makes the prediction based on the last purchase b r c and on a Markov chain calculated on B c .
C L F [40]: Due to space limitations, we do not discuss here. See [40] for more details.
N M F (Non-negative Matrix Factorization) [51] is a collaborative filtering method that applies a non-negative matrix factorization to the customers-items matrix. The matrix is constructed from the purchase history of all customers.
H R M (Hierarchical Representation Model) [41] employs a two-layer structure to construct a hybrid representation over customers and items purchase history B from last transactions: the first layer represents the transactions by aggregating item vectors from the last transactions, while the second layer realizes the hybrid representation by aggregating the user’s vectors and the transactions representations.
T B P [44] is a new pattern-based method proposed by Guidotti et al. [44] that seeks to simultaneously capture the co-occurrence, sequentiality, periodicity and recurrence of the items in basket sequences.
T I F U [47] (https://github.com/HaojiHu/TIFUKNN (Accessed data: 19 October 2021)) is a k-nearest neighbors (kNN) based method.
U P C F [48] (https://github.com/MayloIFERR/RACF (Accessed data: 19 October 2021)) denotes User Popularity-based Collaborative Filtering. The model considers a user-based collaborative approach that relies on similar users to find new items that can be of interest to the target user.
Source code for M C , C L F , N M F , H R M and T B P are provided by [44] (https://github.com/GiulioRossetti/tbp-next-basket (Accessed data: 19 October 2021)); and all the algorithms run under the recommended parameter values. Results are shown in Figure 8 and Figure 9. T B P runs overtime on the Dunnh dataset.
Figure 8 shows the results of F 1 - S c o r e , we can see that our S P A P algorithm has the best F 1 - S c o r e on the Ta-Feng dataset when k is less than 14, and closes to T O P on the Dunnh and X5RH datasets. Figure 9 shows the results of H i t - R a t i o , S P A P outperforms the other algorithms on the Ta-Feng dataset, and closes to T O P on the Dunnh and X5RH datasets.
We noticed that all algorithms, except T B P , exhibit poor performance for F 1 - S c o r e when k is too higher or lower than the average basket size. The most obvious finding is on T-Mall dataset, which has the smallest average basket size. Because F 1 - S c o r e is relevant to the size of recommended basket b c * . A higher value for k means we will select more products to recommend, leading to a large size for b c * and a lower value for p r e c i s o n ( b c , b c * ) ; otherwise, we have a lower value for r e c a l l ( b c , b c * ) . However, when the value of k increases, it results in a higher possibility of being hit, and H i t - R a t i o rises naturally. Only k has a value nearer to the average basket size, and we have a fairer comparison.
As described above, almost all algorithms exhibited their best performance when k was set as a value of the average basket size. Table 4 lists the performance results in which k has a value of the average basket size, k = 6 , k = 9 , k = 5 and k = 3 for Ta-Feng, Dunnh, X5RH and T-Mall, respectively. T I F U has the best performances on the T-Mall dataset. However, our S P A P algorithm outperforms other algorithms on the Ta-Feng, Dunnh and X5RH datasets.
Patterns have a virtue of reflecting customers’ shopping habits. Due to the fact that people visit physical stores more regularly than online stores, our pattern-based model S P A P achieves the best performances on physical store datasets Ta-Feng, Dunnh and X5RH. Table 5 lists the improvements of S P A P compared with T O P on the Ta-Feng, Dunnh and X5RH datasets; and compared with T I F U on the T-Mall dataset.

6.5. Running Time

All of these algorithms have a running time ranging from several seconds to several hours. T O P is always the fast one [44]. In this subsection, we compare the running time (training time and prediction time) of our algorithm with T O P . Both of them are implemented in Java. The results are shown in Table 6. Source code for M C , C L F , N M F , H R M , T B P , T I F U and U P C F are implemented in Python, so we do not report their running times here. We terminate the processe of T B P on the Dunnh dataset when it runs over two days.

7. Conclusions

In this paper, we propose a pattern-based model for next basket prediction. The method includes pattern prediction and preference prediction. In pattern prediction, first, all sequential rules, periodic patterns and association rules are found together with their statistical characteristics. Then, probability models are built based on their statistical characteristics. Afterward, we use the probability models to calculate the probability of all products in the next basket for a customer. The products that have a higher probability will be selected to recommend to him or her. If k products have been selected, then continue to the prediction of the next customer; otherwise, make preference predictions. In preference prediction, the product that is more frequent in the individual shopping records will be selected first. If some products have the same frequency, then the product that has a higher tendency will be selected. Until all k products are selected. Experiments show that our algorithm outperforms those of the baseline methods and state-of-the-art methods on three of four real-world transaction sequence datasets.

Author Contributions

G.C. implemented the experiment and wrote the first draft of the paper, Z.L. provided funding for the paper and revised it. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61672261 and Grant 61802056, in part by the Natural Science Foundation of Jilin Province under Grant 20180101043JC, in part by the Industrial Technology Research and Development Project of Jilin Development and Reform Commission under Grant 2019C053-9, and in part by the Open Research Fund of Key Laboratory of Space Utilization, Chinese Academy of Sciences, under Grant LSU-KFJJ-2019-08.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets are available at the following links: https://www.kaggle.com/chiranjivdas09/ta-feng-grocery-dataset (Accessed data: 19 October 2021), https://www.kaggle.com/frtgnn/dunnhumby-the-complete-journey (Accessed data: 19 October 2021), https://www.kaggle.com/mvyurchenko/x5-retail-hero (Accessed data: 19 October 2021), https://github.com/riccotti/CustomerTemporalRegularities/tree/master/datasets (Accessed data: 19 October 2021).

Conflicts of Interest

The authors declares no conflict of interest.

References

  1. Agrawal, R.; Imielinski, T.; Swami, A. Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 1993, 22, 207–216. [Google Scholar] [CrossRef]
  2. Jamshed, A.; Mallick, B.; Kumar, P. Deep learning-based sequential pattern mining for progressive database. Soft Comput. 2020, 24, 17233–17246. [Google Scholar] [CrossRef]
  3. Fournier-Viger, P.; Wu, C.; Tseng, V.S.; Cao, L.; Nkambou, R. Mining partially-ordered sequential rules common to multiple sequences. IEEE Trans. Knowl. Data Eng. 2015, 27, 2203–2216. [Google Scholar] [CrossRef] [Green Version]
  4. Srinivas, P.G.; Reddy, P.K.; Trinath, A.V.; Bhargav, S.; Kiran, R.U. Mining coverage patterns from transactional databases. J. Intell. Inf. Syst. 2015, 45, 423–439. [Google Scholar] [CrossRef]
  5. Chen, Y.; Peng, W.; Lee, S. Mining temporal patterns in time interval-based data. IEEE Trans. Knowl. Data Eng. 2015, 27, 3318–3331. [Google Scholar] [CrossRef]
  6. Ray, A.; Holder, L.B.; Bifet, A. Efficient frequent subgraph mining on large streaming graphs. Intell. Data Anal. 2019, 23, 103–132. [Google Scholar] [CrossRef]
  7. Fournier-Viger, P.; Wang, Y.; Yang, P.; Lin, C.W.; Kiran, R.U. Tspin: Mining top-k stable periodic patterns. Appl. Intell. 2021. [Google Scholar] [CrossRef]
  8. Esling, P.; Agon, C. Time-series data mining. ACM Comput. Surv. 2012, 45, 12:1–12:34. [Google Scholar] [CrossRef] [Green Version]
  9. Khaleel, M.A.; Dash, G.N.; Choudhury, K.S.; Khan, M.A. Medical data mining for discovering periodically frequent diseases from transactional databases. In Computational Intelligence in Data Mining; Springer: New Delhi, India, 2015; Volume 1, pp. 87–96. [Google Scholar]
  10. Chakraborty, S.; Karâa, W.B.A.; Dey, N.; Banerjee, S.; Azar, A.T. Image mining framework and techniques: A review. Ann. Dermatol. Vénéréol. 2015, 237–244, in press. [Google Scholar]
  11. Guo, L.; Liang, J.; Zhu, Y.; Luo, Y.; Sun, L.; Zheng, X. Collaborative filtering recommendation based on trust and emotion. J. Intell. Inf. Syst. 2019, 53, 113–135. [Google Scholar] [CrossRef]
  12. Yza, B.; Zsab, D.; Wza, B.; Lin, Y.; Sla, B.; Xue, L. Joint personalized markov chains with social network embedding for cold-start recommendation. Neurocomputing 2020, 386, 208–220. [Google Scholar]
  13. Fournier-Viger, P.; Gomariz, A.; Gueniche, T.; Soltani, A.; Wu, C.; Tseng, V.S. SPMF: A java open-source pattern mining library. J. Mach. Learn. Res. 2014, 15, 3389–3393. [Google Scholar]
  14. Fournier-Viger, P.; Gueniche, T.; Zida, S.; Tseng, V.S. ERMiner: Sequential rule mining using equivalence classes. In Advances in Intelligent Data Analysis XIII; Springer: Cham, Switzerland, 2014; pp. 108–119. [Google Scholar]
  15. Fournier-Viger, P.; Faghihi, U.; Nkambou, R.; Nguifo, E.M. CMRules: Mining sequential rules common to several sequences. Knowl. Based Syst. 2012, 25, 63–76. [Google Scholar] [CrossRef] [Green Version]
  16. Fournier-Viger, P.; Yang, P.; Lin, J.C.; Kiran, R.U. Discovering stable periodic-frequent patterns in transactional data. In Advances and Trends in Artificial Intelligence. From Theory to Practice; Springer: Cham, Switzerland, 2019; pp. 230–244. [Google Scholar]
  17. Fournier-Viger, P.; Li, Z.; Lin, J.C.; Kiran, R.U.; Fujita, H. Efficient algorithms to identify periodic patterns in multiple sequences. Inf. Sci. 2019, 489, 205–226. [Google Scholar] [CrossRef]
  18. Fournier-Viger, P.; Yang, P.; Kiran, R.U.; Ventura, S.; Luna, J.M. Mining local periodic patterns in a discrete sequence. Inf. Sci. 2021, 544, 519–548. [Google Scholar] [CrossRef]
  19. Fournier-Viger, P.; Wu, C.; Tseng, V.S. Mining top-k association rules. In Advances in Artificial Intelligence—25th Canadian Conference on Artificial Intelligence, Canadian AI 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 61–73. [Google Scholar]
  20. Fournier-Viger, P.; Tseng, V.S. Mining top-k non-redundant association rules. In Foundations of Intelligent Systems—20th International Symposium, ISMIS 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 31–40. [Google Scholar]
  21. Li, J.; Xu, Y.; Wang, Y.; Chu, C. Strongest association rules mining for personalized recommendation. Syst. Eng. Theory Pract. 2009, 29, 144–152. [Google Scholar] [CrossRef]
  22. Fournier-Viger, P.; Lin, C.W.; Duong, Q.H.; Dam, T.L.; Voznak, M. PFPM: Discovering periodic frequent patterns with novel periodicity measures. In Proceedings of the 2nd Czech-China Scientific Conference, Ostrava, Czech Republic, 7 June 2016; IntechOpen: London, UK, 2017. [Google Scholar]
  23. He, R.; Kang, W.; McAuley, J.J. Translation-based recommendation: A scalable method for modeling sequential behavior. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 13–19 July 2018; pp. 5264–5268. [Google Scholar] [CrossRef] [Green Version]
  24. Zhao, C.; You, J.; Wen, X.; Li, X. Deep bi-lstm networks for sequential recommendation. Entropy 2020, 22, 870. [Google Scholar] [CrossRef] [PubMed]
  25. Huang, T.; Zhang, D.; Bi, L. Neural embedding collaborative filtering for recommender systems. Neural Comput. Appl. 2020, 32, 17043–17057. [Google Scholar] [CrossRef]
  26. Lazcorreta, E.; Botella, F.; Fernandez-Caballero, A. Towards personalized recommendation by two-step modified apriori data mining algorithm. Expert Syst. Appl. 2008, 35, 1422–1429. [Google Scholar] [CrossRef] [Green Version]
  27. Najafabadi, M.K.; Mohamed, A.H.; Onn, C.W. An impact of time and item in fluencer in collaborative filtering recommendations using graph-based model. Inf. Process. Manag. 2019, 56, 526–540. [Google Scholar] [CrossRef]
  28. Cui, Q.; Wu, S.; Liu, Q.; Zhong, W.; Wang, L. MV-RNN: A multi-view recurrent neural network for sequential recommendation. IEEE Trans. Knowl. Data Eng. 2020, 32, 317–331. [Google Scholar] [CrossRef] [Green Version]
  29. Zhang, L.; Wang, P.; Li, J.; Xiao, Z.; Shi, H. Attentive hybrid recurrent neural networks for sequential recommendation. Neural Comput. Appl. 2021, 33, 11091–11105. [Google Scholar] [CrossRef]
  30. Fu, X.; Budzik, J.; Hammond, K.J. Mining navigation history for recommendation. In Proceedings of the IUI 2000—International Conference on Intelligent User Interfaces, New Orleans, LA, USA, 9–12 January 2000; pp. 106–112. [Google Scholar] [CrossRef]
  31. Wang, D.; Yu, G.; Bao, Y. An approach of association rules mining with maximal nonblank for recommendation. J. Softw. 2004, 15, 1182–1188. [Google Scholar]
  32. Wang, T.; Yang, A. Research of weighted association rule and its application in personalization recommendation system. J. Zhengzhou Univ. Nat. Sci. 2007, 39, 65–69. [Google Scholar]
  33. Ding, Z.; Chen, J. Individuation recommendation system based on association rule. Comput. Integr. Manuf. Syst. 2003, 9, 891–893. [Google Scholar]
  34. Ding, Z.; Wang, J.; Wang, D.; Bao, Y.; Yu, G. A web personalized recommendation method based on uncertain consequent association rules. Comput. Sci. 2003, 30, 69–72, 88. [Google Scholar]
  35. Li, J.; Wang, Y.; Xu, Y. Personalized recommendation based on strong association rule mining for mass customization. In Proceedings of the 3rd Interdisciplinary of World Congress of Mass Customization and Personalization MCPC2005, Hong Kong, China, 18–21 June 2005. [Google Scholar]
  36. Najafabadi, M.K.; Mahrin, M.N.; Chuprat, S.; Sarkan, H.M. Improving the accuracy of collaborative filtering recommendations using clustering and association rules mining on implicit data. Comput. Hum. Behav. 2017, 67, 113–128. [Google Scholar] [CrossRef]
  37. Chen, G.; Wei, Q.; Liu, D.; Wets, G. Simple association rules (SAR) and the SAR-based rule discovery. Comput. Ind. Eng. 2002, 43, 721–733. [Google Scholar] [CrossRef]
  38. Zhou, H.; Hirasawa, K. Evolving temporal association rules in recommender system. Neural Comput. Appl. 2019, 31, 2605–2619. [Google Scholar] [CrossRef] [Green Version]
  39. Maske, A.R.; Joglekar, B. An Algorithmic Approach for Mining Customer Behavior Prediction in Market Basket Analysis. In Innovations in Computer Science and Engineering; Lecture Notes in Networks and Systems; Saini, H., Sayal, R., Govardhan, A., Buyya, R., Eds.; Springer: Singapore, 2019; Volume 74. [Google Scholar] [CrossRef]
  40. Cumby, C.M.; Fano, A.E.; Ghani, R.; Krema, M. Predicting customer shopping lists from point-of-sale purchase data. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 402–409. [Google Scholar] [CrossRef]
  41. Wang, P.; Guo, J.; Lan, Y.; Xu, J.; Wan, S.; Cheng, X. Learning hierarchical representation model for next basket recommendation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 403–412. [Google Scholar] [CrossRef] [Green Version]
  42. Guidotti, R.; Rossetti, G.; Pappalardo, L.; Giannotti, F.; Pedreschi, D. Next Basket Prediction using Recurring Sequential Patterns. arXiv 2017, arXiv:1702.07158. [Google Scholar]
  43. Guidotti, R.; Rossetti, G.; Pappalardo, L.; Giannotti, F.; Pedreschi, D. Market basket prediction using user-centric temporal annotated recurring sequences. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), ICDM 2017, New Orleans, LA, USA, 18–21 November 2017; IEEE Computer Society: Washington, DC, USA, 2017; pp. 895–900. [Google Scholar] [CrossRef] [Green Version]
  44. Guidotti, R.; Rossetti, G.; Pappalardo, L.; Giannotti, F.; Pedreschi, D. Personalized market basket prediction with temporal annotated recurring sequences. IEEE Trans. Knowl. Data Eng. 2019, 31, 2151–2163. [Google Scholar] [CrossRef]
  45. Jain, S.; Sharma, N.K.; Gupta, S.; Doohan, N. Business Strategy Prediction System for Market Basket Analysis. In Quality, IT and Business Operations; Springer Proceedings in Business and Economics; Kapur, P., Kumar, U., Verma, A., Eds.; Springer: Singapore, 2018. [Google Scholar] [CrossRef]
  46. Kraus, M.; Feuerriegel, S. Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2643–2652. [Google Scholar]
  47. Hu, H.J.; He, X.N.; Gao, J.Y.; Zhang, Z.L. Modeling Personalized Item Frequency Information for Next-basket Recommendation. In Proceedings of the SIGIR 2020: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event China, 25–30 July 2020; pp. 1071–1080. [Google Scholar]
  48. Faggioli, G.; Polato, M.; Aiolli, F. Recency aware collaborative filtering for next basket recommendation. In Proceedings of the UMAP’20: 28th ACM Conference on User Modeling, Adaptation and Personalization, Genoa, Italy, 14–17 July 2020; pp. 80–87. [Google Scholar] [CrossRef]
  49. Fumarola, F.; Lanotte, P.F.; Ceci, M.; Malerba, D. CloFAST: Closed sequential pattern mining using sparse and vertical id-lists. Knowl. Inf. Syst. 2016, 48, 429–463. [Google Scholar] [CrossRef]
  50. Guidotti, R.; Gabrielli, L.; Monreale, A.; Pedreschi, D.; Giannotti, F. Discovering temporal regularities in retail customers’ shopping behavior. Epj Data Sci. 2018, 7, 6. [Google Scholar] [CrossRef] [Green Version]
  51. Lee, D.D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13; Papers from Neural Information Processing Systems (NIPS) 2000; MIT Press: Cambridge, MA, USA, 2000; pp. 556–562. [Google Scholar]
Figure 1. Probability model of sequential rule.
Figure 1. Probability model of sequential rule.
Entropy 23 01430 g001
Figure 2. Probability model of periodic pattern { a , c } .
Figure 2. Probability model of periodic pattern { a , c } .
Entropy 23 01430 g002
Figure 3. (a) shows the v e r t i c a l b i t - l i s t of item i I , where q = | { B c | B c D , b B c , i b } | . (bd) show v e r t i c a l b i t - l i s t of items b, d and g, respectively, in dataset of Table 2.
Figure 3. (a) shows the v e r t i c a l b i t - l i s t of item i I , where q = | { B c | B c D , b B c , i b } | . (bd) show v e r t i c a l b i t - l i s t of items b, d and g, respectively, in dataset of Table 2.
Entropy 23 01430 g003
Figure 4. Influences of different values of θ r on number of rules, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Figure 4. Influences of different values of θ r on number of rules, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Entropy 23 01430 g004
Figure 5. Influences of different values of η on number of rules, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Figure 5. Influences of different values of η on number of rules, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Entropy 23 01430 g005
Figure 6. Influences of different values of δ on number of periodic patterns, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Figure 6. Influences of different values of δ on number of periodic patterns, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Entropy 23 01430 g006
Figure 7. Influences of different values of θ p on number of periodic patterns, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Figure 7. Influences of different values of θ p on number of periodic patterns, W e i g h t , F 1 - S c o r e and H i t - R a t i o .
Entropy 23 01430 g007
Figure 8. Comparison of F 1 - S c o r e with those of the baseline methods and state-of-the-art methods under different values of k.
Figure 8. Comparison of F 1 - S c o r e with those of the baseline methods and state-of-the-art methods under different values of k.
Entropy 23 01430 g008
Figure 9. Comparison of H i t - R a t i o with those of the baseline methods and state-of-the-art methods under different values of k.
Figure 9. Comparison of H i t - R a t i o with those of the baseline methods and state-of-the-art methods under different values of k.
Entropy 23 01430 g009
Table 1. Transaction sequence of a customer.
Table 1. Transaction sequence of a customer.
Transa. IDTimestampBasketTransa. IDTimestampBasket
b 1 2010-01-01 { a , b , c , f } b 9 2010-02-06 { a , b , g , h }
b 2 2010-01-05 { b , c , d } b 10 2010-02-14 { b , c , d }
b 3 2010-01-07 { a , b , e , f , h } b 11 2010-02-19 { a , b , f , h }
b 4 2010-01-11 { c , d , f , h } b 12 2010-02-25 { c , d , f , g , h }
b 5 2010-01-16 { a , d , e , f , g } b 13 2010-03-02 { a , b , c }
b 6 2010-01-19 { c , e , g , h } b 14 2010-03-09 { b , e , g }
b 7 2010-01-28 { a , b , c , g } b 15 2010-03-17 { a , c , d , e }
b 8 2010-01-31 { a , c , f , g } b 16 2010-03-28 { c , f }
Table 2. Illustrative transaction sequence database containing seven customers.
Table 2. Illustrative transaction sequence database containing seven customers.
Customer IDTransaction Sequences
c 1 { a , c } , { b , e } , { f , g }
c 2 { b , c , d } , { a , c } , { f , g } , { d , e }
c 3 { d , e , g } , { a , c } , { f , g }
c 4 { a , c } , { d , e } , { c } , { f , g }
c 5 { a , c } , { e } , { f , g }
c 6 { a , c , e } , { a , c } , { f , g } , { b , f , g }
c 7 { d } , { a , b , c }
Table 3. Characteristics of running datasets.
Table 3. Characteristics of running datasets.
Dataset♯ Customers♯ Items♯ BasketsAverage Basket Size
Ta-Feng237418,13839,5335.58
Dunnh240291,779273,4749.36
X5RH292422,92199,3575.40
T-Mall748943631,0352.52
Table 4. Comparison with those of the baseline methods and state-of-the-art methods when k is set as average basket size value. The bold is the maximum of the all, and the underline is the maximum of the competing algorithms.
Table 4. Comparison with those of the baseline methods and state-of-the-art methods when k is set as average basket size value. The bold is the maximum of the all, and the underline is the maximum of the competing algorithms.
Dataset SPAP TOP MC CLF NMF HRM TBP TIFU UPCF
F 1 - S c o r e Ta-Feng ( k = 6 ) 0.09220.07960.04150.07650.06170.05700.07420.07310.0672
Dunnh ( k = 9 ) 0.09350.09260.05140.06750.06850.0545-0.08640.0839
X5RH ( k = 5 ) 0.16040.15960.11000.13160.10570.08790.15020.15740.1417
T-Mall ( k = 3 ) 0.07830.06910.03790.07350.06260.04400.07570.09440.0847
H i t - R a t i o Ta-Feng ( k = 6 ) 0.39340.35170.19630.30960.28940.27680.28500.31580.2991
Dunnh ( k = 9 ) 0.53290.52870.33700.41650.45540.4049-0.49170.4833
X5RH ( k = 5 ) 0.57710.57280.44540.50560.44570.40080.50750.54860.5266
T-Mall ( k = 3 ) 0.17780.16310.08690.17250.14570.10720.16020.22000.1979
Table 5. Improvements vs. the maximum of the competing algorithms.
Table 5. Improvements vs. the maximum of the competing algorithms.
Improvements vs. TOP Improvements vs. TIFU
Ta-FengDunnhX5RHT-Mall
F 1 - S c o r e 15.8 % 1 % 0.5 % 17.1 %
H i t - R a t i o 11.9 % 0.8 % 0.8 % 19.2 %
Table 6. Running times.
Table 6. Running times.
Ta-FengDunnhX5RHT-Mall
T O P 0.0476 s0.444 s0.192 s0.078 s
S P A P 0.749 s12.838 s2.824 s0.234 s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, G.; Li, Z. A New Method Combining Pattern Prediction and Preference Prediction for Next Basket Recommendation. Entropy 2021, 23, 1430. https://doi.org/10.3390/e23111430

AMA Style

Chen G, Li Z. A New Method Combining Pattern Prediction and Preference Prediction for Next Basket Recommendation. Entropy. 2021; 23(11):1430. https://doi.org/10.3390/e23111430

Chicago/Turabian Style

Chen, Guisheng, and Zhanshan Li. 2021. "A New Method Combining Pattern Prediction and Preference Prediction for Next Basket Recommendation" Entropy 23, no. 11: 1430. https://doi.org/10.3390/e23111430

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop