Deep Interest Context Network for Click-Through Rate

Yu, Mingting; Liu, Tingting; Yin, Jian; Chai, Peilin

doi:10.3390/app12199531

Open AccessArticle

Deep Interest Context Network for Click-Through Rate

by

Mingting Yu

,

Tingting Liu

,

Jian Yin

^* and

Peilin Chai

School of Mechanical and Information Engineering, Shandong University, Weihai 264209, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 9531; https://doi.org/10.3390/app12199531

Submission received: 26 August 2022 / Revised: 18 September 2022 / Accepted: 20 September 2022 / Published: 22 September 2022

(This article belongs to the Special Issue Deep Learning Based Recommender Systems: Latest Advances and Prospects)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In recent years, the proposed Deep Interest Network (DIN), Deep Interest Evolution Network (DIEN) and Deep Session Interest Network (DSIN) have further developed click-through rate prediction models. The above three models mainly focus on the evolution and development of the user’s historical behavior sequence. To a certain extent, the influence of environmental vectors on the user’s choice of the advertisement for the item to be recommended is ignored. As a result, click-through rates cannot be predicted more accurately when items have strong environmental attributes. To solve this problem, we propose a new model based on DIN, called Deep Interest Context Network (DICN). DICN combines two local activation units. It adaptively learns the user’s interest representation from the user’s historical behavior data concerning an advertisement and the context in which the advertisement is located (i.e., environmental factors). The experimental results show that DICN significantly improves the performance and model expression ability of advertisements with strong environmental attributes.

Keywords:

Deep Interest Network (DIN); context features; local activation units; Deep Interest Context Network (DICN)

1. Introduction

With the continuous development of network applications, the e-commerce industry continues to grow and expand. Network resources are also growing exponentially, and the phenomenon of information overload is becoming increasingly serious. The vast amount of information makes it more difficult for customers to buy what they need, which means they must browse through a lot of irrelevant information before finding the item they need. This process significantly increases the cost of time. How to efficiently obtain resources that meet one’s needs has become a perplexing problem [1,2]. As a powerful tool for e-commerce to promote products and process network information resources, a recommendation system (RS) [3,4] can effectively filter information, help users to retrieve information resources that meet their needs in a personalized way, and alleviate the problem of information overload [5]. Through continuous development and updating, recommendation technology, such as collaborative filtering algorithm, content-based recommendation technology and knowledge-based recommendation technology [1], has been widely used in daily life. In this regard, the Click-Through Rate (CTR) prediction model is popular. Initially, the CTR model started with a Logistic Regression (LR) model, manually constructing cross-features [6], and then continued to evolve. A few years ago, the success of deep learning in computer vision and natural language processing [7] further stimulated the development of CTR prediction models. Song et al. [8] pointed out that the integration of deep learning and recommendation systems should be the focus of future research on recommendation systems. Since then, a series of deep learning-based CTR prediction methods have been proposed in this field, called DeepCTR. For example, on the basis of Factorization Machine, a neural network is introduced to realize the NFM model and DeepFM for deep feature extraction [9,10]. The deep learning part and the linear part are organically combined to form the Wide and Deep model, and on this basis, CIN is added to obtain the xDeepFM model, which is widely used [11,12]. Most of these methods follow the compression and embedding of high-dimensional user features into a fixed-length representation vector and then send it to an MLP (Multi-Layer Perceptron) to learn the nonlinear relationship between different features, such as the PNN model. In the PNN model, a product layer is added to the base model to achieve feature crossover [13]. This feature makes the Embedding and MLP [14] methods unable to accurately and effectively capture the nonlinear relationship between historical behaviors in various user historical behavior sequences, thereby limiting the expressive and predictive capabilities of the model.

In the past few years, Alibaba Group has successively proposed three models named Deep Interest Network (DIN) [14], Deep Interest Evolution Network (DIEN) [15] and Deep Session Interest Network (DSIN) [16]. With these models, the CTR framework is not limited to the Embedding and MLP methods, and the local activation unit it adds is in the user history. Local activation units used in models of user historical behavior data greatly improve the models’ expressive ability, helping the model to accurately and effectively capture the nonlinear relationship from the user’s historical behavior sequence and promote the further development of CTR technology. Nowadays, models consisting of local activation units are widely used. However, some items in real life have strong environmental factors, such as clothes, shoes and jewelry. In different seasons, people are more inclined to choose products of the same type corresponding to the season. For example, in summer, people are more inclined to click on summer clothes such as short sleeves, shorts and sandals while ignoring winter clothes such as down jackets and cotton coats to a certain extent; on the contrary, in winter, people will pay more attention to winter clothes and click less on summer clothing advertisements. We refer to such items as items with strong environmental attributes. The above three models, which were proposed by Alibaba Group, ignore the impact of objective environmental factors on user historical behavior and user click-through rate to a certain extent. In some item advertisements with strong environmental attributes, the context features are simply used as ordinary sparse features, which are spliced with other features and then sent to the MLP layer for learning. This method does not consider the relationship between environmental factors in the user’s historical behavior data and item advertisements, and ignores the environmental attributes of the item to be recommended, thereby limiting the learning and expression ability of the model.

Based on such disadvantages, this paper makes the following contributions:

We point out the intrinsic connection between the object and the objective environment, and develop the environmental attributes of the object itself. We make full use of contextual features and divide the timestamps in the user’s historical behavior sequence by season and weekday types. The variable attributes are dynamically selected through the dataset labels to help the model input features.
We propose a new algorithm model: Deep Interest Environment Network (DICN). We introduce a local activation unit on the basis of the original DIN model, which is used to fully consider the relationship between the environmental characteristics of the items in the historical behavior data and the context of the target advertisement, so that the model is not limited to the historical behavior sequence items themselves, but fully considers the connection between the historical behavior sequence temporal environmental factors and the contextual features of the item advertisement. The correlation between the contextual features and the user’s historical behavior can be adaptively learned, so as to better and more accurately capture the diversity of user interests and help the model further improve the expression ability.
We conducted experiments on open datasets and real datasets of Taobao users’ historical behavior. The experimental results verify the effectiveness of the DICN model algorithm.

The rest of the paper is organized as follows. Section 2 discusses related work, and Section 3 presents the structure of the DICN model. Section 4 introduces the related experimental setup and results analysis, and Section 5 draws conclusions.

2. Related Works

The local activation unit mainly uses the attention mechanism idea. The attention mechanism idea originated from neuroscience research [17]. When people visually scan an image or information, they obtain the target area that needs to be focused on, that is, the focus of attention, and invest more resources in this area. Conversely, they will reduce their investment in non-focused target areas. In deep learning, we introduce the attention mechanism network to extract the original huge and complicated data, so as to analyze the user’s historical behavior characteristics more efficiently and accurately, and help improve the expressive ability of the model. With the introduction of the attention mechanism, people refer to this mechanism in the DeepCTR model structure. This mechanism automatically learns a set of weight coefficients through a neural network, and strengthens important information and suppresses non-important information in a dynamic weighting manner, so that the model can focus on the necessary part of the input information at each step of the task [18,19,20]. The attention mechanism helps the DeepCTR model extract the most informative features and recommend items that best meet the user’s needs. Xiao et al. [21] proposed an Attention Factorization Machine (AFM) based on the FM [22]. It further improves the representation ability and interpretability of the FM by introducing an attention mechanism, assigning weights to the second-order cross-features, and then performing the sum pooling. Luo et al. [23] used GRU (Gated Recurrent Unit) [24] and adaptive attention mechanism to capture users’ diverse interests and improve the performance of sequence recommendation, and proposed a new sequence recommendation model named 3AGRU. Zhang et al. [25] proposed an attention hybrid recurrent neural network called AHRNNs for sequence recommendation based on GRU and LSTM [26]. Recently, new expressions of attention mechanisms have emerged. Zhou et al. adopted the idea of an attention mechanism network, and added local activation units to the model. Then, they proposed the Deep Interest Network Model (DIN). The local activation unit calculates and analyzes different historical behaviors of users, and then assigns different weights to different behaviors. This unit can fully understand the preferences of each user, so as to model and analyze the user. Subsequently, the Deep Interest Evolutionary Network Model (DIEN) proposed by Zhou et al. and the Deep Session Interest Network (DSIN) proposed by Feng et al. both use local activation units, and add recurrent neural networks and timestamp divisions on the basis of DIN. The evolutions of these models are used to study the evolution of users’ interests and the changes in users’ interests in different time periods. Attention mechanism networks are also applied in other recommender system areas such as sequential recommendation and review-based recommendation.

3. The Structure of DICN

The recommendation algorithm model process based on the attention mechanism and environmental factors (DICN) is as follows:

Convert the user id, item id, scenes and behavior record information in the dataset into the corresponding sparse vector, dense vector or variable-length sparse vector into the input layer dynamically by using the original label in the dataset.
The user id, item id and timestamps are converted into embedding vectors through the embedding layer and then directly concatenated. At the same time, after the user’s historical behavior record vectors pass through the embedding layer, they enter the activation layer.
In the activation layer, they pass through the local activation unit with the item (to be recommended) embedding vector and the context features embedding vector. In the activation unit, the history record embedding vector takes the outer product with the item embedding vector and the scene embedding vector, and then is spliced with the original item embedding vector and the scene embedding vector and input into the MLP layer, which is in the local activation unit (DNN). After passing through the DNN [27], the item attention weight matrix and the scene attention weight matrix are obtained. Finally, take the Hadamard product of the two weight matrices to derive the final weight matrix coefficients.
The final weight matrix is multiplied by the historical record embedding vector, and the sum pooling operation is performed. Then, it is spliced and tiled with other spliced vectors and sent to the MLP layer (whole model) to generate the recommendation result.

The overall structure block diagram of DICN is shown in Figure 1.

3.1. Input Layer

This layer mainly re-encodes the input original data according to a certain encoding method, and obtains a sparse vector or variable-length sparse vector that is convenient for embedding operations, so as to obtain a better learning effect.

Usually, we perform one-hot encoding on the features other than the user’s historical behavior features, and the formula is as follows:

{\begin{cases} t_{i} \in V^{K_{i}} \\ t_{i} [j] \in {0, 1} \\ \sum_{j = 1}^{K_{i}} t_{i} [j] = 1 \end{cases}

(1)

where

t_{i}

is the i-th feature group in the dataset

V

,

K_{i}

is the dimension of the feature group

i

and

t_{i} [j]

is the j-th element of the feature group

i

, which is encoded as 0 or 1.

We perform multi-hot encoding on the user’s historical behavior features, and the formula is as follows:

{\begin{cases} t_{i} \in V^{K_{i}} \\ t_{i} [j] \in {0, 1} \\ \sum_{j = 1}^{K_{i}} t_{i} [j] > 1 \end{cases}

(2)

In a feature group, multiple elements of mixed heat coding are 1, and the rest are 0.

In addition, we re-divide the time characteristics in the user historical behavior part by converting the timestamps in the user historical behavior sequence into the form of month and week. In order to prevent the model from overfitting, the month is converted into the season, and the week is converted into working days and public holidays, and then the above encoding is performed.

Through this layer, features with no coding or different coding methods can be expressed with the same or approximately the same coding methods, helping the model read the data and study the nonlinear relationship between the features.

3.2. Embedding Layer

Since the input layer encodes the original features, the input vectors have a high dimension and are sparse. This is not conducive to deep learning processing, so the embedding layer is introduced. This layer mainly maps the sparse vectors into the corresponding low-dimensional vector space and converts them into a fixed-length embedding vector, which is convenient for MLP to learn the nonlinear relationship between features. The formula is as follows:

U^{i} = [u_{1}^{i}, u_{2}^{i}, …, u_{j}^{i}, \dots, u_{K_{i}}^{i}] \in ℝ^{K_{i} \times D}

(3)

In the formula,

U^{i}

is the embedding matrix of the i-th feature group.

u_{j}^{i}

is the embedding vector of the j-th element

t_{i} [j]

in the feature group

i

and follows:

u_{j}^{i} \in R^{D}

(4)

The value of the embedded vector

u_{j}^{i}

is in the set of real numbers of dimension D.

If feature group

i

adopts one-hot encoding, then when

t_{i} [j]

, the embedding vector of feature group

i

is represented as a single embedding vector:

e_{i} = u_{j}^{i}

(5)

If feature group

i

adopts mixed-hot encoding, then when

t_{i} [j]

=1, the embedding vector of feature group

i

is expressed as a list of embedding vectors:

{\begin{cases} {e_{i_{1}}, e_{i_{2}}, \dots, e_{i_{m}}} = {u_{i_{1}}^{i}, u_{i_{2}}^{i}, \dots, u_{i_{m}}^{i}} \\ j \in {i_{1}, i_{2}, \dots, i_{m}} \end{cases}

(6)

3.3. Activation Layer

This layer is the core part of the entire model, and the activation layer is mainly composed of three parts: (1) local activation, (2) unit product unit and (3) sum pooling unit.

3.3.1. Local Activation Unit

After the embedding layer, the user historical feature embedding vectors, the candidate item advertisement embedding vector and the contextual feature embedding vector are sent to the local activation unit to learn the attention weight matrix.

{\begin{cases} h = {e_{g_{1}}, \dots, e_{g_{m_{1}}}, e_{p_{1}}, \dots, e_{p m_{2}}, e_{c_{1}}, \dots, e_{c_{m_{3}}}, \dots e_{s_{m_{4}}}, \dots, e_{w_{m_{5}}}} \\ c = {e_{g}, e_{p}, e_{c}} \\ t = {e_{s}, e_{w}} \end{cases}

(7)

where

h

is the user historical feature embedding vector,

c

is the candidate item advertisement embedding vector and

t

is the contextual feature embedding vector.

Taking user historical features and candidate item advertisements as examples, after the historical behavior feature embedding vector

h

and the candidate item advertisement embedding vector

c

enter the local activation unit, the two vectors are subjected to Hadamard product and vector subtraction to measure the similarity of the two vectors:

h * c = [\begin{matrix} h_{11} c_{11} & \dots & h_{1 j} c_{1 j} \\ ⋮ & ⋱ & ⋮ \\ h_{i 1} c_{i 1} & \dots & h_{i j} c_{i j} \end{matrix}]

(8)

h - c = {h_{1} - c_{1}, \dots, h_{i} - c_{i}}, (i = 1, 2, \dots, n)

(9)

After obtaining the inner product and subtraction results, splicing with the original vector is performed. Then, it is sent to the DNN model using the Dice activation function and the linearization part to obtain the weight matrix

ω_{c}

:

w (A) = D N N (h, c, h * c, h - c) = ω_{c}

(10)

Similarly, the weight matrix

ω_{t}

of the user’s historical behavior feature and context feature can be obtained. The local activation unit framework is shown in Figure 2.

3.3.2. Product Unit

The unit implements weight distribution to the original vector in two steps, namely the Hadamard product part and the outer product part. The weight matrix

ω_{c}

and

ω_{t}

obtained by the local activation unit use the Hadamard product to combine the two weight matrices to obtain the final weight matrix

Ω_{A}

:

Ω_{A} = ω_{c} * ω_{t} = [\begin{matrix} a_{11} b_{11} & \dots & a_{1 j} b_{1 j} \\ ⋮ & ⋱ & ⋮ \\ a_{i 1} b_{i 1} & \dots & a_{i j} b_{i j} \end{matrix}]

(11)

The outer product of the weight matrix

Ω_{A}

and the user historical behavior feature embedding vector

h

are obtained to obtain the user weight behavior sequence vector

H_{A}

:

H_{A} = h \times Ω_{A}

(12)

3.3.3. Sum Pooling Unit

All weighted user historical behavior feature vectors are added to the pool operation. This layer retains the strength of user interest while solving the problem that the length of user interest in the previous Embedding and MLP methods cannot be efficiently learned. Additionally, another newly added local activation unit makes fuller use of the environmental factors in the user’s historical behavior characteristics, and assigns a higher weight to the historical records that match the candidate advertisement and contextual characteristics, helping the model to extract more accurate attention focus.

3.4. MLP Layer

This layer uses a fully connected neural network (DNN) as the core part of the fully connected layer. Before the vectors enter the DNN, the embedded and weighted embedded vectors are spliced and flattened to make the multi-dimensional input one-dimensional, which is convenient for entering the fully connected layer for deep learning. The spliced and flattened vector is sent to the DNN, and the ReLU function is used as the activation function to learn the nonlinear relationship between the vectors. At first, we tried to use Dice [14] or PReLU [28] as the activation function to replace the ReLU function, but the effect was not satisfactory. The use of the Dice activation function increased the loss of the model, and finally, we chose to use the ReLU function as the activation function of DNN.

λ_{n}^{m} = R e L U (α_{n}^{m}) = R e L U (\sum_{j = 1}^{i} w_{n j}^{m} x_{j}^{m - 1} + b_{n}^{m})

(13)

In this formula,

λ_{n}^{m}

is the output result of the n-th neuron in the m-th layer, and

α_{n}^{m}

is the input vector of the n-th neuron in the m-th layer.

i

is the total number of neurons contained in the

m - 1

layer, so the output of the n-th neuron in the m-th layer is always equal to the sum of the results of all the neurons in the previous layer.

Finally, the Softmax function is used for normalization, and the result

ξ

is displayed in the form of probability.

ξ = S o f t m a x (λ)

(14)

After the above operations, the nonlinear relationship between the features can be fully extracted, and deep learning can be performed to provide click-through rate prediction results.

4. Experiments and Analysis

In order to verify the performance of the model in this paper, we use the TensorFlow learning framework to compare and analyze with other mainstream models in the environment of NVIDIA GeForce MX330 GPU, NVIDIA CUDA 10.1, Win11 operating system, Pycharm2022 development platform and Python 3.7.

4.1. Dataset

In order to efficiently and accurately evaluate this model, a dataset containing items with strong environmental attributes and time stamps is required. Therefore, we selected the Alibaba Taobao user behavior dataset and a subset of the Amazon dataset (Amazon Dataset 2014) named Clothing_Shoes_and_Jewelry as the experimental datasets. Among them, the Taobao user behavior dataset includes user id, item id, categories of items, user historical behavior type and behavior timestamps [29,30,31]. The Amazon dataset contains user id, item id, user ratings and timestamps [32,33,34]. The details of the datasets are shown in Table 1.

4.2. Evaluation Indicators

To evaluate the performance of our model, we randomly divided the users’ historical behavior in the dataset into a training set and a test set. In this paper, AUC (area under the curve), loss function Logloss and RelaImpr function [35] are used as evaluation indicators. The loss function formula refers to the loss function formula of the DIN model. The formula is as follows:

L = - \frac{1}{N} \sum_{(i, b) \in T} (b \log p (i) + (1 - b) \log (q - p (i)))

(15)

In the formula,

T

is the training set of size

N

and

i

is the input of the network.

b

is the label and

b \in {0, 1}

.

p (i)

is the output of the network; after passing through the Softmax normalization, it represents the probability of sample

i

being clicked.

In addition, we slightly improved the RelaImpr indicator by replacing the AUC parameters of the original base model, that is, the basic model of Embedding and MLP methods, with the AUC parameters of DIN, so as to compare the model in this article and the DIN model more clearly and intuitively.

R e l a I m p r = (\frac{A U C (m e a s u r e d m o d e l) - 0.5}{A U C (D I N m o d e l) - 0.5} - 1) \times 100 %

(16)

4.3. Comparing Models

In order to verify the performance of the model in this paper, the following classic and widely used models are used for comparison.

Base Model [14]: The base model proposed by Zhou et al. in the DIN paper, the Embedding and MLP methods, is the basic framework of deep networks for most CTR models.
PNN [13]: An improved version of the base model. The PNN model introduces a product layer after the embedding layer, through which the interaction of higher-order features is captured.
DeepFM [10]: The prototype of the DeepFM model is the Wide and Deep [11] model, which is mainly composed of two parts: (1) the wide part composed of the FM, which handles the artificially designed cross-product features, and (2) the deep model part, which performs deep learning; the nonlinear relationship between the features is automatically extracted, that is, the fully connected part mentioned above.
xDeepFM [12]: With the same idea as the DCN [36] model, xDeepFM is an improved version of the Wide and Deep model. The difference is that DCN replaces the logistic regression part of the Wide and Deep model with a cross-layer, which effectively alleviates the problem of gradient disappearance. xDeepFM adds the compressed sensing layer (CIN) module to the Wide and Deep model, and compresses the three-dimensional matrix obtained by the interaction of the features through CIN, so as to perform feature interaction.
DIN [14]: Based on the base model, a local activation unit is added to set the weight of the user’s historical behavior, so as to explore the similarity between the user’s historical behavior and the items to be recommended.
DICN: This is the new model proposed in this paper, as introduced in Section 3. Two local activation units explore the similarity of user historical behavior, items to be recommended and contextual features.

4.4. Experimental Setup

In order to prove the effectiveness of the algorithm in this paper, all the algorithm model parameters of the same dataset in the experiment are set to be the same for comparison. After experimental verification, the parameters selected in this paper can reach the optimal state of each model. For the Taobao dataset, we set the model batch size to 256; the epoch to 10; the training set, validation set and test set ratio to 14:6:5; the number of MLP hidden layers to 256, 128 and 64; and the number of hidden layers of the fully connected layer in activation unit to 80 and 40. For the Amazon dataset, the parameter settings are basically the same as the Taobao dataset, but the epoch is changed to 1, 10 and 15. This is because the Amazon dataset has a large number of samples, and the epoch is increased to observe whether the models are overfitting. After this, we use Adam [37] as the optimizer to train these models.

4.5. Results and Analysis

This section describes the experimental results in the form of tables of figures, and we verify the effectiveness of the model proposed in this paper. The experiments were carried out under the same dataset and experimental environment.

4.5.1. AUC and RelaImpr

This subsection compares the AUC and RelaImpr indicators of the six models, including the DICN model, and demonstrates the superiority of the DICN model through the indicator data values. The results are shown in Table 2.

From Table 2, we can see that the comprehensive performance of the DICN model proposed in this paper is the best in the CTR prediction scenario. Under the Taobao user historical behavior dataset, the AUC of DICN is 1.22% higher than the original model DIN; under the Amazon dataset, the AUC of DICN is 0.12% higher than the original model DIN. Moreover, compared with the DIN model, the RelaImpr index of DICN is 6.2% and 0.79% in the two different datasets, which proves the superiority of DICN compared to DIN. The effective performance of DICN is attributed to the addition of a new local activation unit. On the basis of DIN, it again focuses on locally relevant user interests through soft searches for user historical behaviors related to contextual features, and enhances the adaptive change in user representation. This improves the expressive ability of the model. In addition, for a more visual and clearer representation of the experimental results, we constructed a bar chart of the data in the Tab 2 section. This is shown in Figure 3. From Figure 3a,b, we can clearly visualize that the AUC of DICN is higher than that of DIN and other classical models.

4.5.2. Test Logloss

In this subsection, we explore two other cases of the DICN model: (1) one in which we do not consider the seasonal features in the user’s historical behavior features, and (2) one in which we do not consider the workday features in the user’s historical behavior features. The four models of DIN, DICN (complete model), DICN (without considering seasonal features) and DICN (without considering weekday features) are put together for training and comparison. After training, the test set is used to test the loss rate of the test set of the four models. The numerical value of the contrast loss rate proves that the expression ability is the strongest in the case of the DICN full model. The loss rate of each model test set is shown in Figure 4.

It can be seen from Figure 4 that for two different datasets, the loss rate of the two cases of DICN increases after the second epoch, the model ability decreases rapidly and there is a serious overfitting phenomenon. However, DIN and DICN are relatively stable with the increase in epoch times, and there is no overfitting phenomenon, which shows that DICN has high stability. Furthermore, it can be seen from Figure 4 that the loss rate of DICN is much lower than the other two cases of DICN and lower than the DIN model. Under the two datasets, the minimum loss rate of DICN is 0.0093 and 0.0059 lower than that of DIN, respectively, improving the expressive ability of the model.

From the above table and line graph, it can be concluded that the DICN model proposed in this paper shows effective recommendation performance in the recommendation scenario of strongly related attributes.

5. Conclusions

In the era of big data, personalized recommendations for customers have become mainstream and constitute an important area of industry development [38]. In this paper, based on the context of e-commerce display advertising, the proposed DICN recommendation model algorithm makes full use of the contextual features in the user’s historical behavioral features and combines the time and season of the purchased items in the user’s historical behavioral features with the items and contexts to be inferred through two local activation units. These features are correlated to assign weights to obtain more accurate focus of attention and make the inferred results more accurate and reasonable, generating fully personalized user recommendations. After comparing the two datasets with the classical mainstream algorithms, we concluded that the recommendation accuracy of DICN is better than that of DIN and the traditional DeepCTR model. Moreover, the loss rate of DICN is lower compared to that of DIN and the traditional model. This can be fully reflected in practice—for items with strong correlation attributes, the accurate recommendation effect is even better.

However, the system in this paper can still be upgraded in the future, such as by using contextual features other than seasons and weekdays. Hopefully, we can make more progress in the future so that the system can be more relevant to users.

Author Contributions

Conceptualization, M.Y.; methodology, M.Y.; software, M.Y.; validation, M.Y. and P.C.; formal analysis, M.Y.; data curation, M.Y.; writing—original draft preparation, M.Y.; writing—review and editing, M.Y., T.L., J.Y. and P.C.; visualization, M.Y. and T.L.; project administration, J.Y.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant 61971268.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the National Natural Science Foundation of China for funding our work, grant number 61971268.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yu, M.; He, W.T.; Zhou, X.C.; Cui, M.T.; Wu, K.Q.; Zhou, W.J. Review of recommendation system. J. Netw. Comput. Appl. 2022, 42, 1898–1899. [Google Scholar]
Zhao, D.W. A study of the current state of e-commerce recommendation systems. Bus. Cult. 2011, 11, 206. [Google Scholar]
Zhou, H.H.; Liu, Y.J.; Zhang, W.Q.; Xie, J.Y. A survey of recommender system applied in E-commerce. Appl. Res. Comput. 2004, 21, 8–12. [Google Scholar]
Li, J.J. Researcher on E-commerce intelligent recommendation method based on collaborative filtering. Microcomput. Appl. 2022, 38, 70–72. [Google Scholar]
Liu, J.L.; Li, X.G. Techniques for recommendation system: A survey. Comput. Sci. 2020, 47, 47–55. [Google Scholar] [CrossRef]
Mcmahan, H.B.; Holt, G.; Sculley, D.; Young, M.; Kubica, J. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1222–1230. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Song, Y.; Elkahky, A.M.; He, X.D. Multi-rate deep learning for temporal recommendation. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 17–21 July 2016; pp. 909–912. [Google Scholar]
He, X.N.; Chua, T.S. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 355–364. [Google Scholar]
Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. Deepfm: A factorization-machine based neural network for ctr prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 1725–1731. [Google Scholar]
Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
Lian, J.X.; Zhou, X.H.; Zhang, F.Z.; Chen, Z.X.; Xie, X.; Sun, G.Z. xDeepFM: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference, London, UK, 19–23 August 2018; pp. 1754–1764. [Google Scholar]
Qu, Y.; Han, C.; Kan, R.; Zhang, W.; Wang, J. Product-based neural networks for user response prediction. In Proceedings of the Data Mining (ICDM), 2016 IEEE 16th International Conference on, Barcelona, Spain, 12–15 December 2016; pp. 1149–1154. [Google Scholar]
Zhou, G.R.; Song, C.R.; Zhu, X.Q.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.H.; Jin, J.Q.; Li, H.; Gai, K. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
Zhou, G.R.; Mou, N.; Fan, Y.; Pi, Q.; Bian, W.J.; Zhou, C.; Zhu, X.Q.; Gai, K. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5941–5948. [Google Scholar]
Feng, Y.F.; Lv, F.Y.; Shen, W.C.; Wang, M.H.; Sun, F.; Zhu, Y.; Yang, K.P. Deep session interest network for click-through rate prediction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 2301–2307. [Google Scholar]
Yang, L.; Wang, S.H.; Zhu, B. Point-of-interest recommendation algorithm combing dynamic and static preferences. J. Comput. Appl. 2021, 41, 398–406. [Google Scholar]
Gao, G.S. Survey on attention mechanisms in deep learning recommendation models. Comput. Eng. Appl. 2022, 58, 9–18. [Google Scholar]
Jhamb, Y.; Ebesu, T.; Fang, Y. Attentive contextual denoising autoencoder for recommendation. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, Tianjin, China, 14–17 September 2018; pp. 27–34. [Google Scholar]
Tay, Y.; Luu, A.T.; Hui, S.C. Multi-pointer co-attention networks for recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2309–2318. [Google Scholar]
Xiao, J.; Ye, H.; He, X.N.; Zhang, H.W.; Wu, F.; Chua, T.S. Attentional factorization machines learning the weight of feature interactions via attention networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Menlo Park, CA, USA, 19–25 August 2017; pp. 3119–3125. [Google Scholar]
Steffen, R. Factorization machines. In Proceedings of the 10th International Conference on Data Mining, Sydney, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar]
Luo, A.J.; Zhao, P.P.; Liu, Y.C.; Xu, J.J.; Li, Z.X.; Zhao, L.; Sheng, V.; Cui, Z.M. Adaptive attention-aware gated recurrent unit for sequential recommendation. Springer Cham 2019, 11447, 317–332. [Google Scholar]
Cho, K.; Merrienboer, B.V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar]
Zhang, L.X.; Wang, P.S.; Li, J.C.; Xiao, Z.W.; Shi, H.B. Attentive hybrid recurrent neural networks for sequential recommendation. Neural Comput. Appl. 2021, 33, 11091–11105. [Google Scholar] [CrossRef]
Lai, Y.J. Research on deep learning recommendation model based on attention mechanism. Wirel. Internet Technol. 2022, 19, 153–154. [Google Scholar]
Sze, V.; Chen, Y.H.; Yang, T.J.; Emer, J.S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE Inst. Electr. Electron Eng. 2017, 105, 2295–2329. [Google Scholar] [CrossRef] [Green Version]
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
Zhu, H.; Li, X.; Zhang, P.Y.; Li, G.Z.; He, J.; Li, H.; Gai, K. Learning tree-based deep model for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1079–1089. [Google Scholar]
Zhu, H.; Chang, D.Q.; Xu, Z.R.; Zhang, P.Y.; Li, X.; He, J.; Li, H.; Xu, J.; Gai, K. Joint optimization of tree-based index and deep model for recommender systems. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 3971–3980. [Google Scholar]
Zhuo, J.W.; Xu, Z.R.; Dai, W.; Zhu, H.; Li, H.; Xu, J.; Gai, K. Learning optimal tree models under beam search. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 26–28 August 2020. [Google Scholar]
Mcauley, J.; Targett, C.; Shi, Q.F.; Hengel, A.V.D. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52. [Google Scholar]
He, R.; McAuley, J. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web, Montréal Québec, QC, Canada, 11–15 April 2016; pp. 507–517. [Google Scholar]
Veit, A.; Kovacs, B.; Bell, S.; McAuley, J.; Bala, K.; Belongie, S. Learning visual clothing style with heterogeneous dyadic co-occurrences. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 4642–4650. [Google Scholar]
Yan, L.; Li, W.J.; Xue, G.R.; Han, D.Y. Coupled group lasso for web-scale ctr prediction in display advertising. In Proceedings of the 31th International Conference on Machine Learning, Beijing, China, 22–24 June 2014; pp. 802–810. [Google Scholar]
Wang, R.X.; Fu, B.; Fu, G.; Wang, M.L. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, Halifax, NS, Canada, 13–17 August 2017; pp. 1–7. [Google Scholar]
Kingma, D.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Meng, X.W.; Ji, W.Y.; Zhang, Y.J. A survey of recommendation systems in big data. J. Beijing Univ. Posts Telecommun. 2015, 38, 1–15. [Google Scholar]

Figure 1. Framework of DICN model. The black line in the figure represents the embedding vector fed into the Concat and Flatten functions and the outer product; the blue line represents the embedding vector fed into the local activation unit and the Hadamard product.

Figure 2. Local activation unit.

Figure 3. Comparison of AUC of each model under two datasets. (a) AUC of the Taobao dataset in the test set of the six models. (b) AUC of the Amazon dataset in the test set of the six models.

Figure 4. Comparison of model loss under the two datasets. (a) Logloss of the Taobao dataset in the test set of the four models. (b) Logloss of the Amazon dataset in the test set of the four models.

Table 1. Basic statistics for datasets.

Datasets	Features	Numbers	Total Samples
Taobao	Users	376	11,198
	Items	9066
	Categories	1248
	Behavior Types	4
	Timestamps	11,198
Amazon	Users	88,462	91,206
	Items	8510
	Scores	5
	Timestamps	91,206

Table 2. AUC and RelaImpr predicted by CTR.

Model	Taobao		Amazon
Model	AUC	RelaImpr	AUC	RelaImpr
Base Model	0.5326	−87.97%	0.6133	−25.61%
PNN	0.5433	−84.02%	0.5166	−89.10%
DeepFM	0.5465	−82.84%	0.5232	−84.77%
xDeepFM	0.5542	−80%	0.5252	−83.44%
DIN	0.7710	0.00%	0.6523	0.00%
DICN	0.7832	4.5%	0.6535	0.79%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, M.; Liu, T.; Yin, J.; Chai, P. Deep Interest Context Network for Click-Through Rate. Appl. Sci. 2022, 12, 9531. https://doi.org/10.3390/app12199531

AMA Style

Yu M, Liu T, Yin J, Chai P. Deep Interest Context Network for Click-Through Rate. Applied Sciences. 2022; 12(19):9531. https://doi.org/10.3390/app12199531

Chicago/Turabian Style

Yu, Mingting, Tingting Liu, Jian Yin, and Peilin Chai. 2022. "Deep Interest Context Network for Click-Through Rate" Applied Sciences 12, no. 19: 9531. https://doi.org/10.3390/app12199531

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Interest Context Network for Click-Through Rate

Abstract

1. Introduction

2. Related Works

3. The Structure of DICN

3.1. Input Layer

3.2. Embedding Layer

3.3. Activation Layer

3.3.1. Local Activation Unit

3.3.2. Product Unit

3.3.3. Sum Pooling Unit

3.4. MLP Layer

4. Experiments and Analysis

4.1. Dataset

4.2. Evaluation Indicators

4.3. Comparing Models

4.4. Experimental Setup

4.5. Results and Analysis

4.5.1. AUC and RelaImpr

4.5.2. Test Logloss

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI