Next Article in Journal
A Novel Discrete-Time Chaos-Function-Based Random-Number Generator: Design and Variability Analysis
Previous Article in Journal
A Topological Proof for a Version of Artin’s Induction Theorem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Se-xDeepFEFM: Combining Low-Order Feature Refinement and Interaction Intensity Evaluation for Click-Through Rate Prediction

1
School of Information Engineering, East China Jiaotong University, Nanchang 330000, China
2
School of Software, East China Jiaotong University, Nanchang 330000, China
3
Cyber Science and Engineering School, Wuhan University, Wuhan 430071, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(10), 2123; https://doi.org/10.3390/sym14102123
Submission received: 22 August 2022 / Revised: 5 October 2022 / Accepted: 9 October 2022 / Published: 12 October 2022
(This article belongs to the Section Computer)

Abstract

:
Click-through rate (CTR) prediction can provide considerable economic and social benefits. Few studies have considered the importance of low-order features, usually employing a simple feature interaction method. To address these issues, we propose a novel model called Senet and extreme deep field-embedded factorization machine (Se-xDFEFM) for more effective CTR prediction. We first embed the squeeze-excitation network (Senet) module into Se-xDFEFM to complete low-order feature refinement, which can better filter noisy information. Then, we implement our field-embedded factorization machine (FEFM) to learn the symmetric matrix embeddings for each field pair, along with the single-vector embeddings for each feature, which builds a firm foundation for the subsequent feature interaction. Finally, we design a compressed interaction network (CIN) to realize feature construction with definite order through a vector-wise interaction. We use a deep neural network (DNN) with the CIN to simultaneously implement effective but complementary explicit and implicit feature interactions. Experimental results demonstrate that the Se-xDFEFM model outperforms other state-of-the-art baselines. Our model is effective and robust for CTR prediction. Importantly, our model variants also achieve competitive recommendation performance, demonstrating their scalability.

1. Introduction

Click-through rate (CTR) is a kind of recommendation system [1,2] that attempts to make accurate social recommendations activity according to user preference mining. It has very high practicality. Specifically, CTR is usually used to predict the probability of clicks on advertisements or products. It plays an important role in several applications, including e-commerce, social networking, and video and movie websites. As we know, online advertising provides considerable economic benefits. With the gradual conversion from traditional digital advertising to mobile digital advertising, digital advertising has played a vital role in promoting economic development. Accurate recommendation or ranking of digital advertising can improve users’ experience to a certain degree. More importantly, it can provide considerable flow benefits and revenue benefits to online companies, especially short video applications. Therefore, there is a certain symmetry between browsing records and click records of different users on short video apps, e-commerce websites, and social networking sites. Websites recommend the same ads to users with similar behaviors according to these records, providing satisfactory economic benefits. Hence, CTR prediction has wide application scope and has become a research hotspot in both academia and industrial fields in recent years.
Logistic regression (LR) [3] has achieved satisfactory results in early research on CTR prediction by virtue of its simplicity and strong interpretability. However, the LR model requires manual construction of feature interactions. Dave et al. proposed the prediction method of gradient boosting decision tree (GBDT) [4], which can avoid the need to manually identify features. However, this method is not suitable for very large and sparse datasets. Additionally, the training time is too long, and it is difficult to guarantee the accuracy. To resolve the feature interaction problem, the well-known factorization machine (FM) [5] model was proposed. The FM model sets an implicit vector for each feature to handle the feature interaction problem. Based on the FM model, a field factorization machine (FFM) [6] was proposed, which introduces interaction between different feature fields to learn diverse hidden information for CTR prediction. The above-mentioned traditional models only perform low-order rather than high-order feature interactions, which may limit the prediction ability of the CTR models. Recently, deep learning (DL) technology has made breakthroughs in natural language processing [7], computer vision [8], and other correlated fields. Both industrial and academic researchers have proposed several DL-based CTR models and achieved remarkable research progress. He et al. proposed the neural factorization machine (NFM) [9], which combines second-order features with deep neural network (DNN) features to learn implicit higher-order features. Qu et al. proposed the product-based neural network (PNN) [10], which uses the product layer to learn the feature interactions. The product layer adopts the forms of inner product, outer product, and inner product combined with outer product, fully learning the high-order, non-linear feature interactions. Yang et al. proposed operation-aware neural networks (ONNs) [11], which generate multiple embedding vectors for each feature so that different feature representations can be chosen for different operations. Google proposed the well-known wide & deep learning (WDL) [12], which combines the left wide model using LR with the right deep model using a feed-forward network. The wide part tries to make the prediction contains a certain memory ability, whereas the deep part intends to endow the prediction with a certain generalization ability. Similarly, Huawei proposed DeepFM [13], which replaces the wide part of the WDL model with FM to obtain better feature interactions.
The above-mentioned models usually use the DNN model to learn the combined features of low-order and high-order features without considering the importance of each feature. Hence, Xiao et al. proposed the attentional factorization machine (AFM) [14] using the famous attention mechanism [15] to learn the importance of each feature combination, which helps to improve the final CTR prediction performance. Huang et al. proposed the feature importance and bilinear feature interactions network (FiBiNET) [16], which uses the well-known squeeze-exception network (Senet) [17] to learn the significance of different features. The second-order feature interactions are learned in the bilinear interaction layer through the combination of inner and Hadamard products. Finally, the implicit higher-order features are obtained using DNN. AFM and FiBiNET consider the significance of various features. However, AFM only extends the FM model and does not input the second-order cross features into the DNN to learn more valuable high-order feature interactions [18]. Although FiBiNET achieves high-order feature interactions, its feature interactions lack a certain interpretability, owing to the essential characteristic of the DNN.
Hence, some researchers made more significant modifications to the CTR model. Wang et al. proposed the deep & cross network (DCN) [19], which replaces the wide part of WDL with a cross network. The cross network can make full use of the residual mechanism [20] to automatically construct finite high-order feature interactions. Lian et al. proposed the extreme deep factorization machine (xDeepFM) [21], which uses the feature-level interaction to construct a compressed interaction network (CIN) for explicit high-order feature learning. Liu et al. proposed the feature generation method by convolutional neural network (FGCNN) [22], which can generate local patterns and a combination of these patterns. Recently, Song et al. designed the automatic feature interactions method via self-attentive neural networks (AutoInt) [23]. AutoInt uses the multi-head self-attention mechanism to explicitly construct high-order features, improving the interpretability of the CTR model and reducing the corresponding parameters. Pan proposed the field-weighted factorization machine (FWFM) [24], which employs a weighted FFM. The FFM module is more efficient in modeling different features from different fields. However, the parameters of the FWFM are not significantly reduced. Pandeh et al. proposed the field-embedded factorization machine (FEFM) [25], which learns a kind of symmetric matrix embedding for each field pair, along with the single-vector embeddings. Compared with the traditional FFM model, the FEFM model can considerably reduce the number of parameters, which also promotes its practicality. Yu et al. proposed the feature structure-oriented model called xCrossNet [26], in which the dense and sparse features are calculated by the cross and product layers, respectively, which are then spliced into the DNN to learn higher-order features (The abbreviations and full names of all models can be found in Appendix A Table A1. The advantages and disadvantages of all models can be found in Appendix A Table A2).
The above-mentioned CTR models have achieved satisfactory recommendation results. However, few of them consider the importance of low-order features, which contain noisy information. The combination of explicit high-order feature construction and implicit feature learning has not been fully explored, especially with respect to the complementary information between them. Moreover, the corresponding feature interactions are relatively simple and do not make full use of the information in the feature field. To alleviate these issues, we propose a novel CTR model, namely Senet and extreme deep field embedded factorization machine (Se-xDFEFM), which combines low-order feature refinement and interaction intensity evaluation to implement CTR. Hence, the two key technologies used in our study, namely low-order feature refinement and interaction intensity evaluation, are symmetrical.
In Se-xDFEFM, we take low-order feature refinement, explicit high-order feature construction, and implicit high-order feature learning into account. We achieve feature refinement using the Senet module in the low-order features and implement a field-embedded factorization machine (FEFM) to improve the quality of feature interactions. Finally, we make achieve predictions with a certain generalization ability using a DNN and predictions with a certain memory ability using a CIN. Conceptually and empirically, the main contributions of this paper can be summarized as follows:
(1)
We propose a novel CTR model called Se-xDFEFM, which can simultaneously explicitly construct high-order features and implicitly learn high-order feature combinations.
(2)
We propose an improved feature interaction method called Se-FEFM, which first performs effective feature refinement before feature interactions and uses the field pair symmetric matrix to more accurately evaluate the interaction intensity between different feature fields. All these help to improve the final recommendation performance.
(3)
We propose several model variants of Se-xDFEFM, which also achieve competitive recommendation performance, firmly demonstrating the scalability of our model.
(4)
We reproduce a group of well-known CTR baselines. Extensive experiments on two public datasets demonstrate that the proposed Se-xDFEFM model outperforms these mainstream baselines. The code for our method and all the reproduced models are available at https://github.com/vancci-xgx/se-xdfefm (accessed on 11 October 2022).

2. Model Basis

2.1. FM

As described above, FM is an effective method for constructing second-order feature combinations. It uses the implicit inner product of different features to calculate the coefficient matrix of the interaction terms between various features. FM regards its feature interactions as a high-dimensional sparse matrix decomposition problem. Therefore, FM can extract many new cross features and hidden vectors to improve the final CTR performance. The corresponding mathematical formula of FM is shown in Equation (1), which is composed of three components:
y ( x ) = w 0 + i = 1 n w i x i + i = 1 n 1 j = i + 1 n v i , v j x i x j
where n is the total number of features; w 0 R is the global bias; w i R is the parameter of the first-order feature; , calculates the dot product of two vectors; v i and v j are the i - th and j - th features of the embedding vector, respectively; and x i and x j are the values of the i - th feature and j - th feature, respectively. According to the definitions of these parameters, the first component, w 0 , is the global bias. The second component, i = 1 n 1 w i x i , represents the sum of each first-order feature. The third component, i = 1 n 1 j = i + 1 n v i , v j x i x j , represents the second-order feature interaction between each feature and other features, where v i , v j calculates the dot product of two feature vectors, namely v i and v j . It is the weight of the feature combination x i x j , where x i x j is the multiplication result of x i and x j . “ + ” indicates the addition operation of the three proposed components. Evidently, the traditional FM model only constructs the second-order feature interactions without considering higher-order feature interactions.

2.2. DeepFM

As introduced above, DeepFM is derived from FM. It combines DNN with FM, in which FM focuses on learning the low-order features, whereas DNN attempts to learn the high-order feature combinations. The structure of DeepFM is shown in Figure 1.
As shown in Figure 1, FM and DNN jointly use the output of the embedded layer [27] as their inputs. Then, the DeepFM model concatenates the results of FM and DNN to generate the final output. The final output of DeepFM is expressed as:
y D e e p F M = s i g m o i d ( y F M + y D N N )
where y D e e p F M ( 0 , 1 ) is the final CTR prediction result, s i g m o i d is the sigmoid activation function, y F M is the output of the FM model, y D N N is the output of the DNN model, and “ + ” represents the sum of tensors of the same dimension. The effect of this “ + ” operation makes full use of the complementarity among the FM and DNN models. The corresponding feature interactions of DeepFM lack a certain interpretability because the DNN model only implicitly completes high-order feature interactions. It usually recommends less relevant items when the user–item interactions are very sparse or high-rank. Hence, the DeepFM model needs the FM model introduced above to improve the memory ability and interpretability of CTR prediction. Another disadvantage of DeepFM is that it does not implement feature refinement in the low-order features before feature interactions, and the corresponding interaction procedure is too simple. Although the DeepFM model has several apparent disadvantages, it is a firm foundation for creating the proposed Se-xDFEFM model.

3. The Proposed Se-xDFEFM Model

The Se-xDFEFM model consists of an input layer, embedding layer, Senet layer, linear interaction layer, combination layer, compressed interaction network layer, hidden layer, and output layer. Figure 2 illustrates the structure of the Se-xDFEFM model:
Unlike DeepFM or other correlated models, the proposed Se-xDFEFM model designs an improved but effective feature interactions method named Se-FEFM. It first performs a feature refinement to suppress noisy information and choose important low-order features through a Senet module using a kind of mechanism. Then, it implements FEFM in the online interaction layer, which helps learn the low-order feature interactions. Unlike the above-mentioned CTR models, owing to the proposed feature refinement and low-order feature interactions strategies, Se-xDFEFM refines the low-order features to a certain degree and offers much more valuable low-order information for the successive high-order feature interactions. Therefore, the feature levels used in the proposed model are symmetrical. We use both low-order and high-order features to complete CTR. Third, a CIN is proposed to explicitly learn the high-order feature interactions. Simultaneously, other valuable higher-order feature interactions are implicitly learned by a DNN. Therefore, unlike the above-mentioned CTR models, the Se-xDFEFM model constructs more complex but effective high-order feature interactions from the complementary explicit and implicit perspectives to generate new features with better discriminative ability for the final prediction. Hence, the CIN layer and DNN layer in our model are symmetrical. The final output of the proposed Se-xDFEFM model is shown as follows:
y S e x D F E F M = s i g m o i d ( y S e D F E F M + y C I N + y L i n e a r )
where y S e x D F E F M ( 0 , 1 ) is the final CTR prediction result; s i g m o i d is the sigmoid activation function; y S e D F E F M is the output that passes through the Senet module, the linear interaction layer, the combination layer, and the DNN layer; y C I N is the output of the CIN layer; and y L i n e a r is the linear output from raw features; and “ + ” represents the sum of tensors of the same dimension. Hence, the effect of this “ + ” operation makes full use of the complementarity among the CIN, Linear, and Se-DFEFM models. Each layer of the proposed model is introduced as follows in detail. Based on the definition of each layer, we will redefine the final output of the Se-xDFEFM model.

3.1. Embedding Layer

The input features usually have very high but sparse dimensions in a large-scale recommendation system. Additionally, no obvious temporal and spatial correlations can be observed in these features.
Therefore, the corresponding feature vectors must be compressed into a low-dimensional space first [28,29]. Additionally, the dense features in the numerical field need to be mapped into the same low-dimensional space to ensure the uniformity of the embedding output. Therefore, the embedding layer can integrate diverse features and compress its dimension. The output vector ( E ) of the embedding layer is defined as follows:
E = [ e 1 , e 2 , , e i , , e f ]
where f is the total number of feature fields, e i R D is the i - th field-embedding vector, 1 i f , and D is the dimension of the embedding vector.

3.2. Senet Layer

The output features of the embedding vector have different importance. Hence, we use the Senet module to dynamically tune the weight of each feature, which helps to capture the most important features in the low-order level for the subsequent CTR prediction. The Senet module first generates a weight vector, namely W = [ w 1 , w 2 , , w i , , w f ] , for the embedding vector and then rescales the original embedding vector using W . A new embedding vector, namely V = [ v 1 , v 2 , , v i , , v f ] , is obtained in turn. Here, w i represents the weight of v i , 1 i f .
The Senet module includes three steps: squeeze, excitation, and reweight.
(1) Squeeze calculates the “summary statistics” of each feature field embedding. Therefore, the original embedding vector, E = [ e 1 , e 2 , , e i , , e f ] , is compressed into a statistical vector, H = [ h 1 , h 2 , , h i , , h f ] , through average pooling, where 1 i f , h i is a scalar value, which represents the “summary statistics” of the i - th feature field embedding after averaging all dimensions, and h i is calculated through the following average pooling strategy:
h i = f s q ( e i ) = 1 D t = 1 D e i ( t )
where D denotes the dimension of each original embedding vector, e i is the i - th original embedding vector, and f s q ( e i ) means that the i - th embedding vector of each dimension is summed and averaged. Hence, h i is the value that represents the global information about the i - th field feature. H = [ h 1 , h 2 , , h i , , h f ] represents the global information calculated by all field features, where 1 i f , f is the total number of feature fields.
(2) Excitation calculates the weight of the feature group, namely G , based on the “summary statistics” vector H . The first layer represents a dimension reduction layer with the parameters W 1 . In contrast, the second layer increases the corresponding dimension with the parameters W 2 . The whole calculation process of the “Excitation” operation is shown as follows:
G = f e x   ( H ) = σ 2   ( W 2   σ 1   ( W 1   H ) )
where W 1   R f × f r and W 2   R f × f r denote the parameters of the first and second layers, respectively;   σ 1 and σ 2 are the activation functions of the first and second layers, respectively; r is a reduction ratio; f is the total number of feature fields; and f e x   ( H ) calculates the weight vector through the “summary statistics” vector H .
(3) Reweight: each weight is used to weigh the original embedding and generate a new embedding. The specific calculation procedure of the “Reweight” operation is shown as follows:
E n e w = F r e w e i g h t ( G , E ) = [ g 1 e 1 , g 2 e 2 , , g i e i , , g f e f ] = [ v 1 , v 2 , , v i , , v f ]
where g i R , e i R D , v i R D , and we define 1 i f . D denotes the dimension of the embedding vector, and f is the total number of feature fields. F r e w e i g h t ( G , E ) reweighs the original embedding, E , with the weight vector G . Therefore, we obtain E n e w , which is composed of a group of v i .

3.3. Linear Interaction Layer and Combination Layer

The linear interaction layer is designed to calculate the second-order feature interactions. The traditional feature interaction methods include inner product and Hadamard product [30,31]. However, these methods do not consider broader category field concepts. To address this problem, the proposed Se-xDFEFM model absorbs the field pair symmetric matrix to better capture the importance of feature interactions. The eigenvalue of the field pair symmetric matrix represents the interaction intensity of the field pair, which is called FEFM in this study. The corresponding calculation procedure is shown as follows, consisting of three components:
F E F M ( ( w , v , W ) , x ) = w 0 + i = 1 n w i x i + i = 1 n 1 j = i + 1 n W F ( i ) , F ( j ) v i , v j x i x j
where n is the total number of features, w 0 R is the global bias, w i R is the parameter of the first-order feature, W F ( i ) , F ( j ) is a D × D symmetric matrix, and D denotes the dimension of the embedding vector. , represents the dot product of two vectors. v i and v j are the i - th and j - th features of the embedding vector, respectively. x i and x j are the values of the i - th feature and j - th feature, respectively. FEFM does not learn the feature embedding in a specific field. The symmetric matrix, W F ( i ) , F ( j ) , is the embedding of the two feature fields F ( i ) and F ( j ) . The interaction between the i - th and j - th features is obtained indirectly through the symmetric matrix, W F ( i ) , F ( j ) . W F is based on feature field pairs. Like Equation (1), according to these parameter definitions, the first component, w 0 , represents the global bias. The second component, i = 1 n 1 w i x i , represents the sum of each first-order feature. The third component, i = 1 n 1 j = i + 1 n W F ( i ) , F ( j ) v i , v j x i x j , represents the second-order feature interaction between each feature and other features. v i , v j calculates the dot product of two vectors, namely v i and v j . x i x j is the multiplication result of x i and x j . The difference between Equation (1) and Equation (8) is that Equation (8) uses W F ( i ) , F ( j ) to learn the importance of the interaction between feature fields. Here, “ + ” indicates the addition operation of the three proposed components. Se-xDFEFM uses the Hadamard or inner products to calculate the corresponding feature interactions.
As shown in Figure 2, an interaction vector, P = [ p 1 , , p m ] , is generated from the original embedding, E . And an interaction vector, Q = [ q 1 , , q m ] , is obtained based on the new embedding, E n e w , generated by the FEFM module, where m is the number of embedding vector interactions. Hence, the input of the combination layer is the two interaction vectors, namely P and Q . The two vectors are concatenated to generate the combination S , as shown below:
S = F c o m b i n e ( P , Q ) = [ p 1 , , p m , q 1 , , q m ] = [ c 1 , , c i , , c 2 m ]
where F c o m b i n e ( P , Q ) is the combination operation of the two interaction vectors, P and Q , and c i is one element of P or Q .

3.4. CIN and DNN

CIN is proposed to complete explicit high-order feature interactions, and the network complexity does not increase exponentially with the interaction degree. We formulate the embedding matrix as X 0 R f × D , where the i - th row in X 0 is the embedding vector of the i - th field: X i , 0 = e i . The output of the k - th layer in CIN is also a matrix named X k R L k × D , where L k denotes the number of the embedding vectors in the k - th layer. Then, X k is calculated as follows:
X l , k = i = 1 L k 1 j = 1 L 0 W i , j k , l ( X i , * k 1 X j , 0 )
where X l , k denotes the l - th embedding vector of the k - th layer output matrix. X i , k - 1 denotes the i - th embedding vector of the k - 1 - th layer output matrix. W k , l R L k 1 × L 0 is the parameter matrix for the l - th feature vector of the k - th layer, W i , j k , l denotes the value of the i - th row and j - th column in the parameter matrix, and X j , 0 denotes e j . X l , k is obtained through the interactions between X k - 1 and X 0 . “ ” represents the Hadamard product. Equation (10) calculates L k 1 vectors of the previous layer and L k vectors of the input layer X 0 ; then, it calculates the Hadamard product in pairs to obtain L k 1 L 0 vectors. All vectors are weighed and summed according to W k , l parameters to generate X l , k , which is the l - th embedding vector of the k - th layer. The output, X k , of the k - th layer can be obtained by using L k different parameter matrices.
Hence, CIN can explicitly complete feature interactions, and the corresponding interaction degree increases with the increase in layer depth. The next layer depends on the output of the previous layer and additional input. Let T denote the network depth. Each hidden layer, X k , k [ 1 , T ] , has a connection with the output units. Therefore, sum pooling is performed on the feature map of each layer to obtain p i k as follows:
p i k = j = 1 D X i , j k
where D denotes the dimension of the embedding vector, and X i , j k denotes the i - th embedding vector of the j - th dimension of the k - th layer output matrix.
Each layer can produce a pooling vector named P k = [ p 1 k , p 2 k , , p L k k ] with the length of L k . The pooling vectors of each layer are concatenated before connecting to the output units:
P + = [ P 1 , P 2 , , P T ]
where T denotes the network depth.
The Se-xDFEFM uses DNN to learn implicit higher-order features. CIN and DNN complement each other and make full use of each kind of feature interaction, increasing the strength of the proposed model. The DNN is comprised of several fully connected layers, implicitly capturing high-order feature interactions. As shown in Figure 2, the input of DNN is the output of the combination layer.

3.5. Output Layer

Finally, the Se-xDFEFM model combines the outputs of DNN and CIN with the proposed linear interaction layer, which forms a kind of powerful complementarity and completes high-quality CTR prediction. The final output of Se-xDFEFM is expressed as:
y S e x D F E F M = s i g m o i d ( w l i n e a r T a + w d n n T x d n n k + w c i n T P + + b )
where a is the original feature; x d n n k , P + are the corresponding outputs of DNN and CIN, respectively; w l i n e a r , w d n n , w c i n , and b are learnable parameters of our model; b is the bias parameter; and “ + ” represents the sum of tensors of the same dimension. The actual effect of this “ + ” operation takes full advantage of the complementary information among different models. y S e x D F E F M ( 0 , 1 ) is the final prediction of the proposed CTR model, and s i g m o i d is the sigmoid activation function. Evidently, Equation (13) is a redefinition of Equation (3) because DeepFM is the firm basis of our model.

4. Experimental Results and Analysis

4.1. Datasets

We conducted detailed experiments on two public CTR datasets, namely Criteo [32] and Avazu [33]. The Criteo dataset is widely used to evaluate the performance of CTR models. It is composed of approximately 45 million user advertising behaviors provided by the Criteo advertising company, including 26 category features and 13 numerical features. The Avazu dataset consists of the ad CTR for several consecutive days sorted in chronological order. It contains 40 million actual click logs. Each click data point contains 24 features. Owing to the limited computing resources, only the first 10 million data points of the two datasets were used to train the proposed model. We randomly split all the instances in a ratio of 8:1:1 for training, validation, and test, respectively, according to [16,23,34]. To ensure fair performance comparisons, we reproduced a group of well-known CTR baselines under the same experimental settings.

4.2. Evaluation Metrics

The area under ROC (AUC) [35] and cross-entropy (LogLoss) [36] metrics were used to evaluate the CTR models. AUC refers to the area enclosed by the receiver operating characteristic (ROC) curve and coordinate axis; the closer its value is to 1 or the upper-left corner of the coordinate axis, the better the CTR prediction performance. This metric is not sensitive to whether the sample categories are balanced. The LogLoss metric indicates the difference between the predicted value and the real value. The smaller the LogLoss value, the better the CTR prediction performance.

4.3. Baselines

Here, the baseline is the recent work focusing on the research of click-through rate prediction. Three kinds of mainstream models are chosen as the baselines of this study, which are described as follows:
(1)
The explicit interaction-based CTR prediction models FM [5] and AFM [14];
(2)
The implicit interaction-based CTR prediction models NFM [9], FiBiNET [16], and DeepFEFM [25]; and
(3)
CTR models that perform both explicit and implicit interactions: DCN [19], DeepFM [13], and xDeepFM [21].

4.4. Experimental Environment and Parameter Settings

We conducted experiments on our server with two 1080Ti GPUs and Intel Xeon E5-2620 V4 CPU. All experiments were implemented under the software environment of ubuntu16.04 with the TensorFlow 2 framework. For DNN, the number of hidden layers is set to 3, and the number of neurons in each layer is set to 256 through validation experiments. The dropout ratio is 0.5, and the activation function is ReLU in DNN. We chose Sigmoid as the activation function to output the final prediction of CTR models. All the CTR models are optimized using Adam [37], and the learning rate is uniformly set to 0.0001.

4.5. Experimental Results and Discussion

4.5.1. Performance Comparisons with the Baseline Models

First, the proposed Se-xDFEFM model is compared with the above-mentioned state-of-the-art baseline models (we reproduced all these baselines on our server and completed the relevant experiments using the same experimental settings for fair comparisons). The two metrics, including AUC and LogLoss, are selected to evaluate each CTR model. The experimental results are shown in Table 1. The indicator “Improve1” represents the corresponding performance improvement of the Se-xDFEFM model relative to the most competitive baseline. Similarly, the indicator “Improve2” represents the corresponding performance improvement of the Se-xDFEFM model relative to the second competitive baseline model. For the CTR prediction task, owing to the large amount of data, the measuring standard of these two indicators is ‰, which means that a performance improvement of more than 1‰ is effective and evident [38,39]. The corresponding improvement is essential and helps achieve satisfactory benefits.
As shown in Table 1, the prediction performance of DNN-based CTR models, such as DeepFM [13], DCN [19], etc., is significantly better than that of traditional models, including FM [5], AFM [14], and other models using low-order feature interactions, which further demonstrates that the DNN model is effective for deep feature learning in the CTR prediction task. The DNN model can capture much more non-linear high-order information to complete CTR prediction more effectively. This is also a significant reason why we also employ DNN in our Se-xDFEFM model. Moreover, if both the explicit and implicit feature interactions are considered simultaneously, the corresponding prediction ability of the FiBiNET [16] and xDeepFM [21] models is further improved. The explicit and implicit feature interactions complement each other, which contributes to boosting the final prediction performance. Hence, we need to ensure diverse feature interactions to promote the final CTR prediction performance. This builds a firm foundation for our model.
Notably, the proposed Se-xDFEFM model obtains the best recommendation performance on both datasets. For example, on the Criteo dataset, compared with the state-of-the-art baseline models, such as xDeepFM [21], FiBiNET [16], and DeepFEFM [25], the AUC values of our model increase by 2.9‰, 2.5‰, and 2.5‰, respectively, whereas the corresponding LogLoss values of our model decrease by 2.6‰, 2.1‰, and 2.0‰, respectively. These two groups of values correspond well. A larger loss decrease leads to larger performance improvement. Similarly, on the Avazu dataset, the corresponding AUC values of the Se-xDFEFM model increase by 2.3‰, 1.6‰, and 1.3‰, respectively. The corresponding LogLoss values reduce by 1.3‰, 1.0‰, and 0.8‰, respectively. The proposed Se-xDFEFM model is effective and robust for CTR prediction. We performed the following detailed and deep-level analysis to understand the underlying reasons behind these experimental phenomena.
First, Se-xDFEFM refines the embedding information based on the embedded Senet module, which lays a firm foundation for the subsequent diverse feature interactions. Then, the FEFM mechanism is implemented in our linear interaction layer. The field pair symmetric matrix with improved discriminative ability is obtained to learn the low-order feature interactions, which is a necessary but powerful basis for the subsequent high-order feature interactions. Hence, in contrast to most advanced CTR models based on DL technology, such as xDeepFM [21] and FiBiNET [16], the Se-xDFEFM model not only uses the Senet module to refine the features before using CIN to explicitly construct higher-order features but also completes low-order feature interaction through the proposed FEFM layer before inputting them into the DNN model. This strategy has two evident advantages. First, this helps filter the noisy information in the original features. Second, it offers much more valuable low-order information for the successive high-order feature interactions. These two advantages are beneficial for promoting the final prediction performance. Third, unlike DeepFEFM [25] or FiBiNET [16], which only use implicit interaction, we combine the explicit and implicit feature interactions to obtain a more effective final feature. On the one hand, we use a CIN to explicitly learn high-order feature interactions; on the other hand, we propose an elaborate DNN model to implicitly learn higher-order feature interactions. The two kinds of high-order feature interactions complement each other, which builds an important foundation for CTR. More importantly, owing to the mining of much more valuable low-order information, the DNN and CIN models can better capture that significant information for the CTR prediction. Then, we make full use of the complementarity between these interaction modes to improve the final prediction performance. Notably, our model outperforms DeepFM [13] an xDeepFM [21], which employ both implicit and explicit higher-order feature interactions because Se-xDFEFM not only uses Senet to refine the features but also learns more valuable low-level information through FEFM. Therefore, Se-xDFEFM is an effective CTR model that seamlessly integrates feature refinement, low-order features, high-order features, and implicit and explicit feature interactions into an organic whole. Certainly, there is still some room to improve the recommendation performance on the Avazu dataset. In the future, we plan to use the well-known multi-head self-attention mechanism from Transformer to make more effective explicit feature interactions.
In conclusion, the Se-xDFEFM model is effective and robust for the CTR prediction task. The low-order important information and high-order feature interactions complement each other positively and form a kind of joint force to promote the final CTR performance of the Se-xDFEFM model.

4.5.2. Impact of Different Linear Interaction Methods

As analyzed above, different linear interaction methods, including inner product and Hadamard product, will affect the final performance, as well as the field feature. Hence, we present the following experiments in this section: we propose modified Se-xDFEFM models using the inner product or Hadamard product before or after the application of the field feature. The inner product differs from the Hadamard product in terms of theoretical analysis and actual codes. The inner product produces a scalar value, whereas the Hadamard product generates a matrix through a multiplication operation. The corresponding experimental results are shown in Table 2, obtaining four variants of our model.
In Table 2, Se-xDi indicates that only the inner product without a field feature is used to complete the linear interaction. Se-xDh indicates that only the Hadamard product without a field feature is used to complete the linear interaction. Se-xDFEFMh means that both the field feature and Hadamard product are employed to implement linear interaction. Se-xDFEFMi means that both the field feature and inner product are utilized to realize linear interaction. Here, we also propose two indicators, namely Improve3 and Improve4, to observe the actual performance improvement. The measured standard of these two indicators is ‰. Improve3 represents the performance improvement of the Se-xDFEFMi model relative to Se-xDFEFMh, whereas Improve4 represents the performance improvement of the Se-xDFEFMi model relative to Se-xDi.
As shown in Table 2, the inner product method outperforms the Hadamard product on each dataset. This phenomenon is more evident when the field feature is absorbed into our model, which further improves the prediction performance of the modified Se-xDFEFM model. As analyzed above, the field feature effectively establishes a powerful foundation for the subsequent high-order feature interactions. Our model can make full use of the information in the feature field. However, how to reduce the parameter complexity in the process of extracting field features is still a considerable challenge.

4.5.3. Impact of the CIN Layers

Owing to the deep-level structure, the number of CIN layers could affect the order of explicit feature interactions and impact the final CTR prediction performance of the proposed Se-xDFEFM model. Hence, in this section, we want to decide the best number of CIN layers through validation experiments. The experimental results are shown in Figure 3. We used LogLoss and AUC to draw the corresponding experimental graphs, in which LogLoss follows the left vertical axis, whereas AUC follows the right vertical axis; this represents a concise mode to efficiently present our results.
As shown in Figure 3, when the number of CIN layers is equal to two, the highest AUC and lowest LogLoss can be observed in the two datasets. The two metrics can validate each other; Figure 3 demonstrates this phenomenon well, indicating that the best prediction performance is obtained when the CIN model contains two layers. Hence, suitable feature interactions can result in performance improvement. In contrast, too many high-order feature interactions can result in unexpected noises or model complexity, which could decrease the final CTR prediction performance. As another extreme setting, it is the equivalent to removing the CIN module in our model when the number of CIN layers is equal to zero. The corresponding performance decreases dramatically. This also demonstrates the importance of the proposed explicit feature interactions. It also further validates that CIN and DNN can complement each other, which means that the implicit and explicit feature interactions can form a joint force to improve the final CTR prediction performance of the Se-xDFEFM model. In summary, we set the CIN layers to two in all experiments.

4.5.4. Impact of the Number of CIN Neurons

Like the number of CIN layers, the number of CIN neurons also affects the explicit feature interactions procedure and final CTR performance of the proposed Se-xDFEFM model. Hence, in this section, we want to decide the best number of CIN neurons through validation experiments. The experimental results are shown in Figure 4. We used LogLoss and AUC to draw the corresponding experimental graphs, in which LogLoss follows the left vertical axis, whereas AUC follows the right vertical axis.
As shown in Figure 4a, when the number of neurons in each layer reaches 200, the best AUC and lowest LogLoss can be observed on the Avazu dataset. As shown in Figure 4b, when the number of the neurons in each layer reaches 250, the best AUC and lowest LogLoss can be observed on the Criteo dataset. The two metrics can validate each other; Figure 4 demonstrates this phenomenon well. The possible cause of this phenomenon is that the Criteo dataset has more fields than the Avazu dataset, so the Criteo dataset needs a relatively more complex CIN to complete effective high-order feature interactions. In the future, the corresponding recommendation performance should be further improved from the perspective of model structure optimization. In conclusion, it is necessary to modulate the neurons in each layer of the CIN model to obtain the best CTR prediction performance. In all experiments, we set the number of neurons in each layer to 200 on the Avazu dataset, and the corresponding value on the Criteo dataset is 250.

4.5.5. Impact of Embedding Dimension

As illustrated in Figure 2, the embedding layer contains the most important low-order feature information for the subsequent high-order feature interaction. Hence, we need to set the dimension of the embedding layer elaborately to obtain the best CTR prediction performance. The validation experimental results are shown in Figure 5. We also used LogLoss and AUC to draw the corresponding experimental graphs, in which LogLoss follows the left vertical axis, whereas AUC follows the right vertical axis.
As shown in Figure 5, when the embedding dimension is equal to 10, the best AUC and LogLoss values of the Se-xDFEFM model can be observed on the two datasets. Lower loss makes the proposed model more effective for CTR prediction. The two metrics can validate each other; Figure 5 presents this phenomenon well. However, as the embedding dimension continues to increase, worse AUC and LogLoss values are observed, especially for the Avazu dataset, which indicates that an excessively high dimension of the embedding layer results in a certain amount of noise. This, in turn, negatively affects the final CTR prediction performance. In summary, the embedding dimension of the Se-xDFEFM model needs to be tuned carefully to obtain the best CTR prediction performance.

4.5.6. Ablation Experiment

As mentioned above, the Se-xDFEFM model includes the embedded Senet module, the embedded FEFM module, a DNN, and a CIN. Each module plays a role in our CTR prediction model. In this section, we want to validate the actual contribution of these modules through detailed ablation experiments. This helps to highlight our further research direction. In this experiment, the Se-xDFEFM model without CIN, the Se-xDFEFM model without the FEFM module, the Se-xDFEFM model without the Senet module, and the Se-xDFEFM model without DNN are denoted as Se-xDFEFM-CIN, Se-xDFEFM-FEFM, Se-xDFEFM-Senet, and Se-xDFEFM-DNN, respectively, resulting in another set of model variants (Table 2). The experimental results are shown in Table 3. Moreover, we compared our model variants with many state-of-the-art baselines using AUC (Figure 6).
As shown in Table 3, (1) on the two datasets, removing any module leads to performance degradation of CTR prediction. Hence, all four modules can effectively improve the final recommendation performance. (2) The proposed Se-xDFEFM model combined with the CIN, the embedded Senet module, DNN, and the FEFM modules can obtain the best prediction performance among all the variants. (3) Removing the FEFM module leads to the largest performance degradation on both datasets. This indicates that the proposed embedded FEFM module contributes the most to the final CTR prediction performance among all investigated modules. On the one hand, it directly processes the embedding layer, which can obtain the most primitive and valuable low-order features for CTR prediction. Furthermore, the linear interaction layer embedded in the FEFM module can generate more valuable information for prediction. Hence, unlike the state-of-the-art DL-based CTR models, the FEFM module offers much more valuable low-order information for the successive high-order feature interactions. (4) On the Criteo dataset, CIN contributes slightly more than the Senet module. This means that explicit high-order feature interactions are more effective for the final CTR prediction. The Senet module is a kind of attention mechanism, which can effectively filter noisy information and reduce negative impacts. Therefore, it can improve the final recommendation performance. Conversely, on the Avazu dataset, the Senet module is relatively more important. (5) Using both CIN and DNN models results in better performance than using a single model, which indicates that the implicit and explicit feature interactions complement each other and can form a joint force to improve the final CTR prediction performance of the Se-xDFEFM model. (6) Our model variants, such as Se-xDFEFM-CIN and Se-xDFEFM-SENET, can obtain very competitive prediction performance compared with state-of-the-art baselines (Figure 6). This also validates the robustness, effectiveness, and scalability of our Se-xDFEFM model from another significant perspective.
In summary, the following valuable conclusions can be drawn. The descending order of importance of each module in our model on the Criteo dataset is “FEFM > DNN > CIN > Senet”. The descending order of importance of each module in our model on the Avazu dataset is “FEFM > DNN > Senet > CIN”. Therefore, for the more complex Criteo dataset we should focus more on the low-order feature refinement method, whereas we should further modify the structure of the CIN model for the Avazu dataset. These results suggest future research directions. Moreover, our model variants achieve very competitive recommendation performance, demonstrating their scalability.

5. Conclusions and Future Work

We propose a novel CTR prediction model called Se-xDFEFM, which fits the “symmetry” concept well, including symmetry core technologies, symmetry feature levels, symmetry CNN and DIN layers, and symmetry browsing records and click records.
Se-xDFEFM seamlessly integrates feature refinement through the embedded Senet module, low-order features extracted by the FEFM module, implicit and explicit feature interactions into an organic whole. Extensive experimental results on two public datasets demonstrate that the Se-xDFEFM model is effective and robust for CTR prediction, outperforming other state-of-the-art baselines, including DeepFM [13], FiBiNET [16], AFM [14], and xDeepFM [21]. Unlike other models, the FEFM module offers much more valuable low-order information for the successive high-order feature interactions. All the modules form a kind of joint force to promote the final CTR performance. Notably, our model variants outperform the state-of-the-art baselines, validating their scalability.
However, our model is subject to three limitations. First, the complexity of the CIN is an important issue, which may increase the complexity of the whole model. We intend to use the state-of-the-art multi-head self-attention network to reduce the complexity of explicit high-order feature interaction. Second, our model does not model user behavior in sequence. Hence, we plan to use the deep interest network [40] to address this problem. Third, our model does not use different processing methods for dense and sparse embeddings; how to effectively promote interaction between dense and sparse embeddings is an important challenge for future work. We hope all these strategies can form the basis for future research and further improve CTR prediction performance.

Author Contributions

Software, G.L., G.X., G.W., G.L. and H.Z.; Methodology, G.L., G.X., G.W., Y.Y. and H.Z.; Validation, G.L., G.X., G.W., Y.Y. and H.Z.; Writing—Original Draft, G.L., G.X. and H.Z.; Writing—Review and Editing, G.L., G.X. and H.Z.; Validation, G.W., Y.Y., G.L. and D.J.; Validation, G.W., Y.Y., C.L. and D.J.; Data Curation, G.L., G.X., G.W., Y.Y., C.L. and H.Z.; Formal Analysis, G.X., G.W., Y.Y. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly funded by the National Natural Science Foundation of China (Grant Nos. 62161011 and 61861016), the Natural Science Foundation of Jiangxi Provincial Department of Science and Technology (Grant Nos. 20212BAB202006 and 20202BABL202044), the Key Research and Development Plan of Jiangxi Provincial Science and Technology Department (Grant No. 20192BBE50071), the Humanity and Social Science Foundation of the Jiangxi Province (Grant No. 22TQ01), the Science and Technology Projects of Jiangxi Provincial Department of Education (Grant Nos. GJJ190323 and GJJ200644), and the Humanity and Social Science Foundation of Jiangxi University (Grant Nos. TQ20108 and TQ21203).

Data Availability Statement

The datasets used in the study are publicly available to everyone and can be accessed at: http://labs.criteo.com/downloads/download-terabyte-click-logs/ (accessed on 11 October 2022), http://www.kaggle.com/clavazu-ctr-prediction/ (accessed on 11 October 2022), for criteo and avazu dataset, respectively.

Acknowledgments

We thank the authors of [31,32] for collecting and organizing the datasets used in this study. The authors also would like to thank the editor and the reviewers for their helpful suggestions.

Conflicts of Interest

The authors declare no conflict to interest.

Appendix A

Table A1. Abbreviations and full names.
Table A1. Abbreviations and full names.
AbbreviationFull NameAbbreviationFull Name
CTRclick-through rateWDLwide & deep learning
LRlogistic regressionAFMattentional factorization machine
GBDTgradient boosting decision treeFiBiNETfeature importance and bilinear feature interaction network
FMfactorization machineDCNdeep & cross network
DeepFMFactorization machine-based neural networkxDeepFMextreme deep factorization machine
NFMneural factorization machineFGCNNfeature generation by convolutional neural network
PNNproduct-based neural networkCNNconvolutional neural network
ONNoperation-aware neural networkFWFMField-weighted factorization machine
Se-xDFEFMSenet and extreme deep field-embedded factorization machineFEFMfield-embedded factorization machine
Senetsqueeze-excitation networkxCrossNetfeature structure-oriented model
CINcompressed interaction networkAutoIntself-attentive neural networks
DNNdeep neural networkAUCarea under ROC
FFMfield factorization machineLogLosscross entropy
DLdeep learningROCreceiver operating characteristic
Table A2. Advantages and disadvantages of the models mentioned in the paper.
Table A2. Advantages and disadvantages of the models mentioned in the paper.
ModelAdvantagesDisadvantages
LRSimple and highly interpretableManual construction of feature interactions
GBDTManually identification of features is avoidedNot suitable for large sparse datasets
FMAutomatic construction of feature interactionOnly low-order interaction can be learned
FFMLearns the interactions between different feature fieldsLarge parameter scale
NFMLearns implicit high-order features through DNNImplicit construction of high-order features
PNNEmploys inner product combined with outer productImplicit construction of high-order features
ONNGenerates multiple embedding vectors for each featureLarge parameter scale
WDLGood generalization ability and memory abilityFeature interactions must be built manually
DeepFMAutomatic construction of feature interactionFM can only learn low-order features
AFMLearns the weight of feature interactions through attention mechanismDeep neural network is not used
FiBiNETLearns feature weight through Senet and learns feature interactions through billinear interaction layerImplicit construction of high-order features
DCNAutomatically builds high-order featuresFeature combination in a bit-wise mode
xDeepFMHigh-order feature combination in a vector-wise modeThe weight of each feature is not considered
FGCNNGenerates new features by CNNLarge parameter scale
FWFMEfficiently models different features in different fields Large parameter scale
FEFMLearns the interactions between different feature fields with lower complexity than that of FFMThe weight of each feature is not considered
AutoIntExplicit construction of higher-order features using multi-head self-attention mechanismImplicit construction of high-order features
xCrossNetDense and sparse features are calculated by the cross and product layers, respectivelyImplicit construction of high-order features

References

  1. Zhang, X.; Qin, J.; Zheng, J. A Social Recommendation based on metric learning and Users’ Co-occurrence Pattern. Symmetry 2021, 13, 2158. [Google Scholar] [CrossRef]
  2. Sharma, B.; Hashmi, A.; Gupta, C.; Khalaf, O.I.; Abdulsahib, G.M.; Itani, M.M. Hybrid Sparrow Clustered (HSC) Algorithm for Top-N Recommendation System. Symmetry 2022, 14, 793. [Google Scholar] [CrossRef]
  3. Richardson, M.; Dominowska, E.; Ragno, R. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 521–530. [Google Scholar]
  4. He, X.; Pan, J.; Jin, O.; Xu, T.; Liu, B.; Xu, T.; Shi, Y.; Atallah, A.; Bowers, S.; Candela, J.Q. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop On Data Mining for Online Advertising, New York, NY, USA, 24 August 2014; pp. 1–9. [Google Scholar]
  5. Rendle, S. Factorization machines. In 2010 IEEE International Conference on Data Mining; IEEE: Sydney, Australia, 2010; pp. 995–1000. [Google Scholar]
  6. Juan, Y.; Zhuang, Y.; Chin, W.S.; Chih-Jen, L. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 43–50. [Google Scholar]
  7. Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A survey of data augmentation approaches for nlp. arXiv 2021, arXiv:2105.03075. [Google Scholar]
  8. Buhrmester, V.; Münch, D.; Arens, M. Analysis of explainers of black box deep neural networks for computer vision: A survey. Mach. Learn. Knowl. Extr. 2021, 3, 966–989. [Google Scholar] [CrossRef]
  9. He, X.; Chua, T.S. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 355–364. [Google Scholar]
  10. Qu, Y.; Cai, H.; Ren, K.; Zhang, W.; Yu, Y.; Wen, Y.; Wang, J. Product-based neural networks for user response prediction. In 2016 IEEE 16th International Conference on Data Mining (ICDM); IEEE: Barcelona, Spain, 2016; pp. 1149–1154. [Google Scholar]
  11. Yang, Y.; Xu, B.; Shen, F.; Zhao, J. Operation-aware neural networks for user response prediction. Neural Netw. 2020, 1210, 161–168. [Google Scholar] [CrossRef]
  12. Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
  13. Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
  14. Xiao, J.; Ye, H.; He, X.; Zhang, H.; Fei, W.; Chua, T. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv 2017, arXiv:1708.04617. [Google Scholar]
  15. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, L. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, San Francisco, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  16. Huang, T.; Zhang, Z.; Zhang, J. FiBiNET: Combining feature importance and bilinear feature interaction for click-through rate prediction. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 169–177. [Google Scholar]
  17. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  18. Covington, P.; Adams, J.; Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
  19. Wang, R.; Fu, B.; Fu, G.; Wang, M. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, Halifax, NS, Canada, 14 August 2017; pp. 1–7. [Google Scholar]
  20. Shan, Y.; Hoens, T.R.; Jiao, J.; Wang, H.; Yu, D.; Mao, J. Deep Crossing: Web-scale modeling without manually crafted combinatorial features. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 255–262. [Google Scholar]
  21. Lian, J.; Zhou, X.; Zhang, F.; Chen, Z.; Xie, X.; Sun, G. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1754–1763. [Google Scholar]
  22. Liu, B.; Tang, R.; Chen, Y.; Yu, J.; Guo, H.; Zhang, Y. Feature generation by convolutional neural network for click-through rate prediction. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 1119–1129. [Google Scholar]
  23. Song, W.; Shi, C.; Xiao, Z.; Duan, Z.; Xu, Y.; Zhang, M. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1161–1170. [Google Scholar]
  24. Pan, J.; Xu, J.; Ruiz, A.L.; Zhao, W.; Pan, S.; Sun, Y.; Lu, Q. Field-weighted factorization machines for click-through rate prediction in display advertising. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1349–1357. [Google Scholar]
  25. Pande, H. Field-Embedded Factorization Machines for Click-through rate prediction. arXiv 2020, arXiv:2009.09931. [Google Scholar]
  26. Yu, R.; Ye, Y.; Liu, Q.; Wang, Z.; Yang, C.; Hu, Y.; Chen, E. Xcrossnet: Feature structure-oriented learning for click-through rate predictions. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2021; pp. 436–447. [Google Scholar]
  27. Wang, J.; Huang, P.; Zhao, H.; Zhang, Z.; Zhao, B.; Lee, D.L. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 839–848. [Google Scholar]
  28. Okada, S.; Ohzeki, M.; Taguchi, S. Efficient partition of integer optimization problems with one-hot encoding. Sci. Rep. 2019, 9, 1–12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Sun, Z.; Guo, Q.; Yang, J.; Fang, H.; Guo, G.; Zhang, J.; Burke, R. Research commentary on recommendations with side information: A survey and research directions. Electron. Commer. Res. Appl. 2019, 37, 100879. [Google Scholar] [CrossRef] [Green Version]
  30. Zhu, J.; Liu, J.; Yang, S.; Zhang, Q.; He, X. FuxiCTR: An Open Benchmark for Click-Through Rate Prediction. arXiv 2020, arXiv:2009.05794. [Google Scholar]
  31. Zhang, W.; Qin, J.; Guo, W.; Tang, R.; He, X. Deep Learning for Click-Through Rate Estimation. arXiv 2021, arXiv:2104.10584. [Google Scholar]
  32. Criteo. Available online: http://labs.criteo.com/downloads/download-terabyte-click-logs/ (accessed on 11 October 2022).
  33. Avazu. Available online: http://www.kaggle.com/clavazu-ctr-prediction/ (accessed on 11 October 2022).
  34. Wang, R.; Shivanna, R.; Cheng, D.; Jain, S.; Lin, D.; Hone, L.; Chi, E.H. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 1785–1797. [Google Scholar]
  35. Graepel, T.; Candela, J.Q.; Borchert, T.; Herbrich, R. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; Omnipress: Madison, WI, USA, 2010. [Google Scholar]
  36. Vovk, V. The fundamental nature of the log loss function. In Fields of Logic and Computation II; Springer: Cham, Switzerland, 2015; pp. 307–318. [Google Scholar]
  37. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  38. Jiang, Z. Research on ctr prediction for contextual advertising based on deep architecture model. J. Control Eng. Appl. Inform. 2016, 18, 11–19. [Google Scholar]
  39. Wang, Q.; Liu, F.; Xing, S.; Zhao, X. Research on CTR prediction based on stacked autoencoder. Appl. Intell. 2019, 49, 2970–2981. [Google Scholar] [CrossRef]
  40. Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Junqi, J.; Li, H.; Gai, K. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 19–23 August 2018; pp. 1059–1068. [Google Scholar]
Figure 1. The architecture of DeepFM.
Figure 1. The architecture of DeepFM.
Symmetry 14 02123 g001
Figure 2. The architecture of Se-xDFEFM.
Figure 2. The architecture of Se-xDFEFM.
Symmetry 14 02123 g002
Figure 3. (a) Performance impact of the number of CIN layers on Avazu; (b) performance impact of the number of CIN layers on Criteo.
Figure 3. (a) Performance impact of the number of CIN layers on Avazu; (b) performance impact of the number of CIN layers on Criteo.
Symmetry 14 02123 g003
Figure 4. (a) Performance impact of the number of CIN neurons on Avazu; (b) performance impact of the number of CIN neurons on Criteo.
Figure 4. (a) Performance impact of the number of CIN neurons on Avazu; (b) performance impact of the number of CIN neurons on Criteo.
Symmetry 14 02123 g004
Figure 5. (a) Performance impact of embedding dimension on Avazu; (b) performance impact of embedding dimension on Criteo.
Figure 5. (a) Performance impact of embedding dimension on Avazu; (b) performance impact of embedding dimension on Criteo.
Symmetry 14 02123 g005
Figure 6. Comparisons between our model variants and the mainstream models on Criteo.
Figure 6. Comparisons between our model variants and the mainstream models on Criteo.
Symmetry 14 02123 g006
Table 1. Performance comparisons with recent CTR models.
Table 1. Performance comparisons with recent CTR models.
ModelCriteoAvazu
AUCLogLossAUCLogLoss
FM [5]0.77270.46970.76260.3852
AFM [14]0.77560.46800.76310.3849
NFM [9]0.79190.45390.76490.3840
DeepFM [13]0.79420.45210.76910.3815
DCN [19]0.79440.45180.76920.3815
xDeepFM [21]0.79470.45170.76960.3813
FiBiNET [16]0.79510.45120.77030.3810
DeepFEFM [25]0.79510.45110.77060.3808
Se-xDFEFM0.79760.44910.77190.3800
Improve12.5‰2.0‰1.3‰0.8‰
Improve22.5‰2.1‰1.6‰1.0‰
Table 2. Performance comparisons with recent CTR models.
Table 2. Performance comparisons with recent CTR models.
ModelCriteoAvazu
AUCLogLossAUCLogLoss
Se-xDh0.79470.45170.77010.3810
Se-xDi0.79510.45130.76990.3811
Se-xDFEFMh0.79630.45020.77050.3807
Se-xDFEFMi0.79760.44910.77190.3800
Improve31.3‰1.1‰1.4‰0.7‰
Improve42.5‰2.2‰2.0‰1.1‰
Table 3. Detailed results of ablation analysis experiments.
Table 3. Detailed results of ablation analysis experiments.
ModelModuleCriteoAvazu
CINFEFMSENETDNNAUCLogLossAUCLogLoss
Se-xDFEFM-CIN 0.79660.45040.77110.3804
Se-xDFEFM-FEFM 0.79460.45160.76920.3813
Se-xDFEFM-SENET 0.79690.44960.77010.3808
Se-xDFEFM-DNN 0.79490.45140.76940.3815
Se-xDFEFM0.79760.44910.77190.3800
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, G.; Xu, G.; Wu, G.; Ye, Y.; Li, C.; Zhang, H.; Ji, D. Se-xDeepFEFM: Combining Low-Order Feature Refinement and Interaction Intensity Evaluation for Click-Through Rate Prediction. Symmetry 2022, 14, 2123. https://doi.org/10.3390/sym14102123

AMA Style

Li G, Xu G, Wu G, Ye Y, Li C, Zhang H, Ji D. Se-xDeepFEFM: Combining Low-Order Feature Refinement and Interaction Intensity Evaluation for Click-Through Rate Prediction. Symmetry. 2022; 14(10):2123. https://doi.org/10.3390/sym14102123

Chicago/Turabian Style

Li, Guangli, Guangxin Xu, Guangting Wu, Yiyuan Ye, Chuanxiu Li, Hongbin Zhang, and Donghong Ji. 2022. "Se-xDeepFEFM: Combining Low-Order Feature Refinement and Interaction Intensity Evaluation for Click-Through Rate Prediction" Symmetry 14, no. 10: 2123. https://doi.org/10.3390/sym14102123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop