Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback

Hu, Taojun; Zhou, Xiao-Hua

doi:10.3390/e26090792

Open AccessArticle

Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback

by

Taojun Hu

¹

and

Xiao-Hua Zhou

^1,2,*

¹

Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China

²

Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(9), 792; https://doi.org/10.3390/e26090792

Submission received: 31 July 2024 / Revised: 11 September 2024 / Accepted: 12 September 2024 / Published: 15 September 2024

(This article belongs to the Special Issue Causal Inference in Recommender Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The recommender system (RS) has been widely adopted in many applications, including online advertisements. Predicting the conversion rate (CVR) can help in evaluating the effects of advertisements on users and capturing users’ features, playing an important role in RS. In real-world scenarios, implicit rather than explicit feedback data are more abundant. Thus, directly training the RS with collected data may lead to suboptimal performance due to selection bias inherited from the nature of implicit feedback. Methods such as reweighting have been proposed to tackle selection bias; however, these methods omit delayed feedback, which often occurs due to limited observation times. We propose a novel likelihood approach combining the assumed parametric model for delayed feedback and the reweighting method to address selection bias. Specifically, the proposed methods minimize the likelihood-based loss using the multi-task learning method. The proposed methods are evaluated on the real-world Coat and Yahoo datasets. The proposed methods improve the AUC by 5.7% on Coat and 3.7% on Yahoo compared with the best baseline models. The proposed methods successfully debias the CVR prediction model in the presence of delayed implicit feedback.

Keywords:

recommender system; implicit feedback; delayed feedback; conversion prediction; selection bias

1. Introduction

The recommender system (RS) is widely used to address the information overload problem in multiple fields, including online advertising [1], drug adverse event prediction [2], and personalized treatments [3]. Predicting the conversion rate (CVR) plays a central role in RS. Taking online advertising as an example, advertisers are interested in evaluating the purchase potential of each item for the users; in this example, the CVR of a user–item pair refers to the probability of the user purchasing the item. By accurately predicting the CVR, advertisers can grasp users’ preferences and provide personalized advertisements to the users. On top of this, advertisers can raise their advertisement earnings with limited advertisement investment by enhancing the overall CVR. In general, a good CVR prediction model presents advantages in inferring user preference and offering personalized recommendations, improving the usefulness of the RS.

In practice, high-quality explicit ratings or feedback may be lacking. Instead, there is a large amount of implicit feedback, which indirectly reflects opinions by observing user behaviors [4]. One of the most severe problems of implicit feedback is the selection bias due to the nature of implicit feedback [5]. By passively tracing users’ past behaviors, the RS cannot collect user behaviors if the item is not exposed to the user. The collection of observed user–item pairs with implicit feedback is often a set of missing-not-at-random samples for the population [5]. Thus, directly inferring user preference from the observed behavior may lead to suboptimal performance of the CVR prediction model.

The CVR prediction problem with implicit feedback has been studied for a long time [6,7,8,9]. To address the selection bias of implicit feedback, content-based approaches collect and utilize the features of users and items along with contextual information. This information helps to mine the users’ actual preferences by creating a profile of each user and item, then matching the user–item pairs by associating their profiles. In addition, methods incorporating propensity scores can help to alleviate the selection bias, including the inverse probability score weighting (IPS) and double robust (DR) methods. These methods have been applied to several popular models dealing with implicit feedback, including matrix factorization (MF) [10] and neural collaborative filtering (NCF) [11], resulting in a large number of approaches for tackling selection bias and predicting CVR using implicit feedback [12,13,14,15,16,17].

However, previous methods omit the delayed feedback problem, which often occurs in implicit feedback scenarios due to limited observation times. For example, if a user likes an item, the user will convert at a certain post-click timestamp. If the timestamp is outside the observation period, then negative feedback will be observed, resulting in a false negative sample. As pointed out by [18], delayed feedback is very common in practice, where most conversions occur after a certain period; 50% of conversions occur after one day post-click, and 13% of those occur after two weeks. Ignoring delayed feedback in predicting CVR leads to underestimation of the eventual CVR.

There have been a number of methods aiming to adjust for the naïve method of capturing the delayed feedback. Most of these tackle delayed feedback issues by assuming that a fixed proportion of the negative label will finally convert and that every user–item pair without conversions shares the same probability as a mislabel. However, this is not reliable, as it has been shown that the delayed time is not a uniform distribution, with most conversions occurring within the latest day after the click. To this end, Chapelle et al. [18] first introduced the survival analysis method to deal with delayed feedback in CVR prediction. This method assumes that the delay from click to conversion follows an exponential distribution with unknown expectations. The paper stresses that the models of delayed feedback and CVR prediction cannot be taken separately. The CVR prediction model was trained by maximizing the joint likelihood of both models with either the EM algorithm or direct optimization. Following this work, several authors have extended the exponential delayed feedback model to a more flexible Weibull model/mixture of Weibull model or even the nonparametric model, borrowing ideas from the good performance of kernel density estimation [19,20,21].

The above methods either deal with the selection bias brought about by the implicit feedback or the delayed feedback issues with explicit feedback separately; however, no methods have tackled the delayed feedback when only the implicit feedback is available for predicting the CVR. Disentangling these two problems in predicting CVR and ignorance of either side would lead to a biased CVR prediction. In this paper, we propose methods for unbiased CVR prediction in the presence of delayed implicit feedback. We apply two methods that have been validated as useful in debiasing RS in the context of implicit feedback, namely, the IPS and DR methods, and consider how to obtain unbiased CVR predictions with delayed feedback in these methods. We propose the joint likelihood of observed data in conversion logs with parameters in the CVR prediction and delayed feedback models. In parameter estimation, we minimize the joint loss function, that is, the negative log-likelihood with regularization. Most existing methods either address the selection bias using basic IPS or DR techniques or their variants, or capture the delayed feedback while assuming no selection bias. Compared with existing methods, our proposed joint likelihood loss and multi-task learning approach seeks to simultaneously optimize the CVR prediction model and the delayed feedback model; thus, the proposed models should be unbiased and more robust. We carried out experiments with the proposed method on two real-world datasets. The results demonstrate good performance in obtaining a higher CVR prediction accuracy with unbiased samples from the test datasets. The proposed approach outperforms competing models, including those proposed to address selection bias with implicit feedback and those dealing with delayed feedback, on multiple measurement scales.

2. Related Work

2.1. Conversion Rate (CVR) Prediction

Conversion rate prediction refers to estimating the probability of a user completing a predefined action (a conversion), such as making reviews/ratings for a movie, making a purchase, or signing up for a service in e-commerce. Fundamental models such as logistic regression and multi-layer perception can be used for prediction by leveraging some features from the users and items. More sophisticated methods such as the ESMM [22], ESCM2 [23], and ECMA [24], have been proposed and applied in different industries. However, these methods ignore the existence of delayed feedback in practice. ESCD [25] and ESDF [26] tackle the issue of delayed feedback for their CVR prediction models. Nonetheless, these approaches need more theoretical assurance. In [23], the authors proved that the results of ESMM [22] were biased. In this paper, we propose a likelihood-based loss function and train the proposed models by maximizing the likelihood. The theoretical properties of maximized likelihood estimators help to enhance the evidence of the proposed method.

2.2. Selection Bias in Implicit Feedback

Implicit feedback refers to indirect indicators of users’ preferences, as opposed of explicit ratings; examples include browsing data and purchase history. Explicit feedback such as ratings explicitly reflects user preferences. Unlike explicit feedback, which requires a user response, implicit feedback is often readily available. However, implicit feedback has several drawbacks that challenge researchers to make full use of it, such as inherent noise and lack of negative feedback [27]. Several techniques have been proposed to handle these challenges, including negative sampling and methods for denoising [28,29,30]. However, the above methods have not addressed the issue of selection bias in implicit feedback. Thus, we apply the IPS approach to adjust for the selection bias. The DR approach can further improve models for CVR prediction by introducing an imputation model. The DR approach avoids the out-of-bounds problem and reduces the variance compared with the IPS approach. A number of of methods for debiasing RS have been proposed based on IPS and DR [12,14,31,32,33,34].

2.3. Delayed Feedback

Delayed feedback refers to situations where there is time between the click and the conversion. Ignoring the possibility of delayed feedback and directly labeling such samples as negative leads to underestimation of the CVR. As the delay time can be treated as a random variable following specific distributions of a survival time, it links CVR prediction with survival analysis in statistics. Delayed feedback models (DFM) [18] address this problem by assuming the distribution of the delayed time as an exponential distribution with an expectation determined by a generalized linear model having some predefined features. In addition, other distributions might be effective for modeling the delayed feedback. In [19,20], the authors proposed modeling the delayed feedback using a single Weibull distribution and a mixture of Weibull distributions, respectively, while [21] proposed an efficient approach called NoDeF for modeling the distribution of the delayed feedback nonparametrically using kernel density estimation techniques. However, the above methods may fail to reflect the true distribution of the delayed feedback. To resolve this problem, FSIW [35], FNW/FNC [36], ESDFM [37], DEFER [38], and DEFUSE [39] utilize resampling. However, all of the above methods rely on explicit feedback. Our proposal aims to provide more unbiased and accurate CVR prediction models in the presence of delayed implicit feedback.

3. Problem Setup and Preliminaries

3.1. Problem Setup

We first show the basic setups in a CVR prediction problem without considering the time delay from click to conversion in the presence of implicit feedback. Let

U

and

I

be the sets of users and items in the RS, respectively. The user–item pairs are expressed by

D = {(u, i) ∣ u \in U, i \in I}

. We denote the conversion of user–item pair

(u, i)

by

r_{u, i}

. Conversions in the presence of implicit feedback often denote a variable that measures the user’s preference, such as whether the user views or purchases the item, rather than a direct rating or feedback to declare their like and dislike of the item(s). Without loss of generality, the conversion is assumed to be a binary variable, where

r_{u, i} = 1

means conversion and

r_{u, i} = 0

indicates a failure of conversion. In addition to the conversion outcomes, we denote the user–item features and the observation of

r_{u, i}

by

x_{u, i} \in R^{K}

and

o_{u, i} \in {0, 1}

for each user–item pair in the dataset, where

o_{u, i} = 1

indicates that

r_{u, i}

is observed and documented in the database. We denote the sets of observed user–item pairs in the dataset by

O = {(u, i) ∣ (u, i) \in D, o_{u, i} = 1}

. The complete observations can be represented by

{x_{u, i}, r_{u, i} ∣ (u, i) \in O} \cup {x_{u, i} ∣ (u, i) \in D ∖ O}

.

We assume a model for CVR prediction with parameters

θ

which seeks to predict

r_{u, i}

with user–item features

x_{u, i}

, which we denote by

{\hat{r}}_{u, i} = f (x_{u, i}, θ)

. The aim is to achieve the least biased CVR prediction for all user–item pairs among the population

D

. To this end, by assuming a prediction error

e r r_{u, i} = L (f (x_{u, i}, θ), r_{u, i})

, e.g., the mean square or cross-entropy error, the problem translates to minimizing the population loss, that is,

L_{p} (θ) = \frac{1}{| D |} \sum_{(u, i) \in D} e r r_{u, i} = \frac{1}{| D |} \sum_{(u, i) \in D} L (f (x_{u, i}, θ), r_{u, i}) .

However, there are also unobserved user–item pairs

(u, i) \in D ∖ O

. The naïve method optimizes the loss function on observed samples, which seeks to minimize the sample loss:

L_{s} (θ) = \frac{1}{| O |} \sum_{(u, i) \in O} e r r_{u, i} = \frac{1}{| O |} \sum_{(u, i) \in O} L (f (x_{u, i}, θ), r_{u, i}) .

(1)

If the set of observed samples

O

is a random selection from the population

D

without selection bias, in the other words, the selection indicator

o_{u, i}

is independent with

(x_{u, i}, r_{u, i})

, then we have

E (L_{s} (θ)) = E (L_{p} (θ))

; thus, minimizing the sample loss can asymptotically achieve the optimal estimation based on the population loss, leading to the unbiased CVR prediction model. However, due to the properties of implicit feedback there might exist a selection bias which renders

o_{u, i}

dependent with

(x_{u, i}, r_{u, i})

[14]. There have been a number of methods addressing this issue. The IPS method addresses this issue using the propensity score, that is, the probability of each user–item pair being observed, then weights the sample loss using the inverse propensity scores. The loss function of the IPS method is

L_{I P S} (θ) = \frac{1}{| O |} \sum_{(u, i) \in O} \frac{o_{u, i} e r r_{u, i}}{{\hat{p}}_{u, i}},

(2)

where

{\hat{p}}_{u, i}

is the estimated propensity score of user–item pair

(u, i)

. The IPS method reweighs the observed user–item pairs using the inverse propensity score. It has been verified that the IPS loss is unbiased of the population loss if the propensity score model is correctly specified, where

{\hat{p}}_{u, i} = E (o_{u, i} = 1 ∣ x_{u, i})

for all

(u, i) \in D

. The DR method aims to improve the IPS method by incorporating an imputation model

g (x_{u, i}, ϕ)

[14]. The loss function for the DR method is

L_{D R} (θ, ϕ) = \frac{1}{| O |} \sum_{(u, i) \in O} [\frac{o_{u, i} e r r_{u, i}}{{\hat{p}}_{u, i}} + (1 - \frac{o_{u, i}}{{\hat{p}}_{u, i}}) {\hat{e r r}}_{u, i}],

(3)

where

{\hat{p}}_{u, i}

is the estimated propensity score and

{\hat{e r r}}_{u, i} = L (g (x_{u, i}, ϕ), r_{u, i})

is the loss of the imputation model. It has been proven that the DR method can avoid out-of-bounds prediction for the CVR and reduce the variance compared with the IPS method with the help of the auxiliary imputation model.

3.2. Delayed Feedback and Survival Analysis

In the presence of delayed feedback, the naïve method mistakenly assigns negative labels to those samples with observation times that are too short to capture the eventual conversion status. Previous studies [18] have pointed out that this issue is closely related to censored time survival analysis. In survival analysis, researchers are interested in estimating the occurrence of a predefined event, such as the resurgence of the disease after a treatment and the incubation time before symptoms after an infection. To this end, estimation of the period during the starting time (time of treatment or infections) and the event time, called the survival time, is of great importance. The length of the survival time is always larger than zero; thus, an ordinary normal distribution cannot be applied, as some survival time may be missing when the actual survival time is longer than the observation time. This issue makes survival analysis different from general regression or classification problems.

In survival analysis, assuming the survival time

T (T > 0)

follows a distribution with the probability density function

f (t)

, the cumulative distribution function is denoted by

F (t) = \int_{0}^{t} f (s) d s

. The survival function is the probability of the survival time being longer than the time t, in other words, where the event does not occur before time t. The survival function is denoted by

S (t) = \Pr (T > t) = \int_{t}^{+ \infty} f (s) d s = 1 - F (t)

. We denote the observation time (or censored time in traditional survival analysis) by C. The probability that the event does not occur before the observation time is

S (C)

. The hazard function

λ (t)

, referring to the instantaneous occurrence rate of the event happening in time t conditional on the absence of occurrence before time t, plays an important role in survival analysis. It can be computed using

λ (t) = f (t) / S (t)

. The functions mentioned above can all be expressed as a function of

λ (t)

, that is,

\begin{matrix} S (t) & = exp (- \int_{0}^{t} λ (s) d s), \\ F (t) & = 1 - S (t) = 1 - exp (\int_{0}^{t} - λ (s) d s), \\ f (t) & = λ (t) S (t) = λ (t) exp (\int_{0}^{t} - λ (s) d s) . \end{matrix}

(4)

By designing

λ (t)

, it is possible to consider multiple types of survival time distributions, including exponential, gamma, Weibull, log-normal, etc. The simplest is the exponential distribution, where

λ (t) = λ

is a constant. Though simple, the exponential distribution has strong ability to fit the real-world survival time. Here, we apply the simplest exponential distribution for delayed feedback modeling and show its advantage in CVR prediction with implicit and delayed feedback without the need for a sophisticated delayed feedback model.

4. Materials and Methods

The goal of the proposed method is to predict the CVR of a user–item pair given its features. We denote this by the prediction model

f (x_{u, i}, θ)

. Overall, we realize this goal by assuming a parametric model for the delayed feedback and maximizing a newly proposed likelihood function given the parametric model and all the observed data. Directly maximizing the likelihood (1) can lead to biased estimates for the parameters and a biased CVR prediction model due to selection bias and delayed feedback. We revise the likelihood function (1) to tackle these two problems simultaneously. First, we assume an exponential model for the delayed feedback and use it to infer the probability of a conversation being observed/unobserved given its real status. In comparison, the naïve approach assumes that the observed status of the conversions implies the actual status for all user–item pairs, which is impossible in the presence of delayed feedback. Second, we incorporate an IPS/DR module to tackle the selection bias, which makes the estimated parameters robust against selection bias. We optimize the modified likelihood with multi-task learning to obtain a more unbiased CVR prediction model.

In the following, we use uppercase letters for random variables and lowercase letters for observed data. The variables in the model are provided below (note that the subscripts

(u, i)

are omitted in some cases for simplification):

$X :$ Set of features, including the features of the user and those of the item.
$O \in {0, 1} :$ Random variable indicating whether the user–item pair is observed in the dataset.
$C \in {0, 1} :$ Random variable indicating whether a conversion will finally happen.
$D \geq 0 :$ Delay between click and conversion.
$E \geq 0 :$ Observation time (elapsed time) after the click.
$R \in {0, 1} :$ Variable indicating whether the conversion occurs during the observation time.

As an example,

O (u, i)

represents the random variables indicating whether user–item pair

(u, i)

is observed, and

o_{u, i}

represents the realization of

O (u, i)

in the dataset.

In addition, there is a period between the item being displayed and the click. Mostly, the click occurs just a few seconds after the display, and is captured by the observation time between the display and the click; thus, we regard it as a momentary event. The connection between these variables plays a key role in describing the model. If a conversion finally occurs, it suggests that the conversion will finally happen, and the delay time between the click and conversion is captured by the observation time, that is,

R = 1 ⟺ C = 1, D \leq E .

(5)

If a conversion has not been observed, this may be attributed to two possible cases: either a conversion will not finally happen, or the observation time is too short to capture the delayed feedback, that is,

R = 0 ⟺ C = 0 or C = 1, D > E .

(6)

We can observe Y if and only if

O = 1

; thus, some

r_{u, i} = R (u, i)

would be lacking. We assume independence between O and

(C, D, E)

given the features X, that is,

\Pr (C, D, E ∣ X, O = 1) = \Pr (C, D, E ∣ X) .

(7)

This assumption corresponds to the ignorability assumption under the causal inference [40], and is often assumed to guarantee identifiability. We also assume independence of the pair

(C, D)

and the observation time E given the set of features X, i.e.,

\Pr (C, D ∣ X, E) = \Pr (C, D ∣ X) .

(8)

This assumption holds, as the analysis system decides the observation time irrespective of the nature of the users or items; in contrast, the random variables

(C, D)

are determined by the intrinsic nature of the user–item interactions.

From the datasets, the conversions, features, and observation times for the observed user–item pairs are given. Following the previous section, we denote the whole set of user–item pairs by

D

and the set of observed user–item pairs by

O

. The observed datasets consist of triplets

(x_{u, i}, r_{u, i}, e_{u, i})

for

(u, i) \in O

. As for those with

r_{u, i} = 1

, the delay time from click to conversion

d_{u, i}

is given, forming an alternative triplet

(x_{u, i}, r_{u, i}, d_{u, i})

. To fit the data, we first derive the probability of

R (u, i) = r_{u, i} = 1, D (u, i) = d_{u, i}

and

R (u, i) = r_{u, i} = 0, E (u, i) = e_{u, i}

given

X (u, i) = x_{u, i}

and the parameters of the models, which are the foundation of the overall likelihood function. First, according to (5), we have

\begin{matrix} \Pr (r_{u, i} = 1, D (u, i) = d_{u, i} ∣ X (u, i) = x_{u, i}) \\ = & \Pr (D (u, i) = d_{u, i}, C (u, i) = 1 ∣ X (u, i) = x_{u, i}) \\ = & \Pr (D (u, i) = d_{u, i} ∣ X (u, i) = x_{u, i}, C (u, i) = 1) \Pr (C (u, i) = 1 ∣ X = x_{u, i}), \end{matrix}

(9)

and according to (6),

\begin{matrix} \Pr (r_{u, i} = 0 ∣ X (u, i) = x_{u, i}, E (u, i) = e_{u, i}) \\ = & \Pr (D (u, i) > e_{u, i}, C (u, i) = 1 ∣ X (u, i) = x_{u, i}, E = e_{u, i}) + \Pr (C (u, i) = 0 ∣ X (u, i) = x_{u, i}, E (u, i) = e_{u, i}) . \end{matrix}

(10)

We use two models to fit the observed data, including models for

\Pr (D (u, i) = d_{u, i} ∣ X (u, i) = x_{u, i}, C (u, i) = 1)

and

\Pr (C (u, i) = 1 ∣ X (u, i) = x_{u, i})

. The first model is the distribution of the delayed time and the latter is the CVR prediction model, which is the main focus of our problem. We assume the hazard function as

λ (x, t) (t \geq 0)

. By changing

λ (x, t)

, it is possible to obtain multiple parametric models for the delayed feedback, e.g., Weibull distribution, gamma distribution, log-normal distribution, log-logistic distribution, etc. The most common is the exponential distribution, with

λ (x, t) = λ (x)

as time-invariant. Without loss of generality, we assume the delayed feedback model as the simplest exponential distribution. Thus, according to (4),

\Pr (D (u, i) = d_{u, i} ∣ X (u, i) = x_{u, i}, C (u, i) = 1) = λ (x_{u, i}) exp (- λ (x_{u, i}) d_{u, i})

(11)

and

\Pr (D (u, i) > e_{u, i} ∣ X (u, i) = x_{u, i}, C (u, i) = 1) = S (x_{u, i}, e_{u, i}) = exp (- λ (x_{u, i}) e_{u, i}) .

(12)

We model the connection of the hazard function

λ (x_{u, i})

with a modified linear regression, that is,

λ (x_{u, i}) = Max (0, β \cdot x_{u, i})

, where

β

is an unknown parameter to be estimated. On constructing the CVR prediction model

\Pr (C (u, i) = 1 ∣ X (u, i) = x_{u, i})

, we use matrix factorization, which assumes a latent embedding of users, denoted by

H_{u}

for user

u \in U

and

W_{i}

for item i. The probability of conversion for user–item pair

(u, i)

is denoted by

{\hat{r}}_{u, i} = 1 / [1 + exp (- H_{u} \cdot W_{i})]

. Then, we have

\begin{matrix} \Pr (C (u, i) = 1 ∣ X (u, i) = x_{u, i}) & = {\hat{r}}_{u, i}, \\ \Pr (C (u, i) = 0 ∣ X (u, i) = x_{u, i}) & = 1 - {\hat{r}}_{u, i} . \end{matrix}

(13)

According to Equations (9) to (13), the loss function of a single user–item pair

(u, i)

is provided by the modified cross-entropy loss, which is also the negative log-likelihood function:

\begin{matrix} e r r_{u, i} & = r_{u, i} \log [\Pr (r_{u, i} = 1, D (u, i) = d_{u, i} ∣ X (u, i) = x_{u, i})] \\ + (1 - r_{u, i}) \log [\Pr (r_{u, i} = 0 ∣ X (u, i) = x_{u, i}, E (u, i) = e_{u, i})] \\ = r_{u, i} \log [{\hat{r}}_{u, i} λ (x_{u, i}) exp (- λ (x_{u, i} d_{u, i}))] + (1 - r_{u, i}) \log [{\hat{r}}_{u, i} exp (- λ (x_{u, i} e_{u, i})) + 1 - {\hat{r}}_{u, i}] . \end{matrix}

(14)

We propose two loss functions based on the IPS and DR approaches. For the one incorporating the IPS approach, we obtain the joint loss function

L_{I P S} (θ, β) = \frac{1}{| O |} \sum_{(u, i) \in O} \frac{o_{u, i} e r r_{u, i}}{{\hat{p}}_{u, i}}

, where

e r r_{u, i}

refers to (14). The parameters are optimized by minimizing the regularized loss function

arg min_{θ, β} L_{I P S} (θ, β) + \frac{γ}{2} {∥ θ ∥}_{2}^{2} + \frac{μ}{2} {∥ β ∥}_{2}^{2},

(15)

where

(γ, μ)

are regularization parameters. We call this method MF-IPS+DeF.

For the loss function incorporating the DR approach, we assume another imputation model, which is also encoded with an MF denoted by

{\hat{r r}}_{u, i} = g (x_{u, i}, ϕ)

. Thus,

{\hat{e r r}}_{u, i}

is provided by the binary cross-entropy loss, which is also the negative log-likelihood function:

{\hat{e r r}}_{u, i} = {\hat{r r}}_{u, i} log ({\hat{r}}_{u, i}) + (1 - {\hat{r r}}_{u, i}) log (1 - {\hat{r}}_{u, i}) .

(16)

We then obtain the joint loss

L_{D R} (θ, β, ϕ) = \frac{1}{| O |} \sum_{(u, i) \in O} [\frac{o_{u, i} e r r_{u, i}}{{\hat{p}}_{u, i}} + (1 - \frac{o_{u, i}}{{\hat{p}}_{u, i}}) {\hat{e r r}}_{u, i}]

, where

e r r_{u, i}

refers to (14) and

{\hat{e r r}}_{u, i}

refers to (16). The parameters are obtained by minimizing the loss with regularization:

arg min_{θ, β, ϕ} L_{D R} (θ, β, ϕ) + \frac{γ}{2} {∥ θ ∥}_{2}^{2} + \frac{μ}{2} {∥ β ∥}_{2}^{2} + \frac{τ}{2} {∥ ϕ ∥}_{2}^{2},

(17)

where

(γ, μ, τ)

are regularization parameters. We jointly learn the parameters in the CVR prediction model and the imputation model using the multi-task learning technique. We call this method MF-DR-JL+DeF.

5. Results

This section presents the results of our experiments comparing the proposed methods and competing methods for predicting the CVR on two real-world datasets.

5.1. Dataset Preparation

Following previous studies [14,33,34], we conducted real-world experiments on the widely used Coat [12] and Yahoo [41] benchmark datasets. Both datasets consist of training and test sets, with the two sets in each study sharing the same sets of users and items. The training datasets are biased datasets that suffer from selection bias, while the test datasets are unbiased. The Coat dataset consists of 6960 user–item pairs in the training set and 4640 user–item pairs in the test set, with 290 users and 300 items; each user–item pair is represented by a rating from 1 to 5. We binarized these using a threshold of 3, with ratings lower than 3 assigned a score of zero 0 and a score of 1 being assigned otherwise. The Yahoo dataset consists of 311,704 user–item pairs in the training set and 54,000 in the test set. We binarized these similarly, again using a five-point scale.

We simulated the delayed feedback as follows: first, we extracted the features of users and items through a basic MF method applied to the training set. We concatenated the features of users and items for the representation of the features of user–item pairs

x_{u, i}

, then fixed the overall training period L as a constant. The user may randomly click an item at a timestamp

t_{c l i c k}

within the training period. We sampled

t_{u, i}^{c l i c k}

from a uniform distribution

Unif (0, L)

, while the observation time (or elapsed time) was

E_{u, i} = L - t_{u, i}^{c l i c k}

. We assumed that the delayed time followed an exponential distribution

D_{u, i} \sim Exp (λ (x_{u, i}))

, where the hazard function

λ (x_{u, i}) = \exp (W_{d} \cdot x_{u, i})

,

W_{d}

is randomly sampled from a prior multivariate normal distribution

N (0_{p}, σ_{H}^{2} I_{p})

, and

σ_{H}

is a hyperparameter. In the presence of delayed feedback, the user–item pairs with

D_{u, i} > E_{u, i}

might be mistakenly labeled as negative samples. We set the labels of these samples to 0 in the training set, and set

L = 5

and

σ_{H} = 0.1

in the basic setups, rendering around 20% of positive samples in the training set mislabeled as 0 due to delayed feedback. We trained the proposed method and competing methods on the modified training set, then tested them on the unbiased test set.

5.2. Training Details

For the proposed MF-IPS+DeF and MF-DR-JL+DeF methods, we minimized the loss function with the Adam optimizer. We tuned the learning rate in

{0.01, 0.015, 0.02, 0.03, 0.05, 0.1}

and weight decay in

[1 \times 10^{- 6}, 5 \times 10^{- 2}]

. The training was stopped if there were five epochs with a relative loss reduction lower than

1 \times 10^{- 5}

. In the proposal with DR, we set different weight decay rates in the CVR prediction model/imputation model and the delayed feedback model, then tuned them separately. The batch size was set to 128 for the Coat dataset and 256 for the Yahoo dataset. All the results and figures were generated with Python codes. We trained the models with Python 3.11.9 and Pytorch 2.3.0. The figures were generated with matplotlib 3.8.4.

5.3. Baselines

We want to show the performance of the proposed methods in the presence of both implicit and delayed feedback; however, the baseline methods for comparison include methods that address selection bias through the IPS approach, which follow the DR approach while ignoring the delayed feedback, and those that capturing the delayed feedback with parametric models, which neglect the selection bias. As our proposed proposed CVR prediction model addresses delayed feedback using MF as the base model, we compare it with other MF-based methods such as MF [10], MF-IPS [12], MF-ASIPS [31], MF-SNIPS [12], and MF-DR [31], MF-DR-JL [14], and MF-MRDR-JL [32]. MF-IPS, MF-ASIPS, and MF-SNIPS are variants of the base MF method with base or advanced IPS approaches, while MF-DR, MF-DR-JL, and MF-MRDR-JL are variants of the base MF method that incorporate DR techniques; all of these methods overlook the issue of delayed feedback. We compared the proposed approach with the delayed feedback models, assuming different distributions for the delayed time. We used the method in [18], called the DFM, as the base model. In [18], the authors proposed considering the delayed time as an exponential distribution and used a multi-layer perception (MLP) to learn the hazard function. They used the Weibull and log-normal distributions, which are popular in survival analysis, to model the delayed time. To distinguish them, they referred to these as DFM-EXP, DFM-Weibull, and DFM-LN. All three of these methods are designed to capture the delayed feedback, but neglect the impact of selection bias in the datasets.

5.4. Evaluation Protocols

We adopted three widely used metrics for evaluation: AUC, NDCG@K, and Recall@K. AUC stands for area under the receiver operating curve, which is a widely used metric in diagnostic studies to evaluate the accuracy of a diagnostic test by synthesizing the pairs of sensitivity and specificity across all cutoff points of the continuous biomarker of the test. NDCG@K stands for the normalized discounted cumulative gain at K, which compares the top K predicted items for each user and the actual preference of each user. Recall@K aggregates and averages the proportion of correctly specified items among the top K predicted items for each user. Both NDCG@K and Recall@K are useful for comparing the ranking performance of different ranking algorithms. We set

K = 10

for Coat and

K = 5

for Yahoo. Each model was trained for ten rounds, retrieving the mean and standard error as the eventual results.

5.5. Performance Comparisons

The prediction performance of the proposals and the baselines for the two datasets are shown in Table 1. Among the baselines, the DR-based methods perform better than the IPS-based and delayed feedback models. For the Coat dataset, the MF-DR model obtains the highest AUC. For the Yahoo dataset, the MF-MRDR-JL, which jointly learns both the imputation model and the prediction model, achieves the best AUC. Both use the DR approach to eliminate the selection bias, indicating the effectiveness of this approach. The DFM model, which incorporates the delayed feedback into CVR prediction, improves the AUC of MLP by around 0.02 and largely enhances NDCG@K and Recall@K, suggesting the importance of training a delayed feedback model and including it in the loss function. In contrast, the DFM-based methods, which consider the delayed feedback but ignore the selection bias, perform significantly worse than the SOTA MF-based baseline methods and the proposed methods, indicating that it is particularly important to adjust the selection bias when addressing CVR prediction with implicit feedback. The proposed approach significantly outperforms the state-of-the-art (SOTA) baselines on all evaluation metrics. The proposed MF-DR-JL+DeF method improves upon the AUC of the MF-DR method by 5.7% on the Coat dataset, and the proposed MF-IPS+DeF method improves upon the AUC of the MF-MRDR-JL method by 3.7% on the Yahoo dataset. While the MF-IPS method performs significantly worse than the SOTA baselines, the proposed MF-IPS+DeF method greatly improves upon its the performance. From this perspective, the delayed feedback module plays an important role in accurate CVR prediction; indeed, both the delayed feedback module and adjusting for selection bias are indispensable.

5.6. Sensitivity Analysis

To comprehensively evaluate the proposed method, we conducted a sensitivity analysis with the proposed MF-IPS+DeF and MF-DR-JL+DeF methods on the Coat dataset and compared them with two baselines: the optimal DFM model (DFM-Weibull) and the optimal MF-based method (MF-DR). First, we varied the mislabeling ratio (MR), that is, the proportion of positive samples mislabeled as negative with observation times shorter than the delay time, from 10% to 30%. We set different MRs by changing the total observation time L;

L = 3.2, 3.92, 5, 6.7, 9.7

correspond to MR =

10.02 %, 15.05 %, 19.66 %, 24.99 %, 30.12 %

, respectively, or approximately 10%, 15%, 20%, 25%, and 30%. The results are shown in Figure 1. The proposed methods consistently outperform the baselines in terms of AUC, NDGC@K, and Recall@K. Compared with the other two methods, MF-DR-JL+DeF shows a comparably small decrement on all metrics when the MR increases, suggesting its robustness. The DFM-Weibull method consistently performs the worst, and there is no significant variation in its performance across different MRs. This showcases the importance of adjusting for selection bias in CVR prediction with implicit feedback.

We additionally explored the effect of different distributions of the delayed feedback on the performance of the proposed methods, as shown in Figure 2. In addition to the exponential distribution, we set two other distributions (the Weibull and log-normal distributions) for the delay time. The MR was fixed to around 20%. Note that even for the other distributions, the proposed MF-IPS+DeF and MF-DR-JL+DeF methods only adopted the exponential distribution when constructing the loss function. The performance of the proposed methods under the different distribution shows no significant differences, with both methods consistently outperforming the baselines. This suggests the robustness of the proposals under mis-specification of the delay time distribution.

5.7. Comments

From the results of the performance comparisons under the original setups and the sensitivity analysis, different parametric distributions of the delayed feedback have little impact on the predicted CVR. Thus, we recommend using the easiest exponential model for the delayed feedback. The proposed MF-DR-JL+DeF method is robust against an increasing mislabeling ratio caused by delayed feedback, while the other methods may not be as robust. In practice, we recommend the MF-DR-JL+DeF method for an unbiased and robust CVR prediction model in the presence of delayed and implicit feedback.

6. Discussion

We experimented with the proposed methods on two real-world datasets and compared the results with extensive baselines. Of all the baseline methods, the ones with the optimal AUCs were all DR-based methods. Compared with the IPS approach, DR methods incorporate an imputation model to reduce the variance. Of the three baseline delayed feedback models, there were no significant differences in prediction performance. A possible reason for this result is that the actual distribution of the delay time is generated from the simplest exponential distribution. Thus, delayed feedback models with more complex Weibull and log-normal distributions also capture the actual exponential distribution of the delay time. Therefore, we conducted a sensitivity analysis, varying the actual underlying distribution of the delayed feedback. The results show that the performance of these methods is consistent across different actual delayed feedback distributions. Thus, we recommend using the simplest exponential distribution to model the delayed feedback.

We only used MF as the base model in the proposed method. In addition to the basic MF model, other methods such as BPR and NCF [11,42] can be applied to implicit feedback problems. These baseline models can be easily combined with the proposed method by adjusting the loss functions, and may further improve the CVR prediction accuracy. In addition, we only tried two approaches for tackling selection bias with implicit feedback: the basic IPS method, and the DR method with joint learning. There are several more recent methods studying eliminating the selection bias based on the basic IPS and DR methods, such as by balancing the covariates or proposing better propensity score estimators [34,43,44]. The proposed method can be extended to incorporate these methods for further improvement.

There are several limitations in the proposed method that must be addressed. First, the proposed method relies on the joint loss, which requires an explicit form of the distribution for the delayed feedback. The proposed models only use the exponential distribution, and show good performance even when the actual distribution is a Weibull or log-normal distribution. Nevertheless, the distribution of the delayed feedback may be complex and have multiple peaks or periodicity [21]. For example, users may view the purchase web pages and complete conversions in a certain period during the day, called the peak user hours. To this end, several methods have been proposed to capture the complex distribution of delayed feedback [35,36,37,38,39]. We intend to extend our proposed method in future work to more accurately model the delayed feedback. Second, we only considered static CVR prediction in the problem setups. Other topics that have raised attention in recent times include multi-touch attribution settings and streaming CVR prediction [18,19,37,45,46]. In further work, it would be appealing to test the usefulness of the proposed method or its extensions for addressing these problems.

7. Conclusions

In this paper, we have proposed a novel method for debiasing CVR prediction in RS in the presence of delayed and implicit feedback. We tackle selection bias and mislabeling due to delayed feedback by devising a new likelihood-based loss function that jointly incorporates models for the delayed time and debiasing the CVR prediction through the IPS technique and DR method. We conducted extensive experiments on two real-world datasets, showing that the proposed methods can obtain state-of-the-art performance in the presence of implicit and delayed feedback. The results of our sensitivity analysis validate the robustness of the proposed MF-DR-JL+DeF model, which we recommend for practical use.

Author Contributions

Conceptualization, T.H. and X.-H.Z.; methodology, T.H.; software, T.H.; validation, T.H. and X.-H.Z.; data curation, T.H.; writing—original draft preparation, T.H.; writing—review and editing, T.H. and X.-H.Z.; supervision, X.-H.Z.; funding acquisition, X.-H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by supported by National Key R&D Program of China (No. 2021YFF0901400).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data and codes presented in the study are openly available at https://github.com/Taojun-Hu/MF-DeF, accessed on 30 July 2024.

Acknowledgments

The authors would like to thank the two reviewers for their constructive comments, which greatly improved this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, X.; Gu, C.; Zhang, H.; Yang, X.; Liu, X.; Tang, J.; Liu, H. Dear: Deep reinforcement learning for online advertising impression in recommender systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 750–758. [Google Scholar]
Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef] [PubMed]
Galeano, D.; Paccanaro, A. A recommender system approach for predicting drug side effects. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; IEEE: New York, NY, USA, 2018; pp. 1–8. [Google Scholar]
Oard, D.W.; Kim, J. Implicit feedback for recommender systems. In Proceedings of the AAAI Workshop on Recommendation Systems; AAAI Press: Madison, WI, USA, 1998; Volume 83, pp. 81–83. [Google Scholar]
Lee, J.w.; Park, S.; Lee, J. Dual unbiased recommender learning for implicit feedback. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 1647–1651. [Google Scholar]
Ding, X.; Tang, J.; Liu, T.; Xu, C.; Zhang, Y.; Shi, F.; Jiang, Q.; Shen, D. Infer implicit contexts in real-time online-to-offline recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2336–2346. [Google Scholar]
Dai, X.; Hou, J.; Liu, Q.; Xi, Y.; Tang, R.; Zhang, W.; He, X.; Wang, J.; Yu, Y. U-rank: Utility-oriented learning to rank with implicit feedback. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2373–2380. [Google Scholar]
Tarun Kumar, I.; Karwa, L.; Shana, J. Designing a Recommender System for Articles Using Implicit Feedback. In Computer Vision and Robotics: Proceedings of CVR 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 15–23. [Google Scholar]
Liu, Q.; Li, H.; Ao, X.; Guo, Y.; Dong, Z.; Zhang, R.; Chen, Q.; Tong, J.; He, Q. Online Conversion Rate Prediction via Neural Satellite Networks in Delayed Feedback Advertising. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 27–23 July 2023; pp. 1406–1415. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Schnabel, T.; Swaminathan, A.; Singh, A.; Chandak, N.; Joachims, T. Recommendations as treatments: Debiasing learning and evaluation. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1670–1679. [Google Scholar]
Saito, Y.; Yaginuma, S.; Nishino, Y.; Sakata, H.; Nakata, K. Unbiased recommender learning from missing-not-at-random implicit feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 501–509. [Google Scholar]
Wang, X.; Zhang, R.; Sun, Y.; Qi, J. Doubly robust joint learning for recommendation on data missing not at random. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6638–6647. [Google Scholar]
Wang, X.; Zhang, R.; Sun, Y.; Qi, J. Combating selection biases in recommender systems with a few unbiased ratings. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 427–435. [Google Scholar]
Li, H.; Lyu, Y.; Zheng, C.; Wu, P. TDR-CL: Targeted doubly robust collaborative learning for debiased recommendations. In Proceedings of the the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Li, H.; Zheng, C.; Wu, P. StableDR: Stabilized doubly robust learning for recommendation on data missing not at random. In Proceedings of the the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Chapelle, O. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 1097–1105. [Google Scholar]
Ji, W.; Wang, X.; Zhu, F. Time-aware conversion prediction. Front. Comput. Sci. 2017, 11, 702–716. [Google Scholar] [CrossRef]
Safari, A.; Altman, R.M.; Loughin, T.M. Display advertising: Estimating conversion probability efficiently. arXiv 2017, arXiv:1710.08583. [Google Scholar]
Yoshikawa, Y.; Imai, Y. A Nonparametric Delayed Feedback Model for Conversion Rate Prediction. arXiv 2018, arXiv:1802.00255. [Google Scholar]
Ma, X.; Zhao, L.; Huang, G.; Wang, Z.; Hu, Z.; Zhu, X.; Gai, K. Entire space multi-task model: An effective approach for estimating post-click conversion rate. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 1137–1140. [Google Scholar]
Wang, H.; Chang, T.W.; Liu, T.; Huang, J.; Chen, Z.; Yu, C.; Li, R.; Chu, W. ESCM2: Entire space counterfactual multi-task model for post-click conversion rate estimation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 363–372. [Google Scholar]
Liu, Y.; Dai, H.; Li, B.; Li, J.; Yang, G.; Wang, J. ECMA: An efficient convoy mining algorithm for moving objects. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual, 1–5 November 2021; pp. 1089–1098. [Google Scholar]
Zhao, Y.; Yan, X.; Gui, X.; Han, S.; Sheng, X.R.; Yu, G.; Chen, J.; Xu, Z.; Zheng, B. Entire Space Cascade Delayed Feedback Modeling for Effective Conversion Rate Prediction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4981–4987. [Google Scholar]
Wang, Y.; Zhang, J.; Da, Q.; Zeng, A. Delayed feedback modeling for the entire space conversion rate prediction. arXiv 2020, arXiv:2011.11826. [Google Scholar]
Hu, Y.; Koren, Y.; Volinsky, C. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; IEEE: New York, NY, USA, 2008; pp. 263–272. [Google Scholar]
Wang, W.; Feng, F.; He, X.; Nie, L.; Chua, T.S. Denoising implicit feedback for recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 373–381. [Google Scholar]
Zhao, J.; Wenjie, W.; Xu, Y.; Sun, T.; Feng, F.; Chua, T.S. Denoising diffusion recommender model. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 1370–1379. [Google Scholar]
Dai, Q.; Lv, Y.; Zhu, J.; Ye, J.; Dong, Z.; Zhang, R.; Xia, S.T.; Tang, R. LCD: Adaptive label correction for denoising music recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 3903–3907. [Google Scholar]
Saito, Y. Asymmetric tri-training for debiasing missing-not-at-random explicit feedback. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 309–318. [Google Scholar]
Guo, S.; Zou, L.; Liu, Y.; Ye, W.; Cheng, S.; Wang, S.; Chen, H.; Yin, D.; Chang, Y. Enhanced doubly robust learning for debiasing post-click conversion rate estimation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 275–284. [Google Scholar]
Chen, J.; Dong, H.; Qiu, Y.; He, X.; Xin, X.; Chen, L.; Lin, G.; Yang, K. AutoDebias: Learning to debias for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 21–30. [Google Scholar]
Li, H.; Zheng, C.; Xiao, Y.; Wu, P.; Geng, Z.; Chen, X.; Cui, P. Debiased collaborative filtering with kernel-based causal balancing. In Proceedings of the 13th International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Yasui, S.; Morishita, G.; Komei, F.; Shibata, M. A feedback shift correction in predicting conversion rates under delayed feedback. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2740–2746. [Google Scholar]
Ktena, S.I.; Tejani, A.; Theis, L.; Myana, P.K.; Dilipkumar, D.; Huszár, F.; Yoo, S.; Shi, W. Addressing delayed feedback for continuous training with neural networks in CTR prediction. In Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, 16–20 September 2019; pp. 187–195. [Google Scholar]
Yang, J.Q.; Li, X.; Han, S.; Zhuang, T.; Zhan, D.C.; Zeng, X.; Tong, B. Capturing delayed feedback in conversion rate prediction via elapsed-time sampling. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 4582–4589. [Google Scholar]
Gu, S.; Sheng, X.R.; Fan, Y.; Zhou, G.; Zhu, X. Real negatives matter: Continuous training with real negatives for delayed feedback modeling. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual, 14–18 August 2021; pp. 2890–2898. [Google Scholar]
Chen, Y.; Jin, J.; Zhao, H.; Wang, P.; Liu, G.; Xu, J.; Zheng, B. Asymptotically unbiased estimation for delayed feedback modeling via label correction. In Proceedings of the ACM Web Conference 2022, Virtual, 25–29 April 2022; pp. 369–379. [Google Scholar]
Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Marlin, B.M.; Zemel, R.S. Collaborative prediction and ranking with non-random missing data. In Proceedings of the Third ACM Conference on Recommender Systems, New York, NY, USA, 23–25 October 2009; pp. 5–12. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
Sant’Anna, P.H.; Song, X.; Xu, Q. Covariate distribution balance via propensity scores. J. Appl. Econom. 2022, 37, 1093–1120. [Google Scholar] [CrossRef]
Li, H.; Xiao, Y.; Zheng, C.; Wu, P.; Cui, P. Propensity matters: Measuring and enhancing balancing for recommendation. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 20182–20194. [Google Scholar]
Zhang, Y.; Wei, Y.; Ren, J. Multi-touch attribution in online advertising with survival theory. In Proceedings of the 2014 IEEE International Conference on Data Mining, Washington, DC, USA, 14–17 December 2014; IEEE: New York, NY, USA, 2014; pp. 687–696. [Google Scholar]
Ji, W.; Wang, X.; Zhang, D. A probabilistic multi-touch attribution model for online advertising. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 1373–1382. [Google Scholar]

Figure 1. Effects of the mislabeling ratio (%) on AUC, NDGC@10, and Recall@10 on the Coat dataset.

Figure 2. Effects of different distributions of the delay time in terms of AUC, NDGC@10, and Recall@10 on the Coat dataset.

Table 1. Performance in terms of AUC, NDCG@K, and Recall@K on Coat and Yahoo. The best two results are bolded; the best baseline results for DFM methods with delay feedback and IPS/DR-based methods are underlined.

	Coat			Yahoo
Method	AUC	NDCG@10	RECALL@10	AUC	NDCG@5	RECALL@5
MLP	0.617 $\pm 0.001$	0.505 $\pm 0.000$	0.551 $\pm 0.001$	0.573 $\pm 0.006$	0.569 $\pm 0.009$	0.336 $\pm 0.009$
DFM-EXP	0.632 $\pm 0.000$	0.605 $\pm 0.000$	0.645 $\pm 0.000$	0.598 $\pm 0.001$	0.645 $\pm 0.001$	0.413 $\pm 0.001$
DFM-Weibull	0.633 $\pm 0.000$	0.605 $\pm 0.000$	0.645 $\pm 0.000$	0.598 $\pm 0.001$	0.645 $\pm 0.001$	0.413 $\pm 0.002$
DFM-LN	0.631 $\pm 0.000$	0.605 $\pm 0.000$	0.644 $\pm 0.000$	0.598 $\pm 0.001$	0.645 $\pm 0.001$	0.413 $\pm 0.002$
MF	0.652 $\pm 0.009$	0.650 $\pm 0.007$	0.688 $\pm 0.007$	0.500 $\pm 0.002$	0.598 $\pm 0.002$	0.366 $\pm 0.002$
+ IPS	0.642 $\pm 0.010$	0.640 $\pm 0.005$	0.684 $\pm 0.010$	0.609 $\pm 0.002$	0.601 $\pm 0.003$	0.382 $\pm 0.004$
+ ASIPS	0.656 $\pm 0.008$	0.646 $\pm 0.010$	0.685 $\pm 0.011$	0.606 $\pm 0.003$	0.618 $\pm 0.004$	0.400 $\pm 0.004$
+ SNIPS	0.650 $\pm 0.006$	0.652 $\pm 0.008$	0.691 $\pm 0.006$	0.563 $\pm 0.004$	0.588 $\pm 0.004$	0.355 $\pm 0.005$
+ DR	0.685 $\pm 0.007$	0.652 $\pm 0.008$	0.689 $\pm 0.007$	0.604 $\pm 0.004$	0.626 $\pm 0.003$	0.399 $\pm 0.003$
+ DR-JL	0.658 $\pm 0.009$	0.656 $\pm 0.008$	0.690 $\pm 0.007$	0.627 $\pm 0.002$	0.601 $\pm 0.003$	0.381 $\pm 0.004$
+ MRDR-JL	0.678 $\pm 0.007$	0.666 $\pm 0.008$	0.692 $\pm 0.008$	0.629 $\pm 0.003$	0.611 $\pm 0.004$	0.398 $\pm 0.005$
+ IPS+DeF	0.698 $\pm 0.004$	0.674 $\pm 0.006$	0.695 $\pm 0.008$	0.652 $\pm 0.002$	0.600 $\pm 0.002$	0.373 $\pm 0.004$
+ DR-JL+DeF	0.724 $\pm 0.010$	0.695 $\pm 0.021$	0.721 $\pm 0.015$	0.644 $\pm 0.013$	0.665 $\pm 0.020$	0.432 $\pm 0.018$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, T.; Zhou, X.-H. Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback. Entropy 2024, 26, 792. https://doi.org/10.3390/e26090792

AMA Style

Hu T, Zhou X-H. Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback. Entropy. 2024; 26(9):792. https://doi.org/10.3390/e26090792

Chicago/Turabian Style

Hu, Taojun, and Xiao-Hua Zhou. 2024. "Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback" Entropy 26, no. 9: 792. https://doi.org/10.3390/e26090792

APA Style

Hu, T., & Zhou, X.-H. (2024). Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback. Entropy, 26(9), 792. https://doi.org/10.3390/e26090792

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Debiasing the Conversion Rate Prediction Model in the Presence of Delayed Implicit Feedback

Abstract

1. Introduction

2. Related Work

2.1. Conversion Rate (CVR) Prediction

2.2. Selection Bias in Implicit Feedback

2.3. Delayed Feedback

3. Problem Setup and Preliminaries

3.1. Problem Setup

3.2. Delayed Feedback and Survival Analysis

4. Materials and Methods

5. Results

5.1. Dataset Preparation

5.2. Training Details

5.3. Baselines

5.4. Evaluation Protocols

5.5. Performance Comparisons

5.6. Sensitivity Analysis

5.7. Comments

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI