Model Averaging for Accelerated Failure Time Models with Missing Censoring Indicators

Liao, Longbiao; Liu, Jinghao

doi:10.3390/math12050641

Open AccessArticle

Model Averaging for Accelerated Failure Time Models with Missing Censoring Indicators

by

Longbiao Liao

and

Jinghao Liu

^*

Department of Statistics and Data Science, School of Economics, Jinan University, Guangzhou 510632, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(5), 641; https://doi.org/10.3390/math12050641

Submission received: 23 January 2024 / Revised: 16 February 2024 / Accepted: 20 February 2024 / Published: 22 February 2024

(This article belongs to the Section Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Model averaging has become a crucial statistical methodology, especially in situations where numerous models vie to elucidate a phenomenon. Over the past two decades, there has been substantial advancement in the theory of model averaging. However, a gap remains in the field regarding model averaging in the presence of missing censoring indicators. Therefore, in this paper, we present a new model-averaging method for accelerated failure time models with right censored data when censoring indicators are missing. The model-averaging weights are determined by minimizing the Mallows criterion. Under mild conditions, the calculated weights exhibit asymptotic optimality, leading to the model-averaging estimator achieving the lowest squared error asymptotically. Monte Carlo simulations demonstrate that the method proposed in this paper has lower mean squared errors compared to other model-selection and model-averaging methods. Finally, we conducted an empirical analysis using the real-world Acute Myeloid Leukemia (AML) dataset. The results of the empirical analysis demonstrate that the method proposed in this paper outperforms existing approaches in terms of predictive performance.

Keywords:

model averaging; accelerated failure time model; censoring indicator

MSC:

62D10; 62N01; 62N02

1. Introduction

In some practical scenarios, we often need to select useful models from a candidate model set. A popular approach to address this issue is model selection. Methods such as the Akaike Information Criterion (AIC) [1], Mallows’ Cp [2] and Bayesian Information Criterion (BIC) [3] are designed to identify the best model. However, in cases where a single model does not receive strong support from the data, these model-selection methods may overlook valuable information from other candidate models, leading to issues of model-selection uncertainty and bias [4].

To tackle these challenges and enhance prediction accuracy, several model-averaging techniques have been developed to leverage all information from the candidate models. Taking inspiration from AIC and BIC, Buckland et al. [5] proposed smoothed AIC (SAIC) and smoothed BIC (SBIC) methods based on AIC and BIC, respectively. Hansen [6] introduced the Mallows model-averaging (MMA) estimator, obtaining weights through the minimization of Mallow’s Cp criterion. The MMA estimator asymptotically attains the minimum squared error among the model-averaging estimators in its class. Subsequently, Wan et al. [7] relaxed the constraints of Hansen [6], allowing for non-nested candidate models and continuous weights. In practical applications, many datasets exhibit heteroscedasticity. Therefore, it is essential to explore model-averaging methods tailored for heteroscedastic settings. Firstly, Hensen and Racine [8] proposed Jackknife model averaging (JMA), which determines weights by minimizing a cross-validation criterion. JMA significantly reduces Mean Squared Error (MSE) compared to MMA when errors are heteroscedastic. Secondly, Liu and Okui [9] modified the MMA method proposed by Hensen [6] to make it suitable for heteroscedastic scenarios. Furthermore, Zhao et al. [10] extended [6]’s work by estimating the covariance matrix based on the weighted average of squared residuals corresponding to all candidate models. This approach improves the model average estimator under heteroskedasticity settings.

In survival analysis, the accelerated failure time (AFT) model provides a straightforward description of how covariates directly impact survival time and has consequently garnered widespread attention. There are several parameter-estimation methods for the Accelerated Failure Time (AFT) model, including Miller’s estimator [11], Buckley–James estimator [12], Koul–Susarla–Van Ryzin (KSV) estimator [13] and WLS estimator [14]. However, all these methods assume that the censoring indicator is observable. Therefore, Wang and Dinse [15] improved the KSV estimator to make it adaptable to situations where the censoring indicator is missing.

Under practical conditions, it is common to encounter situations where only the observed time is available and it is uncertain whether the event of interest has occurred. In such cases, data suffer from missingness in the censoring indicator. For example, in a clinical trial for lung cancer, a patient may die for unknown reasons and while the survival time is observed, it is uncertain whether the patient died specifically due to lung cancer. This situation leads to missingness in the censoring indicator. Previous studies have mainly addressed the issue of missingness in the censoring indicator under a specific model. Research on model averaging for right-censored data typically assumes that the censoring indicator is observable. Therefore, this paper adopts the inverse probability weighting method proposed by [15] to construct the response variable. Through appropriate weight-selection criteria, weights are chosen to build the model-averaged estimator for the accelerated failure time model. It significantly enhances the predictive performance of the model and mitigates the bias introduced by the selection of a single model. Compared to previous research, this paper makes two main contributions: First, it introduces a novel model-averaging method for the case of missingness in the censoring indicator. Second, the paper allows for heteroscedasticity and employs model-averaging techniques to estimate variance.

The remaining sections of this paper are organized as follows. In Section 2, we commence by introducing the notation and progressively delineate the methodology and associated theoretical properties of the proposed model-averaging approach. In Section 3, we report the Monte Carlo simulation results. In Section 4, we assess the predictive performance of the proposed model-averaging method against other approaches using the real-world Acute Myeloid Leukemia (AML) dataset. In Section 5, we provide a comprehensive summary of the entire paper and suggest future research directions in this area. All theorem proofs will be presented in Appendix A.

2. Methodology and Theoretical Property

We denote

Y = log (T) = {(Y_{1}, \dots, Y_{n})}^{'}

,

C = log (V) = {(C_{1}, \dots, C_{n})}^{'}

, where T represents the survival time and V denotes the censored time.

X = {(X_{1}^{'}, X_{2}^{'}, \dots, X_{n}^{'})}^{'}

denotes the covariate matrix for n independent observations, where

X_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i p})

. The accelerated failure time model can be expressed as follows:

Y_{i} = μ_{i} + e_{i} = \sum_{j = 1}^{p} β_{j} x_{i j} + e_{i}, (i = 1, \dots, n),

(1)

where

e_{i}

is the random error with

E (e_{i} | X_{i}) = 0

and

E (e_{i}^{2} | X_{i}) = σ_{i}^{2}

.

We assume that there are M candidate models in the candidate model set. Where the mth candidate model contains

p_{m}

covariates. Following [7], these candidate model forms are non-nested. The mth candidate model is

Y_{m i} = \sum_{j = 1}^{p_{m}} β_{j} x_{i j} + e_{m i}, (i = 1, \dots, n),

(2)

for

m = 1, \dots, M

. The matrix form of (2) is

Y_{m} = X_{m} β_{m} + e_{m},

(3)

where

X_{m}

is an

n \times p_{m}

dimensional full column-rank matrix,

Y_{m} = {(Y_{m 1}, \dots, Y_{m n})}^{'}

,

β_{m} = {(β_{1}, \dots, β_{p_{m}})}^{'}

,

e_{m} = {(e_{m 1}, \dots, e_{m n})}^{'}

.

In the case of right censored data, the response variable

Y_{i}

might be censored, making it unobservable. We only observe

(Z_{i}, X_{i}, δ_{i})

, where

Z_{i} = m i n (Y_{i}, C_{i})

and the censoring indicator

δ_{i} = I (Y_{i} \leq C_{i})

. Define a missingness indicator

ξ_{i}

which is 1 if

δ_{i}

is observed and is 0 otherwise. When the censoring indicators are missing, the observed data are

{Z_{i}, X_{i}, ξ_{i}, ξ_{i} δ_{i}}

. For simplicity, we set

U_{i} = {(Z_{i}, X_{i})}^{'}

. In this paper, similar to [15], we assume the missing mechanism for

δ

to be:

P (ξ = 1 | Z, X, δ) = P (ξ = 1 | Z) .

This assumption is more stringent than the missing at random (MAR) condition yet less restrictive than the assumption of missing completely at random (MCAR).

Koul et al. [13] introduced a method that involves synthetic data for constructing linear regression models. Wang and Dinse [15] extended [13]’s method to address the situation where censoring indicators are missing. In our work, we follow the approach proposed by [15] to construct a response in the form of inverse probability weighting, specifically:

Y_{W i} = \frac{\frac{ξ_{i} δ_{i}}{π (Z_{i})} + (1 - \frac{ξ_{i}}{π (Z_{i})}) m (U_{i})}{1 - G_{n} (Z_{i})} Z_{i},

(4)

where

π (z) = E (ξ | Z = z)

,

m (u) = E (δ | U = u)

.

G_{n} (\cdot)

represents the cumulative distribution function of C. It is easy to observe that under the missing data mechanism in this paper:

E (Y_{W i} | X_{i}) = μ_{i} = X_{i} β .

Similar to Equation (2), we have:

Y_{W i} = \sum_{j = 1}^{p_{m}} β_{j} x_{i j} + e_{W i}, (i = 1, \dots, n),

(5)

where

E (e_{W i} | X_{i}) = 0

,

σ_{W i}^{2} = v a r (e_{W i} | X_{i})

. This is expressed in matrix form as:

Y_{W} = X_{m} β_{m} + e_{W},

(6)

where

Y_{W} = {(Y_{W 1}, \dots, Y_{W n})}^{'}

,

e_{W} = {(e_{W 1}, \dots, e_{W n})}^{'}

. And then the weighted least squares estimator of

β_{m}

:

{\hat{β}}_{m} = {(X_{m}^{'} D X_{m})}^{- 1} X_{m}^{'} D Y_{W},

(7)

where

D = d i a g {\frac{1}{σ_{W 1}^{2}}, \dots, \frac{1}{σ_{W n}^{2}}}

.

Let

μ_{m i} = E (Y_{W i} | X_{i})

; subsequently, the estimation for the mth candidate model

μ_{m} = {(μ_{m 1}, \dots, μ_{m n})}^{'}

is given by:

{\hat{μ}}_{m} = X_{m} {\hat{β}}_{m} = X_{m} {(X_{m}^{'} D X_{m})}^{- 1} X_{m}^{'} D Y_{W} = P_{m} Y_{W},

(8)

where

P_{m} = X_{m} {(X_{m}^{'} D X_{m})}^{- 1} X_{m}^{'} D

. Denote weight vector

w = {(w_{1}, \dots, w_{M})}^{T}

, belonging to the set

H_{M} = {w \in {[0, 1]}^{M} : \sum_{m = 1}^{M} w_{m} = 1}

. The model-averaging estimator of

μ

is defined as follows:

{\hat{μ}}_{G_{n}} (w) = \sum_{m = 1}^{M} w_{m} X_{m} {\hat{β}}_{m} = \sum_{m = 1}^{M} w_{m} X_{m} {(X_{m}^{'} D X_{m})}^{- 1} X_{m}^{'} D Y_{W} = P (w) Y_{W},

(9)

for any

w \in H_{M}

, where

P (w) = \sum_{m = 1}^{M} w_{m} X_{m} {(X_{m}^{'} D X_{m})}^{- 1} X_{m}^{'} D

.

Define the square loss function

L_{G_{n}} (w) = {∥ μ - \hat{μ} (w) ∥}^{2}

, where

∥ \cdot ∥

denotes the Euclidean norm. Then the risk function is defined as:

R_{G_{n}} (w) = E (L_{G_{n}} (w)) = {∥ P (w) μ - μ ∥}^{2} + t r {P (w) Ω P^{'} (w)},

(10)

where

Ω = d i a g {σ_{W 1}^{2}, \dots, σ_{W n}^{2}}

. The derivation of (10) is as follows:

\begin{matrix} R_{G_{n}} (w) & = E [L_{G_{n}} (w)] \\ = E [{(μ - \hat{μ} (w))}^{'} (μ - \hat{μ} (w))] \\ = E [u^{'} μ - 2 u^{'} P (w) Y_{W} + Y_{W}^{'} P^{'} (w) P (w) Y_{W}] \\ = u^{'} μ - 2 u^{'} P (w) μ + u^{'} P^{'} (w) P (w) μ + t r (P^{'} (w) Ω P (w)) \\ = {(P (w) μ - μ)}^{'} (P (w) μ - μ) + t r (P (w) Ω P^{'} (w)) . \end{matrix}

(11)

Regarding the choice of weights, a natural approach is to minimize the risk function to obtain the optimal weights. However, as shown in Equation (11), we recognize that the risk function includes the unknowns

μ

, which makes it infeasible to directly minimize the risk function to obtain the optimal weights. Therefore, we replace

μ

with

Y_{W}

and seek an unbiased estimator of the risk function as the criterion for weight selection.

Define the criterion for weight selection as

C_{G_{n}} (w) = {∥ Y_{W} - \hat{μ} (w) ∥}^{2} + 2 t r {P (w) Ω} .

(12)

It is not difficult to observe that

E (C_{G_{n}} (w)) = R_{G_{n}} (w) + \sum_{i = 1}^{n} σ_{W i}^{2}

. By disregarding a term that is independent of

w

,

C_{G_{n}} (w)

serves as an unbiased estimator of the risk function.

In practice,

m (\cdot)

,

π (\cdot)

and

G_{n} (\cdot)

are usually unknown; therefore, we need to estimate them. Firstly regarding the estimation of

m (u)

, it is usually estimated by the Logit model. Suppose

m (u)

is estimated by the parametric model

m_{0} (u; θ)

, where

m_{0} (u; θ) = \frac{e^{U θ}}{1 + e^{U θ}}

. By the maximum likelihood estimation method, we can obtain the parameter estimate

{\hat{θ}}_{n}

for the parameter

θ

.

π (z)

usually can be estimated nonparametrically by

{\hat{π}}_{n} (z) = \sum_{i = 1}^{n} ξ_{i} W (\frac{z - Z_{i}}{b_{n}}) / \sum_{i = 1}^{n} W (\frac{z - Z_{i}}{b_{n}}),

where

W (\cdot)

is a kernel function and

b_{n}

is a bandwidth sequence. Next, we define

u (z) = E (δ | Z = z)

,

u (z)

estimated nonparametrically by

{\hat{u}}_{n} (z) = \frac{\sum_{i = 1}^{n} (δ_{i} \frac{ξ_{i}}{{\hat{π}}_{n} (Z_{i})} K (\frac{z - Z_{i}}{h_{n}}))}{\sum_{i = 1}^{n} (\frac{ξ_{i}}{{\hat{π}}_{n} (Z_{i})} K (\frac{z - Z_{i}}{h_{n}}))},

where

K (\cdot)

is a kernel function and

h_{n}

is a bandwidth sequence. We adopt the following estimator of

G_{n} (z)

:

{\hat{G}}_{n} (z) = 1 - \prod_{i : Z_{i} \leq z} {(\frac{n - R_{i}}{n - R_{i} + 1})}^{1 - {\hat{u}}_{n} (Z_{i})},

where

R_{i}

denotes the rank of

Z_{i}

.

Next, replacing

m (\cdot)

,

π (\cdot)

and

G_{n} (\cdot)

with

m_{0} (\cdot, \cdot)

,

{\hat{π}}_{n} (\cdot)

and

{\hat{G}}_{n} (\cdot)

, we have:

{\hat{Y}}_{W i} = \frac{\frac{ξ_{i} δ_{i}}{{\hat{π}}_{n} (Z_{i})} + (1 - \frac{ξ_{i}}{{\hat{π}}_{n} (Z_{i})}) m_{0} (U_{i}, {\hat{θ}}_{n})}{1 - {\hat{G}}_{n} (Z_{i})} Z_{i} .

And the corresponding weight selection criterion is as follows:

C_{{\hat{G}}_{n}} (w) = {∥ {\hat{Y}}_{W} - {\hat{μ}}_{{\hat{G}}_{n}} (w) ∥}^{2} + 2 t r a c e {P (w) Ω},

(13)

where

{\hat{Y}}_{W} = ({\hat{Y}}_{W 1}, \dots, {\hat{Y}}_{W n})

. The weights for minimizing

C_{{\hat{G}}_{n}} (w)

are given by:

\tilde{w} = arg min_{w \in H_{M}} C_{{\hat{G}}_{n}} (w) .

(14)

Then, we enumerate the necessary regularity conditions for the asymptotic optimality.

(C1): Let $S (t) = 1 - (1 - F (t) (1 - G_{n} (t)))$ and $τ_{H} = inf {t : S (t) = 1}$ , where $F (t)$ is the cumulative distribution function of $Y_{i}$ . Assume that $1 - G_{n} (τ_{H} -) > 0$ .
(C2): There exists a positive constant k such that ${max}_{1 \leq i \leq n} | μ_{i} | \leq k$ .
(C3): Denote $ξ_{n} = {inf}_{w \in H_{M}} R_{G_{n}} (w)$ and $w_{m}^{0}$ is an $M \times 1$ unit vector in which the mth element is 1 and the others are 0. For some integer $1 \leq J < \infty$ and some positive constant k such that $E (e_{i}^{4 J}) \leq k < \infty$ , assume

$M ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {\{R_{G_{n}} (w_{m}^{0})\}}^{J} \to 0 .$
(C4): There exists $ϵ > 0$ such that $inf e_{W i}^{2} > ϵ$ , $i = 1, \dots, n$ .
(C5): $m (\cdot)$ and $π (\cdot)$ are bounded.
(C6): $n h_{n} \to \infty$ and $n h_{n}^{2} \to 0$ .
(C7): Let $\tilde{p} = {max}_{m} p_{m}$ , $ρ_{i i}^{m}$ denote the ith diagonal element of $P_{m}$ . There exists a constant c such that $|ρ_{i i}^{(m)}| \leq c n^{- 1} p_{m}$ .

Condition (C1) is utilized in [16] and it ensures that

1 - G_{n} (t)

is not equal to 0. Condition (C2) mandates that the conditional expectation of

μ_{i}

remains within bounded limits, in line with assumptions seen in prior research, including [7,17]. Condition (C3) is a requirement commonly found in model-averaging literature (e.g., [7,18]). Condition (C4) mandates the non-degeneracy of the covariance matrix

Ω

as

n \to \infty

. Similar assumptions can also be found in [9,10]. Similar to [15], Conditions (C5) and (C6) impose constraints on the bounds of

m (\cdot)

,

π (\cdot)

and bandwidth, respectively. Condition (C7) is frequently employed in the analysis of the asymptotic optimality of cross-validation methods, as seen in prior works like [8].

Theorem 1.

Under Conditions (C1) to (C6),

\frac{L_{{\hat{G}}_{n}} (\tilde{w})}{inf_{w \in H_{M}} L_{{\hat{G}}_{n}} (w)} \overset{p}{\to} 1 .

Theorem 1 establishes the asymptotic optimality of the model-averaging procedure employing weights

\tilde{w}

, as its squared loss converges to that of the infeasible best possible model average estimator.

In most cases,

Ω

is unknown and needs to be estimated. We estimate

Ω

using residuals derived from the model-averaging process:

\hat{e} (w) = {\hat{Y}}_{W} - \hat{μ} (w) = {{\hat{e}}_{W 1} (w), \dots, {\hat{e}}_{W n} (w)}^{'}

. Specifically, the estimator of

Ω

is

\hat{Ω} (w) = d i a g {{\hat{σ}}_{W 1}^{2} (w), \dots, {\hat{σ}}_{W n}^{2} (w)},

(15)

where

{\hat{σ}}_{W i}^{2} = v a r ({\hat{e}}_{W i})

.

In the existing literature on model averaging, most estimates of variance are predominantly derived from the largest candidate model, as exemplified by works such as [6,16]. In contrast, our approach, following [10], leverages information from all candidate models for estimation rather than relying on a single model. Such an estimation method is more robust. Replacing

Ω

by

\hat{Ω} (w)

in (13),

C (w)

becomes

{\hat{C}}_{{\hat{G}}_{n}} (w) = {∥ {\hat{Y}}_{W} - {\hat{μ}}_{{\hat{G}}_{n}} (w) ∥}^{2} + 2 t r a c e {P (w) \hat{Ω} (w)} .

(16)

The weights that minimize

{\hat{C}}_{{\hat{G}}_{n}} (w)

are as follows:

\hat{w} = arg min_{w \in H_{M}} {\hat{C}}_{{\hat{G}}_{n}} (w) .

(17)

This weight selection criterion

{\hat{C}}_{{\hat{G}}_{n}} (w)

is a cubic function of w.

Theorem 2.

Under Conditions (C1) to (C7),

\frac{L_{{\hat{G}}_{n}} (\hat{w})}{inf_{w \in H_{M}} L_{{\hat{G}}_{n}} (w)} \overset{p}{\to} 1 .

3. Simulation

In the simulation study, we generate data from the accelerated failure time (AFT) model,

log (T_{i}) = Y_{i} = \sum_{j = 1}^{1000} β_{j} x_{i j} + e_{i}

, where

β_{j} = 1 / j^{2}

; the observations of

X_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i 1000})

are generated from a multivariate normal distribution with zero mean and covariance matrix

Σ = (σ_{i j})

with

σ_{i j} = 0 . 5^{| i - j |}

. The errors

e_{i}

follow normal distribution

N (0, γ^{2} (x_{i 2}^{4} + 0.01))

. By varying the value of

γ

, we allow

R^{2}

to range from 0.1 to 0.9. This variance specification closely resembles that of [8]. However, we introduce a small constant, 0.01, to ensure that the variances remain strictly positive. The censoring time

C_{i}

is generated from

N (C_{0}, 7)

. By varying the value of

C_{0}

, we achieve censoring rates (CRs) of approximately

20 %

,

40 %

. We set sample sizes n = 150, 300. Here, our model configuration is set in a nested form, meaning the first m models include the first m regressors. The number of candidate models M was set to be

⌈ 3 n^{1 / 3} ⌉

, where

⌈ x ⌉

denote the smallest integer greater than x.

Based on the missing mechanism described in this paper, we assume that the probability of missing censoring indicators, denoted as

1 - π (z)

, is determined via a logistic model:

log {\frac{π (z)}{1 - π (z)}} = θ_{1} + θ_{2} z

. Following [15], we employed the uniform kernel function

W (x) = \frac{1}{2}

for

| x | \leq 1

and

W (x) = 0

otherwise. Additionally, we used the biweight kernel function

K (x) = \frac{15}{16} (1 - 2 x^{2} + x^{4})

for

| x | \leq 1

and

K (x) = 0

otherwise. The bandwidths were

b_{n} = h_{n} = n^{- \frac{1}{3}} max (Z)

. We estimated

m (u)

under the logistic model:

log {\frac{m (u)}{1 - m (u)}} = γ_{1} + γ_{2} z + γ_{3} x

. As highlighted by [19], when the data on

δ

are completely (or quasi-completely) separated, the maximum likelihood estimate of

γ = (γ_{1}, γ_{2}, γ_{3})

does not exist. In our simulation setup, the number of covariates significantly exceeds the sample size. Therefore, we employ the lasso method to estimate the parameters.

We compare the proposed Model-Averaging method for the Missing Censoring Indicators in the Heteroscedastic setting (HCIMA) with other classical model-selection and model-averaging methods in this article. Brief descriptions of these methods are provided below:

The model-selection methods rely on AIC and BIC, where the AIC and BIC criterion for the mth model are defined as follows:

$AIC (m) = log ({\hat{σ}}_{{\hat{G}}_{n} m}^{2}) + 2 n^{- 1} tr (P_{m}),$

and

$BIC (m) = log ({\hat{σ}}_{{\hat{G}}_{n} m}^{2}) + n^{- 1} tr (P_{m}) l o g (n),$

where ${\hat{σ}}_{({\hat{G}}_{n} m)}^{2} = n^{- 1} {∥{\hat{Y}}_{W} - {\hat{μ}}_{{\hat{G}}_{n} m}∥}^{2}$ .
Model methods based on SAIC and SBIC: The weights for the mth candidate model are given by:

$w_{(m)}^{A I C} = exp (- A I C_{m} / 2) / \sum_{m = 1}^{M} exp (- A I C_{m} / 2),$

$w_{(m)}^{B I C} = exp (B I C_{m} / 2) / \sum_{m = 1}^{M} exp (- B I C_{m} / 2),$

where $A I C_{m} = A I C (m) - m i n (A I C)$ , $B I C_{m} = B I C (m) - m i n (B I C)$ .
Additionally, we compare our approach with the method that estimates the variance using the maximum candidate model (MCIMA). And the specifics of variance estimation and weight selection in their approach are as follows:

${\hat{σ}}_{{\hat{G}}_{n}} = {({\hat{σ}}_{{\hat{G}}_{n} 1}, \dots, {\hat{σ}}_{{\hat{G}}_{n} n})}^{T} = \sqrt{\frac{n}{n - M}} (I - P_{M}) {\hat{Y}}_{W},$

${\hat{C}}_{n} (w) = {∥ {\hat{Y}}_{W} - \hat{μ} (w) ∥}^{2} + t r a c e {P (w) \hat{Ω}},$

where $\hat{Ω} = d i a g {{\hat{σ}}_{\hat{G} 1}^{2}, \dots, {\hat{σ}}_{\hat{G} n}^{2}}$ .

In the simulation, we utilize the Mean Squared Error (MSE) to evaluate the performance of various methods, where the MSE is defined as

\frac{1}{n} {∥{\hat{μ}}_{{\hat{G}}_{n}} - μ∥}^{2}

. We present the mean of MSEs from 500 replications.

Figure 1 and Figure 2, respectively, show the Mean Squared Error (MSE) values for various methods across 500 repetitions under different censored rates and sample sizes, with missing rates of 20% and 40%. In terms of Mean Squared Error (MSE), our proposed HCIMA method outperforms other approaches. Additionally, the MCIMA method performs better than existing methods in all cases except for when compared to HCIMA. Furthermore, it is evident that SAIC and SBIC outperform their respective AIC and BIC counterparts, further highlighting the advantages of model-averaging methods.

Comparing Figure 1 and Figure 2, it is observed that the MSE at MR = 20% is slightly higher than at MR = 40%. The reason for this occurrence is that when

ξ_{i} = 1, δ_{i} = 0

, the signs of

Y_{W i}

and

Z_{i}

are opposite. As MR increases, the occurrence of the

ξ_{i} = 1, δ_{i} = 0

situation decreases. Although this result may seem counterintuitive, it does not affect the performance of the method proposed in this paper, which still keeps its advantages in this case.

4. Real Data Analysis

In this section, we assess the predictive performance of our proposed HCIMA method using the real Acute Myeloid Leukemia (AML) dataset. This dataset contains 672 samples, including 97 variables such as patient age, survival time, gender, race, mutation count, etc. For more specific information about this dataset, we refer the reader to https://www.cbioportal.org/study/clinicalData?id=aml_ohsu_2018 (accessed on 13 December 2023).

We selected ten variables for analysis: Cause Of Death, Age, Sex, Overall Survival Status, Overall Survival Months (Survival Time), Number of Cumulative Treatment Stages, Cumulative Treatment Regimen Count, Mutation Count, Platelet Count and WBC (White Bloodcell Count). After removing rows with missing values, we retained a total of 396 samples. We treat samples with unknown causes of death as missing censoring indicators. Among these 396 samples, 76 have unknown causes of death and 167 samples are still alive after the clinical trial ends. Therefore, the missing rate is approximately 19% and the censoring rate is 42%. We focus on the impact of seven variables, excluding “Cause Of Death” and “Overall Survival Status” on Survival Time. Therefore, we can construct

2^{7} - 1 = 127

non-nested candidate models.

We randomly select data from

n_{0}

samples as the training dataset, while the remaining

n_{1} = n - n_{0}

samples are used as the testing dataset. We set the training dataset size to 50%, 60%, 70% and 80% of the total dataset size, respectively. Following [16,20], we employed the normalized mean squared prediction error (NMSPE) as the performance metric:

NMSPE = \frac{\sum_{i = n_{0} + 1}^{n} {({\hat{Y}}_{W i} - {\hat{μ}}_{i})}^{2}}{{min}_{m = 1, 2, \dots, M} \sum_{i = n_{0} + 1}^{n} {({\hat{Y}}_{W i} - {\hat{μ}}_{m i})}^{2}},

where

{\hat{μ}}_{i}

represents the predicted value and

{\hat{μ}}_{m i}

denotes the value of

\hat{μ}

for the mth model.

We calculate the mean, the standard deviation and the optimal rate of each method over these 1000 repetitions. Specifically, the optimal rate refers to the frequency at which the minimum value is achieved across these 1000 repetitions.

Table 1 displays the mean, optimal rates and standard deviations of NMSPE for each method over 1000 repetitions. Consistent with the simulation results, the HCIMA method exhibits the lowest average NMSPE and standard deviation and the highest optimal rate. The MCIMA method also performs well, ranking second after HCIMA. This indicates that the proposed model-averaging methods in this paper demonstrate superior predictive performance compared to other approaches.

5. Discussion

To address the uncertainty in model selection and enhance predictive accuracy, this paper proposes a novel model-averaging approach for the accelerated failure time model with missing indicators. Moreover, we establish asymptotic optimality under certain mild conditions. In Monte Carlo simulations, the method proposed in this paper exhibits lower mean squared errors compared to other model-selection and model-averaging methods. Empirical results demonstrate that the proposed method has a lower NMSPE compared to other approaches, indicating its superior predictive performance. This further underscores the applicability of the proposed method to real-life data scenarios with missing censoring indicators.

In this paper, we introduce the inverse probability weighted form of response variable proposed in [15]. The primary advantage of this form of response variable lies in its double robustness, making it less susceptible to the impact of model misspecification (if

π (\cdot)

or

m (\cdot)

is misspecified). However, as mentioned in [15], its drawback, compared to synthetic response [13], regression calibration and imputation [15], is a larger variance. Yet, in practical scenarios, the harm caused by model misspecification often outweighs the harm of higher variance. Therefore, in our work, we follow the recommendation of [15] to use the inverse probability weighted form of the response variable. A future research direction is to further enhance this response variable for better applicability in the context of missing censoring indicators.

As far as we know, there is currently very limited research on model averaging for missing censoring indicators. Therefore, there are still many questions that deserve further investigation. There is potential for extending our approach to high-dimensional data in terms of data and in terms of models, exploration into partial linear models, generalized linear models and other extensions could be pursued.

Author Contributions

Conceptualization, L.L.; Methodology, L.L.; Writing—review and editing, L.L.; Software, J.L.; Data curation, J.L.; Writing—original draft, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to thank the reviewers and editors for their careful reading and constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix, we provide the proofs for Theorems 1 and 2. To facilitate the presentation, we begin with several lemmas.

Lemma A1.

Under Conditions (C1) to (C3), there exists a positive constant

c_{1}

such that

max_{1 \leq i \leq n} E (e_{W, i}^{4 J} ∣ X_{i}) \leq c_{1},

where J is given in Condition (C3).

Lemma A1 is consistent with Lemma 6.1 in [16] and under our specific conditions, the proof technique for Lemma 1 is the same as the proof technique for Lemma 6.1 in [16].

Lemma A2.

Under Conditions (C1) to (C3),

| \frac{L_{G_{n}} (w)}{R_{G_{n}} (w)} - 1 | = o_{p} (1) .

Under our specific conditions, we can prove this lemma using the same techniques as the proof of (A.3) in [7]. Therefore, we omit the proof here.

Lemma A3.

Under Conditions (C1) to (C5), as

n \to \infty

, we have

{∥{\hat{Y}}_{W} - Y_{W}∥}^{2} = o_{p} (1) .

Proof of Lemma A3.

\begin{matrix} {∥{\hat{Y}}_{W} - Y_{W}∥}^{2} & = \sum_{i = 1}^{n} {\frac{\frac{ξ_{i} δ_{i}}{π (Z_{i})} + (1 - \frac{ξ_{i}}{π (Z_{i})}) m (U_{i})}{1 - G_{n} (Z_{i})} - \frac{\frac{ξ_{i} δ_{i}}{{\hat{π}}_{n} (Z_{i})} + (1 - \frac{ξ_{i}}{{\hat{π}}_{n} (Z_{i})}) m_{0} (U_{i}, {\hat{θ}}_{n})}{1 - {\hat{G}}_{n} (Z_{i})}}^{2} Z_{i} \\ \leq K \sum_{i = 1}^{n} {\frac{1}{1 - G_{n} (Z_{i})} - \frac{1}{1 - {\hat{G}}_{n} (Z_{i})}}^{2} Z_{i} \\ \leq C {\{n^{1 / 2} max_{1 \leq i \leq n} |\frac{1}{1 - {\hat{G}}_{n} (Z_{i})} - \frac{1}{1 - G_{n} (Z_{i})}|\}}^{2} (\frac{1}{n} μ^{T} μ + \frac{1}{n} e_{W}^{T} e_{W}), \end{matrix}

where K is a constant. By Condition (C2), we have

\frac{1}{n} μ^{T} μ = O_{p} (1)

and

\frac{1}{n} e_{W}^{T} e_{W} = O_{p} (1)

. According to [15],

{\hat{G}}_{n} (Z_{i}) - G_{n} (Z_{i}) = o_{p} (1)

. Combined with Condition (C1), we have:

|\frac{{\hat{G}}_{n} (z) - G_{n} (z)}{1 - G_{n} (z)}| = o_{p} (1),

and

|\frac{{\hat{G}}_{n} (z) - G_{n} (z)}{1 - {\hat{G}}_{n} (z)}| = o_{p} (1) .

Similar to the proof of Lemma 6.2 in [16], we have:

n^{1 / 2} max_{1 \leq i \leq n} |\frac{1}{1 - {\hat{G}}_{n} (Z_{i})} - \frac{1}{1 - G_{n} (Z_{i})}| = o_{p} (1) .

Furthermore, we can obtain

∥ {\hat{Y}}_{W} - Y_{W} ∥^{2} = o_{p} (1) .

□

With the three lemmas mentioned above, we can now proceed to prove Theorem 1.

Proof of Theorem 1. Fist, we note that

\begin{matrix} C_{{\hat{G}}_{n}} (w) & = ∥ {\hat{Y}}_{W} - {\hat{μ}}_{{\hat{G}}_{n}} {(w) ∥}^{2} + 2 t r a c e {P (w) Ω} \\ = ∥ {\hat{Y}}_{W} - μ + μ - {\hat{μ}}_{{\hat{G}}_{n}} {(w) ∥}^{2} + 2 t r a c e {P (w) Ω} \\ = ∥ {\hat{Y}}_{W} {- μ ∥}^{2} + {∥ μ - {\hat{μ}}_{{\hat{G}}_{n}} (w) ∥}^{2} + 2 {({\hat{Y}}_{W} - μ)}^{'} (μ - {\hat{μ}}_{{\hat{G}}_{n}} (w)) + 2 t r a c e {P (w) Ω} \\ = ∥ e_{W} ∥^{2} + L_{{\hat{G}}_{n}} (w) + 2 e_{W}^{'} (μ - P (w) μ + P (w) μ - {\hat{μ}}_{{\hat{G}}_{n}} (w)) + 2 t r a c e {P (w) Ω} \\ = L_{{\hat{G}}_{n}} (w) + 2 e_{W}^{'} (I - P (w)) μ + 2 t r a c e (P (w) Ω) - 2 e_{W}^{'} P (w) e_{W} + {∥ e_{W} ∥}^{2} . \end{matrix}

Following [7], except for a term unrelated to w, to prove Theorem 1, we only need to verify

sup_{w \in H_{M}} \frac{| e_{W}^{'} (I - P (w)) μ |}{R_{G_{n}} (w)} = o_{p} (1),

(A1)

sup_{w \in H_{M}} \frac{| t r a c e (P (w) Ω) - 2 e_{W}^{'} P (w) e_{W} |}{R_{G_{n}} (w)} = o_{p} (1),

(A2)

sup_{w \in H_{M}} | \frac{L_{{\hat{G}}_{n}} (w)}{R_{G_{n}} (w)} - 1 | = o_{p} (1) .

(A3)

We begin by proving Equation (A1). As per Equation (11), we can ascertain that:

R_{G_{n}} (w_{m}^{0}) \geq {∥ P (w_{m}^{0}) μ - μ ∥}^{2},

(A4)

R_{G_{n}} (w_{m}^{0}) \geq t r a c e (P (w_{m}^{0}) Ω P^{T} (w_{m}^{0})) .

(A5)

Furthermore, we denote the maximum eigenvalue of matrix A as

λ_{m a x} (A)

; since

P_{m}

is an idempotent matrix, we have:

λ_{max} (P_{m}) = 1,

(A6)

λ_{max} {P (w)} \leq \sum_{m = 1}^{M} w_{m} λ_{max} \{P_{m}\} \leq 1 .

(A7)

According to the proof of Theorem 1 in [21], we have:

lim_{n \to \infty} sup_{w \in H_{M}} λ_{max} (P (w) P {(w)}^{'}) < \infty .

(A8)

Applying the triangle inequality, Bonferroni’s inequality, Chebyshev’s inequality and Theorem 2 of [22], we can conclude, for any

τ > 0

,

\begin{matrix} P \{sup_{w \in H_{M}} \frac{| e_{W}^{'} (I - P (w)) μ |}{R_{G_{n}} (w)} > τ\} \\ \leq P \{sup_{w \in H_{M}} \sum_{m = 1}^{M} w_{m} |e_{W}^{'} (I - P_{m}) μ| > τ ξ_{n}\} \\ = P \{max_{1 \leq m \leq M} |e_{W}^{'} (I - P_{m}) μ| > τ ξ_{n}\} \\ = P \{\{|〈e_{W}, A (w_{1}^{0}) μ〉| > τ ξ_{n}\} \cup \dots \cup \{|〈e_{W}, A (w_{M}^{0}) μ〉| > τ ξ_{n}\}\} \\ \leq \sum_{m = 1}^{M} P \{|〈e_{W}, A (w_{m}^{0}) μ〉| > τ ξ_{n}\} \\ \leq \sum_{m = 1}^{M} E \{\frac{{〈e_{W}, A (w_{m}^{0}) μ〉}^{2 J}}{τ^{2 J} ξ_{n}^{2 J}}\} \\ \leq C_{1} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {∥Ω {(2 J)}^{1 / 2} A (w_{m}^{0}) μ∥}^{2 J}, \end{matrix}

where

〈, 〉

represents an inner product,

A (w) = I - P (w)

.

C_{1}

is a constant,

Ω (2 J) = diag (γ_{1}^{2} (2 J), \dots, γ_{n}^{2} (2 J))

and

γ_{i}^{2} (2 J) = E {(e_{W i}^{2 J} | X_{i})}^{1 / 2 J}

. By Lemma A1,

γ_{i}^{2} (2 J) < \infty

; thus,

λ_{m a x} {(Ω (2 J))}^{J} = O (1)

. Hence, combining this with Equation (A4), we have:

\begin{matrix} P \{sup_{w \in H_{M}} | 〈 e_{W}, A (w) μ 〉 | / R_{G_{n}} (w) > τ\} \\ \leq C_{1} τ^{- 2 J} ξ_{n}^{- 2 J} λ_{m a x} {(Ω (2 J))}^{J} \sum_{m = 1}^{M} {∥A (w_{m}^{0}) μ∥}^{2 J} \\ \leq C_{1}^{'} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {∥A (w_{m}^{0}) μ∥}^{2 J} \\ \leq C_{1}^{'} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {(R_{G_{n}} (w_{m}^{0}))}^{J} . \end{matrix}

And together with condition (C3), we can prove Equation (A1). Next, we will prove (A2). Similar to the proof of Equation (A1), we have:

\begin{matrix} P \{sup_{w \in H_{M}} | trace [Ω P (w)] - 〈 e_{W}, P (w) e_{W} 〉 | / R_{G_{n}} (w) > τ\} \\ \leq & \sum_{m = 1}^{M} P \{|trace [Ω P (w_{m}^{0})] - 〈e_{W}, P (w_{m}^{0}) e_{W}〉| > τ ξ_{n}\} \\ \leq & \sum_{m = 1}^{M} E \{\frac{{[trace [Ω P (w_{m}^{0})] - 〈e_{W}, P (w_{m}^{0}) e_{W}〉]}^{2 J}}{τ^{2 J} ξ_{n}^{2 J}}\} \\ \leq & C_{2} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {\{tr [P {(w_{m}^{0})}^{'} Ω (4 J) P (w_{m}^{0})]\}}^{J}, \end{matrix}

where

C_{2}

is a constant,

Ω (4 J) = diag (γ_{1}^{2} (4 J), \dots, γ_{n}^{2} (4 J))

and

γ_{i}^{2} (4 J) = E {(e_{W i}^{4 J} | X_{i})}^{1 / 4 J}

. By Lemma A1,

γ_{i}^{2} (4 J) < \infty

; thus,

λ_{m a x} {(Ω (4 J))}^{J} = O (1)

. Hence, combining Equation (A5) and condition (C3), we have:

\begin{matrix} P \{sup_{w \in H_{M}} | trace [Ω P (w)] - 〈 e_{W}, P (w) e_{W} 〉 | / R_{G_{n}} (w) > τ\} \\ \leq & C_{2} τ^{- 2 N} ξ_{n}^{- 2 N} λ_{max} {[Ω (4 N)]}^{N} \sum_{m = 1}^{M} tr [P {(w_{m}^{0})}^{'} P (w_{m}^{0})] \\ \leq & C_{2} τ^{- 2 J} ξ_{n}^{- 2 J} {(inf_{i} e_{W i}^{2})}^{- J} \sum_{m = 1}^{M} {\{inf_{i} e_{W i}^{2} t r a c e (P^{2} (w_{m}^{0}))\}}^{J} \\ \leq & C_{3} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {[R_{G_{n}} (w_{m}^{0})]}^{J} = o (1) . \end{matrix}

Next, we will prove Equation (A3). Note that

\begin{matrix} |\frac{L_{{\hat{G}}_{n}} (w)}{R_{G_{n}} (w)} - 1| \\ = & |\frac{L_{G_{n}} (w)}{R_{G_{n}} (w)} - 1 + \frac{L_{{\hat{G}}_{n}} (w) - L_{G_{n}} (w)}{R_{G_{n}} (w)}| \\ \leq & |\frac{L_{G_{n}} (w)}{R_{G_{n}} (w)} - 1| + |\frac{{∥μ - {\hat{μ}}_{{\hat{G}}_{n}} (w)∥}^{2} - {∥ μ - \hat{μ} (w) ∥}^{2}}{R_{G_{n}} (w)}| . \end{matrix}

(A9)

From Lemma A2, we know that

| \frac{L_{G_{n}} (w)}{R_{G_{n}} (w)} - 1 | = o_{p} (1)

, Therefore, to prove (A3), it is sufficient to verify that the second part of Equation (A9) converges to 0 in probability.

\begin{matrix} |\frac{{∥μ - {\hat{μ}}_{{\hat{G}}_{n}} (w)∥}^{2} - {∥ μ - \hat{μ} (w) ∥}^{2}}{R_{G_{n}} (w)}| \\ = & |\frac{2 {(μ - \hat{μ} (w))}^{'} (\hat{μ} (w) - {\hat{μ}}_{{\hat{G}}_{n}} (w)) + {∥\hat{μ} (w) - {\hat{μ}}_{{\hat{G}}_{n}} (w)∥}^{2}}{R_{G_{n}} (w)}| \\ \leq & \frac{2 {\{L_{G_{n}} (w)\}}^{1 / 2} ∥P (w) ({\hat{Y}}_{W} - Y_{W})∥}{R_{G_{n}} (w)} + \frac{{∥P (w) ({\hat{Y}}_{W} - Y_{W})∥}^{2}}{R_{G_{n}} (w)} . \end{matrix}

(A10)

According to Lemma A3, we have:

{∥P (w) ({\hat{Y}}_{W} - Y_{W})∥}^{2} \leq λ_{max} (P (w)) {∥{\hat{Y}}_{W} - Y_{W}∥}^{2} = O_{p} (1) .

Combining this with Lemma A3, we can conclude that (A9) is of

o_{p} (1)

, which establishes the proof for (A3). □

Proof of Theorem 2. It is evident from Equations (13) and (16) that:

{\hat{C}}_{{\hat{G}}_{n}} (w) = C_{{\hat{G}}_{n}} (w) + 2 trace {P (w) \hat{Ω} (w)} - 2 trace {P (w) Ω} .

In conjunction with Theorem 1, it is evident that to prove Theorem 2, we only need to establish:

sup_{w \in H_{M}} [| trace {P (w) \hat{Ω} (w)} - trace {P (w) Ω} | / R_{G_{n}} (w)] = o_{p} (1) .

(A11)

We denote

Q_{m} = diag (ρ_{11}^{(m)}, \dots, ρ_{n n}^{(m)})

and

Q (w) = \sum_{m = 1}^{M} w_{s} Q_{m}

. According to Lemma A1, we have:

λ_{m a x} (Ω) = O (1) .

(A12)

Considering the definition of

\hat{Ω} (w)

, and employing proof techniques similar to [10,23], we obtain:

\begin{matrix} sup_{w \in H_{M}} [| trace {P (w) \hat{Ω} (w)} - trace {P (w) Ω} | / R_{G_{n}} (w)] \\ = & sup_{w \in H_{M}} [∣ {{\hat{Y}}_{W} - P (w) {\hat{Y}}_{W}}^{'} Q (w) {{\hat{Y}}_{W} - P (w) {\hat{Y}}_{W}} - trace {Q (w) Ω} ∣ / R_{G_{n}} (w)] \\ = & sup_{w \in H_{M}} [∣ {e_{W} + μ - P (w) {\hat{Y}}_{W}}^{'} Q (w) {e_{W} + μ - P (w) {\hat{Y}}_{W}} - trace {Q (w) Ω} ∣ / R_{G_{n}} (w)] \\ \leq & sup_{w \in H_{M}} [|e_{W}^{'} Q (w) e_{W} - trace {Q (w) Ω}| / R_{G_{n}} (w)] \\ + 2 sup_{w \in H_{M}} [|e_{W}^{'} Q (w) {P (w) {\hat{Y}}_{W} - μ}| / R_{G_{n}} (w)] \\ + sup_{w \in H_{M}} [|{{P (w) {\hat{Y}}_{W} - μ}^{'} Q (w) {P (w) {\hat{Y}}_{W} - μ}| / R_{G_{n}} (w)] \\ \leq & sup_{w \in H_{M}} [|e_{W}^{'} Q (w) e_{W} - trace {Q (w) Ω}| / R_{G_{n}} (w)] \\ + 2 sup_{w \in H_{M}} [|e_{W}^{'} Q (w) {P (w) μ - μ}| / R_{G_{n}} (w)] \\ + 2 sup_{w \in H_{M}} [|e_{W}^{'} Q (w) P (w) e_{W} - trace {Q (w) P (w) Ω}| / R_{G_{n}} (w)] \\ + 2 sup_{w \in H_{M}} [| trace {Q (w) P (w) Ω} | / R_{G_{n}} (w)] \\ + sup_{w \in H_{M}} [|{P (w) {\hat{Y}}_{W} - μ}^{'} Q (w) {P (w) {\hat{Y}}_{W} - μ}| / R_{G_{n}} (w)] \\ \equiv & T_{1} + T_{2} + T_{3} + T_{4} + T_{5} . \end{matrix}

Let

ρ = {max}_{m} {max}_{i} ρ_{i i}^{(m)}

. According to condition (C7), we have:

ρ = O (n^{- 1} \tilde{p}) .

(A13)

Given the definition of

R_{G_{n}} (w)

and condition (C4), the following equation holds:

R_{G_{n}} (w_{m}^{0}) \geq trace \{P_{m} Ω P_{m}^{T}\} \geq ϵ trace (P_{m}) = ϵ p_{m},

ξ_{n} \to \infty and M ξ_{n}^{- 2 J} = o (1) .

(A14)

From (A6), (A11), (A13), the Chebyshev inequality and Theorem 2 of [22], for any

τ > 0

, there exist constants

c_{1}

and

c_{2}

such that:

\begin{matrix} P (T_{1} > τ) & \leq \sum_{m = 1}^{M} P [|e_{W}^{'} Q_{m} e_{W} - trace (Q_{m} Ω)| > τ ξ_{n}] \\ \leq τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} E {\{e_{W}^{T} Q_{m} e_{W} - trace (Q_{m} Ω)\}}^{2 J} \\ \leq c_{1} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} {trace {Ω^{1 / 2} (4 J) Q_{m} Ω (4 J) Q_{m} Ω^{1 / 2} (4 J)}}^{J} \\ \leq c_{1} τ^{- 2 J} ξ_{n}^{- 2 J} M λ_{max}^{2 J} (Ω (4 J)) max_{1 \leq m \leq M} {t r a c e (Q_{m})}^{J} \\ = ξ_{n}^{- 2 J} M {\{O (n^{- 1} {\tilde{p}}^{2})\}}^{J} = o (1), \end{matrix}

(A15)

\begin{matrix} P (T_{3} / 2 > τ) & \leq \sum_{m = 1}^{M} P \{|e_{W}^{'} Q_{m} P_{m} e_{W} - trace (Q_{m} P_{m} Ω)| > τ ξ_{n}\} \\ \leq τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} E {[e_{W}^{'} Q_{m} P_{m} e_{W} - trace (Q_{m} P_{m} Ω)]}^{2 J} \\ \leq c_{2} τ^{- 2 J} ξ_{n}^{- 2 J} \sum_{m = 1}^{M} trace {Ω^{1 / 2} (4 J) Q_{m} P_{m} Ω (4 J) P_{m}^{T} Q_{m} Ω^{1 / 2} (4 J)}^{J} \\ \leq c_{2} τ^{- 2 J} ξ_{n}^{- 2 J} M λ_{max}^{2 J} (Ω (4 J)) λ_{max}^{J} (P_{m} P_{m}^{T}) max_{1 \leq m \leq M} {t r a c e (Q_{m}^{2})}^{J} \\ = ξ_{n}^{- 2 J} M {\{O (n^{- 1} {\tilde{p}}^{2})\}}^{J} = o (1), \end{matrix}

(A16)

\begin{matrix} T_{2} / 2 & \leq sup_{w \in H_{M}} {\{∥ e_{W} ∥^{2} ρ^{2} {∥ P (w) μ - μ ∥}^{2} / R_{G_{n}}^{2} (w)\}}^{1 / 2} \\ \leq ∥ e_{W} ∥ ρ ξ_{n}^{- 1 / 2} = ξ_{n}^{- 1 / 2} O (n^{- 1 / 2} \tilde{p}) = o (1), \end{matrix}

(A17)

\begin{matrix} T_{4} / 2 & \leq ξ_{n}^{- 1} ρ λ_{max} (Ω) sup_{w \in H_{M}} [trace {P (w)}] \\ \leq ξ_{n}^{- 1} ρ λ_{max} (Ω) max_{m} \{trace (P_{m})\} \\ \leq ξ_{n}^{- 1} ρ λ_{max} (Ω) max_{m} \{λ_{max} (P_{m})\} max_{m} \{rank (P_{m})\} \\ = ξ_{n}^{- 1} O (n^{- 1} {\tilde{p}}^{2}) = ξ_{n}^{- 1} O (n^{- 1} {\tilde{p}}^{2}) = o (1), \end{matrix}

(A18)

\begin{matrix} T_{5} & \leq ρ sup_{w \in H_{M}} [{P (w) {\hat{Y}}_{W} - μ}^{T} {P (w) {\hat{Y}}_{W} - μ} / R_{G_{n}} (w)] \\ = ρ sup_{w \in H_{M}} [L_{G_{n}} (w) / R_{G_{n}} (w)] = O (n^{- 1} \tilde{p}) . \end{matrix}

(A19)

Therefore, combining (A15)–(A19), along with Condition (C7), it is clear that Theorem 2 holds. □

References

Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle. In Second International Symposium on Information Theory; Petrov, B., Csáki, F., Eds.; Akadémiai Kiadó: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
Mallows, C.L. Some Comments on Cp. Technometrics 1973, 15, 661–675. [Google Scholar]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 15–18. [Google Scholar] [CrossRef]
Hjort, N.L.; Claeskens, G. Frequentist Model Average Estimators. J. Am. Stat. Assoc. 2003, 98, 879–899. [Google Scholar] [CrossRef]
Buckland, S.T.; Burnham, K.P.; Augustin, N.H. Model selection: An integral part of inference. Biometrics 1997, 53, 603–618. [Google Scholar] [CrossRef]
Hansen, B.E. Least squares model averaging. Econometrica 2007, 75, 1175–1189. [Google Scholar] [CrossRef]
Wan, A.T.; Zhang, X.; Zou, G. Least squares model averaging by Mallows criterion. J. Econom. 2010, 156, 277–283. [Google Scholar] [CrossRef]
Hansen, B.E.; Racine, J.S. Jackknife model averaging. J. Econom. 2012, 167, 38–46. [Google Scholar] [CrossRef]
Liu, Q.; Okui, R. Heteroscedasticity-robust Cp model averaging. Econom. J. 2013, 16, 463–472. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, X.; Gao, Y. Model averaging with averaging covariance matrix. Econom. Lett. 2016, 145, 214–217. [Google Scholar] [CrossRef]
Miller, R. Least square regression with censored data. Biometrika 1976, 63, 449–464. [Google Scholar] [CrossRef]
Buckley, J.; James, I. Linear regression with censored data. Biometrika 1979, 66, 429–436. [Google Scholar] [CrossRef]
Koul, H.; Susarla, V.; Van Ryzin, J. Regression analysis with randomly right-censored data. Ann. Stat. 1981, 9, 1276–1288. [Google Scholar] [CrossRef]
He, S.; Huang, X. Central limit theorem of linear regression model under right censorship. Sci. China Ser. A-Math. 2003, 46, 600–610. [Google Scholar] [CrossRef]
Wang, Q.; Dinse, G.E. Linear regression analysis of survival data with missing censoring indicators. Lifetime Data Anal. 2011, 17, 256–279. [Google Scholar] [CrossRef]
Liang, Z.Q.; Chen, X.L.; Zhou, Y.Q. Mallows model averaging estimation for linear regression model with right censored data. Acta Math. Appl. Sin. Engl. Ser. 2022, 38, 5–23. [Google Scholar] [CrossRef]
Wei, Y.; Wang, Q.; Liu, W. Model averaging for linear models with responses missing at random. Ann. Inst. Stat. Math. 2020, 73, 535–553. [Google Scholar] [CrossRef]
Liu, Q.; Okui, R.; Yoshimura, A. Generalized least squares model averaging. Econom. Rev. 2016, 35, 1692–1752. [Google Scholar] [CrossRef]
Albert, A.; Anderson, J.A. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984, 71, 1–10. [Google Scholar] [CrossRef]
Zhu, R.; Wan, A.T.; Zhang, X.; Zou, G. A Mallows-type model averaging estimator for the varyingcoecient partially linear model. J. Am. Stat. Assoc. 2019, 114, 882–892. [Google Scholar] [CrossRef]
Dong, Q.K.; Liu, B.X.; Zhao, H. Weighted least squares model averaging for accelerated failure time models. Comput. Stat. Data Anal. 2023, 184, 107743. [Google Scholar] [CrossRef]
Whittle, P. Bounds for the moments of linear and quadratic forms in independent variables. Theory Probab. Appl. 1960, 5, 302–305. [Google Scholar] [CrossRef]
Qiu, Y.; Wang, W.; Xie, T.; Yu, J.; Zhang, X. Boosting Store Sales Through Machine Learning-Informed Promotional Decisions. 2023. Available online: http://www.mysmu.edu/faculty/yujun/Research/Maml_sales.pdf (accessed on 10 February 2024).

Figure 1. Mean Squared Errors (MSEs) of various methods under different sample sizes and censor rates at MR = 20%.

Figure 2. Mean Squared Errors (MSEs) of various methods under different sample sizes and censor rates at MR = 40%.

Table 1. The mean, optimal rate and standard deviation of NMSPE.

	Method	AIC	SAIC	BIC	SBIC	MCIMA	HCIMA
50%	Mean	1.3628	1.3370	1.3517	1.3345	1.2765	1.2165
	Standard deviation	0.5283	0.5060	0.5123	0.4970	0.4039	0.3500
	Optimal rate	0.084	0.137	0.042	0.093	0.306	0.338
60%	Mean	1.3663	1.3388	1.3556	1.3404	1.2651	1.1800
	Standard deviation	0.5504	0.5166	0.5151	0.5068	0.4343	0.3066
	Optimal rate	0.094	0.119	0.049	0.091	0.288	0.359
70%	Mean	1.3347	1.3213	1.3361	1.3259	1.2451	1.1766
	Standard deviation	0.5324	0.5232	0.5288	0.5257	0.3433	0.2966
	Optimal rate	0.097	0.140	0.057	0.079	0.259	0.368
80%	Mean	1.2794	1.2619	1.2828	1.2628	1.2034	1.1504
	Standard deviation	0.4865	0.4714	0.4861	0.4777	0.2941	0.2030
	Optimal rate	0.083	0.165	0.063	0.129	0.240	0.320

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, L.; Liu, J. Model Averaging for Accelerated Failure Time Models with Missing Censoring Indicators. Mathematics 2024, 12, 641. https://doi.org/10.3390/math12050641

AMA Style

Liao L, Liu J. Model Averaging for Accelerated Failure Time Models with Missing Censoring Indicators. Mathematics. 2024; 12(5):641. https://doi.org/10.3390/math12050641

Chicago/Turabian Style

Liao, Longbiao, and Jinghao Liu. 2024. "Model Averaging for Accelerated Failure Time Models with Missing Censoring Indicators" Mathematics 12, no. 5: 641. https://doi.org/10.3390/math12050641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model Averaging for Accelerated Failure Time Models with Missing Censoring Indicators

Abstract

1. Introduction

2. Methodology and Theoretical Property

3. Simulation

4. Real Data Analysis

5. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI