Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models

Ahmed, Syed Ejaz; Arabi Belaghi, Reza; Hussein, Abdulkhadir Ahmed

doi:10.3390/e27030254

Open AccessArticle

Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models

by

Syed Ejaz Ahmed

¹,

Reza Arabi Belaghi

²

and

Abdulkhadir Ahmed Hussein

^3,*

¹

Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada

²

Department of Energy and Technology, Swedish University of Agricultural Sciences, P.O. Box 7032, 750 07 Uppsala, Sweden

³

Department Mathematics & Statistics, University of Windsor, Windsor, ON N9B 3P4, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(3), 254; https://doi.org/10.3390/e27030254

Submission received: 3 January 2025 / Revised: 17 February 2025 / Accepted: 19 February 2025 / Published: 28 February 2025

(This article belongs to the Special Issue Big Data Analytics and Information Science for Business and Biomedical Applications: Third Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Regularization methods such as LASSO, adaptive LASSO, Elastic-Net, and SCAD are widely employed for variable selection in statistical modeling. However, these methods primarily focus on variables with strong effects while often overlooking weaker signals, potentially leading to biased parameter estimates. To address this limitation, Gao, Ahmed, and Feng (2017) introduced a corrected shrinkage estimator that incorporates both weak and strong signals, though their results were confined to linear models. The applicability of such approaches to survival data remains unclear, despite the prevalence of survival regression involving both strong and weak effects in biomedical research. To bridge this gap, we propose a novel class of post-selection shrinkage estimators tailored to the Cox model framework. We establish the asymptotic properties of the proposed estimators and demonstrate their potential to enhance estimation and prediction accuracy through simulations that explicitly incorporate weak signals. Finally, we validate the practical utility of our approach by applying it to two real-world datasets, showcasing its advantages over existing methods.

Keywords:

variable selection; high-dimensional data; Cox proportional hazards model; LASSO; shrinkage estimation; sparse model

MSC:

62J12; 62J07; 62E20; 62F25

1. Introduction

High-dimensional data analysis, where the number of covariates frequently exceeds the sample size, has become a central research focus in contemporary statistics (see [1]). The applications of these methods span a broad range of fields, including genomics, medical imaging, signal processing, social science, and financial economics. In particular, high-dimensional regularized Cox regression models have gained traction in survival analysis (e.g., [2,3,4]), where these techniques help construct parsimonious (sparse) models and can outperform classical selection criteria such as Akaike’s information criterion [5] or the Bayesian information criterion [6].

The least absolute shrinkage and selection operation (LASSO) proposed by [7] remains one of the most popular approaches to high-dimensional regression, due to its computational efficiency and its ability to perform variable selection and parameter shrinkage simultaneously. Numerous extensions of LASSO, such as adaptive LASSO [8], elastic net [9], and scaled LASSO [10], have been developed to further refine estimation and prediction performance. In the context of Cox proportional hazards models, analogous methods—including the LASSO [4,11], the adaptive LASSO [12,13], and smoothly clipped absolute deviation (SCAD; [14])—have been widely examined. Interested readers may also consult [15,16,17,18] for recent advancements in high-dimensional Cox regression.

When

p > n

, the focus is often on accurately recovering both the support (i.e., which covariates have nonzero effects) and the magnitudes of the nonzero regression coefficients. Although many penalized inference procedures excel at identifying “strong” signals (i.e., coefficients that are moderately large and thus easily detected), they may fail in adequately accounting for “weak” signals, whose effects may be small but nonzero. To formalize this, one can divide the index set

{1, \dots, p_{n}}

into three disjoint subsets as follows:

S_{1}

for strong signals,

S_{2}

for weak signals, and

S_{null}

for coefficients that are exactly zero. Standard estimation procedures that neglect weak signals risk introducing non-negligible bias, particularly when these weak signals are numerous.

In this paper, we tackle the bias induced by weak signals in high-dimensional Cox regression by adapting the post-selection shrinkage strategy proposed by [19]. Our key contribution is the development of a weighted ridge (WR) estimator, which effectively differentiates small, nonzero coefficients from those that are truly zero. We show that the resulting post-selection estimators dominate submodel estimators derived from standard regularization methods such as LASSO and elastic net. Moreover, under the condition

p_{n} = O (n^{α})

for some

α > 0

, we establish the asymptotic normality of our post-selection WR estimator, thereby demonstrating its asymptotic efficiency. Through extensive simulations and real data applications, we illustrate that our method achieves substantial improvements in both estimation accuracy and prediction performance.

The remainder of this paper is organized as follows. Section 2 presents the model setup and the proposed post-selection shrinkage estimation procedure. In Section 3, we outline the asymptotic properties of our estimators. Section 4 provides a Monte Carlo simulation study, while Section 5 reports the results of applying our methodology to two real data sets. We conclude in Section 6 with a brief discussion of possible future research directions.

2. Methodology

2.1. Notation and Assumptions

In this section, we state some standard notations and assumptions, used throughout the paper. We use bold upper case letters for matrices and lower case letters for vectors. Moreover,

T

denotes the matrix transpose and

I_{N}

denotes the

N \times N

identity matrix. Design vectors, or columns of

X

, are denoted by

X_{j}, j = 1, \dots, p_{n}

. The index set

M = {1, 2, \dots, p_{n}}

denotes the full model which contains all the potential variables. For a subset

A \subset M

, we use

β_{A}

for a subvector of

β_{M}

indexed by

A

, and

X_{A}

for a submatrix of

X

whose columns are indexed by

A

. For a vector

v = {(v_{1}, \dots, v_{p_{n}})}^{T}

, we denote

{| | v | |}_{2} = \sqrt{\sum_{j = 1}^{p_{n}} v_{j}^{2}}

and

{| | v | |}_{1} = \sum_{j = 1}^{p_{n}} | v_{j} |

. For any square matrix

A

, we let

Λ_{\min} (A)

and

Λ_{\max} (A)

be the smallest and largest eigenvalues of

A

, respectively. Given

a, b \in R

, we let

a \lor b

and

a \land b

denote the maximum and minimum of a and b. For two positive sequences

a_{n}

and

b_{n}

,

a_{n} ≍ b_{n}

, if

a_{n}

is in the same order as

b_{n}

. We use

I (.)

to denote the indicator function;

H_{ϑ} (.; Δ)

denotes the cumulative distribution function (cdf) of a non-central

χ^{2}

-distribution with

ϑ

degrees of freedom and non-centrality parameter

Δ

. We also use

\overset{D}{⟶}

to indicate convergence in distribution.

Let

S \subset {1, \dots, p_{n}}

be the set of the indices of nonzero coefficients, with

s = | S |

denoting the cardinality of

S

. We assume that the true coefficient vector

β^{*} = {(β_{1}^{* T}, \dots, β_{p_{n}}^{* T})}^{T}

is sparse, that is

s < n

. Without loss of generality, we partition the (

n \times p_{n}

)-matrix

X

as

X = {(X_{S_{1}}, X_{S_{2}}, X_{S_{null}})}^{T}

, where

S_{1} \cap S_{2} \cap S_{null} = ⌀

,

S_{1} \cup S_{2} \cup S_{null} = M

and

S_{null} = {j : β_{0 j} = 0}

. For two matrices

X_{S_{1}}

and

X_{S_{2}}

, we define the corresponding sample covariance matrices by

\begin{matrix} Σ_{S_{1} | S_{2}} & = Σ_{S_{1} S_{1}} - Σ_{S_{1} S_{2}} Σ_{S_{2} S_{2}}^{- 1} Σ_{S_{2} S_{1}}, \\ Σ_{S_{2} | S_{1}} & = Σ_{S_{2} S_{2}} - Σ_{S_{2} S_{1}} Σ_{S_{1} S_{1}}^{- 1} Σ_{S_{1} S_{2}} . \end{matrix}

(1)

Let

V = {(X_{S_{2}}, X_{S_{null}})}^{T}

be a

p_{n} - s_{1}

submatrix of

X

. Then, another partition can be written as

X = {(X_{S_{1}}, V)}^{T}

. Let

M_{1} = I_{n} - X_{S_{1}} {\hat{Σ}}_{S_{1} S_{1}}^{- 1} X_{S_{1}}^{T}

. Then,

V^{T} M_{1} V

is a

(p_{n} - s_{1}) \times (p_{n} - s_{1})

dimensional singular matrix with rank

k_{1} \geq 0

. We denote

ϱ_{1} \leq \dots \leq ϱ_{k_{1}}

as all the

k_{1}

positive eigenvalues of

V^{T} M_{1} V

.

2.2. Signal Strength Regularity Conditions

We consider three signal strength assumptions to define three sets of covariates according to their signal strength levels as follows [19]:

(A1): There exists a positive constant $c_{1}$ , such that $| β_{j} | ≳ c_{1} \sqrt{(log p) / n}$ for $\forall j \in S_{1}$ ;
(A2): The coefficient vector $β$ satisfies $| | β_{S_{2}} {| |}_{2}^{2} \sim O (n^{τ})$ for some $0 < τ < 1$ , where $β_{j} \neq 0$ for $\forall j \in S_{2}$ ;
(A3): $β_{j} = 0$ , for $\forall j \in S_{null}$ .

2.3. Cox Proportional Hazards Model

The proportional hazards (PH) model introduced by [20] is one of the most commonly used approaches for analyzing survival data. In this model, the hazard function for an individual depends on covariates through a multiplicative effect, implying that the ratio of hazards for different individuals remains constant over time. We consider a survival model with a true hazard function

λ_{0} (t | X)

for a failure time T, given a covariate vector

X = {(X_{1}, \dots, X_{p})}^{T}

. We let C denote the censoring time and define

Y = \min (T, C)

and

δ = I (T \leq C)

. Suppose we have n i.i.d. observations

{(Y_{i}, δ_{i}, X_{i})}_{i = 1}^{n}

from this true underlying model, where

X = {(X_{1}, \dots, X_{p})}^{T}

represents the

n \times p

design matrix.

The PH model posits that the hazard function for an individual with covariates

X

is

λ (t | X) = λ_{0} (t) exp (X^{T} β),

(2)

where

β = {(β_{1}, \dots, β_{p})}^{T}

is the vector of regression coefficients, and

λ_{0} (t)

is an unknown baseline hazard function. Because

λ_{0} (t)

does not depend on

X

, one can estimate

β

by maximizing the partial log-likelihood

l (β) = \sum_{i = 1}^{n} δ_{i} (x_{i}^{T} β) - \sum_{i = 1}^{n} δ_{i} log (\sum_{j \in R (t_{i})} exp (x_{j}^{T} β)),

(3)

where

δ_{i} = I (T_{i} \leq C_{i})

and

R (t_{i}) = {j : T_{j} \geq t_{i}}

is the risk set just prior to

t_{i}

. Maximizing

l (β)

in (3) with respect to

β

yields the estimator

\hat{β}

for the regression parameters.

2.4. Variable Selection and Estimation

Variable selection can be carried out by minimizing the penalized negative log-partial likelihood as follows:

- l (β) + \sum_{j = 1}^{p_{n}} P_{λ} (β_{j}),

(4)

where

P_{λ} (β_{j})

is a penalty function applied to each component of

β

, and

λ

is a tuning parameter that controls the magnitude of penalization. We consider the following two popular methods:

LASSO. The LASSO estimator follows (4) with an $L_{1}$ -norm penalty,

${Pen}_{λ} (β_{j}) = λ | β_{j} | .$

As $λ$ increases, this penalty continuously shrinks the coefficients toward zero, and some coefficients become exactly zero if $λ$ is sufficiently large. The theoretical properties of the LASSO are well studied; see [21] for an extensive review.
Elastic Net (ENet). The Elastic Net estimator implements (4) with the combined penalty

$P_{λ} (β_{j}) = λ (α | β_{j} | + (1 - α) β_{j}^{2}),$

(5)

where $0 \leq α \leq 1$ . When $α = 1$ , this reduces to the LASSO, and when $α = 0$ , it becomes Ridge. Combining $L_{1}$ and $L_{2}$ penalties leverages the benefits of Ridge while still producing sparse solutions. Unlike LASSO, which can select n variables at most, ENet has no such limitation when $p_{n} > n$ .

2.4.1. Variable Selection Procedure for $S_{1}$ and $S_{2}$

We summarize the variable selection procedure for detecting the strong signals

S_{1}

and the weak signals

S_{2}

.

Step 1 (detection of $S_{1}$ ). Obtain a candidate subset ${\hat{S}}_{1}$ of strong signals using a penalized likelihood estimator (PLE). Specifically, consider

${\hat{β}}^{PLE} = \arg \min_{β} \{- l_{n} (β) + \sum_{j = 1}^{p_{n}} P_{λ} (β_{j})\},$

(6)

where $P_{λ} (β_{j})$ penalizes each $β_{j}$ , shrinking weak effects toward zero and selecting the strong signals. The tuning parameter $λ > 0$ governs the size of the subset ${\hat{S}}_{1}$ .
Step 2 (detection of $S_{2}$ ). To identify ${\hat{S}}_{2}$ , first solve a penalized regression problem with a ridge penalty only on the variables in ${\hat{S}}_{1}^{c}$ . Formally,

${\hat{β}}^{r} = \arg \min_{β} \{- l (β) + r_{n} {∥β_{{\hat{S}}_{1}^{c}}∥}_{2}^{2}\},$

(7)

where $r_{n} > 0$ is a tuning parameter controlling the overall strength of regularization for variables in ${\hat{S}}_{1}^{c}$ . We then define a post-selection weighted ridge (WR) estimator ${\hat{β}}^{WR}$ by

$\begin{matrix} {\hat{β}}_{j}^{WR} = \{\begin{matrix} {\hat{β}}_{j}^{r}, & j \in {\hat{S}}_{1}, \\ {\hat{β}}_{j}^{r} I (|{\hat{β}}_{j}^{r}| > a_{n}), & j \in {\hat{S}}_{1}^{c}, \end{matrix} \end{matrix}$

(8)

where $a_{n}$ is a thresholding parameter. The set ${\hat{S}}_{2}$ is then

${\hat{S}}_{2} = \{j \in {\hat{S}}_{1}^{c} : {\hat{β}}_{j}^{WR} \neq 0, 1 \leq j \leq p\} .$

(9)

We apply this post-selection procedure only if $|{\hat{S}}_{2}| > 2$ . In particular, we set

$a_{n} = c n^{- κ}, 0 < κ \leq \frac{1}{2} .$

(10)

2.4.2. Post-Selection Shrinkage Estimation

We now propose a shrinkage estimator that combines information from two post-selection estimators,

{\hat{β}}^{RE}

and

{\hat{β}}^{WR}

. Recall that

{\hat{β}}_{{\hat{S}}_{1}}^{WR} = {({\hat{β}}_{j}^{r}, j \in {\hat{S}}_{1})}^{T}, and {\hat{β}}_{{\hat{S}}_{2}}^{WR} = {({\hat{β}}_{j}^{r} I (| {\hat{β}}_{j}^{r} | > a_{n}), j \in {\hat{S}}_{2})}^{T} .

Define the post-selection shrinkage estimator for

{\hat{S}}_{1}

as

{\hat{β}}_{{\hat{S}}_{1}}^{SE} = {\hat{β}}_{{\hat{S}}_{1}}^{WR} - (\frac{{\hat{s}}_{2} - 2}{{\hat{T}}_{n}}) ({\hat{β}}_{{\hat{S}}_{1}}^{WR} - {\hat{β}}_{{\hat{S}}_{1}}^{RE}),

(11)

where

{\hat{s}}_{2} = |{\hat{S}}_{2}|

, and

{\hat{β}}_{{\hat{S}}_{1}}^{RE}

is the restricted estimator obtained by maximizing the partial log-likelihood (3) over the set

{\hat{S}}_{1}

. The term

{\hat{T}}_{n}

is given by

{\hat{T}}_{n} = {({\hat{β}}_{{\hat{S}}_{2}}^{WR})}^{T} {(X_{{\hat{S}}_{2}}^{T} M_{{\hat{S}}_{1}} X_{{\hat{S}}_{2}})}^{- 1} {\hat{β}}_{{\hat{S}}_{2}}^{WR}, M_{{\hat{S}}_{1}} = I_{n} - X_{{\hat{S}}_{1}} {\hat{Σ}}_{{\hat{S}}_{1}}^{- 1} X_{{\hat{S}}_{1}}^{T},

(12)

using a generalized inverse if

{\hat{Σ}}_{{\hat{S}}_{1}}

is singular.

To avoid over-shrinking when

{\hat{β}}_{{\hat{S}}_{1}}^{WR}

and

{\hat{β}}_{{\hat{S}}_{1}}^{SE}

have different signs, we define a positive shrinkage estimator via the convex combination

{\hat{β}}_{{\hat{S}}_{1}}^{PSE} = {\hat{β}}_{{\hat{S}}_{1}}^{WR} - ((\frac{{\hat{s}}_{2} - 2}{{\hat{T}}_{n}}) \land 1) ({\hat{β}}_{{\hat{S}}_{1}}^{WR} - {\hat{β}}_{{\hat{S}}_{1}}^{RE}) .

(13)

This modification is essential to prevent an overly aggressive shrinkage that might reverse the sign of estimates in

{\hat{β}}_{{\hat{S}}_{1}}^{WR}

.

3. Asymptotic Properties

In this section, we study the asymptotic properties of the the post-selection shrinkage estimators for the Cox regression model. To investigate the asymptotic theory, we need the following regularity conditions to be met.

(B1): $p = exp (O (n^{α}))$ for some $0 < α < 1$ .
(B2): $ϱ_{1} = O (n^{- η})$ , where $τ < η \leq 1$ for $τ$ in (A2).
(B3): The existence of a positive definite matrix $Σ_{n}$ such that ${lim}_{n \to \infty} Σ_{n} = Σ$ , where the eigenvalues of $Σ$ satisfy $0 < κ_{1} < λ_{\min} (Σ) \leq λ_{\max} (Σ) < κ_{2} < \infty$ .
(B4): Sparse Riesz condition: For the random design matrix $X$ , any $S \subset M$ with $| S | = q, q \leq p$ , and any vector $v \in R^{q}$ , there exists $0 < c_{*} < c^{*} < \infty$ such that $c_{*} \leq | | X_{S}^{T} {v | |}_{2}^{2} / {| | v | |}_{2}^{2} \leq c^{*}$ holds with probability tending to 1.

The following theorems will make it easier to compute the asympotic distributional bias (ADB) and asympotic distributional risk (ADR) of the proposed estimators:

Theorem 1.

Suppose that assumptions (A1)–(A3) and (B1)–(B4) hold. If we choose

r_{n} = c_{2} a_{n}^{- 2} {(log log n)}^{3} log (n \lor p)

for some constant

c_{2} > 0

and

a_{n}

defined in (10) with

ν < (η - α - τ) / 3

, then,

{\hat{S}}_{2}

in (9) satisfies

lim_{n \to \infty} P ({\hat{S}}_{2} = S_{2} | {\hat{S}}_{1} = S_{1}) = 1,

(14)

where

τ, η

, and α are defined in (A2), (B1), and (B2), respectively.

Theorem 2.

Let

s_{n}^{2} = d_{n}^{T} Σ_{n}^{- 1} d_{n}

for any

(p_{1} + p_{2}) \times 1

vector

d_{n}

satisfying

| | d_{n} {| |}_{2}^{2} \leq 1

. Suppose assumptions (B1)–(B4) hold. Consider a sparse Cox model with a signal strength under (A1)–(A3), and with

0 < τ < 1 / 2

. Suppose a pre-selected model such as

S_{1} \subset {\hat{S}}_{1} \subset S_{1} \cup S_{2}

is obtained with probability 1. If we choose

r_{n}

in Theorem 1 with

ν < {(η - α - τ) /, 1 / 4 - τ / 2}

, then, we have the asymptotic normality,

n^{1 / 2} s_{n}^{- 1} d_{n}^{T} ({\hat{β}}_{S_{null}^{c}}^{W R} - β_{S_{null}^{c}}) \overset{D}{⟶} N (0, 1) .

(15)

Asymptotic Distributional Bias and Risk Analysis

In order to compare the estimators, we use the asymptotic distributional bias (ADB) and the asymptotic risk (ADR) expressions of the proposed estimators.

Definition 1.

For any estimator

β_{1 n}^{⋄}

and

p_{1}

-dimensional vector

d_{1 n}

, satisfying

| | d_{1 n} {| |}_{2}^{2} \leq 1

, the ADB and ADR of

d_{1 n}^{T} β_{1 n}^{⋄}

, respectively, are defined as

\begin{matrix} A D B (d_{1 n}^{T} β_{1 n}^{⋄}) & = lim_{n \to \infty} E [{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} (β_{1 n}^{⋄} - β_{1})}], \end{matrix}

(16)

\begin{matrix} A D R (d_{1 n}^{T} β_{1 n}^{⋄}) & = lim_{n \to \infty} E [{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} (β_{1 n}^{⋄} - β_{1})}^{2}], \end{matrix}

(17)

where

s_{1 n}^{2} = d_{1 n}^{T} Σ_{S_{1} | S_{2}}^{- 1} d_{1 n}

. Let

δ = {(δ_{1}, \dots, δ_{p_{2}})}^{T} \in R^{p_{2}}

and

Δ_{d_{1 n}} = \frac{d_{1 n}^{T} (Σ_{S_{1}}^{- 1} Σ_{S_{1} S_{2}} δ δ^{T} Σ_{S_{2} S_{1}} Σ_{S_{1}}^{- 1}) d_{1 n}}{d_{1 n}^{T} (Σ_{S_{1}}^{- 1} Σ_{S_{1} S_{2}} Σ_{S_{2} | S_{1}}^{- 1} Σ_{S_{2} S_{1}} Σ_{S_{1}}^{- 1}) d_{1 n}} .

(18)

We have the following theorems on the expression of ADBs and ADRs of the post-selection estimators.

Theorem 3.

Let

d_{1 n}

be any

p_{1}

-dimensional vector satisfying

0 < | | d_{1 n} {| |}_{2}^{2} \leq 1

and

s_{1 n}^{2} = d_{1 n}^{T} Σ_{S_{1} | S_{2}}^{- 1} d_{1 n}

. Under the assumptions (A1)–(A3), we have

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{W R}) & = 0, \end{matrix}

(19)

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{R E}) & = s_{1}^{- 1} d_{2}^{T} β_{2}, \end{matrix}

(20)

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) & = (p_{2} - 2) s_{1}^{- 1} d_{2}^{T} β_{2}^{*} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}})], \end{matrix}

(21)

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{P S E}) & = s_{1}^{- 1} d_{2}^{T} β_{2}^{*} [(p_{2} - 2) \{E [χ_{p_{2}}^{- 2} (Δ_{d_{2}})] + E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]\} \\ - H_{p_{2}} (p_{2} - 2; Δ_{d_{2}})], \end{matrix}

(22)

where

d_{2 n} = Σ_{S_{2} S_{1}} Σ_{S_{1}}^{- 1} d_{1 n}

and

E [χ_{p_{2}}^{- 2 j} (Δ_{d_{2}})] = \int_{0}^{\infty} x^{- 2 j} d H_{p_{2}} (x; Δ_{d_{2}})

.

See the Appendix A for a detailed proof.

Theorem 4.

Under the assumptions of Theorem 2, except (A2) is replaced by

β_{j} = δ / \sqrt{n}

, for

j \in S_{2}

, with

| δ_{j} | < δ_{\max}

, for some

δ_{\max} > 0

, we have

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{W R}) & = 1, \end{matrix}

(23)

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{R E}) & = 1 + {(1 - c)}^{1 / 2} [2 + {(1 - c)}^{1 / 2} (1 + 2 Δ_{1})], \end{matrix}

(24)

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) & = 1 + {(1 - c)}^{1 / 2} (p_{2} - 2) [{(1 - c)}^{1 / 2} (p_{2} - 2) {E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}})] \\ + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} E [χ_{p_{2}}^{- 4} (Δ_{d_{2}})]} + 2 E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}})]], \end{matrix}

(25)

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{P S E}) & = 1 + (1 - c) {(p_{2} - 2)}^{2} {E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}})] + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} E [χ_{p_{2}}^{- 4} (Δ_{d_{2}})] \\ + E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2 n} - 2))]} \\ + 2 {(1 - c)}^{1 / 2} (p_{2} - 2) {E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}})] + E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ - (p_{2} - 2) E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] - {(1 - c)}^{1 / 2} \\ \times [E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ + {(s_{2}^{- 1} d_{2}^{T} β_{2}^{*})}^{2} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]]}, \\ + {(1 - c)}^{1 / 2} [{(1 - c)}^{1 / 2} (E [χ_{p_{2} + 2}^{2} (Δ_{d_{2}})] + {(s_{2}^{- 1} d_{2}^{T} β_{2}^{*})}^{2} H_{p_{2}} (p_{2} - 2; Δ_{d_{2}})) \\ + 2 \{H_{p_{2}} (p_{2} - 2; Δ_{d_{2}}) - (p_{2} - 2) E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]\}], \end{matrix}

(26)

where

c = {lim}_{n \to \infty} d_{1 n}^{T} Σ_{S_{1}}^{- 1} d_{1 n} / (d_{1 n}^{T} Σ_{S_{11.2}}^{- 1} d_{1 n}) \leq 1

and

s_{2 n}^{2} = d_{2 n}^{T} Σ_{S_{22.1}}^{- 1} d_{2 n}

.

It can be observed that the theoretical results are different from Theorem 3 of [19]. Ref. [19] considered the ADR of PSE estimations for the linear model. In contrast, our Theorems 3 and 4 are used for the PSE with the Cox proportional hazards model, which are feasible estimations. From Theorem 4, we can compare the ADRs of the estimators.

Corollary 1.

Under the assumptions in Theorem 4, we have

If ${| | δ | |}_{2}^{2} \leq 1$ , then $A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{P S E}) \leq A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) \leq A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{W R})$ ;
If ${| | δ | |}_{2}^{2} = o (1)$ and $p_{2} \to \infty$ , then $A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{R E}) < A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{P S E}) \leq A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{W R})$ for $δ = 0$ .

Corollary 1 shows that the performance of the post-selection PSE is closely related to the RE. On the ond hand, if

{\hat{s}}_{1} \subset S_{1} \cup S_{2}

and

(S_{1} \cup S_{2}) \cap {\hat{S}}_{1}^{c}

are large, then the post-selection PSE tends to dominate the RE. Further, if a variable selection method generates the right submodel and

{| | δ | |}_{2}^{2} = o (1)

, that is,

{lim}_{n \to \infty} {\hat{S}}_{1} = S_{1} \cup S_{2}

, then, a post-selection likelihood estimator

{\hat{β}}_{1 n}^{R E}

is the most efficient one compared with all other post-selection estimators.

Remark 1.

The simultaneous variable selection and parameter estimation may not lead to a good estimation strategy when weak signals co-exist with zero signals. Even though the selected candidate subset models can be provided by some existing variable selection techniques when

p > n

, the prediction performance can be improved by the post-selection shrinkage strategy, especially when an under-fitted subset model is selected by an aggressive variable selection procedure.

4. Simulation Study

In this section, we present a simulation study designed to compare the quadratic risk performance of the proposed estimators under the Cox regression model. Each row of the design matrix

X

is generated i.i.d. from a

N (0, Σ)

distribution, where

Σ

follows an autoregressive covariance structure, as follows:

Σ_{j j^{'}} = 0 . 5^{| j - j^{'} |}, 1 \leq j, j^{'} \leq p .

In this setup, we consider the following true regression coefficients:

β = {(\underset{S_{1}}{\underset{︸}{8, 9, 10}}, \underset{S_{2}}{\underset{︸}{1, 0.8, 0.5, \underset{p_{2} - p_{1}}{\underset{︸}{0.2, \dots, 0.2}}}}, \underset{p - p_{1} - p_{2}}{\underset{︸}{0, 0, 0, \dots, 0}})}^{T},

(27)

where the subsets

S_{1}

and

S_{2}

correspond to strong and weak signals, respectively. The true survival times

Y

are generated from an exponential distribution with parameter

X β

. Censoring times are drawn from a

Uniform (0, c)

distribution, where c is chosen to achieve the desired censoring rate. We consider censoring rates of

15 %

and

25 %

, and we explore sample sizes

n = 100, 300, 400

.

We compare the performance of our proposed estimators against two well-known penalized likelihood methods, namely, LASSO and Elastic Net (ENet). We employ the R package glmnet to fit these penalized methods and choose the tuning parameters via cross-validation. For each combination of n and p, we run 1000 Monte Carlo simulations. Let

β_{1 n}^{⋄}

denote either

{\hat{β}}_{1 n}^{PSE}

or

{\hat{β}}_{1 n}^{RE}

after variable selection. We assess the performance using the relative mean squared error (RMSE) with respect to

{\hat{β}}_{1 n}^{WR}

as follows:

RMSE (β_{1 n}^{⋄}) = \frac{E {∥{\hat{β}}_{1 n}^{WR} - β∥}_{2}^{2}}{E {∥β_{1 n}^{⋄} - β∥}_{2}^{2}} .

An

RMSE (β_{1 n}^{⋄}) > 1

indicates that

β_{1 n}^{⋄}

outperforms

{\hat{β}}_{1 n}^{WR}

, and a larger RMSE signifies a stronger degree of superiority over

{\hat{β}}_{1 n}^{WR}

.

Table 1 presents the relative mean squared error (RMSE) values for different regression methods—LASSO and Elastic Net (ENet)—under varying sample sizes (n), number of predictors (p), and censoring percentages (15% and 25%). The RMSE values are averaged over 1000 simulation runs. The table compares three estimators,

{\hat{β}}_{S_{1}}^{P L E}

,

{\hat{β}}_{S_{1}}^{R E}

, and

{\hat{β}}_{S_{1}}^{P S E}

, providing insight into their performance under different settings.

Figure 1 and Figure 2 visualize the RMSE trends for different values of p when comparing LASSO (Figure 1) and ENet (Figure 2) against the proposed estimators (RE and PSE). The plots indicate how RMSE varies as p increases for different sample sizes (n) and censoring levels.

Key Observations and Insights

Superior performance of post-selection estimators: Across all combinations of n and p, the post-selection estimators ( ${\hat{β}}_{S_{1}}^{R E}$ and ${\hat{β}}_{S_{1}}^{P S E}$ ) consistently demonstrate lower RMSEs compared to LASSO and ENet. This suggests that these estimators provide better predictive accuracy and stability.
Impact of censoring percentage:
- When the censoring percentage increases from 15% to 25%, the RMSE values tend to increase across all methods, indicating the expected loss of predictive power due to increased censoring.
- However, the post-selection estimators maintain a more stable RMSE trend, demonstrating their robustness in handling censored data.
Effect of increasing predictors (p):
- As p increases, the RMSE for LASSO and ENet tends to rise, particularly under higher censoring rates.
- This trend suggests that LASSO and ENet struggle with larger feature spaces, likely due to their tendency to aggressively shrink weaker covariates.
- In contrast, the post-selection estimators show relatively stable RMSE behavior, indicating their ability to retain relevant information even in high-dimensional settings.
Impact of sample size (n) on RMSE stability:
- Larger sample sizes (n) generally lead to lower RMSE values across all methods.
- However, the gap between LASSO/ENet and the post-selection estimators remains consistent, reinforcing the advantage of the proposed methods even with more data.
Comparing LASSO and ENet:
- ENet generally has lower RMSE values than LASSO, particularly for small sample sizes, indicating its advantage in balancing feature selection and regularization.
- However, ENet still underperforms compared to post-selection estimators, suggesting that the additional shrinkage adjustments help mitigate underfitting issues.

To further compare the sparsity of the coefficient estimators, we also measure the False Positive Rate (FPR), as follows:

\begin{matrix} FPR (\hat{β}) = \frac{|\{j : {\hat{β}}_{j} \neq 0 \land β_{j} = 0\}|}{|\{j : β_{j} = 0\}|} . \end{matrix}

(28)

A higher FPR indicates that more non-informative variables are incorrectly included in the model, thereby complicating interpretation [22]. When

β

does not contain any zero components, the FPR is undefined. Table 2 compares the performance of LASSO and Elastic Net (ENet) in selecting variables in a high-dimensional Cox model under

15 %

and

25 %

censoring. As sample size (n) increases, both methods select more variables, but false positive rates (FPR) also rise, especially for ENet. LASSO is more conservative, selecting fewer variables with a lower FPR, while ENet selects more but at the cost of higher false discoveries. Higher censoring

(25 %)

slightly increases FPR, reducing selection accuracy. Overall, LASSO offers better false positive control, whereas ENet captures more variables but with increased risk of selecting irrelevant ones.

5. Real Data Example

In this section, we illustrate the practical utility of our proposed methodology on two different high-dimensional datasets.

5.1. Example 1

We first apply our method to a gene expression dataset comprising

n = 614

breast cancer patients, each with

p = 1490

genes. All patients received anthracycline-based chemotherapy. Among these 614 individuals, there were 134 (

21 %

) censored observations, and the mean time to treatment response was approximately

2.98

years. Using biological pathways to identify important genes, Ref. [23] previously selected 29 genes and reported a maximum area under the receiver operating characteristic curve (AUC) of about

62 %

. This relatively low AUC suggests limited predictive power when only using these 29 genes.

To improve upon these findings, we begin by performing an initial noise-reduction procedure on the data. This step helps remove potential outliers and irrelevant features, thereby enhancing the quality of the subsequent variable selection and estimation processes. We applied LASSO and Elastic Net (ENet) for gene selection. The results show that LASSO selected 14 genes, whereas ENet selected 12 genes. We then applied the proposed post-selection shrinkage estimators introduced in Section 2 to evaluate their performance compared to standard methods such as LASSO and Elastic Net. Table 3 shows the estimated coefficients from different estimators, along with the AUC at the bottom. It is evident that the PSE estimate has slightly improved the prediction performance.

5.2. Example 2

We now consider the diffuse large B-cell lymphoma (DLBCL) dataset of [24], which is also high-dimensional. This dataset was used as a primary example to illustrate the effectiveness of our proposed dimension-reduction method. It consists of measurements on 7399 genes obtained from 240 patients via customized cDNA microarrays (lymphochip). Each patient’s survival time was recorded, ranging from 0 to 21.8 years; 127 patients had died (uncensored) and 95 were alive (censored) at the end of the study. Additional details on the dataset can be found in [24].

To obtain the post-selection shrinkage estimators, we first selected candidate subsets using two variable selection approaches—LASSO and Elastic Net (ENet). All tuning parameters were chosen via 10-fold cross-validation. Table 4 shows the estimated coefficients from both LASSO and ENet for the setting

p = 6800

. The AUC results indicate that

{\hat{β}}_{{\hat{S}}_{1}}^{PSE}

generally outperforms

{\hat{β}}_{{\hat{S}}_{1}}^{RE}

and

{\hat{β}}_{{\hat{S}}_{1}}^{PLE}

for both LASSO and ENet procedures. Notably, the ENet-based estimators appear more robust than those obtained via LASSO, underscoring the value of combining

L_{1}

and

L_{2}

penalties in high-dimensional survival analysis.

6. Conclusions

In this paper, we proposed high-dimensional post-selection shrinkage estimators for Cox’s proportional hazards models based on the work of [19]. We investigated the asymptotic risk properties of these estimators in relation to the risks of the subset candidate model, as well as the LASSO and ENet estimators. Our results indicate that the new estimators perform particularly well when the true model contains weak signals. The proposed strategy is also conceptually intuitive and computationally straightforward to implement.

Our theoretical analysis and simulation studies demonstrate that the post-selection shrinkage estimator exhibits superior performance relative to LASSO and ENet, in part because it mitigates the loss of efficiency often associated with variable selection. As a powerful tool for producing interpretable models, sparse modeling via penalized regularization has become increasingly popular for high-dimensional data analysis. Our post-selection shrinkage estimator preserves model interpretability while enhancing predictive accuracy compared to existing penalized regression techniques. Furthermore, two real-data examples illustrate the practical advantages of our method, confirming that its performance is robust and potentially valuable for a range of high-dimensional applications.

Author Contributions

Conceptualization, R.A.B. and S.E.A.; methodology, R.A.B., S.E.A. and A.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and the Engineering Research Council of Canada (NSERC).

Data Availability Statement

All data that are used in this study is publicly available.

Acknowledgments

The research is supported by the Natural Sciences and the Engineering Research Council of Canada (NSERC).

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

Symbol	Description
General Notation
n	Sample size (number of observations)
p	Number of covariates (predictor variables)
$R$	Set of real numbers
$P$	Probability measure
$E$	Expectation operator
$I (\cdot)$	Indicator function
Regression and Estimators
$β$	Regression coefficient vector
$\hat{β}$	Estimated regression coefficients
$λ$	Regularization parameter (for LASSO/ENet)
${\hat{S}}_{1}$	Selected subset of variables
$d_{1 n}$	$p_{1}$ -dimensional vector in the selection model
$β_{1 n}^{\circ}$	Selected regression coefficient estimator
${\hat{β}}_{W R}$	Weighted Ridge (WR) estimator
Survival Analysis Notation
$L (β)$	Cox proportional hazards likelihood function
$D$	Dataset containing observations
X	Covariate matrix
Y	Response variable (time-to-event outcome)
$h (t)$	Hazard function at time t
$\hat{h} (t)$	Estimated hazard function
$Λ (t)$	Cumulative hazard function
Evaluation Metrics
RMSE	Root Mean Squared Error
FPR	False Positive Rate
AUC	Area Under the Curve (for classification models)
Methods and Models
LASSO	Least Absolute Shrinkage and Selection Operator
ENet	Elastic Net
Cox-PH	Cox Proportional Hazards Model
WR	Weighted Ridge estimator
PSE	Post-selection Shrinkage Estimator
RE	Restricted Estimator

Appendix A. Proofs

The technical proofs of Theorems 3 and 4 are included in this section.

(Proof of Theorem 3).

Here, we provide the proof of the ADB expressions of the proposed estimators. Based on Theorem 2, it is clear that

lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})] = E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})\}] = E [Z] = 0,

where

Z \sim N (0, 1)

. Then,

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{R E}) & = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{R E} - β_{1})] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} \{({\hat{β}}_{1 n}^{W R} - β_{1}) - ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})\}] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})] - lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})] \\ = A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{W R}) - lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}] \\ = lim_{n \to \infty} (s_{2 n} / s_{1 n}) E [n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}] \\ = (s_{2} / s_{1}) s_{2}^{- 1} d_{2}^{T} β_{2} = s_{1}^{- 1} d_{2}^{T} β_{2} \end{matrix}

where

d_{2 n} = Σ_{S_{2} S_{1}} Σ_{S_{1}}^{- 1} d_{1 n}

,

d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) = - d_{1}^{T} Σ_{S_{1}}^{- 1} Σ_{S_{2} S_{1}} {\hat{β}}_{2 n}^{W R} = - d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}

and

s_{2 n}^{2} = d_{2}^{T} Σ_{S_{2} | S_{1}}^{- 1} d_{2 n}

.

Now, we compute the ADB of

{\hat{β}}_{1 n}^{S E}

as follows

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) & = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{S E} - β_{1})] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - [(p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) - β_{1})] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})] \\ - lim_{n \to \infty} (p_{2 n} - 2) E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1}^{R E}) {\hat{T}}_{n}^{- 1}] \\ = E [Z] - (p_{2} - 2) E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) T_{n}^{- 1}\}] \\ = (p_{2} - 2) E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} T_{n}^{- 1}\}] \\ = (p_{2} - 2) (s_{2} / s_{1}) E [lim_{n \to \infty} \{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} T_{n}^{- 1}\}] \\ = (p_{2} - 2) s_{1}^{- 1} d_{2}^{T} β_{2} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}})] . \end{matrix}

Finally, we obtain the ADB of

{\hat{β}}_{1 n}^{P S E}

,

\begin{matrix} A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{P S E}) & = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{P S E} - β_{1})] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} {{\hat{β}}_{1 n}^{S E} + [1 - (p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) \\ \times I ({\hat{T}}_{n} < (p_{2 n} - 2)) - β_{1}}] \\ = lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{S E} - β_{1})] \\ + lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} [1 - (p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) I ({\hat{T}}_{n} < (p_{2 n} - 2))] \\ + E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{S E} - β_{1})\}] \\ + E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) I ({\hat{T}}_{n} < (p_{2 n} - 2))\}] \\ - (p_{2} - 2) E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) T_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}] \\ = A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) - E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}] \\ + (p_{2} - 2) E [lim_{n \to \infty} \{n^{1 / 2} s_{1 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}] \\ = A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) - (s_{2} / s_{1}) E [Z I (χ_{P_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ - s_{1}^{- 1} d_{2}^{T} β_{2} H_{p_{2}} (p_{2} - 2; Δ_{d_{2}}) \\ + (p_{2} - 2) (s_{2} / s_{1}) E [Z χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{P_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ + (p_{2} - 2) s_{1}^{- 1} d_{2}^{T} β_{2} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{P_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ = A D B (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) - s_{1}^{- 1} d_{2}^{T} β_{2} H_{p_{2}} (p_{2} - 2; Δ_{d_{2}}) \\ + (p_{2} - 2) s_{1}^{- 1} d_{2}^{T} β_{2} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ = s_{1}^{- 1} d_{2}^{T} β_{2} [(p_{2} - 2) {E [χ_{p_{2}}^{- 2} (Δ_{d_{2}})] + E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{P_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]\} \\ - H_{p_{2}} (p_{2} - 2; Δ_{d_{2}})] . \end{matrix}

□

Proof of Theorem 4.

We provide the proof of the ADR expressions of the proposed estimators. It is clear that

lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})\}}^{2}] = E [lim_{n \to \infty} {\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})\}}^{2}] = E [Z^{2}] = 1,

where

Z \sim N (0, 1)

. Then,

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{R E}) & = lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{R E} - β_{1})\}}^{2}] \\ = lim_{n \to \infty} s_{1 n}^{- 2} E [{\{n^{1 / 2} d_{1 n}^{T} [({\hat{β}}_{1 n}^{W R} - β_{1}) - ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})]\}}^{2}] \\ = lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})\}}^{2}] + lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})\}}^{2}] \\ - 2 lim_{n \to \infty} E [n s_{1 n}^{- 2} d_{1 n}^{T} {({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) ({\hat{β}}_{1 n}^{W R} - β_{1})}^{T} d_{1 n}] \\ = I_{1} + I_{2} + I_{3} . \end{matrix}

From (23), we have

I_{1} = {lim}_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})\}}^{2}] = 1

. Furthermore,

\begin{matrix} I_{2} & = lim_{n \to \infty} s_{1 n}^{- 2} E {[n^{1 / 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})]}^{2} \\ = lim_{n \to \infty} (s_{2 n}^{2} / s_{1 n}^{2}) E {[n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}]}^{2} . \end{matrix}

Since

s_{2 n}^{2} / s_{1 n}^{2} \to 1 - c

, then,

I_{2} = (1 - c) lim_{n \to \infty} E [χ_{1}^{2} (Δ_{d_{2 n}})] = (1 - c) (1 + 2 Δ_{d_{2}}) .

Furthermore,

\begin{matrix} I_{3} & = - 2 lim_{n \to \infty} E [n s_{1 n}^{- 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{W R} - β_{1})}^{T} d_{1 n}] \\ = 2 lim_{n \to \infty} (s_{2 n} / s_{1 n}) E [n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} n^{1 / 2} s_{1 n}^{- 1} {({\hat{β}}_{1 n}^{W R} - β_{1})}^{T} d_{1 n}] \\ = 2 {(1 - c)}^{1 / 2} . \end{matrix}

Now, we investigate (25). By using Equation (17), we have

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) & = lim_{n \to \infty} E [\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} {({\hat{β}}_{1 n}^{S E} - β_{1})\}}^{2}] \\ = lim_{n \to \infty} E {[n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} \{({\hat{β}}_{1 n}^{W R} - β_{1}) - [(p_{2 n} - 2) / {\hat{T}}_{n}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})\}]}^{2} \\ = lim_{n \to \infty} E {[n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})]}^{2} \\ + lim_{n \to \infty} E {[n^{1 / 2} s_{1 n}^{- 1} [(p_{2 n} - 2) T_{n}^{- 1}] d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})]}^{2} \\ - 2 lim_{n \to \infty} E [n s_{1 n}^{- 2} [(p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{W R} - β_{1})}^{T} d_{1 n}] \\ = J_{1} + J_{2} + J_{3} . \end{matrix}

Again,

J_{1} = {lim}_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1})\}}^{2}] = 1

. Then, we have

\begin{matrix} J_{2} & = lim_{n \to \infty} E {[n^{1 / 2} s_{1 n}^{- 1} [(p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})]}^{2} \\ = lim_{n \to \infty} {(p_{2 n} - 1)}^{2} E {[n^{1 / 2} s_{1 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} T_{n}^{- 1}]}^{2} \\ = (s_{2}^{2} / s_{1}^{2}) {(p_{2} - 1)}^{2} E {[lim_{n \to \infty} \{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} {\hat{T}}_{n}^{- 1}\}]}^{2} \\ = (s_{2}^{2} / s_{1}^{2}) {(p_{2} - 1)}^{2} \{E [Z^{2} χ_{p_{2}}^{- 4} (Δ_{d_{2}})] + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} E [χ_{p_{2}}^{- 4} (Δ_{d_{2}})]\} \\ = (1 - c) {(p_{2} - 2)}^{2} \{E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}})] + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} E [χ_{p_{2}}^{- 4} (Δ_{d_{2}})]\}, \end{matrix}

and

\begin{matrix} J_{3} & = - 2 lim_{n \to \infty} E [n s_{1 n}^{- 2} [(p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{W R} - β_{1})}^{T} d_{1 n}] \\ = 2 lim_{n \to \infty} (s_{2 n} / s_{1 n}) (p_{2 n} - 2) E [n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} s_{1 n}^{- 1} {({\hat{β}}_{1 n}^{W R} - β_{1})}^{T} {\hat{T}}_{n}^{- 1}] \\ = 2 {(1 - c)}^{1 / 2} (p_{2} - 2) \{E [Z^{2} χ_{p_{2}}^{- 2} (Δ_{d_{2}})] + s_{2}^{- 1} d_{2}^{T} β_{2} E [Z χ_{p_{2}}^{- 2} (Δ_{d_{2}})]\} \\ = 2 {(1 - c)}^{1 / 2} (p_{2} - 2) E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}})] . \end{matrix}

Finally, we compute the ADR of

{\hat{β}}_{1 n}^{P S E}

as follows:

\begin{matrix} A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{P S E}) & = lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{P S E} - β_{1})\}}^{2}] \\ = lim_{n \to \infty} E [{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} [({\hat{β}}_{1 n}^{S E} - β_{1}) \\ + [1 - (p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) I ({\hat{T}}_{n} < (p_{2 n} - 2))]}^{2}] \\ = lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{S E} - β_{1})\}}^{2}] \\ + lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} [1 - (p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ + 2 lim_{n \to \infty} E [n^{1 / 2} s_{1 n}^{- 2} d_{1 n}^{T} [1 - (p_{2 n} - 2) {\hat{T}}_{n}^{- 1}] ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{S E} - β_{1})}^{T} \\ \times I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}] \\ = A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) + lim_{n \to \infty} E [{\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ + lim_{n \to \infty} {(p_{2 n} - 2)}^{2} E [\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} {({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ - 2 lim_{n \to \infty} (p_{2 n} - 2) E [\{n s_{1 n}^{- 2} d_{1 n}^{T} {({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})}^{2} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}\}] \\ + 2 lim_{n \to \infty} E [\{n s_{1 n}^{- 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{S E} - β_{1})}^{T} I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}\}] \\ - 2 lim_{n \to \infty} (p_{2 n} - 2) E [{n s_{1 n}^{- 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{S E} - β_{1})}^{T} \\ \times {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}}] \\ = A D R (d_{1 n}^{T} {\hat{β}}_{1 n}^{S E}) + K_{1} + K_{2} + K_{3} + K_{4} + K_{5}, \end{matrix}

where

\begin{matrix} K_{1} & = lim_{n \to \infty} E [\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} {({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ = lim_{n \to \infty} E [{\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ = lim_{n \to \infty} {(s_{2 n} / s_{1 n})}^{2} E [{\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ = {(s_{2} / s_{1})}^{2} \{E [Z^{2} I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} H_{p_{2}} (p_{2} - 2; Δ_{d_{2}})\} \\ = (1 - c) \{E [χ_{p_{2} + 2}^{2} (Δ_{d_{2}})] + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} H_{p_{2}} (p_{2} - 2; Δ_{d_{2}})\}, \end{matrix}

\begin{matrix} K_{2} & = lim_{n \to \infty} E [\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} {({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) (p_{2 n} - 2) {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ = lim_{n \to \infty} {(p_{2 n} - 2)}^{2} {(s_{1 n} / s_{2 n})}^{2} E [{\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ = {(p_{2} - 2)}^{2} {(s_{2} / s_{1})}^{2} E [lim_{n \to \infty} {\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{2}] \\ = {(p_{2} - 2)}^{2} (1 - c) E [Z^{2} χ_{p_{2}}^{- 4} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ = {(p_{2} - 2)}^{2} (1 - c) E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))], \end{matrix}

\begin{matrix} K_{3} & = - 2 lim_{n \to \infty} (p_{2 n} - 2) E [\{n s_{1 n}^{- 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E})^{2} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}\}] \\ = - 2 (p_{2} - 2) {(s_{2} / s_{1})}^{2} E [lim_{n \to \infty} {\{n^{1 / 2} s_{2 n}^{- 1} d_{1 n}^{T} {\hat{β}}_{2 n}^{W R}\}}^{2} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))] \\ = - 2 (p_{2} - 2) (1 - c) {E [Z^{2} χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]} \\ = - 2 (p_{2} - 2) (1 - c) {E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ + {(s_{2}^{- 1} d_{2}^{T} β_{2})}^{2} E [χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]}, \end{matrix}

\begin{matrix} K_{4} & = 2 lim_{n \to \infty} E [\{n s_{1 n}^{- 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{S E} - β_{1})}^{T} I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}\}] \\ = 2 lim_{n \to \infty} (s_{2 n} / s_{1 n}) E [\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}\} {\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{S E} - β_{1})\}}^{T} I ({\hat{T}}_{n} < (p_{2 n} - 2))] \\ = 2 (s_{2} / s_{1}) E [\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}\} {n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1}) \\ - (p_{2 n} - 2) {\hat{T}}_{n}^{- 1} I {({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{T} I ({\hat{T}}_{n} < (p_{2 n} - 2))] \\ = 2 {(1 - c)}^{1 / 2} {E [lim_{n \to \infty} \{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}\} {\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1}) I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{T}] \\ - (p_{2} - 2) E [\{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R}\} {\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1}) {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{T}]} \\ = 2 {(1 - c)}^{1 / 2} \{E [Z^{2} I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] - (p_{2} - 2) E [Z^{2} χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]\} \\ = 2 {(1 - c)}^{1 / 2} \{H_{p_{2} + 2} (p_{2} - 2; Δ_{d_{2}}) - (p_{2} - 2) E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]\} \end{matrix}

and

\begin{matrix} K_{5} & = - 2 lim_{n \to \infty} (p_{2 n} - 2) E [\{n s_{1 n}^{- 2} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {({\hat{β}}_{1 n}^{S E} - β_{1})}^{T} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2)) d_{1 n}\}] \\ = 2 (p_{2} - 2) (s_{2} / s_{1}) E [lim_{n \to \infty} \{n^{1 / 2} s_{2 n}^{- 1} d_{2 n}^{T} {\hat{β}}_{2 n}^{W R} {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\} \\ \times {\{n^{1 / 2} s_{1 n}^{- 1} d_{1 n}^{T} ({\hat{β}}_{1 n}^{W R} - β_{1}) - (p_{2 n} - 2) ({\hat{β}}_{1 n}^{W R} - {\hat{β}}_{1 n}^{R E}) {\hat{T}}_{n}^{- 1} I ({\hat{T}}_{n} < (p_{2 n} - 2))\}}^{T}] \\ = 2 (p_{2} - 2) (s_{2} / s_{1}) {E [Z^{2} χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ - (p_{2} - 2) E [Z^{2} χ_{p_{2}}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2}}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]} \\ = 2 (p_{2} - 2) {(1 - c)}^{1 / 2} {E [χ_{p_{2} + 2}^{- 2} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))] \\ - (p_{2} - 2) E [χ_{p_{2} + 2}^{- 4} (Δ_{d_{2}}) I (χ_{p_{2} + 2}^{2} (Δ_{d_{2}}) < (p_{2} - 2))]} . \end{matrix}

□

References

Ahmed, S.E. Big and Complex Data Analysis: Methodologies and Applications; Springer: Cham, Switzerland, 2017. [Google Scholar]
Bradic, J.; Fan, J.; Jiang, J. Regularization for Cox’s proportional hazards model with n_p-dimensionality. Ann. Stat. 2011, 39, 3092–3120. [Google Scholar] [CrossRef] [PubMed]
Bradic, J.; Song, R. Structured estimation for the nonparametric Cox model. Electron. J. Stat. 2015, 9, 492–534. [Google Scholar] [CrossRef]
Gui, J.; Li, H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings with applications to microarray gene expression data. Bioinformatics 2005, 21, 3001–3008. [Google Scholar] [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
Sun, T.; Zhang, C.H. Scaled sparse linear regression. Biometrika 2012, 99, 879–898. [Google Scholar] [CrossRef]
Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
Zhang, H.; Lu, W. Adaptive lasso for Cox’s proportional hazards model. Biometrika 2007, 94, 691–703. [Google Scholar] [CrossRef]
Zou, H. A note on path-based variable selection in the penalized proportional hazards model. Biometrika 2008, 95, 241–247. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 2002, 6, 74–99. [Google Scholar] [CrossRef]
Hong, H.; Li, Y. Feature selection of ultrahigh-dimensional covariates with survival outcomes: A selective review. Appl. Math. Ser. B 2017, 32, 379–396. [Google Scholar] [CrossRef] [PubMed]
Hong, H.; Zheng, Q.; Li, Y. Forward regression for Cox models with high-dimensional covariates. J. Multivar. Anal. 2019, 173, 268–290. [Google Scholar] [CrossRef]
Hong, H.; Chen, X.; Kang, J.; Li, Y. The Lq-norm learning for ultrahigh-dimensional survival data: An integrative framework. Stat. Sin. 2020, 30, 1213–1233. [Google Scholar] [CrossRef]
Ahmed, S.E.; Ahmed, F.; Yüzbaşı, B. Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data; Chapman and Hall/CRC: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
Gao, X.; Ahmed, S.E.; Feng, Y. Post selection shrinkage estimation for high-dimensional data analysis. Appl. Stoch. Models Bus. Ind. 2017, 33, 97–120. [Google Scholar] [CrossRef]
Cox, D.R. Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
Buhlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Kurnaz, S.F.; Hoffmann, I.; Filzmoser, P. Robust and sparse estimation methods for high-dimensional linear and logistic regression. J. Chemom. Intell. Lab. Syst. 2018, 172, 211–222. [Google Scholar] [CrossRef]
Belhechmi, S.; Bin, R.D.; Rotolo, F.; Michiels, S. Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models. BMC Bioinform. 2020, 21, 277. [Google Scholar] [CrossRef] [PubMed]
Rosenwald, A.; Wright, G.; Chan, W.C.; Connors, J.M.; Campo, E.; Fisher, R.I.; Gascoyne, R.D.; Muller-Hermelink, H.K.; Smel, E.B.; Giltnane, J.M.; et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 2002, 25, 1937–1947. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Relative mean squared error (RMSE) of the proposed estimators compared to LASSO for different n and p.

Figure 2. Relative mean squared error (RMSE) of the proposed estimators compared to Elastic Net for different n and p.

Table 1. Simulated relative mean squared error (RMSE) across different values of p and n, averaged over

N = 1000

simulation runs.

Table 1. Simulated relative mean squared error (RMSE) across different values of p and n, averaged over

N = 1000

simulation runs.

			Censoring Percentage
			15%			25%
$n$	$p$	Method	${\hat{β}}_{{\hat{S}}_{1}}^{PLE}$	${\hat{β}}_{{\hat{S}}_{1}}^{RE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PSE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PLE}$	${\hat{β}}_{{\hat{S}}_{1}}^{RE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PSE}$
100	300	LASSO	1.04	1.66	1.19	1.08	1.96	1.23
		ENet	1.07	1.45	1.23	0.92	1.44	1.06
	400	LASSO	1.03	1.10	1.60	0.96	1.03	1.98
		ENet	0.90	1.00	1.45	0.89	0.98	1.36
	500	LASSO	1.08	1.13	1.66	0.98	1.05	1.37
		ENet	0.96	1.01	1.03	0.95	1.00	1.22
300	300	LASSO	0.85	1.60	0.98	0.92	1.64	1.06
		ENet	0.96	1.37	1.08	0.87	1.46	1.00
	350	LASSO	0.83	0.99	1.01	0.90	1.07	1.17
		ENet	0.85	0.99	1.52	0.87	1.02	1.56
	400	LASSO	0.90	1.03	1.25	0.81	0.95	1.73
		ENet	0.90	1.04	1.25	0.76	0.89	1.41
400	400	LASSO	0.99	1.52	1.12	0.82	1.50	0.94
		ENet	0.91	1.29	1.02	0.83	1.26	0.99
	450	LASSO	0.83	1.00	1.13	0.84	0.94	1.61
		ENet	0.92	1.05	1.46	0.85	0.99	1.79
	500	LASSO	0.89	0.93	1.83	0.81	0.93	1.90
		ENet	0.82	0.93	1.38	0.82	0.95	1.75

Table 2. Average number of selected predictors (

{\hat{S}}_{1}

) and false positive rate (FPR) across different values of n and p, averaged over

N = 1000

simulation runs.

Table 2. Average number of selected predictors (

{\hat{S}}_{1}

) and false positive rate (FPR) across different values of n and p, averaged over

N = 1000

simulation runs.

			Censoring Percentage
			$% 15$		$% 25$
$n$	$p$	Method	Average ${\hat{S}}_{1}$	FPR	Average ${\hat{S}}_{1}$	FPR
100	300	LASSO	6.1	0.063	6.4	0.056
		ENet	6.2	0.063	6.6	0.052
	400	LASSO	4.9	0.072	5.2	0.085
		ENet	5.1	0.072	4.8	0.075
	500	LASSO	5.6	0.039	12.6	0.043
		ENet	4.9	0.039	4.0	0.033
300	300	LASSO	13.4	0.209	13.8	0.223
		ENet	12.9	0.209	16.3	0.282
	350	LASSO	15.6	0.202	15.8	0.208
		ENet	15.7	0.202	22.6	0.279
	400	LASSO	14.5	0.137	13.7	0.155
		ENet	13.5	0.137	14.2	0.173
400	400	LASSO	14.1	0.163	15.8	0.171
		ENet	14.2	0.163	20.4	0.212
	450	LASSO	18.4	0.217	23.5	0.24
		ENet	19.1	0.217	30.1	0.263
	500	LASSO	13.6	0.150	13.3	0.158
		ENet	13.3	0.150	13.6	0.158

Table 3. Estimated coefficients using the LASSO and ENet method for example 1.

	LASSO			ENet
Gen ID	${\hat{β}}_{{\hat{S}}_{1}}^{LASSO}$	${\hat{β}}_{{\hat{S}}_{1}}^{RE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PSE}$	${\hat{β}}_{{\hat{S}}_{1}}^{ENet}$	${\hat{β}}_{{\hat{S}}_{1}}^{RE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PSE}$
18	−0.02	0.26	0.21	−0.03	0.07	0.20
97	0.01	0.27	0.00	0.01	0.26	0.01
101	0.05	0.19	0.13	0.05	0.27	0.12
128	–	–	–	−0.01	–	–
232	0.04	−0.42	−0.28	0.04	0.20	−0.25
342	0.15	−0.42	−0.13	0.14	−0.39	−0.10
369	−0.09	−0.05	0.04	−0.08	−0.40	−0.12
408	−0.01	–	–	−0.01	−0.09	0.03
410	0.03	−0.26	−0.15	0.03	−0.06	−0.14
445	–	–	–	−0.00	–	–
468	0.14	0.08	0.02	0.13	−0.26	−0.01
660	−0.00	–	–	−0.00	–	–
731	−0.08	0.09	0.06	−0.08	0.06	0.06
810	−0.04	−0.08	−0.09	0.01	0.09	−0.09
907	–	–	–	0.01	–	–
934	−0.00	–	–	−0.00	–	–
952	–	–	–	−0.01	–	–
961	−0.05	—	–	−0.05	−0.08	0.20
1212	–	–	–	−0.00	–	–
AUC	0.62	0.63	0.65	0.63	0.64	0.66

Table 4. Estimated coefficients using the LASSO and ENet method for example 2.

	LASSO			ENet
Gen ID	${\hat{β}}_{{\hat{S}}_{1}}^{LASSO}$	${\hat{β}}_{{\hat{S}}_{1}}^{RE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PSE}$	${\hat{β}}_{{\hat{S}}_{1}}^{ENet}$	${\hat{β}}_{{\hat{S}}_{1}}^{RE}$	${\hat{β}}_{{\hat{S}}_{1}}^{PSE}$
95	0.02	–	–	−0.34	–	–
112	0.06	0.71	0.70	−0.00	−0.13	−0.08
173	−0.63	–	–	0.68	–	–
205	–	–	–	–	–	–
551	1.60	1.69	1.57	−0.11	−0.28	−0.20
1377	−0.22	−0.84	−0.80	−0.09	−0.16	−0.12
1526	0.41	0.67	0.56	0.02	–	–
1543	−0.43	−0.79	−0.77	0.40	0.75	0.69
2003	−0.11	–	–	1.10	–	–
2025	0.18	0.90	0.78	1.04	1.22	1.07
2439	–	–	–	−0.01	−0.14	−0.12
2705	−0.85	–	–	0.36	0.77	0.61
2973	0.59	1.23	0.99	−0.63	−1.12	−0.81
3240	1.13	–	–	0.03	–	–
3598	−0.22	−0.59	−0.54	0.29	0.55	0.49
3882	0.13	0.40	0.39	−0.06	−0.20	−0.15
4015	0.34	0.81	0.76	−0.08	−0.13	−0.12
4186	−0.50	−0.72	−0.53	–	–	–
4357	0.09	–	–	−0.59	−0.70	−0.65
4662	0.70	0.90	0.83	0.21	0.60	0.38
5131	0.54	0.80	0.71	0.01	0.01	0.01
5222	−0.15	−0.38	−0.26	1.24	1.67	1.34
5541	–	–	–	−0.52	−0.72	−0.68
5577	0.39	0.86	0.70	−0.73	−0.97	−0.80
5778	−0.62	–	–	−0.09	–	–
5808	–	–	–	0.35	0.55	0.46
5951	–	–	–	1.29	2.12	1.70
6103	–	–	–	−0.63	−0.80	−0.76
6254	–	–	–	0.25	0.56	0.48
6493	–	–	–	0.65	–	–
6510	–	–	–	0.86	1.09	0.99
AUC	0.71	0.71	0.73	0.72	0.72	0.74

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahmed, S.E.; Arabi Belaghi, R.; Hussein, A.A. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy 2025, 27, 254. https://doi.org/10.3390/e27030254

AMA Style

Ahmed SE, Arabi Belaghi R, Hussein AA. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy. 2025; 27(3):254. https://doi.org/10.3390/e27030254

Chicago/Turabian Style

Ahmed, Syed Ejaz, Reza Arabi Belaghi, and Abdulkhadir Ahmed Hussein. 2025. "Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models" Entropy 27, no. 3: 254. https://doi.org/10.3390/e27030254

APA Style

Ahmed, S. E., Arabi Belaghi, R., & Hussein, A. A. (2025). Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy, 27(3), 254. https://doi.org/10.3390/e27030254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models

Abstract

1. Introduction

2. Methodology

2.1. Notation and Assumptions

2.2. Signal Strength Regularity Conditions

2.3. Cox Proportional Hazards Model

2.4. Variable Selection and Estimation

2.4.1. Variable Selection Procedure for $S_{1}$ and $S_{2}$

2.4.2. Post-Selection Shrinkage Estimation

3. Asymptotic Properties

Asymptotic Distributional Bias and Risk Analysis

4. Simulation Study

Key Observations and Insights

5. Real Data Example

5.1. Example 1

5.2. Example 2

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

Appendix A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models

Abstract

1. Introduction

2. Methodology

2.1. Notation and Assumptions

2.2. Signal Strength Regularity Conditions

2.3. Cox Proportional Hazards Model

2.4. Variable Selection and Estimation

2.4.1. Variable Selection Procedure for S 1 and S 2

2.4.2. Post-Selection Shrinkage Estimation

3. Asymptotic Properties

Asymptotic Distributional Bias and Risk Analysis

4. Simulation Study

Key Observations and Insights

5. Real Data Example

5.1. Example 1

5.2. Example 2

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

Appendix A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.4.1. Variable Selection Procedure for $S_{1}$ and $S_{2}$