Next Article in Journal
Real-Time Decoder Architecture for LDPC–CPM
Previous Article in Journal
Improved Ionization Potential Depression Model Incorporating Dynamical Structure Factors and Electron Degeneracy for Non-Ideal Plasma Composition
Previous Article in Special Issue
Air Quality Prediction Based on Singular Spectrum Analysis and Artificial Neural Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models

by
Syed Ejaz Ahmed
1,
Reza Arabi Belaghi
2 and
Abdulkhadir Ahmed Hussein
3,*
1
Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada
2
Department of Energy and Technology, Swedish University of Agricultural Sciences, P.O. Box 7032, 750 07 Uppsala, Sweden
3
Department Mathematics & Statistics, University of Windsor, Windsor, ON N9B 3P4, Canada
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(3), 254; https://doi.org/10.3390/e27030254
Submission received: 3 January 2025 / Revised: 17 February 2025 / Accepted: 19 February 2025 / Published: 28 February 2025

Abstract

:
Regularization methods such as LASSO, adaptive LASSO, Elastic-Net, and SCAD are widely employed for variable selection in statistical modeling. However, these methods primarily focus on variables with strong effects while often overlooking weaker signals, potentially leading to biased parameter estimates. To address this limitation, Gao, Ahmed, and Feng (2017) introduced a corrected shrinkage estimator that incorporates both weak and strong signals, though their results were confined to linear models. The applicability of such approaches to survival data remains unclear, despite the prevalence of survival regression involving both strong and weak effects in biomedical research. To bridge this gap, we propose a novel class of post-selection shrinkage estimators tailored to the Cox model framework. We establish the asymptotic properties of the proposed estimators and demonstrate their potential to enhance estimation and prediction accuracy through simulations that explicitly incorporate weak signals. Finally, we validate the practical utility of our approach by applying it to two real-world datasets, showcasing its advantages over existing methods.

1. Introduction

High-dimensional data analysis, where the number of covariates frequently exceeds the sample size, has become a central research focus in contemporary statistics (see [1]). The applications of these methods span a broad range of fields, including genomics, medical imaging, signal processing, social science, and financial economics. In particular, high-dimensional regularized Cox regression models have gained traction in survival analysis (e.g., [2,3,4]), where these techniques help construct parsimonious (sparse) models and can outperform classical selection criteria such as Akaike’s information criterion [5] or the Bayesian information criterion [6].
The least absolute shrinkage and selection operation (LASSO) proposed by [7] remains one of the most popular approaches to high-dimensional regression, due to its computational efficiency and its ability to perform variable selection and parameter shrinkage simultaneously. Numerous extensions of LASSO, such as adaptive LASSO [8], elastic net [9], and scaled LASSO [10], have been developed to further refine estimation and prediction performance. In the context of Cox proportional hazards models, analogous methods—including the LASSO [4,11], the adaptive LASSO [12,13], and smoothly clipped absolute deviation (SCAD; [14])—have been widely examined. Interested readers may also consult [15,16,17,18] for recent advancements in high-dimensional Cox regression.
When p > n , the focus is often on accurately recovering both the support (i.e., which covariates have nonzero effects) and the magnitudes of the nonzero regression coefficients. Although many penalized inference procedures excel at identifying “strong” signals (i.e., coefficients that are moderately large and thus easily detected), they may fail in adequately accounting for “weak” signals, whose effects may be small but nonzero. To formalize this, one can divide the index set { 1 , , p n } into three disjoint subsets as follows: S 1 for strong signals, S 2 for weak signals, and S null for coefficients that are exactly zero. Standard estimation procedures that neglect weak signals risk introducing non-negligible bias, particularly when these weak signals are numerous.
In this paper, we tackle the bias induced by weak signals in high-dimensional Cox regression by adapting the post-selection shrinkage strategy proposed by [19]. Our key contribution is the development of a weighted ridge (WR) estimator, which effectively differentiates small, nonzero coefficients from those that are truly zero. We show that the resulting post-selection estimators dominate submodel estimators derived from standard regularization methods such as LASSO and elastic net. Moreover, under the condition p n = O ( n α ) for some α > 0 , we establish the asymptotic normality of our post-selection WR estimator, thereby demonstrating its asymptotic efficiency. Through extensive simulations and real data applications, we illustrate that our method achieves substantial improvements in both estimation accuracy and prediction performance.
The remainder of this paper is organized as follows. Section 2 presents the model setup and the proposed post-selection shrinkage estimation procedure. In Section 3, we outline the asymptotic properties of our estimators. Section 4 provides a Monte Carlo simulation study, while Section 5 reports the results of applying our methodology to two real data sets. We conclude in Section 6 with a brief discussion of possible future research directions.

2. Methodology

2.1. Notation and Assumptions

In this section, we state some standard notations and assumptions, used throughout the paper. We use bold upper case letters for matrices and lower case letters for vectors. Moreover, T denotes the matrix transpose and I N denotes the N × N identity matrix. Design vectors, or columns of X , are denoted by X j , j = 1 , , p n . The index set M = { 1 , 2 , , p n } denotes the full model which contains all the potential variables. For a subset A M , we use β A for a subvector of β M indexed by A , and X A for a submatrix of X whose columns are indexed by A . For a vector v = ( v 1 , , v p n ) T , we denote | | v | | 2 = j = 1 p n v j 2 and | | v | | 1 = j = 1 p n | v j | . For any square matrix A , we let Λ min ( A ) and Λ max ( A ) be the smallest and largest eigenvalues of A , respectively. Given a , b R , we let a b and a b denote the maximum and minimum of a and b. For two positive sequences a n and b n , a n b n , if a n is in the same order as b n . We use I ( . ) to denote the indicator function; H ϑ ( . ; Δ ) denotes the cumulative distribution function (cdf) of a non-central χ 2 -distribution with ϑ degrees of freedom and non-centrality parameter Δ . We also use D to indicate convergence in distribution.
Let S { 1 , , p n } be the set of the indices of nonzero coefficients, with s = | S | denoting the cardinality of S . We assume that the true coefficient vector β * = ( β 1 * T , , β p n * T ) T is sparse, that is s < n . Without loss of generality, we partition the ( n × p n )-matrix X as X = ( X S 1 , X S 2 , X S null ) T , where S 1 S 2 S null = , S 1 S 2 S null = M and S null = { j : β 0 j = 0 } . For two matrices X S 1 and X S 2 , we define the corresponding sample covariance matrices by
Σ S 1 | S 2 = Σ S 1 S 1 Σ S 1 S 2 Σ S 2 S 2 1 Σ S 2 S 1 , Σ S 2 | S 1 = Σ S 2 S 2 Σ S 2 S 1 Σ S 1 S 1 1 Σ S 1 S 2 .
Let V = ( X S 2 , X S null ) T be a p n s 1 submatrix of X . Then, another partition can be written as X = ( X S 1 , V ) T . Let M 1 = I n X S 1 Σ ^ S 1 S 1 1 X S 1 T . Then, V T M 1 V is a ( p n s 1 ) × ( p n s 1 ) dimensional singular matrix with rank k 1 0 . We denote ϱ 1 ϱ k 1 as all the k 1 positive eigenvalues of V T M 1 V .

2.2. Signal Strength Regularity Conditions

We consider three signal strength assumptions to define three sets of covariates according to their signal strength levels as follows [19]:
(A1) 
There exists a positive constant c 1 , such that | β j | c 1 ( log p ) / n for j S 1 ;
(A2) 
The coefficient vector β satisfies | | β S 2 | | 2 2 O ( n τ ) for some 0 < τ < 1 , where β j 0 for j S 2 ;
(A3) 
β j = 0 , for j S null .

2.3. Cox Proportional Hazards Model

The proportional hazards (PH) model introduced by [20] is one of the most commonly used approaches for analyzing survival data. In this model, the hazard function for an individual depends on covariates through a multiplicative effect, implying that the ratio of hazards for different individuals remains constant over time. We consider a survival model with a true hazard function λ 0 t | X for a failure time T, given a covariate vector X = X 1 , , X p T . We let C denote the censoring time and define Y = min ( T , C ) and δ = I T C . Suppose we have n i.i.d. observations { Y i , δ i , X i } i = 1 n from this true underlying model, where X = X 1 , , X p T represents the n × p design matrix.
The PH model posits that the hazard function for an individual with covariates X is
λ t | X = λ 0 ( t ) exp X T β ,
where β = β 1 , , β p T is the vector of regression coefficients, and λ 0 ( t ) is an unknown baseline hazard function. Because λ 0 ( t ) does not depend on X , one can estimate β by maximizing the partial log-likelihood
l ( β ) = i = 1 n δ i x i T β i = 1 n δ i log j R ( t i ) exp x j T β ,
where δ i = I T i C i and R ( t i ) = { j : T j t i } is the risk set just prior to t i . Maximizing l ( β ) in (3) with respect to β yields the estimator β ^ for the regression parameters.

2.4. Variable Selection and Estimation

Variable selection can be carried out by minimizing the penalized negative log-partial likelihood as follows:
l ( β ) + j = 1 p n P λ β j ,
where P λ ( β j ) is a penalty function applied to each component of β , and λ is a tuning parameter that controls the magnitude of penalization. We consider the following two popular methods:
  • LASSO. The LASSO estimator follows (4) with an L 1 -norm penalty,
    Pen λ ( β j ) = λ | β j | .
    As λ increases, this penalty continuously shrinks the coefficients toward zero, and some coefficients become exactly zero if λ is sufficiently large. The theoretical properties of the LASSO are well studied; see [21] for an extensive review.
  • Elastic Net (ENet). The Elastic Net estimator implements (4) with the combined penalty
    P λ ( β j ) = λ α | β j | + ( 1 α ) β j 2 ,
    where 0 α 1 . When α = 1 , this reduces to the LASSO, and when α = 0 , it becomes Ridge. Combining L 1 and L 2 penalties leverages the benefits of Ridge while still producing sparse solutions. Unlike LASSO, which can select n variables at most, ENet has no such limitation when p n > n .

2.4.1. Variable Selection Procedure for S 1 and S 2

We summarize the variable selection procedure for detecting the strong signals S 1 and the weak signals S 2 .
  • Step 1 (detection of S 1 ). Obtain a candidate subset S ^ 1 of strong signals using a penalized likelihood estimator (PLE). Specifically, consider
    β ^ PLE = arg min β l n ( β ) + j = 1 p n P λ β j ,
    where P λ ( β j ) penalizes each β j , shrinking weak effects toward zero and selecting the strong signals. The tuning parameter λ > 0 governs the size of the subset S ^ 1 .
  • Step 2 (detection of S 2 ). To identify S ^ 2 , first solve a penalized regression problem with a ridge penalty only on the variables in S ^ 1 c . Formally,
    β ^ r = arg min β l ( β ) + r n β S ^ 1 c 2 2 ,
    where r n > 0 is a tuning parameter controlling the overall strength of regularization for variables in S ^ 1 c . We then define a post-selection weighted ridge (WR) estimator β ^ WR by
    β ^ j WR = β ^ j r , j S ^ 1 , β ^ j r I β ^ j r > a n , j S ^ 1 c ,
    where a n is a thresholding parameter. The set S ^ 2 is then
    S ^ 2 = j S ^ 1 c : β ^ j WR 0 , 1 j p .
    We apply this post-selection procedure only if S ^ 2 > 2 . In particular, we set
    a n = c n κ , 0 < κ 1 2 .

2.4.2. Post-Selection Shrinkage Estimation

We now propose a shrinkage estimator that combines information from two post-selection estimators, β ^ RE and β ^ WR . Recall that
β ^ S ^ 1 WR = β ^ j r , j S ^ 1 T , and β ^ S ^ 2 WR = β ^ j r I | β ^ j r | > a n , j S ^ 2 T .
Define the post-selection shrinkage estimator for S ^ 1 as
β ^ S ^ 1 SE = β ^ S ^ 1 WR s ^ 2 2 T ^ n β ^ S ^ 1 WR β ^ S ^ 1 RE ,
where s ^ 2 = S ^ 2 , and β ^ S ^ 1 RE is the restricted estimator obtained by maximizing the partial log-likelihood (3) over the set S ^ 1 . The term T ^ n is given by
T ^ n = β ^ S ^ 2 WR T X S ^ 2 T M S ^ 1 X S ^ 2 1 β ^ S ^ 2 WR , M S ^ 1 = I n X S ^ 1 Σ ^ S ^ 1 1 X S ^ 1 T ,
using a generalized inverse if Σ ^ S ^ 1 is singular.
To avoid over-shrinking when β ^ S ^ 1 WR and β ^ S ^ 1 SE have different signs, we define a positive shrinkage estimator via the convex combination
β ^ S ^ 1 PSE = β ^ S ^ 1 WR s ^ 2 2 T ^ n 1 β ^ S ^ 1 WR β ^ S ^ 1 RE .
This modification is essential to prevent an overly aggressive shrinkage that might reverse the sign of estimates in β ^ S ^ 1 WR .

3. Asymptotic Properties

In this section, we study the asymptotic properties of the the post-selection shrinkage estimators for the Cox regression model. To investigate the asymptotic theory, we need the following regularity conditions to be met.
(B1) 
p = exp ( O ( n α ) ) for some 0 < α < 1 .
(B2) 
ϱ 1 = O ( n η ) , where τ < η 1 for τ in (A2).
(B3) 
The existence of a positive definite matrix Σ n such that lim n Σ n = Σ , where the eigenvalues of Σ satisfy 0 < κ 1 < λ min ( Σ ) λ max ( Σ ) < κ 2 < .
(B4) 
Sparse Riesz condition: For the random design matrix X , any S M with | S | = q , q p , and any vector v R q , there exists 0 < c * < c * < such that c * | | X S T v | | 2 2 / | | v | | 2 2 c * holds with probability tending to 1.
The following theorems will make it easier to compute the asympotic distributional bias (ADB) and asympotic distributional risk (ADR) of the proposed estimators:
Theorem 1.
Suppose that assumptions (A1)–(A3) and (B1)(B4) hold. If we choose r n = c 2 a n 2 ( log log n ) 3 log ( n p ) for some constant c 2 > 0 and a n defined in (10) with ν < ( η α τ ) / 3 , then, S ^ 2 in (9) satisfies
lim n P ( S ^ 2 = S 2 | S ^ 1 = S 1 ) = 1 ,
where τ , η , and α are defined in (A2), (B1), and (B2), respectively.
Theorem 2.
Let s n 2 = d n T Σ n 1 d n for any ( p 1 + p 2 ) × 1 vector d n satisfying | | d n | | 2 2 1 . Suppose assumptions (B1)–(B4) hold. Consider a sparse Cox model with a signal strength under (A1)–(A3), and with 0 < τ < 1 / 2 . Suppose a pre-selected model such as S 1 S ^ 1 S 1 S 2 is obtained with probability 1. If we choose r n in Theorem 1 with ν < { ( η α τ ) / , 1 / 4 τ / 2 } , then, we have the asymptotic normality,
n 1 / 2 s n 1 d n T ( β ^ S null c W R β S null c ) D N ( 0 , 1 ) .

Asymptotic Distributional Bias and Risk Analysis

In order to compare the estimators, we use the asymptotic distributional bias (ADB) and the asymptotic risk (ADR) expressions of the proposed estimators.
Definition 1.
For any estimator β 1 n and p 1 -dimensional vector d 1 n , satisfying | | d 1 n | | 2 2 1 , the ADB and ADR of d 1 n T β 1 n , respectively, are defined as
A D B ( d 1 n T β 1 n ) = lim n E [ { n 1 / 2 s 1 n 1 d 1 n T ( β 1 n β 1 ) } ] ,
A D R ( d 1 n T β 1 n ) = lim n E [ { n 1 / 2 s 1 n 1 d 1 n T ( β 1 n β 1 ) } 2 ] ,
where s 1 n 2 = d 1 n T Σ S 1 | S 2 1 d 1 n . Let δ = ( δ 1 , , δ p 2 ) T R p 2 and
Δ d 1 n = d 1 n T ( Σ S 1 1 Σ S 1 S 2 δ δ T Σ S 2 S 1 Σ S 1 1 ) d 1 n d 1 n T ( Σ S 1 1 Σ S 1 S 2 Σ S 2 | S 1 1 Σ S 2 S 1 Σ S 1 1 ) d 1 n .
We have the following theorems on the expression of ADBs and ADRs of the post-selection estimators.
Theorem 3.
Let d 1 n be any p 1 -dimensional vector satisfying 0 < | | d 1 n | | 2 2 1 and s 1 n 2 = d 1 n T Σ S 1 | S 2 1 d 1 n . Under the assumptions (A1)–(A3), we have
A D B ( d 1 n T β ^ 1 n W R ) = 0 ,
A D B ( d 1 n T β ^ 1 n R E ) = s 1 1 d 2 T β 2 ,
A D B ( d 1 n T β ^ 1 n S E ) = ( p 2 2 ) s 1 1 d 2 T β 2 * E [ χ p 2 2 ( Δ d 2 ) ] ,
A D B ( d 1 n T β ^ 1 n P S E ) = s 1 1 d 2 T β 2 * [ ( p 2 2 ) E [ χ p 2 2 ( Δ d 2 ) ] + E χ p 2 2 ( Δ d 2 ) I ( χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) ) H p 2 p 2 2 ; Δ d 2 ] ,
where d 2 n = Σ S 2 S 1 Σ S 1 1 d 1 n and E [ χ p 2 2 j ( Δ d 2 ) ] = 0 x 2 j d H p 2 ( x ; Δ d 2 ) .
See the Appendix A for a detailed proof.
Theorem 4.
Under the assumptions of Theorem 2, except (A2) is replaced by β j = δ / n , for j S 2 , with | δ j | < δ max , for some δ max > 0 , we have
A D R ( d 1 n T β ^ 1 n W R ) = 1 ,
A D R ( d 1 n T β ^ 1 n R E ) = 1 + ( 1 c ) 1 / 2 [ 2 + ( 1 c ) 1 / 2 ( 1 + 2 Δ 1 ) ] ,
A D R ( d 1 n T β ^ 1 n S E ) = 1 + ( 1 c ) 1 / 2 ( p 2 2 ) [ ( 1 c ) 1 / 2 ( p 2 2 ) { E [ χ p 2 + 2 4 ( Δ d 2 ) ] + ( s 2 1 d 2 T β 2 ) 2 E [ χ p 2 4 ( Δ d 2 ) ] } + 2 E [ χ p 2 + 2 2 ( Δ d 2 ) ] ] ,
A D R ( d 1 n T β ^ 1 n P S E ) = 1 + ( 1 c ) ( p 2 2 ) 2 { E [ χ p 2 + 2 4 ( Δ d 2 ) ] + ( s 2 1 d 2 T β 2 ) 2 E [ χ p 2 4 ( Δ d 2 ) ] + E [ χ p 2 + 2 4 ( Δ d 2 ) I ( χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 n 2 ) ) ] } + 2 ( 1 c ) 1 / 2 ( p 2 2 ) { E [ χ p 2 + 2 2 ( Δ d 2 ) ] + E [ χ p 2 + 2 2 ( Δ d 2 ) I ( χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) ) ] ( p 2 2 ) E [ χ p 2 + 2 4 ( Δ d 2 ) I ( χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) ) ] ( 1 c ) 1 / 2 × [ E [ χ p 2 + 2 2 ( Δ d 2 ) I ( χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) ) ] + ( s 2 1 d 2 T β 2 * ) 2 E [ χ p 2 2 ( Δ d 2 ) I ( χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) ) ] ] } , + ( 1 c ) 1 / 2 [ ( 1 c ) 1 / 2 E [ χ p 2 + 2 2 ( Δ d 2 ) ] + ( s 2 1 d 2 T β 2 * ) 2 H p 2 ( p 2 2 ; Δ d 2 ) + 2 H p 2 ( p 2 2 ; Δ d 2 ) ( p 2 2 ) E [ χ p 2 + 2 2 ( Δ d 2 ) I ( χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) ) ] ] ,
where c = lim n d 1 n T Σ S 1 1 d 1 n / ( d 1 n T Σ S 11.2 1 d 1 n ) 1 and s 2 n 2 = d 2 n T Σ S 22.1 1 d 2 n .
It can be observed that the theoretical results are different from Theorem 3 of [19]. Ref. [19] considered the ADR of PSE estimations for the linear model. In contrast, our Theorems 3 and 4 are used for the PSE with the Cox proportional hazards model, which are feasible estimations. From Theorem 4, we can compare the ADRs of the estimators.
Corollary 1.
Under the assumptions in Theorem 4, we have
  • If | | δ | | 2 2 1 , then A D R ( d 1 n T β ^ 1 n P S E ) A D R ( d 1 n T β ^ 1 n S E ) A D R ( d 1 n T β ^ 1 n W R ) ;
  • If | | δ | | 2 2 = o ( 1 ) and p 2 , then A D R ( d 1 n T β ^ 1 n R E ) < A D R ( d 1 n T β ^ 1 n P S E ) A D R ( d 1 n T β ^ 1 n W R ) for δ = 0 .
Corollary 1 shows that the performance of the post-selection PSE is closely related to the RE. On the ond hand, if s ^ 1 S 1 S 2 and ( S 1 S 2 ) S ^ 1 c are large, then the post-selection PSE tends to dominate the RE. Further, if a variable selection method generates the right submodel and | | δ | | 2 2 = o ( 1 ) , that is, lim n S ^ 1 = S 1 S 2 , then, a post-selection likelihood estimator β ^ 1 n R E is the most efficient one compared with all other post-selection estimators.
Remark 1.
The simultaneous variable selection and parameter estimation may not lead to a good estimation strategy when weak signals co-exist with zero signals. Even though the selected candidate subset models can be provided by some existing variable selection techniques when p > n , the prediction performance can be improved by the post-selection shrinkage strategy, especially when an under-fitted subset model is selected by an aggressive variable selection procedure.

4. Simulation Study

In this section, we present a simulation study designed to compare the quadratic risk performance of the proposed estimators under the Cox regression model. Each row of the design matrix X is generated i.i.d. from a N ( 0 , Σ ) distribution, where Σ follows an autoregressive covariance structure, as follows:
Σ j j = 0 . 5 | j j | , 1 j , j p .
In this setup, we consider the following true regression coefficients:
β = 8 , 9 , 10 S 1 , 1 , 0.8 , 0.5 , 0.2 , , 0.2 p 2 p 1 S 2 , 0 , 0 , 0 , , 0 p p 1 p 2 T ,
where the subsets S 1 and S 2 correspond to strong and weak signals, respectively. The true survival times Y are generated from an exponential distribution with parameter X β . Censoring times are drawn from a Uniform ( 0 , c ) distribution, where c is chosen to achieve the desired censoring rate. We consider censoring rates of 15 % and 25 % , and we explore sample sizes n = 100 , 300 , 400 .
We compare the performance of our proposed estimators against two well-known penalized likelihood methods, namely, LASSO and Elastic Net (ENet). We employ the R package glmnet to fit these penalized methods and choose the tuning parameters via cross-validation. For each combination of n and p, we run 1000 Monte Carlo simulations. Let β 1 n denote either β ^ 1 n PSE or β ^ 1 n RE after variable selection. We assess the performance using the relative mean squared error (RMSE) with respect to β ^ 1 n WR as follows:
RMSE β 1 n = E β ^ 1 n WR β 2 2 E β 1 n β 2 2 .
An RMSE ( β 1 n ) > 1 indicates that β 1 n outperforms β ^ 1 n WR , and a larger RMSE signifies a stronger degree of superiority over β ^ 1 n WR .
Table 1 presents the relative mean squared error (RMSE) values for different regression methods—LASSO and Elastic Net (ENet)—under varying sample sizes (n), number of predictors (p), and censoring percentages (15% and 25%). The RMSE values are averaged over 1000 simulation runs. The table compares three estimators, β ^ S 1 P L E , β ^ S 1 R E , and β ^ S 1 P S E , providing insight into their performance under different settings.
Figure 1 and Figure 2 visualize the RMSE trends for different values of p when comparing LASSO (Figure 1) and ENet (Figure 2) against the proposed estimators (RE and PSE). The plots indicate how RMSE varies as p increases for different sample sizes (n) and censoring levels.

Key Observations and Insights

  • Superior performance of post-selection estimators: Across all combinations of n and p, the post-selection estimators ( β ^ S 1 R E and β ^ S 1 P S E ) consistently demonstrate lower RMSEs compared to LASSO and ENet. This suggests that these estimators provide better predictive accuracy and stability.
  • Impact of censoring percentage:
    • When the censoring percentage increases from 15% to 25%, the RMSE values tend to increase across all methods, indicating the expected loss of predictive power due to increased censoring.
    • However, the post-selection estimators maintain a more stable RMSE trend, demonstrating their robustness in handling censored data.
  • Effect of increasing predictors (p):
    • As p increases, the RMSE for LASSO and ENet tends to rise, particularly under higher censoring rates.
    • This trend suggests that LASSO and ENet struggle with larger feature spaces, likely due to their tendency to aggressively shrink weaker covariates.
    • In contrast, the post-selection estimators show relatively stable RMSE behavior, indicating their ability to retain relevant information even in high-dimensional settings.
  • Impact of sample size (n) on RMSE stability:
    • Larger sample sizes (n) generally lead to lower RMSE values across all methods.
    • However, the gap between LASSO/ENet and the post-selection estimators remains consistent, reinforcing the advantage of the proposed methods even with more data.
  • Comparing LASSO and ENet:
    • ENet generally has lower RMSE values than LASSO, particularly for small sample sizes, indicating its advantage in balancing feature selection and regularization.
    • However, ENet still underperforms compared to post-selection estimators, suggesting that the additional shrinkage adjustments help mitigate underfitting issues.
To further compare the sparsity of the coefficient estimators, we also measure the False Positive Rate (FPR), as follows:
FPR β ^ = j : β ^ j 0 β j = 0 j : β j = 0 .
A higher FPR indicates that more non-informative variables are incorrectly included in the model, thereby complicating interpretation [22]. When β does not contain any zero components, the FPR is undefined. Table 2 compares the performance of LASSO and Elastic Net (ENet) in selecting variables in a high-dimensional Cox model under 15 % and 25 % censoring. As sample size (n) increases, both methods select more variables, but false positive rates (FPR) also rise, especially for ENet. LASSO is more conservative, selecting fewer variables with a lower FPR, while ENet selects more but at the cost of higher false discoveries. Higher censoring ( 25 % ) slightly increases FPR, reducing selection accuracy. Overall, LASSO offers better false positive control, whereas ENet captures more variables but with increased risk of selecting irrelevant ones.

5. Real Data Example

In this section, we illustrate the practical utility of our proposed methodology on two different high-dimensional datasets.

5.1. Example 1

We first apply our method to a gene expression dataset comprising n = 614 breast cancer patients, each with p = 1490 genes. All patients received anthracycline-based chemotherapy. Among these 614 individuals, there were 134 ( 21 % ) censored observations, and the mean time to treatment response was approximately 2.98 years. Using biological pathways to identify important genes, Ref. [23] previously selected 29 genes and reported a maximum area under the receiver operating characteristic curve (AUC) of about 62 % . This relatively low AUC suggests limited predictive power when only using these 29 genes.
To improve upon these findings, we begin by performing an initial noise-reduction procedure on the data. This step helps remove potential outliers and irrelevant features, thereby enhancing the quality of the subsequent variable selection and estimation processes. We applied LASSO and Elastic Net (ENet) for gene selection. The results show that LASSO selected 14 genes, whereas ENet selected 12 genes. We then applied the proposed post-selection shrinkage estimators introduced in Section 2 to evaluate their performance compared to standard methods such as LASSO and Elastic Net. Table 3 shows the estimated coefficients from different estimators, along with the AUC at the bottom. It is evident that the PSE estimate has slightly improved the prediction performance.

5.2. Example 2

We now consider the diffuse large B-cell lymphoma (DLBCL) dataset of [24], which is also high-dimensional. This dataset was used as a primary example to illustrate the effectiveness of our proposed dimension-reduction method. It consists of measurements on 7399 genes obtained from 240 patients via customized cDNA microarrays (lymphochip). Each patient’s survival time was recorded, ranging from 0 to 21.8 years; 127 patients had died (uncensored) and 95 were alive (censored) at the end of the study. Additional details on the dataset can be found in [24].
To obtain the post-selection shrinkage estimators, we first selected candidate subsets using two variable selection approaches—LASSO and Elastic Net (ENet). All tuning parameters were chosen via 10-fold cross-validation. Table 4 shows the estimated coefficients from both LASSO and ENet for the setting p = 6800 . The AUC results indicate that β ^ S ^ 1 PSE generally outperforms β ^ S ^ 1 RE and β ^ S ^ 1 PLE for both LASSO and ENet procedures. Notably, the ENet-based estimators appear more robust than those obtained via LASSO, underscoring the value of combining L 1 and L 2 penalties in high-dimensional survival analysis.

6. Conclusions

In this paper, we proposed high-dimensional post-selection shrinkage estimators for Cox’s proportional hazards models based on the work of [19]. We investigated the asymptotic risk properties of these estimators in relation to the risks of the subset candidate model, as well as the LASSO and ENet estimators. Our results indicate that the new estimators perform particularly well when the true model contains weak signals. The proposed strategy is also conceptually intuitive and computationally straightforward to implement.
Our theoretical analysis and simulation studies demonstrate that the post-selection shrinkage estimator exhibits superior performance relative to LASSO and ENet, in part because it mitigates the loss of efficiency often associated with variable selection. As a powerful tool for producing interpretable models, sparse modeling via penalized regularization has become increasingly popular for high-dimensional data analysis. Our post-selection shrinkage estimator preserves model interpretability while enhancing predictive accuracy compared to existing penalized regression techniques. Furthermore, two real-data examples illustrate the practical advantages of our method, confirming that its performance is robust and potentially valuable for a range of high-dimensional applications.

Author Contributions

Conceptualization, R.A.B. and S.E.A.; methodology, R.A.B., S.E.A. and A.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and the Engineering Research Council of Canada (NSERC).

Data Availability Statement

All data that are used in this study is publicly available.

Acknowledgments

The research is supported by the Natural Sciences and the Engineering Research Council of Canada (NSERC).

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

SymbolDescription
General Notation
nSample size (number of observations)
pNumber of covariates (predictor variables)
R Set of real numbers
P Probability measure
E Expectation operator
I ( · ) Indicator function
Regression and Estimators
β Regression coefficient vector
β ^ Estimated regression coefficients
λ Regularization parameter (for LASSO/ENet)
S ^ 1 Selected subset of variables
d 1 n p 1 -dimensional vector in the selection model
β 1 n Selected regression coefficient estimator
β ^ W R Weighted Ridge (WR) estimator
Survival Analysis Notation
L ( β ) Cox proportional hazards likelihood function
D Dataset containing observations
XCovariate matrix
YResponse variable (time-to-event outcome)
h ( t ) Hazard function at time t
h ^ ( t ) Estimated hazard function
Λ ( t ) Cumulative hazard function
Evaluation Metrics
RMSERoot Mean Squared Error
FPRFalse Positive Rate
AUCArea Under the Curve (for classification models)
Methods and Models
LASSOLeast Absolute Shrinkage and Selection Operator
ENetElastic Net
Cox-PHCox Proportional Hazards Model
WRWeighted Ridge estimator
PSEPost-selection Shrinkage Estimator
RERestricted Estimator

Appendix A. Proofs

The technical proofs of Theorems 3 and 4 are included in this section.
(Proof of Theorem 3). 
Here, we provide the proof of the ADB expressions of the proposed estimators. Based on Theorem 2, it is clear that
lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 = E lim n n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 = E [ Z ] = 0 ,
where Z N ( 0 , 1 ) . Then,
A D B ( d 1 n T β ^ 1 n R E ) = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n R E β 1 = lim n E n 1 / 2 s 1 n 1 d 1 n T ( β ^ 1 n W R β 1 ) ( β ^ 1 n W R β ^ 1 n R E ) = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E = A D B ( d 1 n T β ^ 1 n W R ) lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E = lim n E n 1 / 2 s 1 n 1 d 2 n T β ^ 2 n W R = lim n ( s 2 n / s 1 n ) E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R = ( s 2 / s 1 ) s 2 1 d 2 T β 2 = s 1 1 d 2 T β 2
where d 2 n = Σ S 2 S 1 Σ S 1 1 d 1 n , d 1 n T ( β ^ 1 n W R β ^ 1 n R E ) = d 1 T Σ S 1 1 Σ S 2 S 1 β ^ 2 n W R = d 2 n T β ^ 2 n W R and s 2 n 2 = d 2 T Σ S 2 | S 1 1 d 2 n .
Now, we compute the ADB of β ^ 1 n S E as follows
A D B ( d 1 n T β ^ 1 n S E ) = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n S E β 1 = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R [ ( p 2 n 2 ) T ^ n 1 ] ( β ^ 1 n W R β ^ 1 n R E ) β 1 = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 lim n ( p 2 n 2 ) E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 R E T ^ n 1 = E [ Z ] ( p 2 2 ) E lim n n 1 / 2 s 1 n 1 d 1 n T ( β ^ 1 n W R β ^ 1 n R E ) T n 1 = ( p 2 2 ) E lim n n 1 / 2 s 1 n 1 d 2 n T β ^ 2 n W R T n 1 = ( p 2 2 ) ( s 2 / s 1 ) E lim n n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R T n 1 = ( p 2 2 ) s 1 1 d 2 T β 2 E χ p 2 2 ( Δ d 2 ) .
Finally, we obtain the ADB of β ^ 1 n P S E ,
A D B ( d 1 n T β ^ 1 n P S E ) = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n P S E β 1 = lim n E [ n 1 / 2 s 1 n 1 d 1 n T { β ^ 1 n S E + [ 1 ( p 2 n 2 ) T ^ n 1 ] ( β ^ 1 n W R β ^ 1 n R E ) × I ( T ^ n < ( p 2 n 2 ) ) β 1 } ] = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n S E β 1 + lim n E n 1 / 2 s 1 n 1 d 1 n T [ 1 ( p 2 n 2 ) T ^ n 1 ] β ^ 1 n W R β ^ 1 n R E I T ^ n < ( p 2 n 2 ) + E lim n n 1 / 2 s 1 n 1 d 1 n T ( β ^ 1 n S E β 1 ) + E lim n n 1 / 2 s 1 n 1 d 1 n T ( β ^ 1 n W R β ^ 1 n R E ) I T ^ n < ( p 2 n 2 ) ( p 2 2 ) E lim n n 1 / 2 s 1 n 1 d 1 n T ( β ^ 1 n W R β ^ 1 n R E ) T n 1 I T ^ n < ( p 2 n 2 ) = A D B ( d 1 n T β ^ 1 n S E ) E lim n n 1 / 2 s 1 n 1 d 2 n T β ^ 2 n W R I T ^ n < ( p 2 n 2 ) + ( p 2 2 ) E lim n n 1 / 2 s 1 n 1 d 2 n T β ^ 2 n W R T ^ n 1 I T ^ n < ( p 2 n 2 ) = A D B ( d 1 n T β ^ 1 n S E ) ( s 2 / s 1 ) E Z I χ P 2 2 ( Δ d 2 ) < ( p 2 2 ) s 1 1 d 2 T β 2 H p 2 ( p 2 2 ; Δ d 2 ) + ( p 2 2 ) ( s 2 / s 1 ) E Z χ p 2 2 ( Δ d 2 ) I χ P 2 2 ( Δ d 2 ) < ( p 2 2 ) + ( p 2 2 ) s 1 1 d 2 T β 2 E χ p 2 2 ( Δ d 2 ) I χ P 2 2 ( Δ d 2 ) < ( p 2 2 ) = A D B ( d 1 n T β ^ 1 n S E ) s 1 1 d 2 T β 2 H p 2 ( p 2 2 ; Δ d 2 ) + ( p 2 2 ) s 1 1 d 2 T β 2 E χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) = s 1 1 d 2 T β 2 [ ( p 2 2 ) { E χ p 2 2 ( Δ d 2 ) ] + E χ p 2 2 ( Δ d 2 ) I χ P 2 2 ( Δ d 2 ) < ( p 2 2 ) H p 2 ( p 2 2 ; Δ d 2 ) ] .
Proof of Theorem 4. 
We provide the proof of the ADR expressions of the proposed estimators. It is clear that
lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 2 = E lim n n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 2 = E [ Z 2 ] = 1 ,
where Z N ( 0 , 1 ) . Then,
A D R ( d 1 n T β ^ 1 n R E ) = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n R E β 1 2 = lim n s 1 n 2 E n 1 / 2 d 1 n T β ^ 1 n W R β 1 β ^ 1 n W R β ^ 1 n R E 2 = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 2 + lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E 2 2 lim n E n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E ) ( β ^ 1 n W R β 1 T d 1 n = I 1 + I 2 + I 3 .
From (23), we have I 1 = lim n E n 1 / 2 s 1 n 1 d 1 n T ( β ^ 1 n W R β 1 ) 2 = 1 . Furthermore,
I 2 = lim n s 1 n 2 E n 1 / 2 d 1 n T β ^ 1 n W R β ^ 1 n R E 2 = lim n ( s 2 n 2 / s 1 n 2 ) E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R 2 .
Since s 2 n 2 / s 1 n 2 1 c , then,
I 2 = ( 1 c ) lim n E χ 1 2 ( Δ d 2 n ) = ( 1 c ) ( 1 + 2 Δ d 2 ) .
Furthermore,
I 3 = 2 lim n E n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n W R β 1 T d 1 n = 2 lim n ( s 2 n / s 1 n ) E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R n 1 / 2 s 1 n 1 β ^ 1 n W R β 1 T d 1 n = 2 ( 1 c ) 1 / 2 .
Now, we investigate (25). By using Equation (17), we have
A D R ( d 1 n T β ^ 1 n S E ) = lim n E [ n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n S E β 1 ) 2 = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 [ ( p 2 n 2 ) / T ^ n ] β ^ 1 n W R β ^ 1 n R E 2 = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 2 + lim n E n 1 / 2 s 1 n 1 ( p 2 n 2 ) T n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E 2 2 lim n E n s 1 n 2 ( p 2 n 2 ) T ^ n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n W R β 1 T d 1 n = J 1 + J 2 + J 3 .
Again, J 1 = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 2 = 1 . Then, we have
J 2 = lim n E n 1 / 2 s 1 n 1 ( p 2 n 2 ) T ^ n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E 2 = lim n ( p 2 n 1 ) 2 E n 1 / 2 s 1 n 1 d 2 n T β ^ 2 n W R T n 1 2 = ( s 2 2 / s 1 2 ) ( p 2 1 ) 2 E lim n n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R T ^ n 1 2 = ( s 2 2 / s 1 2 ) ( p 2 1 ) 2 E [ Z 2 χ p 2 4 ( Δ d 2 ) ] + ( s 2 1 d 2 T β 2 ) 2 E [ χ p 2 4 ( Δ d 2 ) ] = ( 1 c ) ( p 2 2 ) 2 E [ χ p 2 + 2 4 ( Δ d 2 ) ] + ( s 2 1 d 2 T β 2 ) 2 E [ χ p 2 4 ( Δ d 2 ) ] ,
and
J 3 = 2 lim n E n s 1 n 2 ( p 2 n 2 ) T ^ n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n W R β 1 T d 1 n = 2 lim n ( s 2 n / s 1 n ) ( p 2 n 2 ) E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R s 1 n 1 ( β ^ 1 n W R β 1 ) T T ^ n 1 = 2 ( 1 c ) 1 / 2 ( p 2 2 ) E Z 2 χ p 2 2 ( Δ d 2 ) + s 2 1 d 2 T β 2 E Z χ p 2 2 ( Δ d 2 ) = 2 ( 1 c ) 1 / 2 ( p 2 2 ) E χ p 2 + 2 2 ( Δ d 2 ) .
Finally, we compute the ADR of β ^ 1 n P S E as follows:
A D R ( d 1 n T β ^ 1 n P S E ) = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n P S E β 1 2 = lim n E [ { n 1 / 2 s 1 n 1 d 1 n T [ β ^ 1 n S E β 1 + 1 ( p 2 n 2 ) T ^ n 1 β ^ 1 n W R β ^ 1 n R E I T ^ n < ( p 2 n 2 ) ] } 2 ] = lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n S E β 1 2 + lim n E n 1 / 2 s 1 n 1 d 1 n T 1 ( p 2 n 2 ) T ^ n 1 β ^ 1 n W R β ^ 1 n R E I T ^ n < ( p 2 n 2 ) 2 + 2 lim n E [ n 1 / 2 s 1 n 2 d 1 n T 1 ( p 2 n 2 ) T ^ n 1 β ^ 1 n W R β ^ 1 n R E β ^ 1 n S E β 1 T × I T ^ n < ( p 2 n 2 ) d 1 n ] = A D R ( d 1 n T β ^ 1 n S E ) + lim n E n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E I T ^ n < ( p 2 n 2 ) 2 + lim n ( p 2 n 2 ) 2 E [ n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E ) T ^ n 1 I T ^ n < ( p 2 n 2 ) 2 2 lim n ( p 2 n 2 ) E n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E 2 T ^ n 1 I T ^ n < ( p 2 n 2 ) d 1 n + 2 lim n E n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n S E β 1 T I T ^ n < ( p 2 n 2 ) d 1 n 2 lim n ( p 2 n 2 ) E [ { n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n S E β 1 T × T ^ n 1 I T ^ n < ( p 2 n 2 ) d 1 n } ] = A D R ( d 1 n T β ^ 1 n S E ) + K 1 + K 2 + K 3 + K 4 + K 5 ,
where
K 1 = lim n E [ n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E ) I T ^ n < ( p 2 n 2 ) 2 = lim n E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R I T ^ n < ( p 2 n 2 ) 2 = lim n ( s 2 n / s 1 n ) 2 E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R I T ^ n < ( p 2 n 2 ) 2 = ( s 2 / s 1 ) 2 E Z 2 I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) + ( s 2 1 d 2 T β 2 ) 2 H p 2 ( p 2 2 ; Δ d 2 ) = ( 1 c ) E χ p 2 + 2 2 ( Δ d 2 ) + ( s 2 1 d 2 T β 2 ) 2 H p 2 ( p 2 2 ; Δ d 2 ) ,
K 2 = lim n E [ n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β ^ 1 n R E ) ( p 2 n 2 ) T ^ n 1 I T ^ n < ( p 2 n 2 ) 2 = lim n ( p 2 n 2 ) 2 ( s 1 n / s 2 n ) 2 E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R T ^ n 1 I T ^ n < ( p 2 n 2 ) 2 = ( p 2 2 ) 2 ( s 2 / s 1 ) 2 E lim n n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R T ^ n 1 I T ^ n < ( p 2 n 2 ) 2 = ( p 2 2 ) 2 ( 1 c ) E Z 2 χ p 2 4 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) = ( p 2 2 ) 2 ( 1 c ) E χ p 2 + 2 4 ( Δ d 2 ) I χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) ,
K 3 = 2 lim n ( p 2 n 2 ) E [ n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E ) 2 T ^ n 1 I T ^ n < ( p 2 n 2 ) d 1 n = 2 ( p 2 2 ) ( s 2 / s 1 ) 2 E [ lim n n 1 / 2 s 2 n 1 d 1 n T β ^ 2 n W R 2 T ^ n 1 I T ^ n < ( p 2 n 2 ) ] = 2 ( p 2 2 ) ( 1 c ) { E Z 2 χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) + ( s 2 1 d 2 T β 2 ) 2 E χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) } = 2 ( p 2 2 ) ( 1 c ) { E χ p 2 + 2 2 ( Δ d 2 ) I χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) + ( s 2 1 d 2 T β 2 ) 2 E χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) } ,
K 4 = 2 lim n E n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n S E β 1 T I T ^ n < ( p 2 n 2 ) d 1 n = 2 lim n ( s 2 n / s 1 n ) E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n S E β 1 T I T ^ n < ( p 2 n 2 ) = 2 ( s 2 / s 1 ) E [ n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R { n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 ( p 2 n 2 ) T ^ n 1 I T ^ n < ( p 2 n 2 ) ) T I T ^ n < ( p 2 n 2 ) ] = 2 ( 1 c ) 1 / 2 { E lim n n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 I T ^ n < ( p 2 n 2 ) T ( p 2 2 ) E n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 T ^ n 1 I T ^ n < ( p 2 n 2 ) T } = 2 ( 1 c ) 1 / 2 E Z 2 I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) ( p 2 2 ) E Z 2 χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) = 2 ( 1 c ) 1 / 2 H p 2 + 2 ( p 2 2 ; Δ d 2 ) ( p 2 2 ) E χ p 2 + 2 2 ( Δ d 2 ) I χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 )
and
K 5 = 2 lim n ( p 2 n 2 ) E n s 1 n 2 d 1 n T β ^ 1 n W R β ^ 1 n R E β ^ 1 n S E β 1 T T ^ n 1 I T ^ n < ( p 2 n 2 ) d 1 n = 2 ( p 2 2 ) ( s 2 / s 1 ) E [ lim n n 1 / 2 s 2 n 1 d 2 n T β ^ 2 n W R T ^ n 1 I T ^ n < ( p 2 n 2 ) × n 1 / 2 s 1 n 1 d 1 n T β ^ 1 n W R β 1 ( p 2 n 2 ) β ^ 1 n W R β ^ 1 n R E T ^ n 1 I T ^ n < ( p 2 n 2 ) T ] = 2 ( p 2 2 ) ( s 2 / s 1 ) { E Z 2 χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) ( p 2 2 ) E Z 2 χ p 2 2 ( Δ d 2 ) I χ p 2 2 ( Δ d 2 ) < ( p 2 2 ) } = 2 ( p 2 2 ) ( 1 c ) 1 / 2 { E χ p 2 + 2 2 ( Δ d 2 ) I χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) ( p 2 2 ) E χ p 2 + 2 4 ( Δ d 2 ) I χ p 2 + 2 2 ( Δ d 2 ) < ( p 2 2 ) } .

References

  1. Ahmed, S.E. Big and Complex Data Analysis: Methodologies and Applications; Springer: Cham, Switzerland, 2017. [Google Scholar]
  2. Bradic, J.; Fan, J.; Jiang, J. Regularization for Cox’s proportional hazards model with np-dimensionality. Ann. Stat. 2011, 39, 3092–3120. [Google Scholar] [CrossRef] [PubMed]
  3. Bradic, J.; Song, R. Structured estimation for the nonparametric Cox model. Electron. J. Stat. 2015, 9, 492–534. [Google Scholar] [CrossRef]
  4. Gui, J.; Li, H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings with applications to microarray gene expression data. Bioinformatics 2005, 21, 3001–3008. [Google Scholar] [CrossRef]
  5. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  6. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  7. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  8. Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  9. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
  10. Sun, T.; Zhang, C.H. Scaled sparse linear regression. Biometrika 2012, 99, 879–898. [Google Scholar] [CrossRef]
  11. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
  12. Zhang, H.; Lu, W. Adaptive lasso for Cox’s proportional hazards model. Biometrika 2007, 94, 691–703. [Google Scholar] [CrossRef]
  13. Zou, H. A note on path-based variable selection in the penalized proportional hazards model. Biometrika 2008, 95, 241–247. [Google Scholar] [CrossRef]
  14. Fan, J.; Li, R. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 2002, 6, 74–99. [Google Scholar] [CrossRef]
  15. Hong, H.; Li, Y. Feature selection of ultrahigh-dimensional covariates with survival outcomes: A selective review. Appl. Math. Ser. B 2017, 32, 379–396. [Google Scholar] [CrossRef] [PubMed]
  16. Hong, H.; Zheng, Q.; Li, Y. Forward regression for Cox models with high-dimensional covariates. J. Multivar. Anal. 2019, 173, 268–290. [Google Scholar] [CrossRef]
  17. Hong, H.; Chen, X.; Kang, J.; Li, Y. The Lq-norm learning for ultrahigh-dimensional survival data: An integrative framework. Stat. Sin. 2020, 30, 1213–1233. [Google Scholar] [CrossRef]
  18. Ahmed, S.E.; Ahmed, F.; Yüzbaşı, B. Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data; Chapman and Hall/CRC: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
  19. Gao, X.; Ahmed, S.E.; Feng, Y. Post selection shrinkage estimation for high-dimensional data analysis. Appl. Stoch. Models Bus. Ind. 2017, 33, 97–120. [Google Scholar] [CrossRef]
  20. Cox, D.R. Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
  21. Buhlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  22. Kurnaz, S.F.; Hoffmann, I.; Filzmoser, P. Robust and sparse estimation methods for high-dimensional linear and logistic regression. J. Chemom. Intell. Lab. Syst. 2018, 172, 211–222. [Google Scholar] [CrossRef]
  23. Belhechmi, S.; Bin, R.D.; Rotolo, F.; Michiels, S. Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models. BMC Bioinform. 2020, 21, 277. [Google Scholar] [CrossRef] [PubMed]
  24. Rosenwald, A.; Wright, G.; Chan, W.C.; Connors, J.M.; Campo, E.; Fisher, R.I.; Gascoyne, R.D.; Muller-Hermelink, H.K.; Smel, E.B.; Giltnane, J.M.; et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 2002, 25, 1937–1947. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Relative mean squared error (RMSE) of the proposed estimators compared to LASSO for different n and p.
Figure 1. Relative mean squared error (RMSE) of the proposed estimators compared to LASSO for different n and p.
Entropy 27 00254 g001
Figure 2. Relative mean squared error (RMSE) of the proposed estimators compared to Elastic Net for different n and p.
Figure 2. Relative mean squared error (RMSE) of the proposed estimators compared to Elastic Net for different n and p.
Entropy 27 00254 g002
Table 1. Simulated relative mean squared error (RMSE) across different values of p and n, averaged over N = 1000 simulation runs.
Table 1. Simulated relative mean squared error (RMSE) across different values of p and n, averaged over N = 1000 simulation runs.
Censoring Percentage
15%25%
n p Method β ^ S ^ 1 PLE β ^ S ^ 1 RE β ^ S ^ 1 PSE β ^ S ^ 1 PLE β ^ S ^ 1 RE β ^ S ^ 1 PSE
100300LASSO1.041.661.191.081.961.23
ENet1.071.451.230.921.441.06
400LASSO1.031.101.600.961.031.98
ENet0.901.001.450.890.981.36
500LASSO1.081.131.660.981.051.37
ENet0.961.011.030.951.001.22
300300LASSO0.851.600.980.921.641.06
ENet0.961.371.080.871.461.00
350LASSO0.830.991.010.901.071.17
ENet0.850.991.520.871.021.56
400LASSO0.901.031.250.810.951.73
ENet0.901.041.250.760.891.41
400400LASSO0.991.521.120.821.500.94
ENet0.911.291.020.831.260.99
450LASSO0.831.001.130.840.941.61
ENet0.921.051.460.850.991.79
500LASSO0.890.931.830.810.931.90
ENet0.820.931.380.820.951.75
Table 2. Average number of selected predictors ( S ^ 1 ) and false positive rate (FPR) across different values of n and p, averaged over N = 1000 simulation runs.
Table 2. Average number of selected predictors ( S ^ 1 ) and false positive rate (FPR) across different values of n and p, averaged over N = 1000 simulation runs.
Censoring Percentage
% 15 % 25
n p MethodAverage  S ^ 1 FPRAverage  S ^ 1 FPR
100300LASSO6.10.0636.40.056
ENet6.20.0636.60.052
400LASSO4.90.0725.20.085
ENet5.10.0724.80.075
500LASSO5.60.03912.60.043
ENet4.90.0394.00.033
300300LASSO13.40.20913.80.223
ENet12.90.20916.30.282
350LASSO15.60.20215.80.208
ENet15.70.20222.60.279
400LASSO14.50.13713.70.155
ENet13.50.13714.20.173
400400LASSO14.10.16315.80.171
ENet14.20.16320.40.212
450LASSO18.40.21723.50.24
ENet19.10.21730.10.263
500LASSO13.60.15013.30.158
ENet13.30.15013.60.158
Table 3. Estimated coefficients using the LASSO and ENet method for example 1.
Table 3. Estimated coefficients using the LASSO and ENet method for example 1.
LASSOENet
Gen ID β ^ S ^ 1 LASSO β ^ S ^ 1 RE β ^ S ^ 1 PSE β ^ S ^ 1 ENet β ^ S ^ 1 RE β ^ S ^ 1 PSE
18−0.020.260.21−0.030.070.20
970.010.270.000.010.260.01
1010.050.190.130.050.270.12
128−0.01
2320.04−0.42−0.280.040.20−0.25
3420.15−0.42−0.130.14−0.39−0.10
369−0.09−0.050.04−0.08−0.40−0.12
408−0.01−0.01−0.090.03
4100.03−0.26−0.150.03−0.06−0.14
445−0.00
4680.140.080.020.13−0.26−0.01
660−0.00−0.00
731−0.080.090.06−0.080.060.06
810−0.04−0.08−0.090.010.09−0.09
9070.01
934−0.00−0.00
952−0.01
961−0.05−0.05−0.080.20
1212−0.00
AUC0.620.630.650.630.640.66
Table 4. Estimated coefficients using the LASSO and ENet method for example 2.
Table 4. Estimated coefficients using the LASSO and ENet method for example 2.
LASSOENet
Gen ID β ^ S ^ 1 LASSO β ^ S ^ 1 RE β ^ S ^ 1 PSE β ^ S ^ 1 ENet β ^ S ^ 1 RE β ^ S ^ 1 PSE
950.02−0.34
1120.060.710.70−0.00−0.13−0.08
173−0.630.68
205
5511.601.691.57−0.11−0.28−0.20
1377−0.22−0.84−0.80−0.09−0.16−0.12
15260.410.670.560.02
1543−0.43−0.79−0.770.400.750.69
2003−0.111.10
20250.180.900.781.041.221.07
2439−0.01−0.14−0.12
2705−0.850.360.770.61
29730.591.230.99−0.63−1.12−0.81
32401.130.03
3598−0.22−0.59−0.540.290.550.49
38820.130.400.39−0.06−0.20−0.15
40150.340.810.76−0.08−0.13−0.12
4186−0.50−0.72−0.53
43570.09−0.59−0.70−0.65
46620.700.900.830.210.600.38
51310.540.800.710.010.010.01
5222−0.15−0.38−0.261.241.671.34
5541−0.52−0.72−0.68
55770.390.860.70−0.73−0.97−0.80
5778−0.62−0.09
58080.350.550.46
59511.292.121.70
6103−0.63−0.80−0.76
62540.250.560.48
64930.65
65100.861.090.99
AUC0.710.710.730.720.720.74
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahmed, S.E.; Arabi Belaghi, R.; Hussein, A.A. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy 2025, 27, 254. https://doi.org/10.3390/e27030254

AMA Style

Ahmed SE, Arabi Belaghi R, Hussein AA. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy. 2025; 27(3):254. https://doi.org/10.3390/e27030254

Chicago/Turabian Style

Ahmed, Syed Ejaz, Reza Arabi Belaghi, and Abdulkhadir Ahmed Hussein. 2025. "Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models" Entropy 27, no. 3: 254. https://doi.org/10.3390/e27030254

APA Style

Ahmed, S. E., Arabi Belaghi, R., & Hussein, A. A. (2025). Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy, 27(3), 254. https://doi.org/10.3390/e27030254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop