Next Article in Journal
Towards Refined Autism Screening: A Fuzzy Logic Approach with a Focus on Subtle Diagnostic Challenges
Previous Article in Journal
Last Word in Last-Mile Logistics: A Novel Hybrid Multi-Criteria Decision-Making Model for Ranking Industry 4.0 Technologies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Varying Index Coefficient Model for Tail Index Regression

School of Mathematics, Harbin Institute of Technology, Xidazhi, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(13), 2011; https://doi.org/10.3390/math12132011
Submission received: 4 June 2024 / Revised: 24 June 2024 / Accepted: 26 June 2024 / Published: 28 June 2024

Abstract

:
Investigating the causes of extreme events is crucial across various fields. However, existing asymptotic theoretical models often lack flexibility and fail to capture the complex dependency structures inherent in extreme events. Additionally, the scarcity of extreme event data and the challenge of fully nonparametric estimation with high-dimensional covariates lead to the “curse of dimensionality”, complicating the analysis of extreme events. Considering the nonlinear interactions among covariates, we propose a flexible model that combines varying index coefficient models with extreme value theory to address these issues. This approach effectively avoids the curse of dimensionality while providing robust explanatory power and high flexibility. Our model also includes a variable selection process, for which we have demonstrated the consistency of the estimators and the oracle property of the variable selection. Monte Carlo simulation results validate the finite sample properties of the estimators. Furthermore, an empirical analysis of tail risk in financial markets offers valuable insights into the drivers of risk.

1. Introduction

The occurrence of extreme events is often accompanied by significant losses, such as those caused by floods, hurricanes, and financial crises. These rare but destructive events can result in damages that are difficult to quantify precisely. Therefore, in various disciplines—including actuarial science, economics, finance, geology, ecology, meteorology, and life sciences—intensive research on extreme events is particularly crucial. A profound understanding of the mechanisms behind extreme events is essential for the effective prevention and mitigation of large-scale disasters.
The tail index (TI), a crucial metric for assessing the probability of extreme events, significantly influences the behavior of the tail of the distribution. A lower tail index indicates a higher probability of extreme events, underscoring the importance of accurately estimating the tail index in extreme value theory (EVT). Classical literature [1,2,3] extensively analyzes the theoretical properties of traditional tail index estimators and their empirical applications.
In recent years, with the expansion of practical applications, scholars have increasingly recognized the importance of estimating tail indices in the presence of covariate information. When considering covariates, assuming that the tail index of the conditional distribution of the response variable may depend on these covariates provides a more complex and realistic scenario for the estimation of conditional tail indices. This type of work focuses on the statistical inference of conditional tail indices when covariates are random variables, with typical studies such as [4,5,6].
Despite significant progress in statistical inference for extreme value indices, previous research has primarily focused on the inference of conditional tail indices, neglecting a deeper consideration of the relationship between covariates and response variables. This research orientation has led to relatively poor performance of covariates in explaining the underlying mechanisms of extreme event occurrences, limiting the interpretability and applicability of the model in practical scenarios. However, in real-world applications, gaining a profound understanding of the reasons behind extreme events is crucial for preventing such occurrences. To address this limitation, introducing covariates related to extreme events and assuming a correlation between tail indices and these covariates is a reasonable and natural choice. In the study by [7], tail index regression parameters were estimated by assuming a linear relationship between tail indices and covariates. Subsequent research [8] established asymptotic properties of the parameters, and Ref. [9] constructed confidence intervals for regression coefficients using the empirical likelihood method. Ref. [10] studied the covariates of mixing frequencies and used them for financial tail risk measurement. However, in practical applications, the relationship between tail indices and covariates can be highly complex, and a linear assumption may be considered too simplistic, making it challenging to capture the true impact of covariates [11]. To enhance model flexibility, Ref. [11] proposed a partially linear semiparametric tail index regression model based on [8], assuming a partially linear semiparametric structure between tail indices and covariates. They established a large-sample theory for the resulting estimates. Furthermore, Ref. [12] introduced a coefficient-varying model into tail indices, assuming the coefficient is an unknown function of a single variable, and studied its statistical properties. Ref. [13] subsequently studied the hypothesis testing problems associated with this model. However, due to the sparsity of extreme data, nonparametric estimation faces the challenge of the curse of dimensionality as the dimension of covariates increases. Nonparametric estimation cannot guarantee efficiency for high-dimensional covariates, making it quite difficult to describe the entire distribution. Therefore, to overcome this challenge, a more flexible modeling approach is needed, cleverly combining dimensionality reduction, variable selection, and generalized tail event modeling techniques.
Inspired by the discussions above, this study aims to explore the integration of more flexible varying coefficient models (VCMs) with extreme value analysis to derive effective estimates of tail indices with interaction effects among covariates. A key feature of VCMs is their ability to allow the coefficients of covariates to vary smoothly with other variables, thus enabling the assessment of nonlinear interactions. Notably, the varying index coefficient model (VICM) proposed by [14], which encompasses a range of commonly used semiparametric models, offers sufficient flexibility for diverse applications.
The VICM can model and assess nonlinear interaction effects between grouped covariates on the response variable, addressing situations where individual covariate effects are weak but combined effects are strong. It effectively overcomes the curse of dimensionality commonly encountered in high-dimensional nonparametric estimation while combining the advantageous characteristics of the single-index model and the coefficient-varying model, as highlighted by [15]. Moreover, it is easily interpretable in practical applications. As a highly useful semiparametric model, the coefficient-varying model includes many other important statistical models as special cases. For instance, the partially linear single-index model [16]; additive model [17]; partially linear additive model [18]; and coefficient-varying model [19], among others. Due to the numerous advantages of the VICM, it has been extensively studied in practice. For example, Ref. [20] extended it to time series data and developed the varying-index coefficient autoregressive model, while Ref. [21] extended it to the field of quantile regression, investigating its statistical properties in high-dimensional settings.
Despite significant advancements in the estimation methods and applications of coefficient-varying models in general cases, there is a notable literature gap in the context of extreme value analysis. Therefore, building upon the natural extensions proposed by [11,12], we extend the VICM to extreme value analysis, introducing a novel tail index regression model based on the VICM. This innovative approach aims to harness the power of the VICM technology to address the challenges posed by complex covariates in extreme value analysis.
Additionally, variable selection is incorporated into our model. In regression analysis, neglecting key predictor variables can lead to significant bias, while including irrelevant predictors can diminish estimation efficiency. Thus, variable selection is an indispensable aspect of modern statistical inference. Traditional methods for variable selection include strategies based on hypothesis testing and information criteria such as AIC and BIC [22]. Moreover, penalty-based techniques like L 1 [23], L 2 [24], ridge regression [25], LASSO [26], adaptive LASSO [27], and smoothly clipped absolute deviation (SCAD) [28] provide effective solutions. Variable selection in varying coefficient models (VCM) has also been extensively studied. Notable works include [29,30,31]. In particular, Ref. [30], building on the SCAD method, proposed an innovative composite penalty technique. This approach accurately identifies the true structure of single-index varying coefficient models (SIVCMs), effectively selects key variables, and precisely estimates unknown index parameters and coefficient functions. Given its desirable properties such as unbiasedness, sparsity, continuity, and the oracle property, we have integrated this advanced variable selection method into our model. This integration not only enhances the predictive accuracy of the model but also ensures the efficiency and effectiveness of variable selection.
This study innovatively combines varying index coefficient models with extreme value theory to construct a flexible framework for analyzing extreme events. This framework aims to investigate the impact of potential factors with nonlinear interactions on the probability of extreme events. During the model construction process, we incorporated a variable selection mechanism to ensure the consistency and predictive power of the model’s estimates. The broad applicability of varying index coefficient models allows this study to encompass mainstream regression-based extreme value analysis models, including the parametric approach by [8] and the semiparametric approach by [11], while effectively addressing their limitations in certain scenarios. However, constructing a comprehensive theoretical framework for this model presents significant challenges. It requires an in-depth analysis of both parametric and nonparametric estimates and the integration of more flexible tail index models. Moreover, the complex interactions between parameters and the intricacies of technical details make the study of asymptotic theory exceptionally challenging. Ref. [32] revealed the limitations of one-step spline function approximation in asymptotic distribution. Therefore, we adopted the two-step estimation strategy proposed by [33,34,35]. First, we approximate the nonparametric function using B-spline functions to obtain preliminary estimates of the parameters and the nonparametric function. Then, we update the nonparametric single-index function using the B-spline back-fitted kernel smoothing (BSBK) method, thereby establishing the asymptotic properties of the nonparametric function. To validate the model’s performance with finite samples and the effectiveness of variable selection, we conducted Monte Carlo simulations. Additionally, to demonstrate the practical application of the model, we provided a real data case study, analyzing risk factors using Chinese stock market index data.
This study makes significant contributions to the field in three main areas. First, we introduce a novel varying index coefficient tail index regression model (VICM-TIR) and systematically investigate its asymptotic properties, including the consistency of variable selection and the oracle property of the estimators, thus providing a robust theoretical foundation for complex data analysis. To the best of our knowledge, previous models have not achieved this. Second, the VICM-TIR model adeptly handles nonlinear interaction effects among covariates and effectively overcomes the curse of dimensionality, demonstrating exceptional flexibility and interpretability. It encompasses the mainstream models currently documented in the literature. Finally, in practical applications, the VICM-TIR model exhibits remarkable modeling flexibility and applicability. It performs exceptionally well even in small sample sizes and high-dimensional scenarios, offering a powerful tool for complex data analysis. Notably, it shows significant innovative value in the analysis of extreme events.
The subsequent sections of this paper are structured as follows: In Section 2, we introduce the varying index coefficient model for tail index regression, detailing the estimation procedure and the method for selecting tuning parameters. Section 3 establishes the asymptotic theory and the properties of the estimators. Section 4 presents the findings of a simulation study, evaluating the model’s performance across various scenarios. Section 5 illustrates the practicality and effectiveness of the model through an analysis of real-world data. Finally, Section 6 concludes the paper, and the Appendix A contains the proofs of the theorems presented.

2. Model and Method

2.1. Model Setting

When dealing with a heavy-tailed distribution featuring a response variable Y R and and a set of covariates ( X , Z ) , where X X R d and Z Z R p , each observation, denoted as ( Y i , X i , Z i ) , stands as an independent sampled from the joint distribution encompassing Y, X , and Z . Utilizing the information provided by the covariates ( X , Z ) , we can define the conditional cumulative distribution function (CDF) about the response variable Y as
F ( y x , z ) = P ( Y y X = x , Z = z ) .
We take into account a Pareto-type distribution; the conditional survival function is denoted by S ( y x , z ) = 1 F ( y x , z ) that assumes the following form:
S ( y x , z ) = y γ ( x , z ) V ( y x , z ) ,
where the unknown tail index γ ( x , z ) depends on the covariates x and z , while V ( y x , z ) represents a slowly varying function, assumed to have a specific form as follows:
V ( y ; x , z ) = c 0 ( x , z ) + c 1 ( x , z ) y h ( x , z ) + o ( y h ( x , z ) ) ,
where c 0 ( x , z ) , c 1 ( x , z ) , and h ( x , z ) are unknown functions of the covariates x and z . Furthermore, for each ( x , z ) , o ( y h ( x , z ) ) represents a remainder term that is dominated by y h ( x , z ) , while c 0 ( x , z ) and c 1 ( x , z ) are assumed to be uniformly bounded below zero. The ratio V ( c y x , z ) / V ( y x , z ) approaches 1 when y , where c > 0 is a constant.
To enhance the flexibility of the model and account for nonlinear interactions among the covariates, we introduce the varying index coefficient model (VICM) to extend the tail index regression (TIR) model. Specifically, we partition the covariates into two components: a varying coefficient part associated with X and a single-index function of Z , denoted as η ( β Z ) . Consequently, the logarithm of the tail index can be expressed as:
log ( γ ( x , z ) ) = α ( x , z ) = k = 1 d η k ( β k z ) x k
where η k ( · ) represents the unknown single-index function, and β k = ( β k 1 , , β k p ) R p denotes the unknown single-index parameter vector. Subsequently, given an observation ( y i , x i , z i ) , the survival function of the Pareto-type model can be constructed as
S ( y i x i , z i ) = y i γ ( x i , z i ) c 0 ( x i , z i ) + c 1 ( x i , z i ) Y i h ( x i , z i ) + o ( y i h ( x i , z i ) ) ,
in which when Y > w n , the approximate probability density function of Y i can be expressed as
f ( y x , z ) = γ ( x , z ) ( y / w n ) γ ( x , z ) y 1 .

2.2. Estimation Procedure

To estimate the parameters ( η , β ) , the Peak-Over-Threshold (POT) method is adopted. This approach involves introducing a threshold w n and utilizing all observations that surpass this threshold for parameter estimation. Notably, the transformed random variable ( Y / w n , x , z ) follows a Pareto-type tail distribution, conditional on Y > w n , as defined by:
P ( Y w n > t Y > w n ) = t γ ( x , z ) , t 1 .
Then, the Pareto-type tailed distribution can be approximated by a standard Pareto distribution. Leveraging the conditional density function f ˜ ( · x , z , y > w n ) of Y / w n conditioned on Y > w n , we can derive the following expression:
f ( y w n x , z , y > w n ) = γ ( x , z ) ( y w n ) γ ( x , z ) 1 .
Therefore, to estimate the parameters, one can minimize the following penalized likelihood function:
L n ( η , β λ ) = i = 1 n { log ( Y i w n ) exp ( l = k d η k ( β k z i ) x k i ) l = k d η k ( β k z i ) x k i } I ( Y i > w n ) + n 0 k = 1 d l = 1 p p λ 1 k l ( β k l ) + n 0 k = 1 d p λ 2 k ( η k ) + n 0 k = 1 d p λ 3 k ( η ˙ k )
where η k ( · )   = ( η k 2 ( u ) d u ) 1 / 2 , and the effective sample size is denoted by n 0 . The function p λ ( · ) is the Smoothly Clipped Absolute Deviation (SCAD) penalty function [28], with a tuning parameter λ determined through a data-driven method. It is defined as:
p λ ( ω ) = λ I ( ω λ ) + ( a λ ω ) + ( a 1 ) λ I ( ω > λ )
with a > 2 ,   ω > 0 , and p λ ( 0 ) = 0 . Adopting the recommendation of [28], we use a = 3.7 as it performs well in all cases.
Given the unknown η ( · ) in (9), directly optimizing (9) is not feasible. Consequently, we approximate η ( · ) in (9) using the spline method. Specifically, we consider a knot sequence with K interior knots ξ k , where K increases with the effective sample size n 0 . Denoting the r-th order B-spline basis as B r ( u ) for u [ 0 , 1 ] with r 2 , the nonparametric functions η k ( u k ) for k = 1 , , d can be estimated using spline functions.
η ^ k ( u k , β k ) = s = 1 K B r , s , k ( u k ) b ^ r , s , k ( β k ) = B r , k ( u k ) b ^ k ( β k ) ,
where B r , k ( u k ) = ( B r , s , k ( u k ) : 1 s K ) and b ^ k ( β k ) = ( b ^ s , k ( β k ) : 1 s K ) .
Recognizing that a zero derivative of the smooth coefficient function η k ( u ) implies a constant value η k , indicating no interaction between Z and X k , we propose a strategy to detect the derivative of η k ( u ) . Motivated by [36], we approximate the derivative η ˙ k using spline functions of a lower order than those used for η k . Specifically, the spline estimator for η ˙ k is formulated as:
η ˙ ^ k ( u k , β k ) = s = 1 K B ˙ s , r ( u k ) b ^ s , k ( β k ) = s = 2 K ( r 1 ) B s , r 1 ( u k ) ( ϖ s + r 1 ϖ s ) b ^ s , k ( β k ) b ^ s 1 , k ( β k ) : = D k δ k ( β k ) ,
where
D k = ( ( r 1 ) B 2 , r 1 ( u k ) ( ϖ 2 + r 1 ϖ 2 ) , , ( r 1 ) B K , r 1 ( u k ) ( ϖ K + r 1 ϖ K ) ) ,
δ k ( β k ) = ( b ^ 2 , k ( β k ) b ^ 1 , k ( β k ) , , b ^ K , k ( β k ) b ^ K 1 , k ( β k ) ) .
Substituting the spline approximation for η k ( u ) into (9), we obtain the following penalized likelihood function:
L n ( b , β λ ) = i = 1 n { log ( Y i w n ) exp ( G i ( β ) b ( β ) ) ( G i ( β ) b ( β ) ) } I ( Y i > w n ) + n 0 k = 1 d l = 1 p p λ 1 k l ( β k l ) + n 0 k = 1 d p λ 2 k ( b k ( β ) H ) + n 0 k = 1 d p λ 3 k ( δ k ( β ) Q ) ,
where
G i ( β ) = vec ( B i diag ( x i ) ) , b k H = ( b k H b k ) 1 / 2 , H = B ( u ) B ( u ) d u , δ k Q = ( δ k Q δ k ) 1 / 2 , Q = D ( u ) D ( u ) d u .
Here, B i = ( B r ( u 1 i ) , , B r ( u d i ) ) is a basis matrix of K × d , vec is the operator stacking the columns of a given matrix, and diag ( x i ) is the d × d diagonal matrix with x i as diagonal element.
To ensure identifiability, the p-dimensional single-index parameter β k must satisfy the normalization constraint β k = 1 with a positive first component β k 1 > 0 for 1 k d . As a result, reparameterization is necessary. We introduce ϕ k = { ( ϕ k 1 , , ϕ k , p 1 ) R p 1 | ϕ k c } , where 0 < c < 1 , to facilitate this reparameterization. Subsequently, β k can be expressed as a function of ϕ k in the form β k = β ( ϕ k ) = ( 1 ϕ k 2 , ϕ k ) . Assuming the existence of ϕ k 0 such that β k 0 = β ( ϕ k 0 ) , the estimated logarithmic tail index (TI) can then be expressed as
α ^ ( v i , ϕ ^ , b ^ ( ϕ ^ ) ) = log ( γ ( v i , ϕ ^ , b ^ ( ϕ ) ) ) = G ^ i ( ϕ ^ ) b ( ϕ ^ ) .
Subsequently, the formulation of the penalized likelihood function, which is utilized to estimate the parameters ( b , ϕ ) , is given by
L n ( b , ϕ λ ) = i = 1 n { log ( Y i w n ) exp ( G i ( ϕ ) b ( ϕ ) ) G i ( ϕ ) b ( ϕ ) } I ( Y i > w n ) + n 0 k = 1 d l = 1 p 1 p λ 1 k l ( ϕ k l ) + n 0 k = 1 d p λ 2 k ( b k ( ϕ ) H ) + n 0 k = 1 d p λ 3 k ( δ k ( ϕ ) Q ) .
Under the constraint ϕ k < 1 for all 1 k d , minimize the objective function (18) to obtain the parameter estimates, and subsequently, when fixing ϕ , the profile likelihood estimator of b ( ϕ ) can be derived as
b ^ ( ϕ ) = argmin b R d × K L n ( b ( ϕ ) , ϕ λ ) .
After obtaining the estimated B-spline coefficients b ( ϕ ) , we proceed to estimate the parameter vector ϕ as follows:
ϕ ^ = argmin ϕ R ( p 1 ) × d L n ( b ^ ( ϕ ) , ϕ λ ) .

2.3. Tuning Parameter Selection

The proposed estimator comprises three tuning parameters: the threshold w n , the knot count K, and the smoothing parameter λ .
Research indicates that knot selection is less sensitive compared to λ [37], and employing evenly spaced knots with a sufficiently large fixed K is often sufficient. Drawing inspiration from [18], we employ cross-validation to select the optimal λ by minimizing the cross-validation score. For setting the threshold w n , we adopt a data-driven approach proposed by [8]. This approach utilizes a discrepancy measure provided by [8] to determine w n . Specifically, we define Q ^ i = exp [ exp [ ( η ( β Z i ) ) X i ] log ( Y i / w n ) ] , which approximately follows a uniform distribution on [ 0 , 1 ] . By minimizing the difference between the theoretical and empirical uniform distributions, we can identify the suitable threshold value. The difference is defined as following:
D ( w n ) = i = 1 n Q ^ i F ^ ( Q i ) I ( Y i > w n ) i = 1 n I ( Y i > w n )
where F ^ ( · ) is the empirical distribution of Q i based on { Q ^ i : Y i > w n , i = 1 , , n } . Subsequently, the threshold w n is determined by minimizing the discrepancy measure D ( w n ) .

3. Asymptotic Theory

In this section, we delve into the asymptotic properties of the proposed estimators. The detailed proofs for these attributes, as well as the underlying assumptions, are relegated to the Appendix A.
To establish notation, we designate β 0 as the true value of β for the duration of this text. For conciseness, we introduce ϕ as a concatenation of ( ϕ 1 , ϕ 2 , , ϕ d ) , where ϕ j signifies the j-th component, j = 1 , , ( p 1 ) × d . Without affecting generality, we presume that ϕ j 0 for j = 1 , , h 1 and ϕ j = 0 for j = h , , ( p 1 ) × d . Furthermore, η ^ k ( · ) denotes the nonzero varying coefficients for k = 1 , , m , constant nonzero values for k = m + 1 , , q , and zero values for k = q + 1 , , d .
Carrying on with the previous notation, we define J ( ϕ k ) as the Jacobian matrix of size p × ( p 1 ) representing the partial derivative of β k with respect to ϕ k .
J ( ϕ k ) = ( 1 ϕ k 2 ) 1 / 2 ϕ k I p 1
To simplify the notation, let ϱ k = β k Z . Furthermore, we define the space M as the set of functions with a finite L 2 norm on the domain [ 0 , 1 ] d × R d .
M = g ( ϱ , x ) = k = 1 d g k ( ϱ k ) x k , E g l ( ϱ l ) 2 < ,
where ϱ = ( ϱ 1 , , ϱ d ) and x = ( x 1 , , x d ) . To investigate the large-sample characteristics of parameter estimators, we introduce β 0 as the vector of true parameters, where β 0 = ( β 1 0 ) , , ( β d 0 ) and β k 0 ( ϕ k 0 ) = ( β k 1 0 , ϕ k 0 ) for 1 k d . For each 1 j p , let g j 0 be the function that satisfies the given condition:
P ( Z j ) = g j 0 ( ϱ ( β 0 ) , X ) = k = 1 d g k , j 0 ( ϱ k ( β k 0 ) ) X k = arg min g M E Z j g ( ϱ ( β 0 ) , X ) 2 .
Let P ( Z ) = P ( Z 1 ) , , P ( Z p ) , Z ˜ = Z P ( Z ) , and the gradient matrix of the log-TI is
α ˙ ( X , Z , η 0 , β 0 ( ϕ 0 ) ) = η ˙ k 0 ( ϱ k ( β k 0 ( ϕ k 0 ) ) ) X k J k ( ϕ k 0 ) Z ˜ , 1 k d .
For any matrix A , denote A 2 = A A . Then define
Ω ( η 0 , ϕ 0 ) = E [ α ˙ 2 ( X , Z , η 0 , β 0 ( ϕ 0 ) ) ] .
Theorem 1. 
Suppose that Assumptions A1–A11 in the Appendix A hold and the number of knots K = O p ( n 0 1 / ( 2 r + 1 ) ) . Then, we have
(i) β ^ β 0 = O p ( n 0 r / ( 2 r + 1 ) + a n + R w n ) ;
(ii) η ^ k ( · ) η k 0 ( · )   = O p ( n 0 r / ( 2 r + 1 ) + a n + R w n ) ,   k = 1 , , d , where
a n = max j , k p ˙ λ 1 j ( β j 0 ) , p ˙ λ 2 k ( b k 0 H ) , p ˙ λ 3 k ( δ k 0 Q ) : β j 0 0 , b k 0 0 , δ k 0 0 , R w n = E c 0 ( x , z ) w n γ ( x , z ) + c 1 ( x , z ) γ ( x , z ) γ ( x , z ) + h ( x , z ) w n γ ( x , z ) h ( x , z ) .
Theorem 2. 
Use Assumptions A1–A11 in the Appendix A and the number of knots K = O p ( n 0 1 / ( 2 r + 1 ) ) . If
λ max = max j , k λ 1 j , λ 2 k , λ 3 k λ min = min j , k λ 1 j , λ 2 k , λ 3 k
Suppose
λ max 0 , n 0 r / ( 2 r + 1 ) λ min ( n ) .
Then, with probability approaching one, β ^ and η ^ ( · ) must satisfy:
(i) β ^ j = 0 ,   j = h + 1 , , p × d ;
(ii) η ^ k ( · ) are nonzero constants for k = m + 1 , , q and η ^ k ( · ) = 0 ,   k = q + 1 , , d .
Let ϕ * = ( ϕ 1 , ϕ 2 , , ϕ h 1 ) , β * = ( β 1 , β 2 , , β h ) , and
η * ( · ) = ( η 1 ( · ) , η 2 ( · ) , ( · ) , η q ( · ) ) .
Theorem 3. 
Under Assumptions A1–A11 in the Appendix A and Theorem 2, we have
n 0 Ω ( β * ^ β 0 * + Ω 1 / 2 n 0 1 / 2 ε ) D N ( 0 , J Ω J )
where J = J ( ϕ k 0 * ) = diag ( J ( ϕ 1 0 * ) , , J ( ϕ m 0 * ) ) and its dimension is m p × m ( p 1 ) , J ( ϕ k 0 * ) is defined in (22) and Ω = Ω ( η 0 * , ϕ 0 * ) is given at (26).
Theorems 1 and 2 establish the consistency of the variable selection process. Furthermore, Theorems 1, 2, and 3 collectively demonstrate the oracle property of β . Specifically, these theorems indicate that our proposed estimators achieve the optimal convergence rate and maintain asymptotic distribution consistency with estimators based on the correct submodel. Notably, Theorem 1 shows that the spline estimator η ^ k ( · ) obtained from the estimation procedure in (18) is a consistent estimator of η k ( · ) . However, the asymptotic distribution of η ^ k ( · ) is not available. To address this issue, we employ a two-step spline backfitted local linear (SBLL) estimation method to further process the nonparametric function η k ( · ) . Without loss of generality, we focus on the estimation process of the first nonparametric function η 1 ( · ) , as the estimation of other functions can be similarly achieved. For the spline estimates η ^ k ( · ) for k 2 , we use them as initial estimates and define C i , 1 = k = 2 d η k ( β ^ k Z i ) X k i . Then, let ϱ i = ϱ i 1 = β ^ 1 Z i , and for each given ϱ 1 = β ^ 1 Z 1 , η 1 ( ϱ i ) is estimated through local linear fitting as η 1 ( ϱ i ) = η ˙ 1 ( ϱ 1 ) + η ¨ 1 ( ϱ 1 ) ( ϱ i ϱ 1 ) + O p ( h 1 2 ) , where ϱ i ϱ 1 h 1 and h 1 is the bandwidth. Then we drive the estimator η ^ L L , 1 ( ϱ 1 , β ^ 1 ) by minimizing the following local kernel objective function:
i = 1 n K h 1 ( ϱ i ϱ 1 ) { log ( Y i w n ) exp ( C i 1 + G ¯ i 1 ι ) C i 1 G ¯ i 1 ι } I ( Y i > w n )
where K h 1 ( ϱ ) = K ( ϱ / h 1 ) / h 1 is a non-negative symmetric kernel function, G ¯ i 1 = ( X 1 i , X 1 i ( ϱ i ϱ 1 ) / h 1 ) and ι = ( ι 0 , ι 1 ) = ( η ( ϱ 1 ) , η ˙ ( ϱ 1 ) ) . Since η k ( ϱ k ) for k 2 are unknown, we adapt (28) by substituting the spline estimators η ^ k ( ϱ k , β ^ k ) from (18) for η k ( ϱ k ) . This substitution is equivalent to replacing C i 1 in (28) with C ^ i 1 . The resulting modified SBLL estimator is denoted as η ^ SBLL , 1 ( ϱ 1 , β ^ 1 ) . Denote μ e ( K ) = u e K ( u ) d u and v e = u e K 2 ( u ) d u for e = 0 , 1 , 2 , 3 and assume that the following expressions are convergent in probability when the sample size n 0 , that is,
n n 0 μ 2 ( e 1 ) E R w n ( X 1 , ϱ 1 ) f U ( ϱ 1 ) p Ξ e e ( ϱ 1 ) , e = 1 , 2
n 2 h 1 n 0 h 1 2 2 η ¨ 1 ( ϱ 1 ) μ 2 1 E R w n ( X 1 , ϱ 1 ) + c 0 ( X 1 , ϱ 1 ) w n γ ( X 1 , ϱ 1 ) f U ( ϱ 1 ) p Σ 11 ( ϱ 1 )
n n 0 E [ c 0 ( X 1 , ϱ 1 ) w n γ ( X 1 , ϱ 1 ) + c 1 ( X 1 , ϱ 1 ) γ 2 ( X 1 , ϱ 1 ) + h 2 ( X 1 , ϱ 1 ) γ ( X 1 , ϱ 1 ) + h ( X 1 , ϱ 1 ) 2 w n γ ( X 1 , ϱ 1 ) · w n h ( X 1 , ϱ 1 ) ] f Z ( ϱ 1 ) v 0 p Λ 11 ( ϱ 1 ) ,
where f Z ( ϱ 1 ) is the marginal probability density function of ϱ 1 .
Theorem 4. 
Suppose that Assumptions A1–A11 in the Appendix A are satisfied. For K = O p ( n 0 1 / ( 2 r + 1 ) ) , as n 0 , for any ϱ 1 [ h 1 , 1 h 1 ] , we have
n 0 h 1 η ^ L L , 1 ( ϱ 1 , β ^ 1 ) η 1 ( ϱ 1 ) Ξ 11 1 ( ϱ 1 ) Σ 11 ( ϱ 1 ) d N ( 0 , Π ) ,
where Π = Ξ 11 1 ( ϱ 1 ) Λ 11 ( ϱ 1 ) Ξ 11 1 ( ϱ 1 ) with Ξ 11 ( ϱ 1 ) , Σ 11 ( ϱ 1 ) , and Λ 11 ( ϱ 1 ) defined in (29), (30), and (31), respectively.
Next, we establish the uniform oracle efficiency of the SBLL estimator η ^ SBLL , 1 ( ϱ 1 , β ^ 1 ) . Specifically, Theorem 5 demonstrates that the absolute difference between η ^ SBLL , 1 ( ϱ 1 , β ^ 1 ) and η ˜ LL , 1 ( ϱ 1 , β ^ 1 ) is uniformly bounded by O p ( K r ) . Consequently, this ensures that η ^ SBLL , 1 ( ϱ 1 , β ^ 1 ) and η ˜ LL , 1 ( ϱ 1 , β ^ 1 ) share the same asymptotic distribution.
Theorem 5. 
Under Assumptions A1–A11 in the Appendix A, and K = O p ( n 0 1 / ( 2 r + 1 ) ) , we have
sup ϱ 1 [ 0 , 1 ] η ^ SBLL , 1 ( ϱ 1 , β ^ 1 ) η ^ LL , 1 ( ϱ 1 , β ^ 1 ) = O p ( K r ) .
Corollary 6. 
Under Assumptions A1–A11 in the Appendix A, and K = O p ( n 0 1 / ( 2 r + 1 ) ) , as n 0 , we have
n 0 h 1 η ^ SBLL , 1 ( ϱ 1 , β ^ 1 ) η 1 ( ϱ 1 ) Ξ 11 1 ( ϱ 1 ) Σ 11 ( ϱ 1 ) d N ( 0 , Π ) .

4. Monte Carlo Studies

In this section, we evaluate the finite-sample performance of the proposed estimator through Monte Carlo simulations. Adhering to the setup outlined in [8,11], we postulate that the response variable Y i follows a specific distribution, detailed as follows:
1 F ( y ; x , z ) = ( 1 + c ) y γ ( x , z ) 1 + c y γ ( x , z ) = y γ ( x , z ) ( 1 + c ) c ( 1 + c ) y γ ( x , z ) + O ( y γ ( x , z ) ) .
Afterwards, let log ( γ ( x , z ) ) = η 0 ( β 0 z ) x and p = 10 , d = 6 .
β 1 0 = 1 / 28 ( 2 , 2 , 6 , 2 , 7 , 5 , 0 , , 0 ) β 2 0 = 1 / 41 ( 2 , 2 , 5 , 3 , 5 , 4 , 0 , , 0 ) β 3 0 = 1 / 19 ( 2 , 2 , 2 , 1 , 7 , 1 , 0 , , 0 ) β k 0 = 0 , k > 3 .
By adjusting the parameter c of the slowly varying function, a diverse set of distributions for y can be generated, thereby facilitating simulations across a range of scenarios. Specifically, the values of c were chosen as c = 0.1 , 0.25 , 0.5 to illustrate different scenarios, while the sample size n was varied as n = 500 , 1000 , 2000 to examine the performance across varying data sizes. For the parametric components, the marginal distributions of X and Z were specified as follows: Z i U ( 1.5 , 1.5 ) and X i U ( 3 , 3 ) . Additionally, the true single-index function was represented by a smooth function, defined as:
η 1 ( β 1 Z ) = 0.5 sin ( β 1 Z ) η 2 ( β 2 Z ) = cos ( β 2 Z ) η 3 ( β 3 Z ) = 0.5 ( exp ( ( β 3 Z ) 2 ) + 0.5 ( β 3 Z ) 2 ) η k ( β k Z ) = 0 , k > 3 .
Adopting the methodology of [18], we utilize equidistant knots with a constant K value set to 3. To ascertain the sample fraction, as suggested by [8], we analyze a set of 100 distinct w n values (denoted as { w n ( l ) : 1 l 100 } ) along with their corresponding sample fractions ( n 0 / n ) distributed uniformly within the range [ 1 3 , 1 ] . Additionally, for each model configuration, we conduct 5000 simulation runs. The precision of estimating β and η ( · ) is quantified using mean squared errors, respectively,
MSE β = β ^ n β 0 2 ,
ASE η = 1 N d k = 1 N j = 1 d η j ( ϱ k ) η ^ j ( ϱ k ) 2 ,
ASE γ = 1 N i = 1 n γ ^ i γ i 2 ,
where { ϱ k , k = 1 , , N } signifies the regular grid points for evaluating the function η ^ ( ϱ ) , γ 0 represents the true value of γ as defined in (2), and γ ^ denotes its estimator. For our simulation, N = 500 is adopted. The results are summarized in Table 1, where columns C β and C η indicate the mean values for correctly identifying nonzero coefficients, while I C β and I C η represent the mean values for incorrectly identifying zero coefficients. The row "VICMTIR-VS" outlines the performance of our proposed estimator with the variable selection procedure model, and the row “Oracle” depicts the estimator’s performance based on the actual model with known zero coefficients.
With an increasing sample size, there is a concurrent decline in mean squared errors (MSEs) and standard deviations (STDs) across diverse parameter settings for c in the slowly varying function. Simultaneously, the accuracy of variable selection is enhanced, highlighting the robustness of the model estimators. Furthermore, the third column of the table exhibits the median sample fraction derived from 5000 realizations, demonstrating a declining pattern as the sample size expands, which aligns with our expectations. Additionally, as the sample size enlarges, the variable selection method progressively approaches the performance of the oracle procedure concerning model error.
Graphically, Figure 1 depicts the bias of nonzero parameters, while Figure 2 exhibits the bias of zero parameters. Both figures reveal that the mean bias of the estimators is close to 0, thus demonstrating the effectiveness of the proposed estimation approach. Furthermore, Figure 3 showcases the fitted curves of η ( ϱ ) alongside its corresponding 95% point-by-point confidence interval, indicating a satisfactory fit for the nonlinear function.
Subsequently, a comparative evaluation is undertaken between our model and alternative parameterized and nonparameterized models, incorporating tail indices in diverse forms. The estimation performance of these models is analyzed in scenarios encompassing both low- and high-dimensional covariates, with specific dimensions set to ( p , d ) = ( 6 , 3 ) , ( 10 , 6 ) , and ( 20 , 9 ) . The parameter vector par comprises ( β , ζ ) , where β is defined as earlier, and ζ is a d-dimensional vector. Without loss of generality, the parameter c in the slowly varying function is set to 0.5 for the simulations. For the linear setting, we adopt the method from [8], using α = par ( z , x ) . For the single-index model (SIM), we consider the approach from [38], applying α = η ( par ( z , x ) ) . In the fully nonparametric setting (NPM), we follow the methodology described in [39], employing kernel smoothing with the Epanechnikov kernel function, assuming equal bandwidth, and utilizing α = f ( z , x ) . Finally, for the general variable coefficient model (VCM), we adhere to the approach from [12], implementing α = f ( z ) x .
Table 2 presents the outcomes of 500 Monte Carlo simulations. In comparing the average squared errors (ASEs) among various models, we observe that for the single-index model (SIM) and the variable index coefficient model with variable selection (VICM-VS), the ASEs do not exhibit a substantial rise as the dimensionality of covariates grows under different sample sizes. Conversely, in the fully nonparametric model and the general variable coefficient model, an appreciable increase in ASEs is noted as the dimensionality of covariates increases, highlighting the dimensionality curse in high-dimensional nonparametric estimation. The oversimplified linear model, due to its inability to capture the nonlinear effects of factors, results in significant estimation errors. However, based on the experimental findings, the VICMTIR-VS approach maintains a commendable model flexibility level. It demonstrates satisfactory estimation accuracy, even in small sample sizes and large parameter dimensions.

5. Empirical Analysis

In the assessment of extreme financial occurrences and market risks, extreme value theory (EVT) emerges as a robust methodology for quantifying high-quantile random phenomena, widely acknowledged as an efficacious statistical modeling approach. Herein, we deploy this model to gauge tail risk in financial markets. Specifically, we leverage daily trading data from the CSI 300 Index in China, spanning the period from 8 April 2010, to 1 February 2023, comprising a total of 2657 observations. To authenticate the model’s effectiveness, we allocate the initial 80% of the dataset for in-sample parameter estimation, reserving the remaining 20% for out-of-sample validation.
In considering the selection of covariates, we acknowledge the direct influence of the index’s financial indicators on tail risk. Furthermore, given the phenomenon of economic globalization, tail risk is also influenced by fluctuations in international markets. Consequently, we postulate that major global market indices impact the tail risk of the CSI 300 Index by influencing its underlying financial indicators. The precise definitions and settings of each variable are outlined in Table 3. The returns for each index are calculated by employing the formula Δ U t = 100 × ( ln U t ln U t 1 ) . The descriptive statistics in Table 4 show that most covariates exhibited skewed distribution. Therefore, in the selection of standardization methods, following the approach outlined in [8], the variables were transformed using rank transformations. Specifically, let R i be X i ’s rank in the sample { X i : 1 i n } , then conduct rank transformations by redefining X i : = Φ 1 ( R i 3 / 8 ) / ( n + 1 / 4 ) (the normal score transformation).
First, the model is implemented on the training dataset, leveraging the threshold selection approach outlined previously to identify the effective sample for model estimation, which comprises roughly 20% of the total. Following this, the parameter estimations are presented in Table 5. Figure 4 illustrates the estimated varying coefficient functions, revealing a noteworthy nonlinear association between internal variables and tail risk. Notably, the influence of internal variables exhibits a distinct interplay with international indices. Based on the estimation outcomes, the trading volume and turnover rate exhibit primarily negative effects. As these metrics increase, the tail index diminishes, thereby intensifying tail risk. Conversely, the trading value and P/BV ratio predominantly have positive impacts. When these values rise, the tail index augments, leading to a mitigation of tail risk.
After estimating the parameters, we derive the tail index. To evaluate the goodness-of-fit of the model, we employ the QQ-plot methodology [40] by constructing a plot for the pairs ( U ^ t , F ^ n ( U t ) ) where Y t > w n . Here, U ^ t is defined as exp { exp ( α ( x t , z t ) ) log ( Y t / w n ) } and F ^ n ( U t ) represents the empirical distribution of { U ^ t } . Ideally, if w n is large enough and the model fits well, U ^ t should follow a uniform distribution on [ 0 , 1 ] . As depicted in Figure 5, the close alignment between the 45-degree reference line (solid line) and the QQ-plot (dashed line) suggests a robust fit of our VICM-TIR model. The tail index of the log-return distribution, shown in the right panel of Figure 6, reveals smaller indices, indicating heavier tails and a higher likelihood of extreme losses. Notably, the estimation results indicate that tail risk had already emerged before the turbulence observed in the Chinese stock market on 19 June 2015.

6. Discussion

This study reveals that investigating extreme events allows for a deeper understanding of their underlying causes, complexity, and severity. To address the limitations of traditional semiparametric models in estimating complex covariates, this research proposes a novel tail index regression model based on the varying index coefficient model (VICM-TIR). This model significantly enhances the flexibility and practicality of extreme event modeling by accurately capturing nonlinear interaction effects among covariates. Additionally, the study incorporates a variable selection mechanism and rigorously demonstrates the consistency and oracle properties of the estimators. Monte Carlo simulation experiments validate the finite sample properties of the model’s parameter estimates. Finally, the model’s effectiveness in practical applications is illustrated through an analysis of tail risk in financial markets. This research not only provides valuable insights into the understanding of extreme events but also offers a practical analytical tool for decision-makers and researchers across various fields.
Despite the strong potential of the VICM-TIR model, future research needs to address several key issues. Firstly, the current study primarily considers independently and identically distributed observations; future research should extend to broader scenarios, particularly incorporating lag effects of variables. Secondly, the estimation problems and theoretical properties under high-dimensional and ultra-high-dimensional covariate settings require further in-depth investigation. Finally, due to the uncertainty of the tail index, the generalized Pareto distribution (GPD) or generalized extreme value distribution (EVD) is considered for modeling tail behavior. Studies by Chavez-Demoulin and Davison [41] and Youngman [42] show that the GPD, developed through semiparametric regression, fits various data patterns effectively. Therefore, future research should extend the current methodology to include the GPD, enhancing the model’s robustness and real-world applicability.
Overall, the exploration of the causes of extreme events is a fascinating research area that deepens our understanding of the challenges posed by complex data and enhances model interpretability. By thoroughly investigating extreme events and their core causes, we can better reveal the complexity and severity of these events, promote the development of new models and methods, and provide more valuable references for policymakers and decision-makers.

Author Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by H.A. and B.T. The first draft of the manuscript was written by H.A. and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The authors did not receive support from any organization for the submitted work.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, [Boping Tian], upon reasonable request.

Conflicts of Interest

The authors report that there are no competing interests to declare.

Appendix A. Proofs

In order to establish the consistency and asymptotic normality of the proposed estimators, the following assumptions are required. First, denote C ( r ) [ 0 , 1 ] = ψ ψ ( r ) C [ 0 , 1 ] as the space of r-th order smooth functions. Additionally, let C 0 , 1 ( X ω ) represent the space of Lipschitz continuous functions on X ω , i.e.,
C 0 , 1 ( X ω ) = ψ : ψ 0 , 1 = sup ω ω , ω , ω X ω ψ ( ω ) ψ ( ω ) ω ω < ,
where ψ 0 , 1 denotes the C 0 , 1 -norm of ψ .
Assumption A1. 
The observations ( Y i , X i , Z i ) are assumed to be independently and identically distributed. For β in the neighborhood of β 0 , the marginal density f ϱ ( · ) of the random variable ϱ = β Z is bounded away from zero on Z w and f ϱ ( · ) C 0 , 1 ( Z w ) , where Z w = β Z , Z Z and Z is a compact support set of Z . Without loss of generality, we assume Z w = [ 0 , 1 ] .
Assumption A2. 
The parameter space of β is compact, and the function α ( v ; β , η ) is t 2 -th continuous differentiable on the parameter space of β for each fixed v = ( x , z ) , and the spline order is r t . The true parameter vector β 0 is an interior point of the parameter space of β.
Assumption A3. 
Let F ( z ) = E X X Z = z . Then, F ( z ) is r-th continuously differentiable with respect to z . Furthermore, for a given z , F ( z ) is a positive definite matrix, and the eigenvalues of F ( z ) are bounded. Moreover, the matrices Ξ e e ( ϱ ) ,   e = 1 , 2 defined in (29) are nonsingular, while Λ 11 ( ϱ ) defined in (31) is positive definite.
Assumption A4. 
For 1 k d and 1 l p ,   g l , k 0 C ( 1 ) [ 0 , 1 ] .
Assumption A5. 
The kernel function K ( ϱ ) is a symmetric and continuous probability density function satisfying Z w K ( ϱ ) d ϱ = 1 , Z w ϱ 2 K ( ϱ ) d ϱ < , and ϱ K ( ϱ ) 0 as ϱ .
Assumption A6. 
The bandwidth h n satisfies h n 0 , n 0 h n 2 0 , n / n 0 2 h n 0 , n h n 2 / n 0 0 , and n 0 h n / log ( n 0 ) as the sample size approaches infinity.
Assumption A7. 
Denote
R = n n 0 E α ˜ ( 1 ) ( v i , β 0 , η 0 ) c 1 ( v i ) h ( v i ) γ ( v i ) + h ( v i ) w n γ ( v i ) h ( v i ) ,
assume that n 0 , Ω 1 / 2 R ε for some nonzero constant vector ε.
Assumption A8. 
The term o ( y h ( x , z ) ) in the slowly varying function V ( y ; x , z ) satisfies that when y ,
sup x X , z Z y h ( x , z ) o ( y h ( x , z ) ) 0
Assumption A9. 
Assume that
lim inf n lim inf ϕ j 0 + λ 1 j 1 p ˙ λ 1 j ( ϕ j ) > 0 , lim inf n lim inf b k H 0 λ 2 k 1 p ˙ λ 2 k ( b k H ) > 0 , lim inf n lim inf δ s Q 0 λ 3 s 1 p ˙ λ 3 s ( δ s Q ) > 0 ,
where j = h , , ( p 1 ) × d ,   k = m + 1 , , q , and s = q + 1 , , d .
Assumption A10. 
Let
b n = max j { p ¨ λ 1 j ( ϕ j 0 ) , p ¨ λ 2 k ( b k 0 H ) , p ¨ λ 3 k ( δ k Q ) : ϕ j 0 0 , b k 0 0 , δ k 0 0 } .
Then, b n 0 , as n 0 .
Assumption A11. 
For any given nonzero ω, we have
lim n n 1 / 2 p ˙ λ ( ω ) = 0 , lim n n r / ( 2 r + 1 ) p ¨ λ ( ω ) = 0
Satisfying these regularity conditions is fundamental to guaranteeing the asymptotic properties of our estimators. Firstly, Assumption A1 [43] is a common and standard prerequisite in the single-index model framework. Secondly, Assumption A2 [37] is pivotal in ensuring the existence and uniqueness of the estimator β ^ . Moreover, Assumptions A3–A7 are essential in establishing asymptotic normality, which lays the groundwork for understanding the estimator’s properties as the sample size tends to infinity. Notably, Assumptions A5 and A6, introduced by [11], are utilized to establish the asymptotic normality of nonparametric functions, thereby ensuring the distributional characteristics of the estimates in large-sample settings. Furthermore, Assumptions A7 and A8 [8] regulate the extreme behavior of slowly varying functions, with Assumption A7 implicitly indicating the optimal convergence rate of w n . Lastly, Assumptions A9, A10 [28], and A11 [44] precisely define the requirements for the penalty function, introducing appropriate penalties to promote sparsity in the estimation process.
Let L n ( ϕ , η ) = Φ n ( ϕ ) + P n ( ϕ ) , where
Φ n ( ϕ ) : = i = 1 n { log ( Y i w n ) exp ( η ( ϕ z i ) x i ) η ( ϕ z i ) x i } I ( Y i > w n ) ,
P n ( ϕ ) : = n 0 j = 1 ( p 1 ) × d p λ 1 j ( ϕ j ) + n 0 k = 1 d p λ 2 k ( b k ( ϕ ) H ) + n 0 k = 1 d p λ 3 k ( δ k ( ϕ ) ) .
To simplify the notation, we define
i 0 = log ( Y i w n ) exp ( α 0 ( v i ; ϕ 0 * , η 0 * ) ) I ( Y i > w n ) ,
and
^ i = log ( Y i w n ) exp ( α ^ ( v i ; ϕ ^ * , b ^ * ) ) I ( Y i > w n ) ,
where α ^ ( v i ; β ^ * , b ^ * ) is defined in (17).
Lemma A1. 
Suppose that Assumptions A1–A10 hold, and the number of knots K = O p ( n 0 1 / ( 2 r + 1 ) ) . Then, we have
1 n 0 i = 1 n i 0 ϖ i * ϖ i * H n E n 1 H n P Ω ,
where
E n = 1 n 0 i = 1 n i 0 G i ( ϕ 0 * ) G i ( ϕ 0 * ) ,
H n = 1 n 0 i = 1 n i 0 G i ( ϕ 0 * ) ϖ i * ,
ϖ i * = α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) and ‘P’ means the convergence in probability.
Proof of Lemma A1. 
Let
ϖ ˜ i * = ϖ i * H n E n 1 G i * ( ϕ 0 * ) , G * = ( 1 0 G 1 * ( ϕ 0 * ) , , n 0 G n * ( ϕ 0 * ) ) ,
and
ϖ * = ( 1 0 ϖ 1 * , , n 0 ϖ n * ) , ϖ * = ϖ * Γ n + Γ n = : Δ n + Γ n ,
where
Γ n = ( H 1 ( ϕ 0 * ) E 1 ( ϕ 0 * ) 1 1 0 G 1 ( ϕ 0 * ) , , H n ( ϕ 0 * ) E n ( ϕ 0 * ) 1 n 0 G n ( ϕ 0 * ) ) .
Then, a simple calculation yields
1 n 0 i = 1 n i 0 ϖ ˜ i * ϖ ˜ i * = n 0 1 ϖ * ( I P ) ( I P ) ϖ * = n 0 1 Δ n Δ n + Γ n ( I P ) ( I P ) Γ n + Δ n ( I P ) ( I P ) Γ n + Γ n ( I P ) ( I P ) Δ n + Δ n P P Δ n ,
where P = G * ( G * G * ) 1 G * . To streamline the expression, we introduce the shorthand notation D k ( ϱ ) to represent the difference between η k ( ϱ ) and B ( ϱ ) b k 0 . According to [45] [Corollary 6.21], if the functions η k ( ϱ ) for k = 1 , , p satisfy Assumption A2, then there exists a positive constant C such that
sup ϱ D k ( ϱ ) C K r , sup ϱ D ˙ k ( ϱ ) = sup ϱ η ˙ k ( ϱ ) B ˙ ( ϱ ) b k 0 C K r + 1 .
Therefore, a matrix M can be determined that satisfies the condition Γ n G * M = O p ( n 0 1 / 2 K r ) . Additionally, by taking into account the projection matrix P , we derive
( I P ) Γ n = Γ n G * M + G * M P Γ n 2 Γ n G * M = O p ( n 0 1 / 2 K r ) .
Moreover, by direct calculations, it is derived that E ( G * Δ n u 1 , , u n ) = 0 . Then, we have
E ( G * Δ n ) = 0 , and E ( G * Δ n 2 ) = E ( i = 1 n G i * Δ n i 2 ) = O p ( n 0 K ) ,
where Δ n i is the i th row of Δ n . Hence, G Δ n = O p ( n 0 1 / 2 K 1 / 2 ) . After that, we have
P Δ n G * ( G G * ) 1 G Δ n = O p ( n 0 1 / 2 K 1 / 2 ) O p ( n 0 1 K ) O p ( n 0 1 / 2 K 1 / 2 ) = O p ( K ) ,
and
( I P ) Δ n = O p ( n 0 1 / 2 ) .
Consequently, by referencing equations (A18)–(A21), we note that all but the first term on the right-hand side of (A16) are of order o p ( 1 ) . Additionally, utilizing the law of large numbers, we deduce that the first term converges to Ω in probability, thereby validating the intended conclusion. □
Let 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) = 1 n 0 Ω 1 / 2 i = 1 n ϖ ˜ i * ( i 0 I ( Y i > w n ) ) , then we will show that 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) is asymptotically normal with bias ε .
Lemma A2. 
Assume Assumptions A1–A10 hold; we have
1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) = 1 n 0 Ω 1 / 2 i = 1 n ϖ ˜ i * ( i 0 I ( Y i > w n ) ) d N ( ε , I ) ,
where ‘ d ’ stands for converges in distribution.
Proof of Lemma A2. 
The asymptotic normality of Φ ˜ n ( ϕ 0 * ) is derived in two stages owing to its composition as a sum of n independent and identically distributed random variables. Initially, it is demonstrated that the expectation E 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) converges to ε . Subsequently, the covariance cov 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) tends to the identity matrix I . As a result, applying the Central Limit Theorem, we establish that 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) converges in distribution to a normal distribution with mean ε and variance–covariance matrix I . To simplify the notation, let v i = ( x i , z i ) , and define
Q n 1 = n n 0 Ω 1 / 2 E ϖ ˜ i * exp ( α ( v i , ϕ 0 * ) ) log ( Y i / w n ) I ( Y i > w n ) Q n 2 = n n 0 Ω 1 / 2 E ϖ ˜ i * I ( Y i > w n ) .
Then, E 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) = Q n 1 Q n 2 . Subsequently, by Assumption A8, the Q n 1 can be written as
Q n 1 = n n 0 Ω 1 / 2 E ϖ ˜ i * exp ( α ( v i , ϕ 0 * ) ) 0 P ( log ( Y i / w n ) > t ) d t = n n 0 Ω 1 / 2 E ϖ ˜ i * exp ( α ( v i , ϕ 0 * ) ) 0 w n γ ( v i ) e γ ( v i ) t V ( w n e t , v i ) d t = n n 0 Ω 1 / 2 E ( ϖ ˜ i * exp ( α ( v i , ϕ 0 * ) ) w n γ ( v i ) ) × 0 e γ ( v i ) t c 0 ( v i ) + c 1 ( v i ) w n h ( v i ) e h ( v i ) t d t × { 1 + o ( 1 ) } = n n 0 Ω 1 / 2 E ϖ ˜ i * c 0 ( v i ) w n γ ( v i ) 0 γ ( v i ) exp γ ( v i ) t d t + E ϖ ˜ i * c 1 ( v i ) w n γ ( v i ) h ( v i ) 0 γ ( v i ) exp [ t { γ ( v i ) + h ( v i ) } ] d t × { 1 + o ( 1 ) } = n n 0 Ω 1 / 2 E ϖ ˜ i * c 0 ( v i ) w n γ ( v i ) + E ϖ ˜ i * c 1 ( v i ) γ ( v i ) γ ( v i ) + h ( v i ) w n γ ( v i ) h ( v i ) × { 1 + o ( 1 ) } .
Analogously, we have
Q n 2 = n n 0 Ω 1 / 2 E ϖ ˜ i * w n γ ( v i ) V ( w n , v i ) = n n 0 Ω 1 / 2 E ϖ ˜ i * c 0 ( v i ) w n γ ( v i ) + n n 0 Ω 1 / 2 E ϖ ˜ i * c 1 ( v i ) w n γ ( v i ) h ( v i ) + o ( 1 ) .
Assumption A7 together with (A24) and (A25) implies that
Q n 1 Q n 2 = n n 0 Ω 1 / 2 E ϖ ˜ i * c 1 ( v i ) h ( v i ) γ ( v i ) + h ( v i ) w n γ ( v i ) h ( v i ) + o ( 1 ) ε .
This completes the proof of the first step.
Step 2. We evaluate cov { 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) } , which is given by
cov 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) = n n 0 Ω 1 cov ( exp ( α ( v i , ϕ 0 * ) ) log ( Y i / w n ) 1 ) ϖ ˜ i * I ( Y i > w n ) = n n 0 Ω 1 E ( exp ( α ( v i , ϕ 0 * ) ) log ( Y i / w n ) 1 ) 2 ϖ ˜ i * ϖ ˜ i * I ( Y i > w n ) n n 0 ε n ε n ,
where ε n = Q n 1 Q n 2 . Based on the outcome of Step 1, it follows that n n 0 ε n ε . This implies that n n 0 1 ε n ε n 0 . Accordingly,
cov 1 n 0 Ω 1 / 2 Φ ˜ n ( ϕ 0 * ) = ( n n 0 ) Ω 1 P ( Y i > w n ) × E ( exp ( α ( v i , ϕ 0 * ) ) log ( Y i / w n ) 1 ) 2 ϖ ˜ i * ϖ ˜ i * Y i > w n + o ( 1 ) .
As illustrated by [8], n n 0 1 P ( Y i > w n ) converges to 1. Furthermore, for sufficiently large w n such that Y i > w n , the quantity exp ( α ( v i , ϕ 0 * ) ) log ( Y i / w n ) approximates a standard exponential distribution. Additionally, utilizing analogous methods from Step 1’s verification and considering Assumption A7, we can establish that the expression in (A28) satisfies E ( ϖ ˜ i * ϖ ˜ i * Y i > w n ) + o ( 1 ) Ω as w n . This completes the proof. □
Proof of Theorem 1. 
(i) Define τ = n 0 r / ( 2 r + 1 ) + a n + R w n , and ϕ = ϕ 0 + τ ρ and τ n * = ϕ 0 . Then, the objective function L n ( ϕ ) can be written as
L n ( ϕ ) = i = 1 n log ( Y i / w n ) exp α ( v i , ϕ ) α ( v i , ϕ ) I ( Y i > ω n ) + P n ( ϕ ) .
In addition, let the minimizer of the above objective function be ϕ ^ . The Hessian matrix of L n ( ϕ ) is
L ¨ n ( ϕ ) = i = 1 n log ( Y i / w n ) α ˙ ( v i , ϕ ) 2 exp α ( v i , ϕ ) + α ¨ ( v i , ϕ ) ( log ( Y i / w n ) exp α ( v i , ϕ ) 1 ) I ( Y i > ω n ) + P ¨ n ( ϕ )
which is a positive definite matrix for any ϕ R ( p 1 ) × d . Since L n ( ϕ ) is a strictly convex function of ϕ , we can deduce that if L n ( ϕ ) possesses at least one local minimizer whose magnitude is on the order of O p ( τ ) , then this local minimizer necessarily corresponds to the global minimizer. Specifically, the difference between the estimated parameter ϕ ^ and the true parameter ϕ 0 satisfies ϕ ^ ϕ 0 = O p ( τ ) . To further analysis this, let us consider the difference L n ( ϕ ) L n ( ϕ 0 ) . Applying the second-order Taylor expansion, we obtain:
n 0 1 / 2 L n ( ϕ ) L n ( ϕ 0 ) = n 0 1 / 2 τ ρ L ˙ n ( ϕ 0 ) + τ 2 ρ n 0 1 L ¨ n ( ϕ 0 ) ρ / 2 + o p ( 1 )
Furthermore, applying Lemma A2 and Slutsky’s theorem, we obtain that
n 0 1 / 2 L ˙ n ( ϕ 0 ) = n 0 1 / 2 Φ ˙ n ( ϕ 0 ) + n 0 1 / 2 P ˙ d N ( R + n 0 1 / 2 P ˙ , Ω ( Ω + P ¨ ) 2 )
and
n 0 1 L ¨ n ( ϕ 0 ) p Ω + P ¨ .
Hence, provided that the constant C is adequately large, the quadratic term 1 2 ρ n 0 1 L ¨ n ( ϕ 0 ) ρ dominates the linear term n 0 1 / 2 ρ L ˙ n ( ϕ 0 ) with overwhelming probability. In particular, for any positive ϵ , there exists a threshold C such that
lim inf n P ( inf ρ = C L n ( ϕ ) > L n ( ϕ 0 ) ) 1 ϵ .
This implies that L n ( ϕ ) has at least one local minimizer, which is O p ( τ ) (see [28]).
(ii) For given ϕ ^ , Let
b ^ = b ( ϕ ^ ) = b 0 ( ϕ ^ ) + δ ρ b .
Subsequently, we will establish that for any positive ϵ , there exists a sufficiently large constant C satisfying the following condition:
P inf ρ b = C L n ( b ) > L n ( b 0 ) 1 ϵ ,
where b 0 is the true value of b . Follows previous notation, let L n ( b ) = Φ ( b ) + P n ( b ) , where
Φ n ( b ) : = i = 1 n log ( Y i w n ) exp ( G i ( ϕ ) b ) ( G i ( ϕ ) b ) I ( Y i > w n ) , P n ( b ) : = n 0 j = 1 ( p 1 ) × d p λ 1 j ( ϕ j ) + n 0 k = 1 d p λ 2 k ( b k ( ϕ ) H ) + n 0 k = 1 d p λ 3 k ( δ k ( ϕ ) Q ) .
Furthermore, let us define T k ( ρ b ) = K 1 L n ( b ) L n ( b 0 ) . By utilizing the Taylor expansion and a straightforward calculation, we obtain
T k ( ρ b ) = 1 K L n ( b 0 + δ ρ b ) L n ( b 0 ) 1 K δ ρ b Φ ˙ n ( b 0 ) + 1 2 K ρ b Φ ¨ n ( b 0 ) ρ b δ 2 + o p ( 1 ) + n 0 K k = 1 d p λ 2 k ( b k H ) p λ 2 k ( b k 0 H ) + n 0 K k = 1 d p λ 3 k ( δ k H ) p λ 3 k ( δ k 0 H ) = S 1 + S 2 + S 3 + S 4 + o p ( 1 ) .
Continue to use the previous symbol
Φ ˙ n ( b 0 ) = i = 1 n α ^ ˙ ( v i , b 0 ) ( log ( Y i w n ) exp ( α ^ ( v i , b 0 ) ) 1 ) I ( Y i > w n ) = i = 1 n α ^ ˙ ( v i ; b 0 ) i 0 I ( Y i > w n ) + i = 1 n α ^ ˙ ( v i ; b 0 ) ^ i i 0 : = E 1 + E 2 .
where α ^ is defined in (17). By the Lemma A2, we know that E 1 = O p ( n 0 R ) . As (A17),
E 2 = i = 1 n α ^ ( 1 ) ( v i , b 0 ) { ^ i i 0 } = i = 1 n α ^ ( 1 ) ( v i , b 0 ) { i 0 ( α ^ ( v i , b 0 ) α ( v i , b 0 , η 0 ) ) } = O p ( n 0 K r ) .
Then,
S 1 = O p ( τ K [ n 0 R + n 0 K r ] ) ρ b .
Similarly,
Φ ¨ n ( b 0 ) = i = 1 n α ^ ( 1 ) ( v i , b 0 ) { ( log ( Y i w n ) exp ( α ^ ( v i , b 0 ) ) ) α ^ ( 1 ) ( v i , b 0 ) } I ( Y i > w n ) + i = 1 n α ^ ( 2 ) ( v i , b 0 ) { ( log ( Y i w n ) exp ( α ^ ( v i , b 0 ) ) 1 ) } I ( Y i > w n ) = i = 1 n α ^ ( 1 ) ( v i , b 0 ) ^ i α ^ ( 1 ) ( v i , b 0 ) i = 1 n α ( 1 ) ( v i , η 0 ) i 0 α ( 1 ) ( v i , η 0 ) + i = 1 n α ^ ( 2 ) ( v i , b 0 ) { ^ i I ( Y i > w n ) } i = 1 n α ( 2 ) ( v i , η 0 ) { i 0 I ( Y i > w n ) } + i = 1 n α ( 1 ) ( v i , η 0 ) i 0 α ( 1 ) ( v i , η 0 ) + i = 1 n α ( 2 ) ( v i , η 0 ) { i 0 I ( Y i > w n ) } : = E 4 + E 5 + E 6 .
As
E 4 = i = 1 n α ^ ( 1 ) ( v i , b 0 ) ^ i α ^ ( 1 ) ( v i , b 0 ) i = 1 n α ( 1 ) ( v i , η 0 ) i 0 α ( 1 ) ( v i , η 0 ) = i = 1 n ( α ^ ( 1 ) v i ; b 0 α ( 1 ) ( v i , η 0 ) ) ^ i α ^ ( 1 ) ( v i , b 0 ) + α ( 1 ) ( v i , η 0 ) ( ^ i i 0 ) α ^ ( 1 ) ( v i , b 0 ) + α ( 1 ) ( v i , η 0 ) i 0 ( α ^ ( 1 ) v i ; b 0 α ( 1 ) ( v i , η 0 ) ) = n 0 O p ( K r ) ,
and
E 5 = i = 1 n α ^ ( 2 ) ( v i , b 0 ) { ^ i I ( Y i > w n ) } i = 1 n α ( 2 ) ( v i , η 0 ) { i 0 I ( Y i > w n ) } = n 0 O p ( K r + 2 ) ,
then from Lemma A2 and Assumption A7,
S 2 = 1 2 τ 2 O p ( n 0 [ R 2 + K r + 2 ] ) ρ b 2 .
Furthermore, invoking p λ ( 0 ) = 0 , and by the standard argument of the Taylor expansion,
S 3 n 0 K k = 1 d τ p ˙ λ 2 k ( b k 0 ) sgn ( b k 0 ) ρ b k + τ 2 p ¨ λ 2 k ( b k 0 ) ρ b k 2 { 1 + o ( 1 ) } s 1 K 1 n 0 τ a n ρ b k + n 0 K 1 τ 2 b n ρ b k 2 .
where b n is defined in (A4). Hence, by selecting a sufficiently large C, we observe that S 2 uniformly dominates S 1 and S 3 for all ρ b = C . Following a similar reasoning, we can establish that S 4 is also uniformly dominated by S 2 for the same ρ b = C . Consequently, with an adequate choice of C, the condition (A36) is satisfied. Therefore, there exist local minimizers b ^ such that
b ^ b 0 = O p ( τ ) .
Note that
η ^ k ( u ) η k 0 ( u ) 2 = U η ^ k ( u ) η k 0 ( u ) 2 d u = U B ( u ) b ^ k B ( u ) b k 0 + D k ( u ) 2 d u 2 U B ( u ) b ^ k B ( u ) b k 0 2 d u + 2 U D k 2 ( u ) d u = 2 ( b ^ k b k 0 ) H ( b ^ k b k 0 ) + 2 U D k 2 ( u ) d u ,
Then, invoking H = O ( 1 ) , a simple calculation yields
( b ^ k b k 0 ) T H ( b ^ k b k 0 ) = τ 2 ρ b 2 ,
In addition, it is easy to show that
U D k 2 ( u ) d u = O p ( n 2 r / ( 2 r + 1 ) ) .
By utilizing Equations (A49) and (A50), the demonstration of (ii) is concluded. □
Proof of Theorem 2. 
(i) As λ max approaches zero, it becomes evident that a n tends to zero for sufficiently large n. Consequently, according to Theorem 1, it suffices to demonstrate that for any ϕ j that fulfills the condition
ϕ j ϕ j 0 = O p ( n 0 r / ( 2 r + 1 ) + R w n ) , j = 1 , , h ,
and some given small ϵ = C n 0 r / ( 2 r + 1 ) , when n 0 , with probability approaching one, we have
L n ( ϕ ) ϕ j > 0 , 0 < ϕ j < ϵ , < 0 , ϵ < ϕ j < 0 , j = h + 1 , , ( p 1 ) × d .
A simple calculation yields
L ( ϕ ) ϕ j = Φ n ( ϕ ) ϕ j + n 0 p λ 1 j ( ϕ j ) sgn ( ϕ j ) = Φ n ( ϕ 0 ) ϕ j + k = 1 ( p 1 ) × d 2 Φ n ( ϕ 0 ) ϕ j ϕ k ( ϕ k ϕ k 0 ) + o = 1 ( p 1 ) × d k = 1 ( p 1 ) × d 3 Φ n ( ϕ ˜ ) ϕ j ϕ k ϕ o × ( ϕ k ϕ k 0 ) ( ϕ o ϕ o 0 ) + n 0 p λ 1 j ( ϕ j ) sgn ( ϕ j ) ,
where ϕ ˜ lies between ϕ and ϕ 0 . Note that by Lemma A2,
n 0 1 Φ n ( ϕ 0 ) ϕ j = O p ( n 0 1 / 2 )
and
1 n 0 2 L ( ϕ 0 ) ϕ j ϕ k = E 2 L ( ϕ 0 ) ϕ j ϕ k + o p ( 1 ) ,
by the Theorem 1 that ϕ ϕ 0 = O p ( n 0 r / ( 2 r + 1 ) + R w n ) , we have
L ( ϕ ) ϕ j = n 0 λ 2 j λ 1 j 1 p ˙ λ 1 j ( ϕ j ) sgn ( ϕ j ) + O p ( ( n 0 r / ( 2 r + 1 ) + R w n ) λ 1 j 1 ) .
Since
lim n 0 lim inf ϕ j 0 λ 1 j 1 p ˙ λ 1 j ( ϕ j ) > 0 , λ 1 j n 0 r / ( 2 r + 1 ) λ min n 0 r / ( 2 r + 1 ) ,
The sign of the derivative is solely dictated by the sign of ϕ j . Consequently, (A53) is satisfied, thus completing the proof of part (i).
Subsequently, we proceed to prove part (ii). Given that η ^ k ( u ) = B ( u ) b ^ k and η ˙ ^ k ( u ) = D k δ ^ k , to establish part (ii), it suffices to demonstrate that for any arbitrary ϵ > 0 , there exists a sufficiently large n 0 such that P ( A n k ) < ϵ and P ( S n j ) < ϵ , where A n k = b ^ k H 0 and S n j = δ ^ j Q 0 for k = q + 1 , , d and j = m + 1 , , d . Utilizing the properties of B-splines, we can deduce that
b ^ k H = b ^ k b k 0 H = b ^ k b k 0 O ( K 1 / 2 ) = O p ( n 0 r / ( 2 r + 1 ) ) O ( n 0 1 / 2 ( 2 r + 1 ) ) = O p ( n 0 1 / 2 )
for k = q + 1 , , d . Then when n 0 is large enough, there exists some C such that
P ( A n k ) < ϵ / 2 + P b ^ k H 0 : b ^ k H < C n 0 1 / 2 .
If b ^ k H 0 , then by Assumption A11, we can prove that, when n 0 is large enough, there exists some C such that
P ( n 1 / 2 p ˙ λ 2 k ( b ^ k H ) > C ) < ϵ / 2 .
In addition, by Assumption A11, we get that
inf b ˙ k H C n 0 1 / 2 n 0 1 / 2 p ˙ λ 2 k ( b ^ k H ) inf b ˙ k H c n n 0 1 / 2 p ˙ λ 2 k ( b ^ k H ) = n 0 1 / 2 λ 2 k inf b ˙ k H c n λ 2 k 1 λ ˙ λ 2 k ( b ^ k H ) ,
where c n = C n 0 2 r 1 2 ( 2 r + 1 ) . That is, if b ^ k H 0 and b ^ k H < C n 0 1 / 2 , then for sufficiently large n, it follows that n 0 1 / 2 p ˙ λ 2 k ( b ^ k H ) > C . Consequently, in conjunction with (A59) and (A60), we obtain
P ( A n k ) < ϵ / 2 + P ( n 0 1 / 2 p ˙ λ 2 k ( b ^ k H ) > C ) < ϵ .
On the other hand, utilizing the relationship b ^ b 0 = O p ( n 0 r / ( 2 r + 1 ) ) , a straightforward computation yields δ ^ k δ k 0 = A ( b ^ k b k 0 ) = O p ( n 0 r 2 r + 1 ) for k = 1 , , d . Therefore, for j = m + 1 , , d , we have δ ^ j Q = δ ^ j δ j 0 Q = δ ^ j δ j 0 O ( K 1 / 2 ) = O p ( n 0 r 2 r + 1 ) O ( n 0 1 2 ( 2 r + 1 ) ) = O p ( n 0 2 r 1 2 ( 2 r + 1 ) ) . Consequently, for sufficiently large n 0 , there exists a constant C such that
P ( S n j ) < ϵ / 2 + P δ ^ j Q 0 : δ ^ j Q < C n 0 2 r 1 2 ( 2 r + 1 ) .
If δ ^ j Q 0 , then under the assumption stated in A11, we can establish that, for sufficiently large n, there exists a constant C such that
P ( n 0 2 r 1 2 ( 2 r + 1 ) p ˙ λ 3 j ( δ ^ j Q ) > C ) < ϵ / 2 .
In addition, by Assumption A9, we get that
inf δ ^ j Q c n n 0 2 r 1 2 ( 2 r + 1 ) p ˙ λ 3 j ( δ ^ j Q ) = n 0 2 r 1 2 ( 2 π + 1 ) λ 3 j inf δ ^ j Q c n λ 3 j 1 p ˙ λ 3 j ( δ ^ j Q )
In other words, if the norm of δ ^ j is nonzero and bounded above by C n 0 2 r 1 2 ( 2 r + 1 ) , then for sufficiently large n 0 , n 0 2 r 1 2 ( 2 r + 1 ) p ˙ λ 3 j ( δ ^ j Q ) > C . Consequently, combining (A63) and (A64), we obtain P ( S n j ) < ϵ , indicating that η ^ j ( u ) are nonzero constants for j = m + 1 , , d . This concludes the proof of Theorem 2. □
Proof of Theorem 3. 
By Theorems 1 and 2, it is apparent that as n 0 , with probability tending to one, L n ( ϕ , b ) attains its minimum at ( ϕ ^ * , 0 ) and ( b ^ * , 0 ) . Let
L 1 n ( ϕ , b ) = L n ( ϕ , b ) ϕ * , L 2 n ( ϕ , b ) = L n ( ϕ , b ) b * .
Then, ( ϕ ^ * , 0 ) and ( b ^ * , 0 ) must satisfy
1 n 0 L 1 n ( ( ϕ ^ * , 0 ) , ( b ^ * , 0 ) ) = 1 n 0 i = 1 n { α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) ( log ( Y i w n ) exp ( α ^ ( v i ; ϕ ^ * , b ^ * ) ) 1 ) } I ( Y i > w n ) + C 1 = 0 ,
and
1 n 0 L 2 n ( ( ϕ ^ * , 0 ) , ( b ^ * , 0 ) ) = 1 n 0 i = 1 n { G i * ( ϕ ^ * ) ( log ( Y i w n ) exp ( α ^ ( v i ; ϕ ^ * , b ^ * ) ) 1 ) } I ( Y i > w n ) + C 2 = 0 ,
where α ^ ( v i ; ϕ , b ) is defined at (17),
α ^ ( 1 ) ( v i ; ϕ , b ) = G ˙ ( β k ( ϕ k ) z i ) b ( 1 ϕ k 2 ) 1 / 2 ϕ : I q 1 z i , 1 k d
G ˙ ( β ( ϕ ) = vec ( B ˙ ( β ( ϕ ) z i ) diag ( x i ) )
and
C 1 = ( p ˙ λ 11 ϕ ^ 1 sgn ϕ ^ 1 , , p ˙ λ 1 h 1 ϕ ^ h 1 sgn ϕ ^ h 1 ) ,
C 2 = p ˙ λ 21 b 1 sgn b 1 , , p ˙ λ 2 q b q sgn b q , p ˙ λ 31 δ 1 sgn δ 1 , , p ˙ λ 3 m δ m sgn δ m .
By applying the Taylor expansion to p ˙ λ 1 j ( | ϕ ^ j | ) , we obtain
p ˙ λ 1 j ( ϕ ^ j ) = p ˙ λ 1 j ( ϕ j 0 ) + p ¨ λ 1 j ( ϕ l j 0 ) + o p ( 1 ) ( ϕ ^ j ϕ j 0 ) .
Furthermore, Assumption A10 implies that p ¨ λ 1 j ( | ϕ j 0 | ) = o p ( 1 ) , and since p ˙ λ 1 j ( | ϕ j 0 | ) = 0 as λ max 0 , by Theorems 1 and 2, we have
p ˙ λ 1 j ( ϕ ^ j ) sgn ( ϕ ^ j ) = o p ( ϕ ^ * ϕ 0 * ) .
Similarly, we can show
p ˙ λ 2 k ( b ^ k H ) H b ^ k b ^ k H = o p ( b ^ * b 0 * )
and
p ˙ λ 3 s ( b ^ s Q ) Q b ^ s b ^ s Q = o p ( b ^ * b 0 * ) .
Then, denote Λ i = ^ i i 0 ,
Λ i = log ( Y i w n ) { exp ( α ^ ( v i ; ϕ ^ * , b ^ * ) ) exp ( α 0 ( v i ; ϕ 0 * , η 0 * ) ) } I ( Y i > w n ) = log ( Y i w n ) exp ( α 0 ( v i ; ϕ 0 * , η 0 * ) ) { α ^ ( v i ; ϕ ^ * , b ^ * ) α 0 ( v i ; ϕ 0 * , η 0 * ) } I ( Y i > w n ) + o p ( α ^ α 0 ) = i 0 { α ^ ( v i ; ϕ ^ * , b ^ * ) α 0 ( v i ; ϕ 0 * , η 0 * ) } + o p ( α ^ α 0 ) = i 0 { G i ( ϕ ^ * ) b ^ * η 0 ( ϕ 0 * u i ) x i } + o p ( α ^ α 0 ) .
then
G i ( ϕ ^ * ) b ^ * η 0 ( ϕ 0 * u i ) x i = G i ( ϕ 0 * ) b 0 * η 0 ( ϕ 0 * u i ) x i + ( G i ( ϕ ^ * ) G i ( ϕ 0 * ) ) b 0 * + G i ( ϕ ^ * ) ( b ^ * b 0 * ) = G i ( ϕ 0 * ) b 0 * η 0 ( ϕ 0 * u i ) x i + G ˙ i ( ϕ 0 * ) b 0 * ( ϕ ^ * ϕ 0 * ) + G i ( ϕ ^ * ) ( b ^ * b 0 * ) + o p ( ϕ ^ * ϕ 0 * ) = D ( ϕ 0 * u i ) x i + η ˙ 0 ( ϕ 0 * u i ) x i ( ϕ ^ * ϕ 0 * ) + G i ( ϕ ^ * ) ( b ^ * b 0 * ) + o p ( ϕ ^ * ϕ 0 * ) .
We can get
Λ i = i 0 { D ( ϕ 0 * u i ) x i + α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) ( ϕ ^ * ϕ 0 * ) + G i ( ϕ ^ * ) ( b ^ * b 0 * ) } + o p ( α ^ α 0 ) .
Hence, by (A68), a simple calculation yields
1 n 0 i = 1 n { G i * ( ϕ ^ * ) ( log ( Y i w n ) exp ( α ^ ( v i ; ϕ ^ * , b ^ * ) ) 1 ) } I ( Y i > w n ) = 1 n 0 i = 1 n G i ( ϕ ^ * ) i 0 I ( Y i > w n ) + Λ i = 1 n 0 i = 1 n G i ( ϕ ^ * ) { i 0 I ( Y i > w n ) + i 0 D ( ϕ 0 * u i ) x i } + 1 n 0 i = 1 n G i ( ϕ ^ * ) i 0 { α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) ( ϕ ^ * ϕ 0 * ) } + 1 n 0 i = 1 n G i ( ϕ ^ * ) i 0 { G i ( ϕ ^ * ) ( b ^ * b 0 * ) } .
Then, from Assumption A3, Theorem 1, and sup u B ( u ) = O ( 1 ) , we have
b ^ * b 0 * = E ^ n + o p ( 1 ) 1 A n + H ^ n ( ϕ ^ * ϕ 0 * ) ,
where
E ^ n = 1 n 0 i 1 n i 0 G i ( ϕ ^ * ) G i ( ϕ ^ * ) ,
H ^ n = 1 n 0 i 1 n i 0 G i ( ϕ ^ * ) ϖ i * ,
A n = 1 n 0 i = 1 n G i ( ϕ ^ * ) { i 0 I ( Y i > w n ) + i 0 D ( ϕ 0 * u i ) x i } .
Then, substituting into (A67), we can get
0 = 1 n 0 i = 1 n { α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) ( log ( Y i w n ) exp ( α ^ ( v i ; ϕ ^ * , b ^ * ) ) 1 ) } I ( Y i > w n ) + o p ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) { i 0 I ( Y i > w n ) + i 0 D ( ϕ 0 * u i ) x i } + 1 n 0 i = 1 n α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) i 0 { α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) ( ϕ ^ * ϕ 0 * ) } + 1 n 0 i = 1 n α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) i 0 { G i ( ϕ ^ * ) ( b ^ * b 0 * ) } + o p ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) { i 0 I ( Y i > w n ) + i 0 D ( ϕ 0 * u i ) x i i 0 { G i ( ϕ ^ * ) E ^ n + o p ( 1 ) 1 A n } + 1 n 0 i = 1 n α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) i 0 { α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) G i ( ϕ ^ * ) [ E ^ n + o p ( 1 ) ] 1 H ^ n } ( ϕ ^ * ϕ 0 * ) + o p ( ϕ ^ * ϕ 0 * ) = : J 1 + J 2 + o p ( ϕ ^ * ϕ 0 * ) .
For J 1 , a simple calculation yields
J 1 = 1 n 0 i = 1 n α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) M 1 i + 1 n 0 i = 1 n { α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) α ^ ( 1 ) ( v i ; ϕ 0 * , b 0 * ) } M 1 i + 1 n i = 1 n { α ^ ( 1 ) ( v i ; ϕ 0 * , b 0 * ) α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) } M 1 i = : J 11 + J 12 + J 13 ,
where
M 1 i = i 0 I ( Y i > w n ) + i 0 D ( ϕ 0 * u i ) x i i 0 G i ( ϕ ^ * ) E ^ n + o p ( 1 ) 1 A n .
Note that
1 n 0 i = 1 n H ^ n E ^ n 1 G i * ( ϕ ^ * ) i 0 I ( Y i > w n ) + i 0 D ( ϕ 0 * u i ) x i i 0 G i ( ϕ ^ * ) E ^ n 1 A n = 0 , 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) i 0 G i * ( ϕ ^ * ) = 0 , J ( ϕ ^ * ) J ( ϕ 0 * ) = O p ( ϕ ^ * ϕ 0 * ) .
Then, by Assumption A3 and D ( u ) = O ( K r ) , we can drive that
J 11 = 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) ( i 0 I ( Y i > w n ) ) + 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) i 0 D ( ϕ 0 * u i ) x i 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) i 0 G i ( ϕ ^ * ) E ^ n + o p ( 1 ) 1 A n + o p ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) ( i 0 I ( Y i > w n ) ) + o p ( ϕ ^ * ϕ 0 * ) .
In addition, by (A17), it is easy to show that
J 12 = o p ( ϕ ^ * ϕ 0 * ) .
Similarly, we can prove that
J 13 = o p ( ϕ ^ * ϕ 0 * ) .
We now deal with J 2 . A simple calculation yields
J 2 = 1 n 0 i = 1 n α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) M 2 i + 1 n 0 i = 1 n { α ^ ( 1 ) ( v i ; ϕ ^ * , b ^ * ) α ^ ( 1 ) ( v i ; ϕ 0 * , b 0 * ) } M 2 i + 1 n i = 1 n { α ^ ( 1 ) ( v i ; ϕ 0 * , b 0 * ) α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) } M 2 i = : J 21 + J 22 + J 23 ,
where
M 2 i = i 0 { α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) G i ( ϕ ^ * ) [ E ^ n + o p ( 1 ) ] 1 H ^ n } ( ϕ ^ * ϕ 0 * ) .
Similar arguments to that of J 12 , we can get
J 22 = o p ( ϕ ^ * ϕ 0 * ) , J 23 = o p ( ϕ ^ * ϕ 0 * ) .
So,
J 2 = 1 n 0 i = 1 n α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) i 0 { α ( 1 ) ( v i ; ϕ 0 * , η 0 * ) G i ( ϕ ^ * ) [ E ^ n + o p ( 1 ) ] 1 H ^ n } ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n i 0 ϖ i * [ ϖ i * G i ( ϕ ^ * ) E ^ n 1 H ^ n ] ( ϕ ^ * ϕ 0 * ) .
From the foregoing discussion, we get
1 n 0 i = 1 n i 0 ϖ i * [ ϖ i * G i ( ϕ ^ * ) E ^ n 1 H ^ n ] ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) ( i 0 I ( Y i > w n ) ) ,
and then, we have
n 0 ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n ϖ i * H ^ n E ^ n 1 G i * ( ϕ ^ * ) ( i 0 I ( Y i > w n ) ) 1 n 0 i = 1 n i 0 ϖ i * ϖ i * H ^ n E ^ n 1 H ^ n + o p ( 1 ) .
From Theorem 1, a simple calculation leads to
H ^ n E ^ n 1 G i * ( ϕ ^ * ) = H n E n 1 G i * ( ϕ 0 * ) + o p ( ϕ ^ * ϕ 0 * ) ,
and
H ^ n E ^ n 1 H ^ n = H n E n 1 H n + o p ( ϕ ^ * ϕ 0 * ) .
Then, as A1, we have
n 0 ( ϕ ^ * ϕ 0 * ) = 1 n 0 i = 1 n ϖ ˜ i * ( i 0 I ( Y i > w n ) ) 1 n 0 i = 1 n i 0 ϖ ˜ i * ϖ ˜ i * + o p ( 1 ) .
Consequently, invoking Lemma A1, Lemma A2 and Slutsky’s theorem, we conclude that
n 0 Ω ( β ^ β 0 + Ω 1 / 2 n 0 1 / 2 ε ) D N ( 0 , J Ω J T ) .
This completes the proof. □
Proof of Theorem 4. 
First, take the first-order Taylor expansion of (28) around ι 0 for a given C i 1 to obtain
i = 1 n K h 1 ϱ i ϱ 1 ι log Y i / w n exp ( C i 1 + G ¯ i 1 ι ) C i 1 G ¯ i 1 ι I Y i > w n = i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι 1 G ¯ i 1 I Y i > w n = i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι 0 1 G ¯ i 1 I Y i > w n + i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι ˜ G ¯ i 1 G ¯ i 1 I Y i > w n ι ι 0 ,
in which ι ˜ is between ι 0 and ι ^ n for given ϱ 1 and satisfies ι ˜ ι 0 in probability. Consequently,
ι ^ n ι 0 = i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι ˜ G ¯ i 1 G ¯ i 1 I Y i > w n 1 · i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι 0 1 G ¯ i 1 I Y i > w n .
For ease of notation, we write
A 1 = 1 n 0 i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι ˜ G ¯ i 1 G ¯ i 1 I Y i > w n ,
and
Σ 1 = h 1 n 0 i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp ( C i 1 + G ¯ i 1 ι 0 ) 1 G ¯ i 1 I Y i > w n .
Next, we establish the probabilistic convergence of A 1 to A ( ϱ 1 ) , where A ( ϱ 1 ) is a diagonal matrix with entries
A k k ( ϱ 1 ) = n n 0 μ 2 ( k 1 ) E [ c 0 x i , ϱ 1 w n γ x i , ϱ 1 + c 1 X i , Z 1 γ x i , ϱ 1 γ x i , ϱ 1 + h x i , ϱ 1 · w n γ x i , ϱ 1 h x i , ϱ 1 ] f Z ( ϱ 1 ) , k = 1 , 2 .
Noting that the ( i , j ) th entry of A n is
A n i j = 1 n 0 i = 1 n K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι ˜ ϱ i ϱ 1 h 1 i + j 2 I Y i > w n , i , j = 1 , 2 ,
Then, we demonstrate the probabilistic convergence of ( A n ) i j to A i j . Using the approximation e x 1 x for x 0 and following [46], we obtain
E A n i j = n n 0 E [ K h 1 ϱ i ϱ 1 ϱ i ϱ 1 h 1 i + j 2 exp C i 1 + η 1 ϱ i · E log Y 1 / w n I Y 1 > w n ] + O n h 1 2 2 n 0 = n n 0 E K h 1 ϱ i ϱ 1 ϱ i ϱ 1 h 1 i + j 2 exp C i 1 + η 1 ϱ i 0 pr Y i > w n e t d t + O n h 1 2 2 n 0 = n n 0 E [ K h 1 ( ϱ i ϱ 1 ) ( ϱ i ϱ 1 h 1 ) i + j 2 { c 0 x i , ϱ i w n γ x i , ϱ i + c 1 x i , ϱ i γ x i , ϱ i γ x i , ϱ i + h x i , ϱ i · w n γ x i , ϱ i h x i , ϱ i } ] + O n h 1 2 2 n 0 = n n 0 μ i + j 2 E c 0 x i , ϱ i w n γ x i , ϱ i + c 1 x i , ϱ i γ x i , ϱ i γ x i , ϱ i + h x i , ϱ i · w n γ x i ϱ 1 h x i ϱ 1 ϱ i = ϱ 1 f Z ( ϱ 1 ) + o ( 1 ) .
Furthermore, the ( i , j ) -th element of the variance-covariance matrix of A n is
var A n i j = n n 0 2 E K h 1 ϱ i ϱ 1 exp C i 1 + G ¯ i 1 ι ˜ log Y i / w n I Y i > w n ϱ i ϱ 1 h 1 i + j 2 2 n n 0 2 E K h 1 ϱ i ϱ 1 exp C i 1 + G ¯ i 1 ι ˜ log Y i / w n I Y i > w n ϱ i ϱ 1 h 1 i + j 2 2 = n n 0 2 h 1 E K 2 ϱ i ϱ 1 / h 1 / h 1 exp 2 C i 1 + G ¯ i 1 ι 0 log 2 Y i / w n I Y i > w n ϱ i ϱ 1 h 1 2 ( i + j 2 ) O 1 n 0 + O n h 1 n 0 2 .
Direct calculations and Assumption A6 indicate that var [ ( A n ) i j ] = O n n 0 2 h n + O n h n n 0 2 = o ( 1 ) . Therefore, the probabilistic convergence of A n to A ( ϱ 1 ) follows from the property R = E ( R ) + O p ( var ( R ) ) for any random variable R. Additionally, the diagonal structure of A ( ϱ 1 ) is guaranteed by the definitions of μ k and ν k . Focusing on ( Σ n ) 11 , being the sum of independent and identically distributed variables, we aim to determine its asymptotic mean and variance under the assumptions of the Central Limit Theorem. Specifically, the mean of the ( 1 , 1 ) th entry of Σ n can be formulated as
E Σ n 11 = n 2 h 1 n 0 E K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι 0 1 I Y i > w n = n 2 h 1 n 0 E K h 1 ϱ i ϱ 1 exp C i 1 + G ¯ i 1 ι 0 0 pr Y i > w n e t d t n 2 h 1 n 0 E K h 1 ϱ i ϱ 1 pr Y i > w n = n 2 h 1 n 0 E K h 1 ϱ i ϱ 1 exp C i 1 + η 1 ϱ i 1 h 1 2 2 η ¨ ϱ i 0 pr Y i > w n e t d t n 2 h 1 n 0 E K h 1 ϱ i ϱ 1 pr Y i > w n = n 2 h 1 n 0 E h 1 2 2 η ¨ ϱ i μ 2 c 0 x i , ϱ i w n γ x i , ϱ i + c 1 x i , ϱ i γ x i , ϱ i γ x i , ϱ i + h x i , ϱ i w n γ x i , ϱ i h x i , ϱ i c 1 x i , ϱ i h x i , ϱ i γ x i , ϱ i + h x i , ϱ i w n γ x i , ϱ i h x i , ϱ i ϱ i = ϱ 1 f Z ( ϱ 1 ) Σ 11 ( ϱ 1 ) ,
and the corresponding variance-covariance matrix is
var Σ n 11 = n h 1 n 0 var K h 1 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι 0 1 I Y i > w n = n h 1 n 0 E K h 1 2 ϱ i ϱ 1 log Y i / w n exp C i 1 + G ¯ i 1 ι 0 1 2 I Y i > w n + O ( 1 / n ) = n n 0 E { c 0 x i , ϱ i w n γ x i , ϱ i + c 1 x i , ϱ i γ 2 x i , ϱ i + h 2 x i , ϱ i γ x i , ϱ i + h x i , ϱ i 2 · w n γ x i , ϱ i h x i , ϱ i ϱ i = ϱ 1 } f U ( ϱ 1 ) ν 0 + o ( 1 ) Λ 11 ( ϱ 1 ) .
Thus, we conclude that
n 0 h 1 η ^ 1 ( ϱ 1 ) η 1 ( ϱ 1 ) = A 11 ( ϱ 1 ) 1 Σ 11 ( ϱ 1 ) + o p 1 n 0 h 1 + h 1 2 ,
which leads to the asymptotic variance matrix A 11 ( ϱ 1 ) 1 Λ 11 ( ϱ 1 ) A 11 ( ϱ 1 ) 1 . Therefore, we complete the proof. □
Proof of theorem 5. 
As η ˜ L L , 1 ( ϱ 1 ) is the solution of
i n K h 1 ϱ i ϱ 1 X i 1 { log Y i w n exp ( C i 1 ) exp ( X i 1 η ˜ L L , 1 ( ϱ 1 ) ) 1 } I ( Y i > w n ) = 0 ,
and η ^ S B L L , 1 is the solution of
i n K h 1 ϱ i ϱ 1 X i 1 { log Y i w n exp ( C ^ i 1 ) exp ( X i 1 η ^ S B L L , 1 ( ϱ 1 ) ) 1 } I ( Y i > w n ) = 0 ,
Then joining the two equations, after a simple calculation, we can get
η ^ S B L L , 1 η ˜ L L , 1 = i = 1 n K h 1 ϱ i ϱ 1 log Y i w n X 1 i exp ( C ˜ i 1 ) ( C ^ i 1 C i 1 ) I ( Y i > w n ) i = 1 n K h 1 ϱ i ϱ 1 log Y i w n X 1 i 2 exp ( C ˜ i 1 ) I ( Y i > w n ) ,
where C ˜ i 1 is between C i 1 + X i 1 η ˜ L L , 1 ( ϱ 1 ) and C ^ i 1 + X i 1 η ^ S B L L , 1 ( ϱ 1 ) . So according to [36], we have
sup η ^ S B L L , 1 η ˜ L L , 1 = O p ( K r ) .
This completes the proof of the Theorem 5. □

References

  1. Beirlant, J.; Goegebeur, Y.; Segers, J.; Teugels, J.L. Statistics of Extremes: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 558. [Google Scholar]
  2. Castillo, E.; Hadi, A.S.; Balakrishnan, N.; Sarabia, J.M. Extreme Value and Related Models with Applications in Engineering and Science; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
  3. Resnick, S.I. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  4. Goegebeur, Y.; Guillou, A.; Schorgen, A. Nonparametric regression estimation of conditional tails: The random covariate case. Statistics 2014, 48, 732–755. [Google Scholar] [CrossRef]
  5. Gardes, L.; Stupfler, G. An integrated functional Weissman estimator for conditional extreme quantiles. REVSTAT-Stat. J. 2019, 17, 109–144. [Google Scholar]
  6. Goegebeur, Y.; Guillou, A.; Osmann, M. A local moment type estimator for the extreme value index in regression with random covariates. Can. J. Stat. 2014, 42, 487–507. [Google Scholar] [CrossRef]
  7. Beirlant, J.; Goegebeur, Y. Regression with response distributions of Pareto-type. Comput. Stat. Data Anal. 2003, 42, 595–619. [Google Scholar] [CrossRef]
  8. Wang, H.; Tsai, C.L. Tail index regression. J. Am. Stat. Assoc. 2009, 104, 1233–1240. [Google Scholar] [CrossRef]
  9. Ma, Y.; Jiang, Y.; Huang, W. Empirical likelihood based inference for conditional Pareto-type tail index. Stat. Probab. Lett. 2018, 134, 114–121. [Google Scholar] [CrossRef]
  10. An, H.; Tian, B. Unleashing the Potential of Mixed Frequency Data: Measuring Risk with Dynamic Tail Index Regression Model. Comput. Econ. 2024, 1–49. [Google Scholar] [CrossRef]
  11. Li, R.; Leng, C.; You, J. Semiparametric Tail Index Regression. J. Bus. Econ. Stat. 2020, 40, 82–95. [Google Scholar] [CrossRef]
  12. Ma, Y.; Jiang, Y.; Huang, W. Tail index varying coefficient model. Commun. Stat.-Theory Methods 2019, 48, 235–256. [Google Scholar] [CrossRef]
  13. Momoki, K.; Yoshida, T. Hypothesis testing for varying coefficient models in tail index regression. Stat. Pap. 2024, 1–32. [Google Scholar] [CrossRef]
  14. Ma, S.; Song, P.X.K. Varying index coefficient models. J. Am. Stat. Assoc. 2015, 110, 341–356. [Google Scholar] [CrossRef]
  15. Dong, H.; Otsu, T.; Taylor, L. Estimation of varying coefficient models with measurement error. J. Econom. 2022, 230, 388–415. [Google Scholar] [CrossRef]
  16. Carroll, R.J.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized partially linear single-index models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
  17. Stone, C.J. [Generalized additive models]: Comment. Stat. Sci. 1986, 1, 312–314. [Google Scholar] [CrossRef]
  18. Yu, Y.; Ruppert, D. Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 2002, 97, 1042–1054. [Google Scholar] [CrossRef]
  19. Hastie, T.; Tibshirani, R. Varying-coefficient models. J. R. Stat. Soc. Ser. B Stat. Methodol. 1993, 55, 757–779. [Google Scholar] [CrossRef]
  20. Chen, Y.; Rao, M.; Feng, K.; Niu, G. Modified varying index coefficient autoregression model for representation of the nonstationary vibration from a planetary gearbox. IEEE Trans. Instrum. Meas. 2023, 72, 3511812. [Google Scholar] [CrossRef]
  21. Lv, J.; Li, J. High-dimensional varying index coefficient quantile regression model. Stat. Sin. 2022, 32, 673–694. [Google Scholar] [CrossRef]
  22. Burnham, K.P.; Anderson, D.R. Multimodel inference: Understanding AIC and BIC in model selection. Sociol. Methods Res. 2004, 33, 261–304. [Google Scholar] [CrossRef]
  23. Donoho, D.L.; Johnstone, I.M. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
  24. Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  25. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  26. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  27. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  28. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  29. Chu, W.; Li, R.; Liu, J.; Reimherr, M. Feature selection for generalized varying coefficient mixed-effect models with application to obesity GWAS. Ann. Appl. Stat. 2020, 14, 276. [Google Scholar] [CrossRef] [PubMed]
  30. Feng, S.; Xue, L. Model detection and estimation for single-index varying coefficient model. J. Multivar. Anal. 2015, 139, 227–244. [Google Scholar] [CrossRef]
  31. Song, Y.; Lin, L.; Jian, L. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model. Commun. Nonlinear Sci. Numer. Simul. 2016, 36, 109–128. [Google Scholar] [CrossRef]
  32. Stone, C.J. Additive regression and other nonparametric models. Ann. Stat. 1985, 13, 689–705. [Google Scholar] [CrossRef]
  33. Wang, L.; Yang, L. Spline-Backfitted Kernel Smoothing of Nonlinear Additive Autoregression Model. Ann. Stat. 2007, 2474–2503. [Google Scholar] [CrossRef]
  34. Wang, J.; Yang, L. Efficient and fast spline-backfitted kernel smoothing of additive models. Ann. Inst. Stat. Math. 2009, 61, 663–690. [Google Scholar] [CrossRef]
  35. Liu, R.; Yang, L. Spline-backfitted kernel smoothing of additive coefficient model. Econom. Theory 2010, 26, 29–59. [Google Scholar] [CrossRef]
  36. De Boor, C.; De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978; Volume 27. [Google Scholar]
  37. Ruppert, D. Selecting the number of knots for penalized splines. J. Comput. Graph. Stat. 2002, 11, 735–757. [Google Scholar] [CrossRef]
  38. Yoshida, T. Single-index models for extreme value index regression. arXiv 2022, arXiv:2203.05758. [Google Scholar]
  39. Goegebeur, Y.; Guillou, A.; Stupfler, G. Uniform asymptotic properties of a nonparametric regression estimator of conditional tails. Ann. l’IHP Probab. Stat. 2015, 51, 1190–1213. [Google Scholar] [CrossRef]
  40. Beirlant, J.; Vynckier, P.; Teugels, J.L. Tail index estimation, pareto quantile plots regression diagnostics. J. Am. Stat. Assoc. 1996, 91, 1659–1667. [Google Scholar]
  41. Chavez-Demoulin, V.; Davison, A.C. Generalized additive modelling of sample extremes. J. R. Stat. Soc. Ser. C Appl. Stat. 2005, 54, 207–222. [Google Scholar] [CrossRef]
  42. Youngman, B.D. Generalized additive models for exceedances of high thresholds with an application to return level estimation for US wind gusts. J. Am. Stat. Assoc. 2019, 114, 1865–1879. [Google Scholar] [CrossRef]
  43. Cui, X.; Härdle, W.K.; Zhu, L. The EFM approach for single-index models. Ann. Stat. 2011, 39, 1658–1688. [Google Scholar] [CrossRef]
  44. Johnson, B.A.; Lin, D.; Zeng, D. Penalized estimating functions and variable selection in semiparametric regression models. J. Am. Stat. Assoc. 2008, 103, 672–680. [Google Scholar] [CrossRef]
  45. Schumaker, L.L. Spline Functions: Computational Methods; SIAM: Philadelphia, PA, USA, 2015. [Google Scholar]
  46. Fan, J. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66; Routledge: London, UK, 2018. [Google Scholar]
Figure 1. Box plot for nonzero parameters. The bias is equal to the difference between the estimated and true values.
Figure 1. Box plot for nonzero parameters. The bias is equal to the difference between the estimated and true values.
Mathematics 12 02011 g001
Figure 2. Box plot for zero parameters. The bias is equal to the difference between the estimated and true values.
Figure 2. Box plot for zero parameters. The bias is equal to the difference between the estimated and true values.
Mathematics 12 02011 g002
Figure 3. The 95% pointwise confidence band for the single-index functions. The blue dashed line in the middle represents the fitted curve, while the red solid line represents the true η ( u ) .
Figure 3. The 95% pointwise confidence band for the single-index functions. The blue dashed line in the middle represents the fitted curve, while the red solid line represents the true η ( u ) .
Mathematics 12 02011 g003
Figure 4. Single-index function estimation curves.
Figure 4. Single-index function estimation curves.
Mathematics 12 02011 g004
Figure 5. The blue solid line corresponds to the 45-degree reference line, while the red dashed line represents the estimator.
Figure 5. The blue solid line corresponds to the 45-degree reference line, while the red dashed line represents the estimator.
Mathematics 12 02011 g005
Figure 6. The (top panel) of the figure shows the actual logarithmic returns, while the (bottom panel) displays the estimated tail index.
Figure 6. The (top panel) of the figure shows the actual logarithmic returns, while the (bottom panel) displays the estimated tail index.
Mathematics 12 02011 g006
Table 1. Simulation results.
Table 1. Simulation results.
cnFractionMethod MSE β (STD) ASE η (STD) ASE γ (STD) C β C η IC β IC η
0.15000.380VICMTIR-VS0.014(0.116)0.039(0.028)0.203(0.174)17.2362.9300.6100.304
Oracle0.010(0.111)0.037(0.026)0.186(0.173)18300
10000.363VICMTIR-VS0.006(0.077)0.018(0.013)0.082(0.112)17.7402.9820.2480.098
Oracle0.005(0.071)0.015(0.008)0.054(0.044)18300
20000.341VICMTIR-VS0.003(0.053)0.011(0.009)0.032(0.039)18300
Oracle0.003(0.049)0.009(0.004)0.025(0.032)18300
0.255000.394VICMTIR-VS0.012(0.108)0.036(0.027)0.199(0.183)17.3442.9540.5680.318
Oracle0.010(0.110)0.035(0.025)0.162(0.154)18300
10000.380VICMTIR-VS0.006(0.074)0.017(0.013)0.079(0.092)17.7702.9780.2440.070
Oracle0.005(0.066)0.014(0.007)0.057(0.066)18300
20000.364VICMTIR-VS0.002(0.049)0.011(0.009)0.032(0.036)18300
Oracle0.002(0.063)0.009(0.004)0.027(0.031)18300
0.55000.380VICMTIR-VS0.011(0.105)0.037(0.027)0.178(0.174)17.4182.9560.5480.342
Oracle0.008(0.101)0.035(0.023)0.141(0.151)18300
10000.372VICMTIR-VS0.005(0.068)0.019(0.013)0.067(0.083)17.8022.9720.2580.050
Oracle0.004(0.063)0.015(0.008)0.049(0.055)18300
20000.360VICMTIR-VS0.002(0.046)0.012(0.009)0.031(0.045)18300
Oracle0.002(0.043)0.010(0.004)0.026(0.045)18300
Table 2. The ASEs (std) of all estimators were obtained via Monte Carlo simulation.
Table 2. The ASEs (std) of all estimators were obtained via Monte Carlo simulation.
n50010002000
(p,d) (6,3) (10,6) (20,9) (6,3) (10,6) (20,9) (6,3) (10,6) (20,9)
Liner2.152(1.913)2.858(2.927)3.704(3.939)1.602(0.918)1.967(1.055)2.672(1.853)1.253(0.513)1.534(0.639)1.934(0.869)
NPM0.611(0.305)0.736(0.617)1.647(0.461)0.553(0.295)0.692(0.392)0.866(0.522)0.486(0.102)0.531(0.063)0.633(0.091)
SIM0.244(0.139)0.284(0.181)0.345(0.226)0.202(0.092)0.235(0.123)0.311(0.207)0.185(0.069)0.191(0.069)0.222(0.126)
VCM0.239(0.093)0.438(0.092)0.842(0.114)0.191(0.046)0.309(0.046)0.479(0.042)0.170(0.029)0.270(0.029)0.316(0.025)
VICM-VS0.141(0.151)0.178(0.174)0.213(0.190)0.049(0.055)0.067(0.083)0.076(0.104)0.026(0.045)0.031(0.045)0.032(0.034)
Table 3. Variable definitions.
Table 3. Variable definitions.
State VariablesDefinition
Outer VariablesSP500The daily return of the S&P 500 Index.
N225The daily return of the Nikkei 225 Index.
LKS11The daily return data of the Korean KS11 index.
IXICThe daily return of the Nasdaq Composite Index.
HSIThe daily return of the Hang Seng Index.
FTSEThe daily return of the FTSE (Financial Times Stock Exchange) Index.
DJIAThe daily return of the Dow Jones Industrial Average.
Internal VariablesVolumeDefined as the changes in trading volume, calculated as Δ X t = ( l n X t l n X t 1 ) .
MoneyDefined as the changes in trading trading Value, calculated as Δ X t = ( l n X t l n X t 1 ) .
TurnoverThe daily turnover rate.
P/BVCalculated by price/book value.
Table 4. Descriptive statistics: The reported statistics include the minimum (Min) and maximum (Max), the mean, standard deviation (SD), Skewness (Skew.), and Kurtosis (Kurt.).
Table 4. Descriptive statistics: The reported statistics include the minimum (Min) and maximum (Max), the mean, standard deviation (SD), Skewness (Skew.), and Kurtosis (Kurt.).
MinMaxMeanSDSkew.Kurt.
Panel A: daily return data
CSI 300−9.1546.208−0.0041.449−0.7074.907
Panel B: Outer Variables
SP500−12.7658.9680.0361.147−0.7914.769
N225−11.1537.8130.0331.33−0.3645.389
LKS11−6.428.2510.0081.046−0.0895.701
LIXIC−13.1498.9350.0441.317−0.678.936
LHSI−6.5678.6930.0111.2620.0613.414
LFTSE−11.5129.2350.0161.041−0.70711.294
DJIA−13.84210.7640.0331.124−0.91120.879
Panel C: Internal Variables
Volume−1.1971.0930.0000.2150.4381.279
Money−1.2281.1430.0000.2040.4731.617
Turnover0.1373.0950.5190.3582.95310.939
P/BV0.0003.1731.740.3741.3911.857
Table 5. Parameter estimation results of each variable: standard errors are reported in parentheses, where asterisks indicate significance at the *** 1%, ** 5%, and * 10% levels.
Table 5. Parameter estimation results of each variable: standard errors are reported in parentheses, where asterisks indicate significance at the *** 1%, ** 5%, and * 10% levels.
SP500N225LKS11LIXICLHSILFTSEDJIA
Volume0.371 **0.503 **00.244 ** 0.162 0.537 **0.485 **
(0.212)(0.240)(0.246)(0.226)(0.313)(0.230)(0.225)
Money0.680 ** 0.062 000.660 **0.193 *0.246 **
(0.381)(0.263)(0.267)(0.239)(0.332)(0.121)(0.118)
Turnover0.219 **0 0.071 0.156 *0.869 **0.386 **0.137 *
(0.115)(0.256)(0.212)(0.105)(0.387)(0.206)(0.107)
P/BV00.04200.300 ***0.949 ***0.011 0.087
(0.338)(0.251)(0.197)(0.077)(0.293)(0.176)(0.329)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

An, H.; Tian, B. Varying Index Coefficient Model for Tail Index Regression. Mathematics 2024, 12, 2011. https://doi.org/10.3390/math12132011

AMA Style

An H, Tian B. Varying Index Coefficient Model for Tail Index Regression. Mathematics. 2024; 12(13):2011. https://doi.org/10.3390/math12132011

Chicago/Turabian Style

An, Hongyu, and Boping Tian. 2024. "Varying Index Coefficient Model for Tail Index Regression" Mathematics 12, no. 13: 2011. https://doi.org/10.3390/math12132011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop