Varying Index Coefficient Model for Tail Index Regression

An, Hongyu; Tian, Boping

doi:10.3390/math12132011

Open AccessFeature PaperArticle

Varying Index Coefficient Model for Tail Index Regression

by

Hongyu An

and

Boping Tian

^*

School of Mathematics, Harbin Institute of Technology, Xidazhi, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(13), 2011; https://doi.org/10.3390/math12132011

Submission received: 4 June 2024 / Revised: 24 June 2024 / Accepted: 26 June 2024 / Published: 28 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Investigating the causes of extreme events is crucial across various fields. However, existing asymptotic theoretical models often lack flexibility and fail to capture the complex dependency structures inherent in extreme events. Additionally, the scarcity of extreme event data and the challenge of fully nonparametric estimation with high-dimensional covariates lead to the “curse of dimensionality”, complicating the analysis of extreme events. Considering the nonlinear interactions among covariates, we propose a flexible model that combines varying index coefficient models with extreme value theory to address these issues. This approach effectively avoids the curse of dimensionality while providing robust explanatory power and high flexibility. Our model also includes a variable selection process, for which we have demonstrated the consistency of the estimators and the oracle property of the variable selection. Monte Carlo simulation results validate the finite sample properties of the estimators. Furthermore, an empirical analysis of tail risk in financial markets offers valuable insights into the drivers of risk.

Keywords:

BSBK; extreme value theory; oracle property; Pareto-type tail distribution; varying index coefficient model; variable selection

MSC:

62G32; 62G05; 62F12; 62P05

1. Introduction

The occurrence of extreme events is often accompanied by significant losses, such as those caused by floods, hurricanes, and financial crises. These rare but destructive events can result in damages that are difficult to quantify precisely. Therefore, in various disciplines—including actuarial science, economics, finance, geology, ecology, meteorology, and life sciences—intensive research on extreme events is particularly crucial. A profound understanding of the mechanisms behind extreme events is essential for the effective prevention and mitigation of large-scale disasters.

The tail index (TI), a crucial metric for assessing the probability of extreme events, significantly influences the behavior of the tail of the distribution. A lower tail index indicates a higher probability of extreme events, underscoring the importance of accurately estimating the tail index in extreme value theory (EVT). Classical literature [1,2,3] extensively analyzes the theoretical properties of traditional tail index estimators and their empirical applications.

In recent years, with the expansion of practical applications, scholars have increasingly recognized the importance of estimating tail indices in the presence of covariate information. When considering covariates, assuming that the tail index of the conditional distribution of the response variable may depend on these covariates provides a more complex and realistic scenario for the estimation of conditional tail indices. This type of work focuses on the statistical inference of conditional tail indices when covariates are random variables, with typical studies such as [4,5,6].

Despite significant progress in statistical inference for extreme value indices, previous research has primarily focused on the inference of conditional tail indices, neglecting a deeper consideration of the relationship between covariates and response variables. This research orientation has led to relatively poor performance of covariates in explaining the underlying mechanisms of extreme event occurrences, limiting the interpretability and applicability of the model in practical scenarios. However, in real-world applications, gaining a profound understanding of the reasons behind extreme events is crucial for preventing such occurrences. To address this limitation, introducing covariates related to extreme events and assuming a correlation between tail indices and these covariates is a reasonable and natural choice. In the study by [7], tail index regression parameters were estimated by assuming a linear relationship between tail indices and covariates. Subsequent research [8] established asymptotic properties of the parameters, and Ref. [9] constructed confidence intervals for regression coefficients using the empirical likelihood method. Ref. [10] studied the covariates of mixing frequencies and used them for financial tail risk measurement. However, in practical applications, the relationship between tail indices and covariates can be highly complex, and a linear assumption may be considered too simplistic, making it challenging to capture the true impact of covariates [11]. To enhance model flexibility, Ref. [11] proposed a partially linear semiparametric tail index regression model based on [8], assuming a partially linear semiparametric structure between tail indices and covariates. They established a large-sample theory for the resulting estimates. Furthermore, Ref. [12] introduced a coefficient-varying model into tail indices, assuming the coefficient is an unknown function of a single variable, and studied its statistical properties. Ref. [13] subsequently studied the hypothesis testing problems associated with this model. However, due to the sparsity of extreme data, nonparametric estimation faces the challenge of the curse of dimensionality as the dimension of covariates increases. Nonparametric estimation cannot guarantee efficiency for high-dimensional covariates, making it quite difficult to describe the entire distribution. Therefore, to overcome this challenge, a more flexible modeling approach is needed, cleverly combining dimensionality reduction, variable selection, and generalized tail event modeling techniques.

Inspired by the discussions above, this study aims to explore the integration of more flexible varying coefficient models (VCMs) with extreme value analysis to derive effective estimates of tail indices with interaction effects among covariates. A key feature of VCMs is their ability to allow the coefficients of covariates to vary smoothly with other variables, thus enabling the assessment of nonlinear interactions. Notably, the varying index coefficient model (VICM) proposed by [14], which encompasses a range of commonly used semiparametric models, offers sufficient flexibility for diverse applications.

The VICM can model and assess nonlinear interaction effects between grouped covariates on the response variable, addressing situations where individual covariate effects are weak but combined effects are strong. It effectively overcomes the curse of dimensionality commonly encountered in high-dimensional nonparametric estimation while combining the advantageous characteristics of the single-index model and the coefficient-varying model, as highlighted by [15]. Moreover, it is easily interpretable in practical applications. As a highly useful semiparametric model, the coefficient-varying model includes many other important statistical models as special cases. For instance, the partially linear single-index model [16]; additive model [17]; partially linear additive model [18]; and coefficient-varying model [19], among others. Due to the numerous advantages of the VICM, it has been extensively studied in practice. For example, Ref. [20] extended it to time series data and developed the varying-index coefficient autoregressive model, while Ref. [21] extended it to the field of quantile regression, investigating its statistical properties in high-dimensional settings.

Despite significant advancements in the estimation methods and applications of coefficient-varying models in general cases, there is a notable literature gap in the context of extreme value analysis. Therefore, building upon the natural extensions proposed by [11,12], we extend the VICM to extreme value analysis, introducing a novel tail index regression model based on the VICM. This innovative approach aims to harness the power of the VICM technology to address the challenges posed by complex covariates in extreme value analysis.

Additionally, variable selection is incorporated into our model. In regression analysis, neglecting key predictor variables can lead to significant bias, while including irrelevant predictors can diminish estimation efficiency. Thus, variable selection is an indispensable aspect of modern statistical inference. Traditional methods for variable selection include strategies based on hypothesis testing and information criteria such as AIC and BIC [22]. Moreover, penalty-based techniques like

L_{1}

[23],

L_{2}

[24], ridge regression [25], LASSO [26], adaptive LASSO [27], and smoothly clipped absolute deviation (SCAD) [28] provide effective solutions. Variable selection in varying coefficient models (VCM) has also been extensively studied. Notable works include [29,30,31]. In particular, Ref. [30], building on the SCAD method, proposed an innovative composite penalty technique. This approach accurately identifies the true structure of single-index varying coefficient models (SIVCMs), effectively selects key variables, and precisely estimates unknown index parameters and coefficient functions. Given its desirable properties such as unbiasedness, sparsity, continuity, and the oracle property, we have integrated this advanced variable selection method into our model. This integration not only enhances the predictive accuracy of the model but also ensures the efficiency and effectiveness of variable selection.

This study innovatively combines varying index coefficient models with extreme value theory to construct a flexible framework for analyzing extreme events. This framework aims to investigate the impact of potential factors with nonlinear interactions on the probability of extreme events. During the model construction process, we incorporated a variable selection mechanism to ensure the consistency and predictive power of the model’s estimates. The broad applicability of varying index coefficient models allows this study to encompass mainstream regression-based extreme value analysis models, including the parametric approach by [8] and the semiparametric approach by [11], while effectively addressing their limitations in certain scenarios. However, constructing a comprehensive theoretical framework for this model presents significant challenges. It requires an in-depth analysis of both parametric and nonparametric estimates and the integration of more flexible tail index models. Moreover, the complex interactions between parameters and the intricacies of technical details make the study of asymptotic theory exceptionally challenging. Ref. [32] revealed the limitations of one-step spline function approximation in asymptotic distribution. Therefore, we adopted the two-step estimation strategy proposed by [33,34,35]. First, we approximate the nonparametric function using B-spline functions to obtain preliminary estimates of the parameters and the nonparametric function. Then, we update the nonparametric single-index function using the B-spline back-fitted kernel smoothing (BSBK) method, thereby establishing the asymptotic properties of the nonparametric function. To validate the model’s performance with finite samples and the effectiveness of variable selection, we conducted Monte Carlo simulations. Additionally, to demonstrate the practical application of the model, we provided a real data case study, analyzing risk factors using Chinese stock market index data.

This study makes significant contributions to the field in three main areas. First, we introduce a novel varying index coefficient tail index regression model (VICM-TIR) and systematically investigate its asymptotic properties, including the consistency of variable selection and the oracle property of the estimators, thus providing a robust theoretical foundation for complex data analysis. To the best of our knowledge, previous models have not achieved this. Second, the VICM-TIR model adeptly handles nonlinear interaction effects among covariates and effectively overcomes the curse of dimensionality, demonstrating exceptional flexibility and interpretability. It encompasses the mainstream models currently documented in the literature. Finally, in practical applications, the VICM-TIR model exhibits remarkable modeling flexibility and applicability. It performs exceptionally well even in small sample sizes and high-dimensional scenarios, offering a powerful tool for complex data analysis. Notably, it shows significant innovative value in the analysis of extreme events.

The subsequent sections of this paper are structured as follows: In Section 2, we introduce the varying index coefficient model for tail index regression, detailing the estimation procedure and the method for selecting tuning parameters. Section 3 establishes the asymptotic theory and the properties of the estimators. Section 4 presents the findings of a simulation study, evaluating the model’s performance across various scenarios. Section 5 illustrates the practicality and effectiveness of the model through an analysis of real-world data. Finally, Section 6 concludes the paper, and the Appendix A contains the proofs of the theorems presented.

2. Model and Method

2.1. Model Setting

When dealing with a heavy-tailed distribution featuring a response variable

Y \in R

and and a set of covariates

(X, Z)

, where

X \in X \subset R^{d}

and

Z \in Z \subset R^{p}

, each observation, denoted as

(Y_{i}, X_{i}, Z_{i})

, stands as an independent sampled from the joint distribution encompassing Y,

X

, and

Z

. Utilizing the information provided by the covariates

(X, Z)

, we can define the conditional cumulative distribution function (CDF) about the response variable Y as

F (y ∣ x, z) = P (Y \leq y ∣ X = x, Z = z) .

(1)

We take into account a Pareto-type distribution; the conditional survival function is denoted by

S (y ∣ x, z) = 1 - F (y ∣ x, z)

that assumes the following form:

S (y ∣ x, z) = y^{- γ (x, z)} V (y ∣ x, z),

(2)

where the unknown tail index

γ (x, z)

depends on the covariates

x

and

z

, while

V (y ∣ x, z)

represents a slowly varying function, assumed to have a specific form as follows:

V (y; x, z) = c_{0} (x, z) + c_{1} (x, z) y^{- h (x, z)} + o (y^{- h (x, z)}),

(3)

where

c_{0} (x, z)

,

c_{1} (x, z)

, and

h (x, z)

are unknown functions of the covariates

x

and

z

. Furthermore, for each

(x, z)

,

o (y^{- h (x, z)})

represents a remainder term that is dominated by

y^{- h (x, z)}

, while

c_{0} (x, z)

and

c_{1} (x, z)

are assumed to be uniformly bounded below zero. The ratio

V (c y ∣ x, z) / V (y ∣ x, z)

approaches 1 when

y \to \infty

, where

c > 0

is a constant.

To enhance the flexibility of the model and account for nonlinear interactions among the covariates, we introduce the varying index coefficient model (VICM) to extend the tail index regression (TIR) model. Specifically, we partition the covariates into two components: a varying coefficient part associated with

X

and a single-index function of

Z

, denoted as

η (β^{⊤} Z)

. Consequently, the logarithm of the tail index can be expressed as:

log (γ (x, z)) = α (x, z) = \sum_{k = 1}^{d} η_{k} (β_{k}^{⊤} z) x_{k}

(4)

where

η_{k} (\cdot)

represents the unknown single-index function, and

β_{k} = {(β_{k 1}, \dots, β_{k p})}^{⊤} \in R^{p}

denotes the unknown single-index parameter vector. Subsequently, given an observation

(y_{i}, x_{i}, z_{i})

, the survival function of the Pareto-type model can be constructed as

\begin{matrix} S (y_{i} ∣ x_{i}, z_{i}) = y_{i}^{- γ (x_{i}, z_{i})} \{c_{0} (x_{i}, z_{i}) + c_{1} (x_{i}, z_{i}) Y_{i}^{- h (x_{i}, z_{i})} + o (y_{i}^{- h (x_{i}, z_{i})})\}, \end{matrix}

(5)

in which when

Y > w_{n}

, the approximate probability density function of

Y_{i}

can be expressed as

f (y ∣ x, z) = γ (x, z) {(y / w_{n})}^{- γ (x, z)} y^{- 1} .

(6)

2.2. Estimation Procedure

To estimate the parameters

(η, β)

, the Peak-Over-Threshold (POT) method is adopted. This approach involves introducing a threshold

w_{n}

and utilizing all observations that surpass this threshold for parameter estimation. Notably, the transformed random variable

(Y / w_{n}, x, z)

follows a Pareto-type tail distribution, conditional on

Y > w_{n}

, as defined by:

P (\frac{Y}{w_{n}} > t ∣ Y > w_{n}) = t^{- γ (x, z)}, t \geq 1 .

(7)

Then, the Pareto-type tailed distribution can be approximated by a standard Pareto distribution. Leveraging the conditional density function

\tilde{f} (\cdot ∣ x, z, y > w_{n})

of

Y / w_{n}

conditioned on

Y > w_{n}

, we can derive the following expression:

f (\frac{y}{w_{n}} ∣ x, z, y > w_{n}) = γ (x, z) {(\frac{y}{w_{n}})}^{- γ (x, z) - 1} .

(8)

Therefore, to estimate the parameters, one can minimize the following penalized likelihood function:

\begin{matrix} L_{n} (η, β ∣ λ) & = \sum_{i = 1}^{n} {log (\frac{Y_{i}}{w_{n}}) exp (\sum_{l = k}^{d} η_{k} (β_{k}^{⊤} z_{i}) x_{k i}) - \sum_{l = k}^{d} η_{k} (β_{k}^{⊤} z_{i}) x_{k i}} I (Y_{i} > w_{n}) \\ + n_{0} \sum_{k = 1}^{d} \sum_{l = 1}^{p} p_{λ_{1 k l}} (∣ β_{k l} ∣) + n_{0} \sum_{k = 1}^{d} p_{λ_{2 k}} (∥ η_{k} ∥) + n_{0} \sum_{k = 1}^{d} p_{λ_{3 k}} (∥ {\dot{η}}_{k} ∥) \end{matrix}

(9)

where

∥ η_{k} (\cdot) ∥ = {(\int η_{k}^{2} (u) d u)}^{1 / 2},

and the effective sample size is denoted by

n_{0}

. The function

p_{λ} (\cdot)

is the Smoothly Clipped Absolute Deviation (SCAD) penalty function [28], with a tuning parameter

λ

determined through a data-driven method. It is defined as:

p_{λ}^{'} (ω) = λ \{I (ω ⩽ λ) + \frac{{(a λ - ω)}_{+}}{(a - 1) λ} I (ω > λ)\}

(10)

with

a > 2, ω > 0

, and

p_{λ} (0) = 0

. Adopting the recommendation of [28], we use

a = 3.7

as it performs well in all cases.

Given the unknown

η (\cdot)

in (9), directly optimizing (9) is not feasible. Consequently, we approximate

η (\cdot)

in (9) using the spline method. Specifically, we consider a knot sequence with K interior knots

ξ_{k}

, where K increases with the effective sample size

n_{0}

. Denoting the r-th order B-spline basis as

B_{r} (u)

for

u \in [0, 1]

with

r \geq 2

, the nonparametric functions

η_{k} (u_{k})

for

k = 1, \dots, d

can be estimated using spline functions.

{\hat{η}}_{k} (u_{k}, β_{k}) = \sum_{s = 1}^{K} B_{r, s, k} (u_{k}) {\hat{b}}_{r, s, k} (β_{k}) = B_{r, k} {(u_{k})}^{⊤} {\hat{b}}_{k} (β_{k}),

(11)

where

B_{r, k} (u_{k}) = {(B_{r, s, k} (u_{k}) : 1 \leq s \leq K)}^{⊤}

and

{\hat{b}}_{k} (β_{k}) = {({\hat{b}}_{s, k} (β_{k}) : 1 \leq s \leq K)}^{⊤}

.

Recognizing that a zero derivative of the smooth coefficient function

η_{k} (u)

implies a constant value

η_{k}

, indicating no interaction between

Z

and

X_{k}

, we propose a strategy to detect the derivative of

η_{k} (u)

. Motivated by [36], we approximate the derivative

{\dot{η}}_{k}

using spline functions of a lower order than those used for

η_{k}

. Specifically, the spline estimator for

{\dot{η}}_{k}

is formulated as:

\begin{matrix} {\hat{\dot{η}}}_{k} (u_{k}, β_{k}) & = \sum_{s = 1}^{K} {\dot{B}}_{s, r} (u_{k}) {\hat{b}}_{s, k} (β_{k}) \\ = \sum_{s = 2}^{K} \frac{(r - 1) B_{s, r - 1} (u_{k})}{(ϖ_{s + r - 1} - ϖ_{s})} \{{\hat{b}}_{s, k} (β_{k}) - {\hat{b}}_{s - 1, k} (β_{k})\} \\ : = D_{k}^{⊤} δ_{k} (β_{k}), \end{matrix}

(12)

where

D_{k} = {(\frac{(r - 1) B_{2, r - 1} (u_{k})}{(ϖ_{2 + r - 1} - ϖ_{2})}, \dots, \frac{(r - 1) B_{K, r - 1} (u_{k})}{(ϖ_{K + r - 1} - ϖ_{K})})}^{⊤},

(13)

δ_{k} (β_{k}) = {({\hat{b}}_{2, k} (β_{k}) - {\hat{b}}_{1, k} (β_{k}), \dots, {\hat{b}}_{K, k} (β_{k}) - {\hat{b}}_{K - 1, k} (β_{k}))}^{⊤} .

(14)

Substituting the spline approximation for

η_{k} (u)

into (9), we obtain the following penalized likelihood function:

\begin{matrix} L_{n} (b, β ∣ λ) & = \sum_{i = 1}^{n} {log (\frac{Y_{i}}{w_{n}}) exp (G_{i}^{⊤} (β) b (β)) - (G_{i}^{⊤} (β) b (β))} I (Y_{i} > w_{n}) \\ + n_{0} \sum_{k = 1}^{d} \sum_{l = 1}^{p} p_{λ_{1 k l}} (∣ β_{k l} ∣) + n_{0} \sum_{k = 1}^{d} p_{λ_{2 k}} (∥ b_{k} (β) ∥_{H}) + n_{0} \sum_{k = 1}^{d} p_{λ_{3 k}} (∥ δ_{k} (β) ∥_{Q}), \end{matrix}

(15)

where

\begin{matrix} G_{i} (β) = vec (B_{i} diag (x_{i})), \\ ∥ b_{k} ∥_{H} = {(b_{k}^{⊤} H b_{k})}^{1 / 2}, H = \int B (u) B^{⊤} (u) d u, \\ ∥ δ_{k} ∥_{Q} = {(δ_{k}^{⊤} Q δ_{k})}^{1 / 2}, Q = \int D (u) D^{⊤} (u) d u . \end{matrix}

(16)

Here,

B_{i} = (B_{r} (u_{1 i}), \dots, B_{r} (u_{d i}))

is a basis matrix of

K \times d

, vec is the operator stacking the columns of a given matrix, and

diag (x_{i})

is the

d \times d

diagonal matrix with

x_{i}

as diagonal element.

To ensure identifiability, the p-dimensional single-index parameter

β_{k}

must satisfy the normalization constraint

∥ β_{k} ∥ = 1

with a positive first component

β_{k 1} > 0

for

1 \leq k \leq d

. As a result, reparameterization is necessary. We introduce

ϕ_{k} = {(ϕ_{k 1}, \dots, ϕ_{k, p - 1}) \in R^{p - 1} | ∥ ϕ_{k} ∥ \leq c}

, where

0 < c < 1

, to facilitate this reparameterization. Subsequently,

β_{k}

can be expressed as a function of

ϕ_{k}

in the form

β_{k} = β (ϕ_{k}) = {(\sqrt{1 - ∥ ϕ_{k} ∥^{2}}, ϕ_{k}^{⊤})}^{⊤}

. Assuming the existence of

ϕ_{k}^{0}

such that

β_{k}^{0} = β (ϕ_{k}^{0})

, the estimated logarithmic tail index (TI) can then be expressed as

\hat{α} (v_{i}, \hat{ϕ}, \hat{b} (\hat{ϕ})) = log (γ (v_{i}, \hat{ϕ}, \hat{b} (ϕ))) = {\hat{G}}_{i}^{⊤} (\hat{ϕ}) b (\hat{ϕ}) .

(17)

Subsequently, the formulation of the penalized likelihood function, which is utilized to estimate the parameters

(b, ϕ)

, is given by

\begin{matrix} L_{n} (b, ϕ ∣ λ) & = \sum_{i = 1}^{n} {log (\frac{Y_{i}}{w_{n}}) exp (G_{i}^{⊤} (ϕ) b (ϕ)) - G_{i}^{⊤} (ϕ) b (ϕ)} I (Y_{i} > w_{n}) \\ + n_{0} \sum_{k = 1}^{d} \sum_{l = 1}^{p - 1} p_{λ_{1 k l}} (∣ ϕ_{k l} ∣) + n_{0} \sum_{k = 1}^{d} p_{λ_{2 k}} (∥ b_{k} (ϕ) ∥_{H}) + n_{0} \sum_{k = 1}^{d} p_{λ_{3 k}} (∥ δ_{k} (ϕ) ∥_{Q}) . \end{matrix}

(18)

Under the constraint

∥ ϕ_{k} ∥ < 1

for all

1 \leq k \leq d

, minimize the objective function (18) to obtain the parameter estimates, and subsequently, when fixing

ϕ

, the profile likelihood estimator of

b (ϕ)

can be derived as

\begin{matrix} \hat{b} (ϕ) = {argmin}_{b \in R^{d \times K}} L_{n} (b (ϕ), ϕ ∣ λ) . \end{matrix}

(19)

After obtaining the estimated B-spline coefficients

b (ϕ)

, we proceed to estimate the parameter vector

ϕ

as follows:

\hat{ϕ} = \underset{ϕ \in R^{(p - 1) \times d}}{argmin} L_{n} (\hat{b} (ϕ), ϕ ∣ λ) .

(20)

2.3. Tuning Parameter Selection

The proposed estimator comprises three tuning parameters: the threshold

w_{n}

, the knot count K, and the smoothing parameter

λ

.

Research indicates that knot selection is less sensitive compared to

λ

[37], and employing evenly spaced knots with a sufficiently large fixed K is often sufficient. Drawing inspiration from [18], we employ cross-validation to select the optimal

λ

by minimizing the cross-validation score. For setting the threshold

w_{n}

, we adopt a data-driven approach proposed by [8]. This approach utilizes a discrepancy measure provided by [8] to determine

w_{n}

. Specifically, we define

{\hat{Q}}_{i} = exp [- exp [{(η (β^{⊤} Z_{i}))}^{⊤} X_{i}] log (Y_{i} / w_{n})]

, which approximately follows a uniform distribution on

[0, 1]

. By minimizing the difference between the theoretical and empirical uniform distributions, we can identify the suitable threshold value. The difference is defined as following:

D (w_{n}) = \frac{\sum_{i = 1}^{n} \{{\hat{Q}}_{i} - \hat{F} (Q_{i})\} I (Y_{i} > w_{n})}{\sum_{i = 1}^{n} I (Y_{i} > w_{n})}

(21)

where

\hat{F} (\cdot)

is the empirical distribution of

Q_{i}

based on

{{\hat{Q}}_{i} : Y_{i} > w_{n}, i = 1, \dots, n}

. Subsequently, the threshold

w_{n}

is determined by minimizing the discrepancy measure

D (w_{n})

.

3. Asymptotic Theory

In this section, we delve into the asymptotic properties of the proposed estimators. The detailed proofs for these attributes, as well as the underlying assumptions, are relegated to the Appendix A.

To establish notation, we designate

β^{0}

as the true value of

β

for the duration of this text. For conciseness, we introduce

ϕ

as a concatenation of

{(ϕ_{1}^{⊤}, ϕ_{2}^{⊤}, \dots, ϕ_{d}^{⊤})}^{⊤}

, where

ϕ_{j}

signifies the j-th component,

j = 1, \dots, (p - 1) \times d

. Without affecting generality, we presume that

ϕ_{j} \neq 0

for

j = 1, \dots, h - 1

and

ϕ_{j} = 0

for

j = h, \dots, (p - 1) \times d

. Furthermore,

{\hat{η}}_{k} (\cdot)

denotes the nonzero varying coefficients for

k = 1, \dots, m

, constant nonzero values for

k = m + 1, \dots, q

, and zero values for

k = q + 1, \dots, d

.

Carrying on with the previous notation, we define

J (ϕ_{k})

as the Jacobian matrix of size

p \times (p - 1)

representing the partial derivative of

β_{k}

with respect to

ϕ_{k}^{⊤}

.

J (ϕ_{k}) = [\begin{matrix} - (1 - ∥ ϕ_{k} {∥^{2})}^{- 1 / 2} ϕ_{k}^{⊤} \\ I_{p - 1} \end{matrix}]

(22)

To simplify the notation, let

ϱ_{k} = β_{k}^{⊤} Z

. Furthermore, we define the space

M

as the set of functions with a finite

L_{2}

norm on the domain

{[0, 1]}^{d} \times R^{d}

.

M = \{g (ϱ, x) = \sum_{k = 1}^{d} g_{k} (ϱ_{k}) x_{k}, E g_{l} {(ϱ_{l})}^{2} < \infty\},

(23)

where

ϱ = {(ϱ_{1}, \dots, ϱ_{d})}^{⊤}

and

x = {(x_{1}, \dots, x_{d})}^{⊤}

. To investigate the large-sample characteristics of parameter estimators, we introduce

β^{0}

as the vector of true parameters, where

β^{0} = {\{{(β_{1}^{0})}^{⊤}, \dots, {(β_{d}^{0})}^{⊤}\}}^{⊤}

and

β_{k}^{0} (ϕ_{k}^{0}) = {(β_{k 1}^{0}, ϕ_{k}^{0})}^{⊤}

for

1 \leq k \leq d

. For each

1 \leq j \leq p

, let

g_{j}^{0}

be the function that satisfies the given condition:

\begin{matrix} P (Z_{j}) = g_{j}^{0} (ϱ (β^{0}), X) & = \sum_{k = 1}^{d} g_{k, j}^{0} (ϱ_{k} (β_{k}^{0})) X_{k} \\ = arg min_{g \in M} E {\{Z_{j} - g (ϱ (β^{0}), X)\}}^{2} . \end{matrix}

(24)

Let

P (Z) = {\{P (Z_{1}), \dots, P (Z_{p})\}}^{⊤}, \tilde{Z} = Z - P (Z)

, and the gradient matrix of the log-TI is

\dot{α} (X, Z, η^{0}, β^{0} (ϕ^{0})) = {[{\{{\dot{η}}_{k}^{0} (ϱ_{k} (β_{k}^{0} (ϕ_{k}^{0}))) X_{k} J_{k}^{⊤} (ϕ_{k}^{0}) \tilde{Z}\}}^{⊤}, 1 \leq k \leq d]}^{⊤} .

(25)

For any matrix

A

, denote

A^{\otimes 2} = A A^{⊤}

. Then define

Ω (η^{0}, ϕ^{0}) = E [{\dot{α}}^{\otimes 2} (X, Z, η^{0}, β^{0} (ϕ^{0}))] .

(26)

Theorem 1.

Suppose that Assumptions A1–A11 in the Appendix A hold and the number of knots

K = O_{p} (n_{0}^{1 / (2 r + 1)})

. Then, we have

(i)

∥ \hat{β} - β^{0} ∥ = O_{p} (n_{0}^{- r / (2 r + 1)} + a_{n} + R_{w n})

;

(ii)

∥ {\hat{η}}_{k} (\cdot) - η_{k 0} (\cdot) ∥ = O_{p} (n_{0}^{- r / (2 r + 1)} + a_{n} + R_{w n}), k = 1, \dots, d

, where

\begin{matrix} a_{n} = max_{j, k} \{∥ {\dot{p}}_{λ_{1 j}} (∥ β_{j 0} ∥) ∥, ∥ {\dot{p}}_{λ_{2 k}} (∥ b_{k}^{0} ∥_{H}) ∥, ∥ {\dot{p}}_{λ_{3 k}} (∥ δ_{k}^{0} ∥_{Q}) ∥ : β_{j 0} \neq 0, b_{k}^{0} \neq 0, δ_{k}^{0} \neq 0\}, \\ R_{w n} = E [c_{0} (x, z) w_{n}^{- γ (x, z)} + \frac{c_{1} (x, z) γ (x, z)}{γ (x, z) + h (x, z)} w_{n}^{- γ (x, z) - h (x, z)}] . \end{matrix}

Theorem 2.

Use Assumptions A1–A11 in the Appendix A and the number of knots

K = O_{p} (n_{0}^{1 / (2 r + 1)})

. If

\begin{matrix} λ_{max} = max_{j, k} \{λ_{1 j}, λ_{2 k}, λ_{3 k}\} \\ λ_{min} = min_{j, k} \{λ_{1 j}, λ_{2 k}, λ_{3 k}\} \end{matrix}

Suppose

λ_{max} \to 0, n_{0}^{r / (2 r + 1)} λ_{min} \to \infty (n \to \infty) .

Then, with probability approaching one,

\hat{β}

and

\hat{η} (\cdot)

must satisfy:

(i)

{\hat{β}}_{j} = 0, j = h + 1, \dots, p \times d

;

(ii)

{\hat{η}}_{k} (\cdot)

are nonzero constants for

k = m + 1, \dots, q

and

{\hat{η}}_{k} (\cdot) = 0, k = q + 1, \dots, d

.

Let

ϕ^{*} = {(ϕ_{1}, ϕ_{2}, \dots, ϕ_{h - 1})}^{⊤}

,

β^{*} = {(β_{1}, β_{2}, \dots, β_{h})}^{⊤}

, and

η^{*} (\cdot) = {(η_{1} (\cdot), η_{2} (\cdot), \dots (\cdot), η_{q} (\cdot))}^{⊤} .

Theorem 3.

Under Assumptions A1–A11 in the Appendix A and Theorem 2, we have

\begin{matrix} \sqrt{n_{0}} Ω & (\hat{β^{*}} - β^{0 *} + Ω^{- 1 / 2} n_{0}^{- 1 / 2} ε) \overset{D}{\to} N (0, J Ω J^{⊤}) \end{matrix}

(27)

where

J = \oplus J (ϕ_{k}^{0 *}) = diag (J (ϕ_{1}^{0 *}), \dots, J (ϕ_{m}^{0 *}))

and its dimension is

m p \times m (p - 1)

,

J (ϕ_{k}^{0 *})

is defined in (22) and

Ω = Ω (η^{0 *}, ϕ^{0 *})

is given at (26).

Theorems 1 and 2 establish the consistency of the variable selection process. Furthermore, Theorems 1, 2, and 3 collectively demonstrate the oracle property of

β

. Specifically, these theorems indicate that our proposed estimators achieve the optimal convergence rate and maintain asymptotic distribution consistency with estimators based on the correct submodel. Notably, Theorem 1 shows that the spline estimator

{\hat{η}}_{k} (\cdot)

obtained from the estimation procedure in (18) is a consistent estimator of

η_{k} (\cdot)

. However, the asymptotic distribution of

{\hat{η}}_{k} (\cdot)

is not available. To address this issue, we employ a two-step spline backfitted local linear (SBLL) estimation method to further process the nonparametric function

η_{k} (\cdot)

. Without loss of generality, we focus on the estimation process of the first nonparametric function

η_{1} (\cdot)

, as the estimation of other functions can be similarly achieved. For the spline estimates

{\hat{η}}_{k} (\cdot)

for

k \geq 2

, we use them as initial estimates and define

C_{i, 1} = \sum_{k = 2}^{d} η_{k} ({\hat{β}}_{k}^{⊤} Z_{i}) X_{k i}

. Then, let

ϱ_{i} = ϱ_{i 1} = {\hat{β}}_{1}^{⊤} Z_{i}

, and for each given

ϱ_{1} = {\hat{β}}_{1}^{⊤} Z_{1}

,

η_{1} (ϱ_{i})

is estimated through local linear fitting as

η_{1} (ϱ_{i}) = {\dot{η}}_{1} (ϱ_{1}) + {\ddot{η}}_{1} (ϱ_{1}) (ϱ_{i} - ϱ_{1}) + O_{p} (h_{1}^{2})

, where

∣ ϱ_{i} - ϱ_{1} ∣ \leq h_{1}

and

h_{1}

is the bandwidth. Then we drive the estimator

{\hat{η}}_{L L, 1} (ϱ_{1}, {\hat{β}}_{1})

by minimizing the following local kernel objective function:

\begin{matrix} \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) & {log (\frac{Y_{i}}{w_{n}}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι) - C_{i 1} - {\bar{G}}_{i 1}^{⊤} ι} I (Y_{i} > w_{n}) \end{matrix}

(28)

where

K_{h_{1}} (ϱ) = K (ϱ / h_{1}) / h_{1}

is a non-negative symmetric kernel function,

{\bar{G}}_{i 1} = (X_{1 i}, X_{1 i} (ϱ_{i} - ϱ_{1}) / h_{1})

and

ι = (ι_{0}, ι_{1}) = {(η (ϱ_{1}), \dot{η} (ϱ_{1}))}^{⊤}

. Since

η_{k} (ϱ_{k})

for

k \geq 2

are unknown, we adapt (28) by substituting the spline estimators

{\hat{η}}_{k} (ϱ_{k}, {\hat{β}}_{k})

from (18) for

η_{k} (ϱ_{k})

. This substitution is equivalent to replacing

C_{i 1}

in (28) with

{\hat{C}}_{i 1}

. The resulting modified SBLL estimator is denoted as

{\hat{η}}_{SBLL, 1} (ϱ_{1}, {\hat{β}}_{1})

. Denote

μ_{e} (K) = \int u^{e} K (u) d u

and

v_{e} = \int u^{e} K^{2} (u) d u

for

e = 0, 1, 2, 3

and assume that the following expressions are convergent in probability when the sample size

n_{0} \to \infty

, that is,

\frac{n}{n_{0}} μ_{2 (e - 1)} E \{R_{w n} (X_{1}, ϱ_{1})\} f_{U} (ϱ_{1}) \to_{p} Ξ_{e e} (ϱ_{1}), e = 1, 2

(29)

\sqrt{\frac{n^{2} h_{1}}{n_{0}}} \{\frac{h_{1}^{2}}{2} {\ddot{η}}_{1} (ϱ_{1}) μ_{2} - 1\} E [R_{w n} (X_{1}, ϱ_{1}) + c_{0} (X_{1}, ϱ_{1}) w_{n}^{- γ (X_{1}, ϱ_{1})}] f_{U} (ϱ_{1}) \to_{p} Σ_{11} (ϱ_{1})

(30)

\begin{matrix} \frac{n}{n_{0}} E [c_{0} (X_{1}, ϱ_{1}) w_{n}^{- γ (X_{1}, ϱ_{1})} + c_{1} (X_{1}, ϱ_{1}) \frac{γ^{2} (X_{1}, ϱ_{1}) + h^{2} (X_{1}, ϱ_{1})}{{\{γ (X_{1}, ϱ_{1}) + h (X_{1}, ϱ_{1})\}}^{2}} w_{n}^{- γ (X_{1}, ϱ_{1})} \\ \cdot w_{n}^{h (X_{1}, ϱ_{1})}] f_{Z} (ϱ_{1}) v_{0} \to_{p} Λ_{11} (ϱ_{1}), \end{matrix}

(31)

where

f_{Z} (ϱ_{1})

is the marginal probability density function of

ϱ_{1}

.

Theorem 4.

Suppose that Assumptions A1–A11 in the Appendix A are satisfied. For

K = O_{p} (n_{0}^{1 / (2 r + 1)})

, as

n_{0} \to \infty

, for any

ϱ_{1} \in [h_{1}, 1 - h_{1}]

, we have

\sqrt{n_{0} h_{1}} \{{\hat{η}}_{L L, 1} (ϱ_{1}, {\hat{β}}_{1}) - η_{1} (ϱ_{1}) - Ξ_{11}^{- 1} (ϱ_{1}) Σ_{11} (ϱ_{1})\} \to_{d} N (0, Π),

(32)

where

Π = Ξ_{11}^{- 1} (ϱ_{1}) Λ_{11} (ϱ_{1}) Ξ_{11}^{- 1} (ϱ_{1})

with

Ξ_{11} (ϱ_{1}), Σ_{11} (ϱ_{1})

, and

Λ_{11} (ϱ_{1})

defined in (29), (30), and (31), respectively.

Next, we establish the uniform oracle efficiency of the SBLL estimator

{\hat{η}}_{SBLL, 1} (ϱ_{1}, {\hat{β}}_{1})

. Specifically, Theorem 5 demonstrates that the absolute difference between

{\hat{η}}_{SBLL, 1} (ϱ_{1}, {\hat{β}}_{1})

and

{\tilde{η}}_{LL, 1} (ϱ_{1}, {\hat{β}}_{1})

is uniformly bounded by

O_{p} (K^{- r})

. Consequently, this ensures that

{\hat{η}}_{SBLL, 1} (ϱ_{1}, {\hat{β}}_{1})

and

{\tilde{η}}_{LL, 1} (ϱ_{1}, {\hat{β}}_{1})

share the same asymptotic distribution.

Theorem 5.

Under Assumptions A1–A11 in the Appendix A, and

K = O_{p} (n_{0}^{1 / (2 r + 1)})

, we have

sup_{ϱ_{1} \in [0, 1]} ∥ {\hat{η}}_{SBLL, 1} (ϱ_{1}, {\hat{β}}_{1}) - {\hat{η}}_{LL, 1} (ϱ_{1}, {\hat{β}}_{1}) ∥ = O_{p} (K^{- r}) .

(33)

Corollary 6.

Under Assumptions A1–A11 in the Appendix A, and

K = O_{p} (n_{0}^{1 / (2 r + 1)})

, as

n_{0} \to \infty

, we have

\sqrt{n_{0} h_{1}} \{{\hat{η}}_{SBLL, 1} (ϱ_{1}, {\hat{β}}_{1}) - η_{1} (ϱ_{1}) - Ξ_{11}^{- 1} (ϱ_{1}) Σ_{11} (ϱ_{1})\} \to_{d} N (0, Π) .

(34)

4. Monte Carlo Studies

In this section, we evaluate the finite-sample performance of the proposed estimator through Monte Carlo simulations. Adhering to the setup outlined in [8,11], we postulate that the response variable

Y_{i}

follows a specific distribution, detailed as follows:

\begin{matrix} 1 - & F (y; x, z) \\ = \frac{(1 + c) y^{- γ (x, z)}}{1 + c y^{- γ (x, z)}} \\ = y^{- γ (x, z)} \{(1 + c) - c (1 + c) y^{- γ (x, z)} + O (y^{- γ (x, z)})\} . \end{matrix}

(35)

Afterwards, let

log (γ (x, z)) = η^{0 ⊤} (β^{0} ⊤ z) x

and

p = 10

,

d = 6

.

\begin{matrix} β_{1}^{0} & = 1 / \sqrt{28} (\sqrt{2}, 2, \sqrt{6}, 2, \sqrt{7}, \sqrt{5}, 0, \dots, 0) \\ β_{2}^{0} & = 1 / \sqrt{41} (\sqrt{2}, 2, \sqrt{5}, 3, \sqrt{5}, 4, 0, \dots, 0) \\ β_{3}^{0} & = 1 / \sqrt{19} (2, 2, \sqrt{2}, 1, \sqrt{7}, 1, 0, \dots, 0) \\ β_{k}^{0} & = 0, k > 3 . \end{matrix}

(36)

By adjusting the parameter c of the slowly varying function, a diverse set of distributions for y can be generated, thereby facilitating simulations across a range of scenarios. Specifically, the values of c were chosen as

c = 0.1, 0.25, 0.5

to illustrate different scenarios, while the sample size n was varied as

n = 500, 1000, 2000

to examine the performance across varying data sizes. For the parametric components, the marginal distributions of

X

and

Z

were specified as follows:

Z_{i} \sim U (- 1.5, 1.5)

and

X_{i} \sim U (- 3, 3)

. Additionally, the true single-index function was represented by a smooth function, defined as:

\begin{matrix} η_{1} (β_{1}^{⊤} Z) & = - 0.5 sin (β_{1}^{⊤} Z) \\ η_{2} (β_{2}^{⊤} Z) & = cos (β_{2}^{⊤} Z) \\ η_{3} (β_{3}^{⊤} Z) & = 0.5 (- exp (- {(β_{3}^{⊤} Z)}^{2}) + 0.5 {(β_{3}^{⊤} Z)}^{2}) \\ η_{k} (β_{k}^{⊤} Z) & = 0, k > 3 . \end{matrix}

(37)

Adopting the methodology of [18], we utilize equidistant knots with a constant K value set to 3. To ascertain the sample fraction, as suggested by [8], we analyze a set of 100 distinct

w_{n}

values (denoted as

{w_{n}^{(l)} : 1 \leq l \leq 100}

) along with their corresponding sample fractions (

n_{0} / n

) distributed uniformly within the range

[\frac{1}{3}, 1]

. Additionally, for each model configuration, we conduct 5000 simulation runs. The precision of estimating

β

and

η (\cdot)

is quantified using mean squared errors, respectively,

{MSE}_{β} = {∥ {\hat{β}}_{n} - β^{0} ∥}^{2},

(38)

{ASE}_{η} = \frac{1}{N d} \sum_{k = 1}^{N} \sum_{j = 1}^{d} {\{η_{j} (ϱ_{k}) - {\hat{η}}_{j} (ϱ_{k})\}}^{2},

(39)

{ASE}_{γ} = \frac{1}{N} \sum_{i = 1}^{n} {\{{\hat{γ}}_{i} - γ_{i}\}}^{2},

(40)

where

{ϱ_{k}, k = 1, \dots, N}

signifies the regular grid points for evaluating the function

\hat{η} (ϱ)

,

γ_{0}

represents the true value of

γ

as defined in (2), and

\hat{γ}

denotes its estimator. For our simulation,

N = 500

is adopted. The results are summarized in Table 1, where columns

C_{β}

and

C_{η}

indicate the mean values for correctly identifying nonzero coefficients, while

I C_{β}

and

I C_{η}

represent the mean values for incorrectly identifying zero coefficients. The row "VICMTIR-VS" outlines the performance of our proposed estimator with the variable selection procedure model, and the row “Oracle” depicts the estimator’s performance based on the actual model with known zero coefficients.

With an increasing sample size, there is a concurrent decline in mean squared errors (MSEs) and standard deviations (STDs) across diverse parameter settings for c in the slowly varying function. Simultaneously, the accuracy of variable selection is enhanced, highlighting the robustness of the model estimators. Furthermore, the third column of the table exhibits the median sample fraction derived from 5000 realizations, demonstrating a declining pattern as the sample size expands, which aligns with our expectations. Additionally, as the sample size enlarges, the variable selection method progressively approaches the performance of the oracle procedure concerning model error.

Graphically, Figure 1 depicts the bias of nonzero parameters, while Figure 2 exhibits the bias of zero parameters. Both figures reveal that the mean bias of the estimators is close to 0, thus demonstrating the effectiveness of the proposed estimation approach. Furthermore, Figure 3 showcases the fitted curves of

η (ϱ)

alongside its corresponding 95% point-by-point confidence interval, indicating a satisfactory fit for the nonlinear function.

Subsequently, a comparative evaluation is undertaken between our model and alternative parameterized and nonparameterized models, incorporating tail indices in diverse forms. The estimation performance of these models is analyzed in scenarios encompassing both low- and high-dimensional covariates, with specific dimensions set to

(p, d) = (6, 3), (10, 6),

and

(20, 9)

. The parameter vector

par

comprises

{(β^{⊤}, ζ^{⊤})}^{⊤}

, where

β^{⊤}

is defined as earlier, and

ζ^{⊤}

is a d-dimensional vector. Without loss of generality, the parameter c in the slowly varying function is set to

0.5

for the simulations. For the linear setting, we adopt the method from [8], using

α = {par}^{⊤} (z, x)

. For the single-index model (SIM), we consider the approach from [38], applying

α = η ({par}^{⊤} (z, x))

. In the fully nonparametric setting (NPM), we follow the methodology described in [39], employing kernel smoothing with the Epanechnikov kernel function, assuming equal bandwidth, and utilizing

α = f (z, x)

. Finally, for the general variable coefficient model (VCM), we adhere to the approach from [12], implementing

α = f {(z)}^{⊤} x

.

Table 2 presents the outcomes of 500 Monte Carlo simulations. In comparing the average squared errors (ASEs) among various models, we observe that for the single-index model (SIM) and the variable index coefficient model with variable selection (VICM-VS), the ASEs do not exhibit a substantial rise as the dimensionality of covariates grows under different sample sizes. Conversely, in the fully nonparametric model and the general variable coefficient model, an appreciable increase in ASEs is noted as the dimensionality of covariates increases, highlighting the dimensionality curse in high-dimensional nonparametric estimation. The oversimplified linear model, due to its inability to capture the nonlinear effects of factors, results in significant estimation errors. However, based on the experimental findings, the VICMTIR-VS approach maintains a commendable model flexibility level. It demonstrates satisfactory estimation accuracy, even in small sample sizes and large parameter dimensions.

5. Empirical Analysis

In the assessment of extreme financial occurrences and market risks, extreme value theory (EVT) emerges as a robust methodology for quantifying high-quantile random phenomena, widely acknowledged as an efficacious statistical modeling approach. Herein, we deploy this model to gauge tail risk in financial markets. Specifically, we leverage daily trading data from the CSI 300 Index in China, spanning the period from 8 April 2010, to 1 February 2023, comprising a total of 2657 observations. To authenticate the model’s effectiveness, we allocate the initial 80% of the dataset for in-sample parameter estimation, reserving the remaining 20% for out-of-sample validation.

In considering the selection of covariates, we acknowledge the direct influence of the index’s financial indicators on tail risk. Furthermore, given the phenomenon of economic globalization, tail risk is also influenced by fluctuations in international markets. Consequently, we postulate that major global market indices impact the tail risk of the CSI 300 Index by influencing its underlying financial indicators. The precise definitions and settings of each variable are outlined in Table 3. The returns for each index are calculated by employing the formula

Δ U_{t} = 100 \times (ln U_{t} - ln U_{t - 1})

. The descriptive statistics in Table 4 show that most covariates exhibited skewed distribution. Therefore, in the selection of standardization methods, following the approach outlined in [8], the variables were transformed using rank transformations. Specifically, let

R_{i}

be

X_{i}

’s rank in the sample

{X_{i} : 1 \leq i \leq n}

, then conduct rank transformations by redefining

X_{i} : = Φ^{- 1} (R_{i} - 3 / 8) / (n + 1 / 4)

(the normal score transformation).

First, the model is implemented on the training dataset, leveraging the threshold selection approach outlined previously to identify the effective sample for model estimation, which comprises roughly 20% of the total. Following this, the parameter estimations are presented in Table 5. Figure 4 illustrates the estimated varying coefficient functions, revealing a noteworthy nonlinear association between internal variables and tail risk. Notably, the influence of internal variables exhibits a distinct interplay with international indices. Based on the estimation outcomes, the trading volume and turnover rate exhibit primarily negative effects. As these metrics increase, the tail index diminishes, thereby intensifying tail risk. Conversely, the trading value and P/BV ratio predominantly have positive impacts. When these values rise, the tail index augments, leading to a mitigation of tail risk.

After estimating the parameters, we derive the tail index. To evaluate the goodness-of-fit of the model, we employ the QQ-plot methodology [40] by constructing a plot for the pairs

({\hat{U}}_{t}, {\hat{F}}_{n} (U_{t}))

where

Y_{t} > w_{n}

. Here,

{\hat{U}}_{t}

is defined as

exp {- exp (α (x_{t}, z_{t})) log (Y_{t} / w_{n})}

and

{\hat{F}}_{n} (U_{t})

represents the empirical distribution of

{{\hat{U}}_{t}}

. Ideally, if

w_{n}

is large enough and the model fits well,

{\hat{U}}_{t}

should follow a uniform distribution on

[0, 1]

. As depicted in Figure 5, the close alignment between the 45-degree reference line (solid line) and the QQ-plot (dashed line) suggests a robust fit of our VICM-TIR model. The tail index of the log-return distribution, shown in the right panel of Figure 6, reveals smaller indices, indicating heavier tails and a higher likelihood of extreme losses. Notably, the estimation results indicate that tail risk had already emerged before the turbulence observed in the Chinese stock market on 19 June 2015.

6. Discussion

This study reveals that investigating extreme events allows for a deeper understanding of their underlying causes, complexity, and severity. To address the limitations of traditional semiparametric models in estimating complex covariates, this research proposes a novel tail index regression model based on the varying index coefficient model (VICM-TIR). This model significantly enhances the flexibility and practicality of extreme event modeling by accurately capturing nonlinear interaction effects among covariates. Additionally, the study incorporates a variable selection mechanism and rigorously demonstrates the consistency and oracle properties of the estimators. Monte Carlo simulation experiments validate the finite sample properties of the model’s parameter estimates. Finally, the model’s effectiveness in practical applications is illustrated through an analysis of tail risk in financial markets. This research not only provides valuable insights into the understanding of extreme events but also offers a practical analytical tool for decision-makers and researchers across various fields.

Despite the strong potential of the VICM-TIR model, future research needs to address several key issues. Firstly, the current study primarily considers independently and identically distributed observations; future research should extend to broader scenarios, particularly incorporating lag effects of variables. Secondly, the estimation problems and theoretical properties under high-dimensional and ultra-high-dimensional covariate settings require further in-depth investigation. Finally, due to the uncertainty of the tail index, the generalized Pareto distribution (GPD) or generalized extreme value distribution (EVD) is considered for modeling tail behavior. Studies by Chavez-Demoulin and Davison [41] and Youngman [42] show that the GPD, developed through semiparametric regression, fits various data patterns effectively. Therefore, future research should extend the current methodology to include the GPD, enhancing the model’s robustness and real-world applicability.

Overall, the exploration of the causes of extreme events is a fascinating research area that deepens our understanding of the challenges posed by complex data and enhances model interpretability. By thoroughly investigating extreme events and their core causes, we can better reveal the complexity and severity of these events, promote the development of new models and methods, and provide more valuable references for policymakers and decision-makers.

Author Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by H.A. and B.T. The first draft of the manuscript was written by H.A. and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The authors did not receive support from any organization for the submitted work.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, [Boping Tian], upon reasonable request.

Conflicts of Interest

The authors report that there are no competing interests to declare.

Appendix A. Proofs

In order to establish the consistency and asymptotic normality of the proposed estimators, the following assumptions are required. First, denote

C^{(r)} [0, 1] = \{ψ ∣ ψ^{(r)} \in C [0, 1]\}

as the space of r-th order smooth functions. Additionally, let

C^{0, 1} (X_{ω})

represent the space of Lipschitz continuous functions on

X_{ω}

, i.e.,

C^{0, 1} (X_{ω}) = \{{ψ : ∥ ψ ∥}_{0, 1} = sup_{ω \neq ω^{'}, ω, ω^{'} \in X_{ω}} \frac{∣ ψ (ω) - ψ (ω^{'}) ∣}{∣ ω - ω^{'} ∣} < \infty\},

(A1)

where

{∥ ψ ∥}_{0, 1}

denotes the

C^{0, 1}

-norm of

ψ

.

Assumption A1.

The observations

(Y_{i}, X_{i}, Z_{i})

are assumed to be independently and identically distributed. For β in the neighborhood of

β^{0}

, the marginal density

f_{ϱ} (\cdot)

of the random variable

ϱ = β^{⊤} Z

is bounded away from zero on

Z_{w}

and

f_{ϱ} (\cdot) \in C^{0, 1} (Z_{w})

, where

Z_{w} = \{β^{⊤} Z, Z \in Z\}

and

Z

is a compact support set of

Z

. Without loss of generality, we assume

Z_{w} = [0, 1]

.

Assumption A2.

The parameter space of β is compact, and the function

α (v; β, η)

is

t \geq 2

-th continuous differentiable on the parameter space of β for each fixed

v = (x, z)

, and the spline order is

r \geq t

. The true parameter vector

β^{0}

is an interior point of the parameter space of β.

Assumption A3.

Let

F (z) = E \{X X^{⊤} ∣ Z = z\}

. Then,

F (z)

is r-th continuously differentiable with respect to

z

. Furthermore, for a given

z

,

F (z)

is a positive definite matrix, and the eigenvalues of

F (z)

are bounded. Moreover, the matrices

Ξ_{e e} (ϱ), e = 1, 2

defined in (29) are nonsingular, while

Λ_{11} (ϱ)

defined in (31) is positive definite.

Assumption A4.

For

1 \leq k \leq d

and

1 \leq l \leq p,

g_{l, k}^{0} \in C^{(1)} [0, 1]

.

Assumption A5.

The kernel function

K (ϱ)

is a symmetric and continuous probability density function satisfying

\int_{Z_{w}} K (ϱ) d ϱ = 1

,

\int_{Z_{w}} ϱ^{2} K (ϱ) d ϱ < \infty

, and

∣ ϱ ∣ K (ϱ) \to 0

as

∣ ϱ ∣ \to \infty

.

Assumption A6.

The bandwidth

h_{n}

satisfies

h_{n} \to 0

,

n_{0} h_{n}^{2} \to 0

,

n / n_{0}^{2} h_{n} \to 0

,

n h_{n}^{2} / n_{0} \to 0

, and

n_{0} h_{n} / log (n^{0}) \to \infty

as the sample size approaches infinity.

Assumption A7.

Denote

R = \frac{n}{\sqrt{n_{0}}} E \{{\tilde{α}}^{(1)} (v_{i}, β^{0}, η^{0}) c_{1} (v_{i}) \frac{h (v_{i})}{γ (v_{i}) + h (v_{i})} w_{n}^{- γ (v_{i}) - h (v_{i})}\},

(A2)

assume that

n_{0} \to \infty

,

Ω^{- 1 / 2} R \to ε

for some nonzero constant vector ε.

Assumption A8.

The term

o (y^{- h (x, z)})

in the slowly varying function

V (y; x, z)

satisfies that when

y \to \infty

,

sup_{x \in X, z \in Z} \{y^{h (x, z)} o (y^{- h (x, z)})\} \to 0

Assumption A9.

Assume that

\begin{matrix} \underset{n \to \infty}{lim inf} \underset{ϕ_{j} \to 0^{+}}{lim inf} λ_{1 j}^{- 1} ∣ {\dot{p}}_{λ_{1 j}} (∣ ϕ_{j} ∣) ∣ > 0, \\ \underset{n \to \infty}{lim inf} \underset{∥ b_{k} ∥_{H} \to 0}{lim inf} λ_{2 k}^{- 1} ∣ {\dot{p}}_{λ_{2 k}} (∥ b_{k} ∥_{H}) ∣ > 0, \\ \underset{n \to \infty}{lim inf} \underset{∥ δ_{s} ∥_{Q} \to 0}{lim inf} λ_{3 s}^{- 1} ∣ {\dot{p}}_{λ_{3 s}} (∥ δ_{s} ∥_{Q}) ∣ > 0, \end{matrix}

(A3)

where

j = h, \dots, (p - 1) \times d, k = m + 1, \dots, q

, and

s = q + 1, \dots, d

.

Assumption A10.

Let

\begin{matrix} b_{n} & = max_{j} {∣ {\ddot{p}}_{λ_{1 j}} (∣ ϕ_{j}^{0} ∣) ∣, ∣ {\ddot{p}}_{λ_{2 k}} (∣ ∥ b_{k}^{0} ∥_{H}) ∣, ∣ {\ddot{p}}_{λ_{3 k}} (∥ δ_{k} ∥_{Q}) ∣ : \\ ϕ_{j}^{0} \neq 0, b_{k}^{0} \neq 0, δ_{k}^{0} \neq 0} . \end{matrix}

(A4)

Then,

b_{n} \to 0

, as

n_{0} \to \infty

.

Assumption A11.

For any given nonzero ω, we have

lim_{n \to \infty} n^{1 / 2} {\dot{p}}_{λ} (∣ ω ∣) = 0, lim_{n \to \infty} n^{r / (2 r + 1)} {\ddot{p}}_{λ} (∣ ω ∣) = 0

(A5)

Satisfying these regularity conditions is fundamental to guaranteeing the asymptotic properties of our estimators. Firstly, Assumption A1 [43] is a common and standard prerequisite in the single-index model framework. Secondly, Assumption A2 [37] is pivotal in ensuring the existence and uniqueness of the estimator

\hat{β}

. Moreover, Assumptions A3–A7 are essential in establishing asymptotic normality, which lays the groundwork for understanding the estimator’s properties as the sample size tends to infinity. Notably, Assumptions A5 and A6, introduced by [11], are utilized to establish the asymptotic normality of nonparametric functions, thereby ensuring the distributional characteristics of the estimates in large-sample settings. Furthermore, Assumptions A7 and A8 [8] regulate the extreme behavior of slowly varying functions, with Assumption A7 implicitly indicating the optimal convergence rate of

w_{n}

. Lastly, Assumptions A9, A10 [28], and A11 [44] precisely define the requirements for the penalty function, introducing appropriate penalties to promote sparsity in the estimation process.

Let

L_{n} (ϕ, η) = Φ_{n} (ϕ) + P_{n} (ϕ)

, where

Φ_{n} (ϕ) : = \sum_{i = 1}^{n} {log (\frac{Y_{i}}{w_{n}}) exp (η^{⊤} (ϕ^{⊤} z_{i}) x_{i}) - η^{⊤} (ϕ^{⊤} z_{i}) x_{i}} I (Y_{i} > w_{n}),

(A6)

P_{n} (ϕ) : = n_{0} \sum_{j = 1}^{(p - 1) \times d} p_{λ_{1 j}} (∣ ϕ_{j} ∣) + n_{0} \sum_{k = 1}^{d} p_{λ_{2 k}} (∥ b_{k} (ϕ) ∥_{H}) + n_{0} \sum_{k = 1}^{d} p_{λ_{3 k}} (∥ δ_{k} (ϕ) ∥) .

(A7)

To simplify the notation, we define

ℜ_{i}^{0} = log (\frac{Y_{i}}{w_{n}}) exp (α^{0} (v_{i}; ϕ^{0 *}, η^{0 *})) I (Y_{i} > w_{n}),

(A8)

and

{\hat{ℜ}}_{i} = log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*})) I (Y_{i} > w_{n}),

(A9)

where

\hat{α} (v_{i}; {\hat{β}}^{*}, {\hat{b}}^{*})

is defined in (17).

Lemma A1.

Suppose that Assumptions A1–A10 hold, and the number of knots

K = O_{p} (n_{0}^{1 / (2 r + 1)})

. Then, we have

\frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} ϖ_{i}^{*} ϖ_{i}^{* ⊤} - H_{n}^{⊤} E_{n}^{- 1} H_{n} \overset{P}{⟶} Ω,

(A10)

where

E_{n} = \frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} G_{i}^{⊤} (ϕ^{0 *}) G_{i} (ϕ^{0 *}),

(A11)

H_{n} = \frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} G_{i} (ϕ^{0 *}) ϖ_{i}^{* ⊤},

(A12)

ϖ_{i}^{*} = α^{(1)} (v_{i}; ϕ^{0 *}, η^{0 *})

and ‘P’ means the convergence in probability.

Proof of Lemma A1.

Let

{\tilde{ϖ}}_{i}^{*} = ϖ_{i}^{*} - H_{n}^{⊤} E_{n}^{- 1} G_{i}^{*} (ϕ^{0 *}), G^{*} = {(\sqrt{ℜ_{1}^{0}} G_{1}^{*} (ϕ^{0 *}), \dots, \sqrt{ℜ_{n}^{0}} G_{n}^{*} (ϕ^{0 *}))}^{⊤},

(A13)

and

ϖ^{*} = {(\sqrt{ℜ_{1}^{0}} ϖ_{1}^{*}, \dots, \sqrt{ℜ_{n}^{0}} ϖ_{n}^{*})}^{⊤}, ϖ^{*} = ϖ^{*} - Γ_{n} + Γ_{n} = : Δ_{n} + Γ_{n},

(A14)

where

\begin{matrix} Γ_{n} = (H_{1} (ϕ^{0 *}) E_{1} {(ϕ^{0 *})}^{- 1} \sqrt{ℜ_{1}^{0}} G_{1} (ϕ^{0 *}), \dots, H_{n} (ϕ^{0 *}) E_{n} {(ϕ^{0 *})}^{- 1} \sqrt{ℜ_{n}^{0}} G_{n} (ϕ^{0 *})) . \end{matrix}

(A15)

Then, a simple calculation yields

\begin{matrix} \frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} {\tilde{ϖ}}_{i}^{*} {\tilde{ϖ}}_{i}^{* ⊤} & = n_{0}^{- 1} ϖ^{* ⊤} (I - P^{⊤}) (I - P) ϖ^{*} \\ = n_{0}^{- 1} \{Δ_{n}^{⊤} Δ_{n} + Γ_{n}^{⊤} (I - P^{⊤}) (I - P) Γ_{n} + Δ_{n}^{⊤} (I - P^{⊤}) (I - P) Γ_{n} \\ + Γ_{n}^{⊤} (I - P^{⊤}) (I - P) Δ_{n} + Δ_{n}^{⊤} P^{⊤} P Δ_{n}\}, \end{matrix}

(A16)

where

P = G^{*} {(G^{* ⊤} G^{*})}^{- 1} G^{* ⊤}

. To streamline the expression, we introduce the shorthand notation

D_{k} (ϱ)

to represent the difference between

η_{k} (ϱ)

and

B^{⊤} (ϱ) b_{k}^{0}

. According to [45] [Corollary 6.21], if the functions

η_{k} (ϱ)

for

k = 1, \dots, p

satisfy Assumption A2, then there exists a positive constant C such that

\begin{matrix} sup_{ϱ} ∣ D_{k} (ϱ) ∣ & \leq C K^{- r}, \\ sup_{ϱ} ∣ {\dot{D}}_{k} (ϱ) ∣ & = sup_{ϱ} ∣ {\dot{η}}_{k} (ϱ) - {\dot{B}}^{⊤} (ϱ) b_{k}^{0} ∣ \leq C K^{- r + 1} . \end{matrix}

(A17)

Therefore, a matrix

M

can be determined that satisfies the condition

∥ Γ_{n} - G^{*} M ∥ = O_{p} (n_{0}^{1 / 2} K^{- r})

. Additionally, by taking into account the projection matrix

P

, we derive

\begin{matrix} ∥ (I - P) Γ_{n} ∥ & = ∥ Γ_{n} - G^{*} M ∥ + ∥ G^{*} M - P Γ_{n} ∥ \\ \leq 2 ∥ Γ_{n} - G^{*} M ∥ = O_{p} (n_{0}^{1 / 2} K^{- r}) . \end{matrix}

(A18)

Moreover, by direct calculations, it is derived that

E (G^{* ⊤} Δ_{n} ∣ u_{1}, \dots, u_{n}) = 0

. Then, we have

E (G^{* ⊤} Δ_{n}) = 0, and E (∥ G^{* ⊤} Δ_{n} ∥^{2}) = E (\sum_{i = 1}^{n} ∥ G_{i}^{*} Δ_{n i} ∥^{2}) = O_{p} (n_{0} K),

(A19)

where

Δ_{n i}

is the i th row of

Δ_{n}

. Hence,

∥ G^{⊤} Δ_{n} ∥ = O_{p} (n_{0}^{1 / 2} K^{1 / 2})

. After that, we have

\begin{matrix} ∥ P Δ_{n} ∥ & \leq ∥ G^{*} ∥ ∥ {(G^{⊤} G^{*})}^{- 1} ∥ ∥ G^{⊤} Δ_{n} ∥ \\ = O_{p} (n_{0}^{1 / 2} K^{- 1 / 2}) O_{p} (n_{0}^{- 1} K) O_{p} (n_{0}^{1 / 2} K^{1 / 2}) = O_{p} (K), \end{matrix}

(A20)

and

∥ (I - P) Δ_{n} ∥ = O_{p} (n_{0}^{1 / 2}) .

(A21)

Consequently, by referencing equations (A18)–(A21), we note that all but the first term on the right-hand side of (A16) are of order

o_{p} (1)

. Additionally, utilizing the law of large numbers, we deduce that the first term converges to

Ω

in probability, thereby validating the intended conclusion. □

Let

\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *}) = \frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{ϖ}}_{i}^{*} (ℜ_{i}^{0} - I (Y_{i} > w_{n}))

, then we will show that

\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})

is asymptotically normal with bias

ε

.

Lemma A2.

Assume Assumptions A1–A10 hold; we have

\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *}) = \frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{ϖ}}_{i}^{*} (ℜ_{i}^{0} - I (Y_{i} > w_{n})) \overset{d}{⟶} N (ε, I),

(A22)

where ‘

\overset{d}{⟶}

’ stands for converges in distribution.

Proof of Lemma A2.

The asymptotic normality of

{\tilde{Φ}}_{n} (ϕ^{0 *})

is derived in two stages owing to its composition as a sum of n independent and identically distributed random variables. Initially, it is demonstrated that the expectation

E \{\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})\}

converges to

ε

. Subsequently, the covariance

cov \{\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})\}

tends to the identity matrix

I

. As a result, applying the Central Limit Theorem, we establish that

\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})

converges in distribution to a normal distribution with mean

ε

and variance–covariance matrix

I

. To simplify the notation, let

v_{i} = (x_{i}, z_{i})

, and define

\begin{matrix} Q_{n 1} & = \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} exp (α (v_{i}, ϕ^{0 *})) log (Y_{i} / w_{n}) I (Y_{i} > w_{n})\} \\ Q_{n 2} & = \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} I (Y_{i} > w_{n})\} . \end{matrix}

(A23)

Then,

E \{\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})\} = Q_{n 1} - Q_{n 2} .

Subsequently, by Assumption A8, the

Q_{n 1}

can be written as

\begin{matrix} Q_{n 1} & = \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} exp (α (v_{i}, ϕ^{0 *})) \int_{0}^{\infty} P (log (Y_{i} / w_{n}) > t) d t\} \\ = \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} exp (α (v_{i}, ϕ^{0 *})) \int_{0}^{\infty} w_{n}^{- γ (v_{i})} e^{- γ (v_{i}) t} V (w_{n} e^{t}, v_{i}) d t\} \\ = \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{(\frac{{\tilde{ϖ}}_{i}^{*} exp (α (v_{i}, ϕ^{0 *}))}{w_{n}^{γ (v_{i})}}) \\ \times \int_{0}^{\infty} e^{- γ (v_{i}) t} \{c_{0} (v_{i}) + c_{1} (v_{i}) w_{n}^{- h (v_{i})} e^{- h (v_{i}) t}\} d t\} \times {1 + o (1)} \\ = & \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} [E \{{\tilde{ϖ}}_{i}^{*} c_{0} (v_{i}) w_{n}^{- γ (v_{i})} \int_{0}^{\infty} γ (v_{i}) exp [- γ (v_{i}) t] d t\} \\ + E \{{\tilde{ϖ}}_{i}^{*} c_{1} (v_{i}) w_{n}^{- γ (v_{i}) - h (v_{i})} \int_{0}^{\infty} γ (v_{i}) exp [- t {γ (v_{i}) + h (v_{i})}] d t\}] \times {1 + o (1)} \\ = & \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} [E \{{\tilde{ϖ}}_{i}^{*} c_{0} (v_{i}) w_{n}^{- γ (v_{i})}\} \\ + E \{{\tilde{ϖ}}_{i}^{*} c_{1} (v_{i}) \frac{γ (v_{i})}{γ (v_{i}) + h (v_{i})} w_{n}^{- γ (v_{i}) - h (v_{i})}\}] \times {1 + o (1)} . \end{matrix}

(A24)

Analogously, we have

\begin{matrix} Q_{n 2} = & \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} w_{n}^{- γ (v_{i})} V (w_{n}, v_{i})\} \\ = & \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} c_{0} (v_{i}) w_{n}^{- γ (v_{i})}\} + \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} c_{1} (v_{i}) w_{n}^{- γ (v_{i}) - h (v_{i})}\} + o (1) . \end{matrix}

(A25)

Assumption A7 together with (A24) and (A25) implies that

\begin{matrix} Q_{n 1} - Q_{n 2} = \frac{n}{\sqrt{n_{0}}} Ω^{- 1 / 2} E \{{\tilde{ϖ}}_{i}^{*} c_{1} (v_{i}) \frac{h (v_{i})}{γ (v_{i}) + h (v_{i})} w_{n}^{- γ (v_{i}) - h (v_{i})}\} \\ + o (1) \to ε . \end{matrix}

(A26)

This completes the proof of the first step.

Step 2. We evaluate

cov {\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})}

, which is given by

\begin{matrix} cov \{\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})\} \\ = \frac{n}{n_{0}} Ω^{- 1} cov \{(exp (α (v_{i}, ϕ^{0 *})) log (Y_{i} / w_{n}) - 1) {\tilde{ϖ}}_{i}^{*} I (Y_{i} > w_{n})\} \\ = \frac{n}{n_{0}} Ω^{- 1} E [{(exp (α (v_{i}, ϕ^{0 *})) log (Y_{i} / w_{n}) - 1)}^{2} {\tilde{ϖ}}_{i}^{*} {\tilde{ϖ}}_{i}^{* ⊤} I (Y_{i} > w_{n})] \\ - \frac{n}{n_{0}} ε_{n} ε_{n}^{⊤}, \end{matrix}

(A27)

where

ε_{n} = Q_{n 1} - Q_{n 2}

. Based on the outcome of Step 1, it follows that

\frac{n}{\sqrt{n_{0}}} ε_{n} \to ε

. This implies that

n n_{0}^{- 1} ε_{n} ε_{n}^{⊤} \to 0

. Accordingly,

\begin{matrix} cov \{\frac{1}{\sqrt{n_{0}}} Ω^{- 1 / 2} {\tilde{Φ}}_{n} (ϕ^{0 *})\} \\ = (\frac{n}{n_{0}}) Ω^{- 1} P (Y_{i} > w_{n}) \\ \times E [{(exp (α (v_{i}, ϕ^{0 *})) log (Y_{i} / w_{n}) - 1)}^{2} {\tilde{ϖ}}_{i}^{*} {\tilde{ϖ}}_{i}^{* ⊤} ∣ Y_{i} > w_{n}] \\ + o (1) . \end{matrix}

(A28)

As illustrated by [8],

n n_{0}^{- 1} P (Y_{i} > w_{n})

converges to 1. Furthermore, for sufficiently large

w_{n}

such that

Y_{i} > w_{n}

, the quantity

exp (α (v_{i}, ϕ^{0 *})) log (Y_{i} / w_{n})

approximates a standard exponential distribution. Additionally, utilizing analogous methods from Step 1’s verification and considering Assumption A7, we can establish that the expression in (A28) satisfies

E ({\tilde{ϖ}}_{i}^{*} {\tilde{ϖ}}_{i}^{* ⊤} ∣ Y_{i} > w_{n}) + o (1) \to Ω

as

w_{n} \to \infty

. This completes the proof. □

Proof of Theorem 1.

(i) Define

τ = n_{0}^{- r / (2 r + 1)} + a_{n} + R_{w n}

, and

ϕ = ϕ^{0} + τ ρ

and

τ_{n}^{*} = ϕ^{0}

. Then, the objective function

L_{n} (ϕ)

can be written as

L_{n} (ϕ) = \sum_{i = 1}^{n} log (Y_{i} / w_{n}) [exp \{α (v_{i}, ϕ)\} - α (v_{i}, ϕ)] I (Y_{i} > ω_{n}) + P_{n} (ϕ) .

(A29)

In addition, let the minimizer of the above objective function be

\hat{ϕ}

. The Hessian matrix of

L_{n} (ϕ)

is

\begin{matrix} {\ddot{L}}_{n} (ϕ) & = \sum_{i = 1}^{n} \{log (Y_{i} / w_{n}) \dot{α} {(v_{i}, ϕ)}^{\otimes 2} exp \{α (v_{i}, ϕ)\} \\ + \ddot{α} (v_{i}, ϕ) (log (Y_{i} / w_{n}) exp \{α (v_{i}, ϕ)\} - 1)\} I (Y_{i} > ω_{n}) \\ + {\ddot{P}}_{n} (ϕ) \end{matrix}

(A30)

which is a positive definite matrix for any

ϕ \in R^{(p - 1) \times d}

. Since

L_{n} (ϕ)

is a strictly convex function of

ϕ

, we can deduce that if

L_{n} (ϕ)

possesses at least one local minimizer whose magnitude is on the order of

O_{p} (τ)

, then this local minimizer necessarily corresponds to the global minimizer. Specifically, the difference between the estimated parameter

\hat{ϕ}

and the true parameter

ϕ^{0}

satisfies

\hat{ϕ} - ϕ^{0} = O_{p} (τ)

. To further analysis this, let us consider the difference

L_{n} (ϕ) - L_{n} (ϕ^{0})

. Applying the second-order Taylor expansion, we obtain:

n_{0}^{- 1 / 2} \{L_{n} (ϕ) - L_{n} (ϕ^{0})\} = n_{0}^{- 1 / 2} τ ρ^{⊤} {\dot{L}}_{n} (ϕ^{0}) + τ^{2} ρ^{⊤} \{n_{0}^{- 1} {\ddot{L}}_{n} (ϕ^{0})\} ρ / 2 + o_{p} (1)

(A31)

Furthermore, applying Lemma A2 and Slutsky’s theorem, we obtain that

n_{0}^{- 1 / 2} {\dot{L}}_{n} (ϕ^{0}) = n_{0}^{- 1 / 2} {\dot{Φ}}_{n} (ϕ^{0}) + n_{0}^{- 1 / 2} \dot{P} \to_{d} N (R + n_{0}^{- 1 / 2} \dot{P}, \frac{Ω}{{(Ω + \ddot{P})}^{\otimes 2}})

(A32)

and

n_{0}^{- 1} {\ddot{L}}_{n} (ϕ^{0}) \to_{p} Ω + \ddot{P} .

(A33)

Hence, provided that the constant C is adequately large, the quadratic term

\frac{1}{2} ρ^{⊤} \{n_{0}^{- 1} {\ddot{L}}_{n} (ϕ^{0})\} ρ

dominates the linear term

n_{0}^{- 1 / 2} ρ^{⊤} {\dot{L}}_{n} (ϕ^{0})

with overwhelming probability. In particular, for any positive

ϵ

, there exists a threshold C such that

\underset{n}{lim inf} P (inf_{∥ ρ ∥ = C} L_{n} (ϕ) > L_{n} (ϕ^{0})) \geq 1 - ϵ .

(A34)

This implies that

L_{n} (ϕ)

has at least one local minimizer, which is

O_{p} (τ)

(see [28]).

(ii) For given

\hat{ϕ}

, Let

\hat{b} = b (\hat{ϕ}) = b^{0} (\hat{ϕ}) + δ ρ_{b} .

(A35)

Subsequently, we will establish that for any positive

ϵ

, there exists a sufficiently large constant C satisfying the following condition:

P \{inf_{∥ ρ_{b} ∥ = C} L_{n} (b) > L_{n} (b^{0})\} \geq 1 - ϵ,

(A36)

where

b^{0}

is the true value of

b

. Follows previous notation, let

L_{n} (b) = Φ (b) + P_{n} (b)

, where

\begin{matrix} Φ_{n} (b) : = \sum_{i = 1}^{n} \{log (\frac{Y_{i}}{w_{n}}) exp (G_{i}^{⊤} (ϕ) b) - (G_{i}^{⊤} (ϕ) b)\} I (Y_{i} > w_{n}), \\ P_{n} (b) : = n_{0} \sum_{j = 1}^{(p - 1) \times d} p_{λ_{1 j}} (∥ ϕ_{j} ∥) + n_{0} \sum_{k = 1}^{d} p_{λ_{2 k}} (∥ b_{k} {(ϕ) ∥}_{H}) + n_{0} \sum_{k = 1}^{d} p_{λ_{3 k}} (∥ δ_{k} (ϕ) ∥_{Q}) . \end{matrix}

(A37)

Furthermore, let us define

T_{k} (ρ_{b}) = K^{- 1} \{L_{n} (b) - L_{n} (b^{0})\}

. By utilizing the Taylor expansion and a straightforward calculation, we obtain

\begin{matrix} T_{k} (ρ_{b}) = & \frac{1}{K} \{L_{n} (b^{0} + δ ρ_{b}) - L_{n} (b^{0})\} \\ \geq & \frac{1}{K} δ ρ_{b} [{\dot{Φ}}_{n} (b^{0})] + \frac{1}{2 K} ρ_{b}^{⊤} [{\ddot{Φ}}_{n} (b^{0})] ρ_{b} δ^{2} + o_{p} (1) \\ + \frac{n_{0}}{K} \sum_{k = 1}^{d} [p_{λ_{2 k}} (∥ b_{k} ∥_{H}) - p_{λ_{2 k}} (∥ b_{k}^{0} ∥_{H})] \\ + \frac{n_{0}}{K} \sum_{k = 1}^{d} [p_{λ_{3 k}} (∥ δ_{k} ∥_{H}) - p_{λ_{3 k}} (∥ δ_{k}^{0} ∥_{H})] \\ = & S_{1} + S_{2} + S_{3} + S_{4} + o_{p} (1) . \end{matrix}

(A38)

Continue to use the previous symbol

\begin{matrix} {\dot{Φ}}_{n} (b^{0}) & = \sum_{i = 1}^{n} \{\dot{\hat{α}} (v_{i}, b^{0}) (log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}, b^{0})) - 1)\} I (Y_{i} > w_{n}) \\ = \sum_{i = 1}^{n} \dot{\hat{α}} (v_{i}; b^{0}) \{ℜ_{i}^{0} - I (Y_{i} > w_{n})\} + \sum_{i = 1}^{n} \dot{\hat{α}} (v_{i}; b^{0}) \{{\hat{ℜ}}_{i} - ℜ_{i}^{0}\} \\ : = E_{1} + E_{2} . \end{matrix}

(A39)

where

\hat{α}

is defined in (17). By the Lemma A2, we know that

E_{1} = O_{p} (\sqrt{n_{0}} R)

. As (A17),

\begin{matrix} E_{2} = & \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}, b^{0}) {{\hat{ℜ}}_{i} - ℜ_{i}^{0}} \\ = & \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}, b^{0}) {ℜ_{i}^{0} (\hat{α} (v_{i}, b^{0}) - α (v_{i}, b^{0}, η^{0}))} \\ = & O_{p} (n_{0} K^{- r}) . \end{matrix}

(A40)

Then,

S_{1} = O_{p} (\frac{τ}{K} [n_{0} R + n_{0} K^{- r}]) ∥ ρ_{b} ∥ .

(A41)

Similarly,

\begin{matrix} {\ddot{Φ}}_{n} (b^{0}) & = \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}, b^{0}) {(log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}, b^{0}))) {\hat{α}}^{(1)} {(v_{i}, b^{0})}^{⊤}} I (Y_{i} > w_{n}) \\ + \sum_{i = 1}^{n} {\hat{α}}^{(2)} (v_{i}, b^{0}) {(log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}, b^{0})) - 1)} I (Y_{i} > w_{n}) \\ = \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}, b^{0}) {\hat{ℜ}}_{i} {\hat{α}}^{(1)} {(v_{i}, b^{0})}^{⊤} - \sum_{i = 1}^{n} α^{(1)} (v_{i}, η^{0}) ℜ_{i}^{0} α^{(1)} {(v_{i}, η^{0})}^{⊤} \\ + \sum_{i = 1}^{n} {\hat{α}}^{(2)} (v_{i}, b^{0}) {{\hat{ℜ}}_{i} - I (Y_{i} > w_{n})} - \sum_{i = 1}^{n} α^{(2)} (v_{i}, η^{0}) {ℜ_{i}^{0} - I (Y_{i} > w_{n})} \\ + \sum_{i = 1}^{n} α^{(1)} (v_{i}, η^{0}) ℜ_{i}^{0} α^{(1)} {(v_{i}, η^{0})}^{⊤} + \sum_{i = 1}^{n} α^{(2)} (v_{i}, η^{0}) {ℜ_{i}^{0} - I (Y_{i} > w_{n})} \\ : = E_{4} + E_{5} + E_{6} . \end{matrix}

(A42)

As

\begin{matrix} E_{4} & = \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}, b^{0}) {\hat{ℜ}}_{i} {\hat{α}}^{(1)} {(v_{i}, b^{0})}^{⊤} - \sum_{i = 1}^{n} α^{(1)} (v_{i}, η^{0}) ℜ_{i}^{0} α^{(1)} {(v_{i}, η^{0})}^{⊤} \\ = \sum_{i = 1}^{n} ({\hat{α}}^{(1)} (v_{i}; b^{0}) - α^{(1)} (v_{i}, η^{0})) {\hat{ℜ}}_{i} {\hat{α}}^{(1)} {(v_{i}, b^{0})}^{⊤} \\ + α^{(1)} (v_{i}, η^{0}) ({\hat{ℜ}}_{i} - ℜ_{i}^{0}) {\hat{α}}^{(1)} {(v_{i}, b^{0})}^{⊤} \\ + α^{(1)} (v_{i}, η^{0}) ℜ_{i}^{0} ({\hat{α}}^{(1)} (v_{i}; b^{0}) - α^{(1)} {(v_{i}, η^{0})}^{⊤}) \\ = n_{0} O_{p} (K^{- r}), \end{matrix}

(A43)

and

\begin{matrix} E_{5} & = \sum_{i = 1}^{n} {\hat{α}}^{(2)} (v_{i}, b^{0}) {{\hat{ℜ}}_{i} - I (Y_{i} > w_{n})} - \sum_{i = 1}^{n} α^{(2)} (v_{i}, η^{0}) {ℜ_{i}^{0} - I (Y_{i} > w_{n})} \\ = n_{0} O_{p} (K^{- r + 2}), \end{matrix}

(A44)

then from Lemma A2 and Assumption A7,

S_{2} = \frac{1}{2} τ^{2} O_{p} (n_{0} [R^{2} + K^{- r + 2}]) {∥ ρ_{b} ∥}^{2} .

(A45)

Furthermore, invoking

p_{λ} (0) = 0

, and by the standard argument of the Taylor expansion,

\begin{matrix} S_{3} & ⩽ \frac{n_{0}}{K} \sum_{k = 1}^{d} [τ {\dot{p}}_{λ_{2 k}} (∥ b_{k}^{0} ∥) sgn (b_{k}^{0}) ∥ ρ_{b k} ∥ + τ^{2} {\ddot{p}}_{λ_{2 k}} (∥ b_{k}^{0} ∥) ∥ ρ_{b k} ∥^{2} {1 + o (1)}] \\ ⩽ \sqrt{s - 1} K^{- 1} n_{0} τ a_{n} ∥ ρ_{b k} ∥ + n_{0} K^{- 1} τ^{2} b_{n} {∥ ρ_{b k} ∥}^{2} . \end{matrix}

(A46)

where

b_{n}

is defined in (A4). Hence, by selecting a sufficiently large C, we observe that

S_{2}

uniformly dominates

S_{1}

and

S_{3}

for all

∥ ρ_{b} ∥ = C

. Following a similar reasoning, we can establish that

S_{4}

is also uniformly dominated by

S_{2}

for the same

∥ ρ_{b} ∥ = C

. Consequently, with an adequate choice of C, the condition (A36) is satisfied. Therefore, there exist local minimizers

\hat{b}

such that

∥ \hat{b} - b^{0} ∥ = O_{p} (τ) .

(A47)

Note that

\begin{matrix} ∥ {\hat{η}}_{k} (u) - η_{k 0} {(u) ∥}^{2} & = \int_{U} {\{{\hat{η}}_{k} (u) - η_{k 0} (u)\}}^{2} d u \\ = \int_{U} {\{B^{⊤} (u) {\hat{b}}_{k} - B^{⊤} (u) b_{k}^{0} + D_{k} (u)\}}^{2} d u \\ ⩽ 2 \int_{U} {\{B^{⊤} (u) {\hat{b}}_{k} - B^{⊤} (u) b_{k}^{0}\}}^{2} d u + 2 \int_{U} D_{k}^{2} (u) d u \\ = 2 {({\hat{b}}_{k} - b_{k}^{0})}^{⊤} H ({\hat{b}}_{k} - b_{k}^{0}) + 2 \int_{U} D_{k}^{2} (u) d u, \end{matrix}

(A48)

Then, invoking

∥ H ∥ = O (1)

, a simple calculation yields

{({\hat{b}}_{k} - b_{k}^{0})}^{T} H ({\hat{b}}_{k} - b_{k}^{0}) = τ^{2} {∥ ρ_{b} ∥}^{2},

(A49)

In addition, it is easy to show that

\int_{U} D_{k}^{2} (u) d u = O_{p} (n^{- 2 r / (2 r + 1)}) .

(A50)

By utilizing Equations (A49) and (A50), the demonstration of (ii) is concluded. □

Proof of Theorem 2.

(i) As

λ_{max}

approaches zero, it becomes evident that

a_{n}

tends to zero for sufficiently large n. Consequently, according to Theorem 1, it suffices to demonstrate that for any

ϕ_{j}

that fulfills the condition

∥ ϕ_{j} - ϕ_{j}^{0} ∥ = O_{p} (n_{0}^{- r / (2 r + 1)} + R_{w n}), j = 1, \dots, h,

(A51)

and some given small

ϵ = C n_{0}^{- r / (2 r + 1)}

, when

n_{0} \to \infty

, with probability approaching one, we have

\frac{\partial L_{n} (ϕ)}{\partial ϕ_{j}} \{\begin{matrix} > 0, & 0 < ϕ_{j} < ϵ, \\ < 0, & - ϵ < ϕ_{j} < 0, \end{matrix}\} j = h + 1, \dots, (p - 1) \times d .

(A52)

A simple calculation yields

\begin{matrix} \frac{\partial L (ϕ)}{\partial ϕ_{j}} = & \frac{\partial Φ_{n} (ϕ)}{\partial ϕ_{j}} + n_{0} p_{λ_{1 j}^{'}} (∥ ϕ_{j} ∥) sgn (ϕ_{j}) \\ = & \frac{\partial Φ_{n} (ϕ^{0})}{\partial ϕ_{j}} + \sum_{k = 1}^{(p - 1) \times d} \frac{\partial^{2} Φ_{n} (ϕ^{0})}{\partial ϕ_{j} \partial ϕ_{k}} (ϕ_{k} - ϕ_{k}^{0}) + \sum_{o = 1}^{(p - 1) \times d} \sum_{k = 1}^{(p - 1) \times d} \frac{\partial^{3} Φ_{n} (\tilde{ϕ})}{\partial ϕ_{j} \partial ϕ_{k} ϕ_{o}} \\ \times (ϕ_{k} - ϕ_{k}^{0}) (ϕ_{o} - ϕ_{o}^{0}) + n_{0} p_{λ_{1 j}^{'}}^{'} (∥ ϕ_{j} ∥) sgn (ϕ_{j}), \end{matrix}

(A53)

where

\tilde{ϕ}

lies between

ϕ

and

ϕ^{0}

. Note that by Lemma A2,

n_{0}^{- 1} \frac{\partial Φ_{n} (ϕ^{0})}{\partial ϕ_{j}} = O_{p} (n_{0}^{- 1 / 2})

(A54)

and

\frac{1}{n_{0}} \frac{\partial^{2} L (ϕ^{0})}{\partial ϕ_{j} \partial ϕ_{k}} = E \{\frac{\partial^{2} L (ϕ^{0})}{\partial ϕ_{j} \partial ϕ_{k}}\} + o_{p} (1),

(A55)

by the Theorem 1 that

ϕ - ϕ^{0} = O_{p} (n_{0}^{- r / (2 r + 1)} + R_{w n})

, we have

\frac{\partial L (ϕ)}{\partial ϕ_{j}} = n_{0} λ_{2 j} \{λ_{1 j}^{- 1} {\dot{p}}_{λ_{1 j}} (∥ ϕ_{j} ∥) sgn (ϕ_{j}) + O_{p} ((n_{0}^{- r / (2 r + 1)} + R_{w n}) λ_{1 j}^{- 1})\} .

(A56)

Since

lim_{n_{0} \to \infty} \underset{ϕ_{j} \to 0}{lim inf} λ_{1 j}^{- 1} {\dot{p}}_{λ_{1 j}} (∥ ϕ_{j} ∥) > 0, λ_{1 j} n_{0}^{r / (2 r + 1)} ⩾ λ_{min} n_{0}^{r / (2 r + 1)} \to \infty,

(A57)

The sign of the derivative is solely dictated by the sign of

ϕ_{j}

. Consequently, (A53) is satisfied, thus completing the proof of part (i).

Subsequently, we proceed to prove part (ii). Given that

{\hat{η}}_{k} (u) = B^{⊤} (u) {\hat{b}}_{k}

and

{\hat{\dot{η}}}_{k} (u) = D_{k}^{⊤} {\hat{δ}}_{k}

, to establish part (ii), it suffices to demonstrate that for any arbitrary

ϵ > 0

, there exists a sufficiently large

n_{0}

such that

P (A_{n k}) < ϵ

and

P (S_{n j}) < ϵ

, where

A_{n k} = \{{∥ {\hat{b}}_{k} ∥}_{H} \neq 0\}

and

S_{n j} = \{{∥ {\hat{δ}}_{j} ∥}_{Q} \neq 0\}

for

k = q + 1, \dots, d

and

j = m + 1, \dots, d

. Utilizing the properties of B-splines, we can deduce that

∥ {\hat{b}}_{k} ∥_{H} = ∥ {\hat{b}}_{k} - b_{k}^{0} ∥_{H} = ∥ {\hat{b}}_{k} - b_{k}^{0} ∥ O (K^{- 1 / 2}) = O_{p} (n_{0}^{- r / (2 r + 1)}) O (n_{0}^{- 1 / 2 (2 r + 1)}) = O_{p} (n_{0}^{- 1 / 2})

(A58)

for

k = q + 1, \dots, d

. Then when

n_{0}

is large enough, there exists some C such that

P (A_{n k}) < ϵ / 2 + P \{∥ {\hat{b}}_{k} ∥_{H} \neq 0 : {∥ {\hat{b}}_{k} ∥}_{H} < C n_{0}^{- 1 / 2}\} .

(A59)

If

∥ {\hat{b}}_{k} ∥_{H} \neq 0

, then by Assumption A11, we can prove that, when

n_{0}

is large enough, there exists some C such that

P (n^{1 / 2} {\dot{p}}_{λ_{2 k}} (∥ {\hat{b}}_{k} ∥_{H}) > C) < ϵ / 2 .

(A60)

In addition, by Assumption A11, we get that

\begin{matrix} inf_{∥ {\dot{b}}_{k} ∥_{H} \leq C n_{0}^{- 1 / 2}} n_{0}^{1 / 2} {\dot{p}}_{λ_{2 k}} (∥ {\hat{b}}_{k} ∥_{H}) & \geq inf_{∥ {\dot{b}}_{k} ∥_{H} \leq c_{n}} n_{0}^{1 / 2} {\dot{p}}_{λ_{2 k}} (∥ {\hat{b}}_{k} ∥_{H}) \\ = n_{0}^{1 / 2} λ_{2 k} inf_{∥ {\dot{b}}_{k} ∥_{H} \leq c_{n}} λ_{2 k}^{- 1} {\dot{λ}}_{λ_{2 k}} (∥ {\hat{b}}_{k} ∥_{H}) \to \infty, \end{matrix}

(A61)

where

c_{n} = C n_{0}^{- \frac{2 r - 1}{2 (2 r + 1)}}

. That is, if

{∥ {\hat{b}}_{k} ∥}_{H} \neq 0

and

{∥ {\hat{b}}_{k} ∥}_{H} < C n_{0}^{- 1 / 2}

, then for sufficiently large n, it follows that

n_{0}^{1 / 2} {\dot{p}}_{λ_{2 k}} ({∥ {\hat{b}}_{k} ∥}_{H}) > C

. Consequently, in conjunction with (A59) and (A60), we obtain

P (A_{n k}) < ϵ / 2 + P (n_{0}^{1 / 2} {\dot{p}}_{λ_{2 k}} (∥ {\hat{b}}_{k} ∥_{H}) > C) < ϵ .

(A62)

On the other hand, utilizing the relationship

∥ \hat{b} - b^{0} ∥ = O_{p} (n_{0}^{- r / (2 r + 1)})

, a straightforward computation yields

∥ {\hat{δ}}_{k} - δ_{k}^{0} ∥ = ∥ A ({\hat{b}}_{k} - b_{k}^{0}) ∥ = O_{p} (n_{0}^{- \frac{r}{2 r + 1}})

for

k = 1, \dots, d

. Therefore, for

j = m + 1, \dots, d

, we have

∥ {\hat{δ}}_{j} ∥_{Q} = ∥ {\hat{δ}}_{j} - δ_{j 0} ∥_{Q} = ∥ {\hat{δ}}_{j} - δ_{j 0} ∥ O (K^{1 / 2}) = O_{p} (n_{0}^{- \frac{r}{2 r + 1}}) O (n_{0}^{\frac{1}{2 (2 r + 1)}}) = O_{p} (n_{0}^{- \frac{2 r - 1}{2 (2 r + 1)}})

. Consequently, for sufficiently large

n_{0}

, there exists a constant C such that

P (S_{n j}) < ϵ / 2 + P \{∥ {\hat{δ}}_{j} ∥_{Q} \neq 0 : {∥ {\hat{δ}}_{j} ∥}_{Q} < C n_{0}^{- \frac{2 r - 1}{2 (2 r + 1)}}\} .

(A63)

If

∥ {\hat{δ}}_{j} ∥_{Q} \neq 0

, then under the assumption stated in A11, we can establish that, for sufficiently large n, there exists a constant C such that

P (n_{0}^{\frac{2 r - 1}{2 (2 r + 1)}} {\dot{p}}_{λ_{3 j}} (∥ {\hat{δ}}_{j} ∥_{Q}) > C) < ϵ / 2 .

(A64)

In addition, by Assumption A9, we get that

inf_{∥ {\hat{δ}}_{j} ∥_{Q} \leq c_{n}} n_{0}^{\frac{2 r - 1}{2 (2 r + 1)}} {\dot{p}}_{λ_{3 j}} (∥ {\hat{δ}}_{j} ∥_{Q}) = n_{0}^{\frac{2 r - 1}{2 (2 π + 1)}} λ_{3 j} inf_{∥ {\hat{δ}}_{j} ∥_{Q} \leq c_{n}} λ_{3 j}^{- 1} {\dot{p}}_{λ_{3 j}} (∥ {\hat{δ}}_{j} ∥_{Q}) \to \infty

(A65)

In other words, if the norm of

{\hat{δ}}_{j}

is nonzero and bounded above by

C n_{0}^{- \frac{2 r - 1}{2 (2 r + 1)}}

, then for sufficiently large

n_{0}

,

n_{0}^{\frac{2 r - 1}{2 (2 r + 1)}} {\dot{p}}_{λ_{3 j}} (∥ {\hat{δ}}_{j} ∥_{Q}) > C

. Consequently, combining (A63) and (A64), we obtain

P (S_{n j}) < ϵ

, indicating that

{\hat{η}}_{j} (u)

are nonzero constants for

j = m + 1, \dots, d

. This concludes the proof of Theorem 2. □

Proof of Theorem 3.

By Theorems 1 and 2, it is apparent that as

n_{0} \to \infty

, with probability tending to one,

L_{n} (ϕ, b)

attains its minimum at

{({\hat{ϕ}}^{* ⊤}, 0)}^{⊤}

and

{({\hat{b}}^{* ⊤}, 0)}^{⊤}

. Let

L_{1 n} (ϕ, b) = \frac{\partial L_{n} (ϕ, b)}{\partial ϕ^{*}}, L_{2 n} (ϕ, b) = \frac{\partial L_{n} (ϕ, b)}{\partial b^{*}} .

(A66)

Then,

{({\hat{ϕ}}^{* ⊤}, 0)}^{⊤}

and

{({\hat{b}}^{* ⊤}, 0)}^{⊤}

must satisfy

\begin{matrix} \frac{1}{n_{0}} L_{1 n} ({({\hat{ϕ}}^{* ⊤}, 0)}^{⊤}, {({\hat{b}}^{* ⊤}, 0)}^{⊤}) \\ = \frac{1}{n_{0}} \sum_{i = 1}^{n} {{\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) (log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*})) - 1)} I (Y_{i} > w_{n}) + C_{1} \\ = 0, \end{matrix}

(A67)

and

\begin{matrix} \frac{1}{n_{0}} L_{2 n} ({({\hat{ϕ}}^{* ⊤}, 0)}^{⊤}, {({\hat{b}}^{* ⊤}, 0)}^{⊤}) \\ = \frac{1}{n_{0}} \sum_{i = 1}^{n} {G_{i}^{*} ({\hat{ϕ}}^{*}) (log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*})) - 1)} I (Y_{i} > w_{n}) + C_{2} \\ = 0, \end{matrix}

(A68)

where

\hat{α} (v_{i}; ϕ, b)

is defined at (17),

{\hat{α}}^{(1)} (v_{i}; ϕ, b) = {[{\dot{G}}^{⊤} (β_{k}^{⊤} (ϕ_{k}) z_{i}) b [- (1 - ∥ ϕ_{k} {∥^{2})}^{- 1 / 2} ϕ : I_{q - 1}] z_{i}, 1 \leq k \leq d]}^{⊤}

(A69)

{\dot{G}}^{⊤} (β^{⊤} (ϕ) = vec (\dot{B} (β^{⊤} (ϕ) z_{i}) diag (x_{i}))

(A70)

and

C_{1} = ({\dot{p}}_{λ_{11}} (∣ {\hat{ϕ}}_{1} ∣) sgn ({\hat{ϕ}}_{1}), \dots, {\dot{p}}_{λ_{1 h - 1}} (∣ {\hat{ϕ}}_{h - 1} ∣) sgn ({\hat{ϕ}}_{h - 1})),

(A71)

\begin{matrix} C_{2} = & ({\dot{p}}_{λ_{21}} (∥ b_{1} ∥) sgn (b_{1}), \dots, {\dot{p}}_{λ_{2 q}} (∥ b_{q} ∥) sgn (b_{q}), \\ {{\dot{p}}_{λ_{31}} (∥ δ_{1} ∥) sgn (δ_{1}), \dots, {\dot{p}}_{λ_{3 m}} (∥ δ_{m} ∥) sgn (δ_{m}))}^{⊤} . \end{matrix}

(A72)

By applying the Taylor expansion to

{\dot{p}}_{λ_{1 j}} (| {\hat{ϕ}}_{j} |)

, we obtain

{\dot{p}}_{λ_{1 j}} (∣ {\hat{ϕ}}_{j} ∣) = {\dot{p}}_{λ_{1 j}} (∣ ϕ_{j}^{0} ∣) + \{{\ddot{p}}_{λ_{1 j}} (∣ ϕ_{l j}^{0} ∣) + o_{p} (1)\} ({\hat{ϕ}}_{j} - ϕ_{j}^{0}) .

(A73)

Furthermore, Assumption A10 implies that

{\ddot{p}}_{λ_{1 j}} (| ϕ_{j}^{0} |) = o_{p} (1)

, and since

{\dot{p}}_{λ_{1 j}} (| ϕ_{j}^{0} |) = 0

as

λ_{max} \to 0

, by Theorems 1 and 2, we have

{\dot{p}}_{λ_{1 j}} (∣ {\hat{ϕ}}_{j} ∣) sgn ({\hat{ϕ}}_{j}) = o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) .

(A74)

Similarly, we can show

{\dot{p}}_{λ_{2 k}} (∥ {\hat{b}}_{k} ∥_{H}) \frac{H {\hat{b}}_{k}}{∥ {\hat{b}}_{k} ∥_{H}} = o_{p} ({\hat{b}}^{*} - b^{0 *})

(A75)

and

{\dot{p}}_{λ_{3 s}} (∥ {\hat{b}}_{s} ∥_{Q}) \frac{Q {\hat{b}}_{s}}{∥ {\hat{b}}_{s} ∥_{Q}} = o_{p} ({\hat{b}}^{*} - b^{0 *}) .

(A76)

Then, denote

Λ_{i} = {\hat{ℜ}}_{i} - ℜ_{i}^{0}

,

\begin{matrix} Λ_{i} & = log (\frac{Y_{i}}{w_{n}}) {exp (\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*})) - exp (α^{0} (v_{i}; ϕ^{0 *}, η^{0 *}))} I (Y_{i} > w_{n}) \\ = log (\frac{Y_{i}}{w_{n}}) exp (α^{0} (v_{i}; ϕ^{0 *}, η^{0 *})) {\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) - α^{0} (v_{i}; ϕ^{0 *}, η^{0 *})} I (Y_{i} > w_{n}) \\ + o_{p} (\hat{α} - α^{0}) \\ = ℜ_{i}^{0} {\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) - α^{0} (v_{i}; ϕ^{0 *}, η^{0 *})} + o_{p} (\hat{α} - α^{0}) \\ = ℜ_{i}^{0} {G_{i}^{⊤} ({\hat{ϕ}}^{*}) {\hat{b}}^{*} - η^{0 ⊤} (ϕ^{0 * ⊤} u_{i}) x_{i}} + o_{p} (\hat{α} - α^{0}) . \end{matrix}

(A77)

then

\begin{matrix} G_{i}^{⊤} ({\hat{ϕ}}^{*}) {\hat{b}}^{*} - η^{0 ⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} & = G_{i}^{⊤} (ϕ^{0 *}) b^{0 *} - η^{0 ⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} \\ + (G_{i}^{⊤} ({\hat{ϕ}}^{*}) - G_{i}^{⊤} (ϕ^{0 *})) b^{0 *} + G_{i}^{⊤} ({\hat{ϕ}}^{*}) ({\hat{b}}^{*} - b^{0 *}) \\ = G_{i}^{⊤} (ϕ^{0 *}) b^{0 *} - η^{0 ⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} + {\dot{G}}_{i}^{⊤} (ϕ^{0 *}) b^{0 *} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ + G_{i}^{⊤} ({\hat{ϕ}}^{*}) ({\hat{b}}^{*} - b^{0 *}) + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} + {\dot{η}}^{0 ⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ + G_{i}^{⊤} ({\hat{ϕ}}^{*}) ({\hat{b}}^{*} - b^{0 *}) + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) . \end{matrix}

(A78)

We can get

\begin{matrix} Λ_{i} & = ℜ_{i}^{0} {D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} + α^{(1) ⊤} (v_{i}; ϕ^{0 *}, η^{0 *}) ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ + G_{i}^{⊤} ({\hat{ϕ}}^{*}) ({\hat{b}}^{*} - b^{0 *})} + o_{p} (\hat{α} - α^{0}) . \end{matrix}

(A79)

Hence, by (A68), a simple calculation yields

\begin{matrix} \frac{1}{n_{0}} \sum_{i = 1}^{n} {G_{i}^{*} ({\hat{ϕ}}^{*}) (log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*})) - 1)} I (Y_{i} > w_{n}) \\ = & \frac{1}{n_{0}} \sum_{i = 1}^{n} G_{i} ({\hat{ϕ}}^{*}) \{ℜ_{i}^{0} - I (Y_{i} > w_{n}) + Λ_{i}\} \\ = & \frac{1}{n_{0}} \sum_{i = 1}^{n} G_{i} ({\hat{ϕ}}^{*}) {ℜ_{i}^{0} - I (Y_{i} > w_{n}) + ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i}} \\ + & \frac{1}{n_{0}} \sum_{i = 1}^{n} G_{i} ({\hat{ϕ}}^{*}) ℜ_{i}^{0} {α^{(1) ⊤} (v_{i}; ϕ^{0 *}, η^{0 *}) ({\hat{ϕ}}^{*} - ϕ^{0 *})} \\ + & \frac{1}{n_{0}} \sum_{i = 1}^{n} G_{i} ({\hat{ϕ}}^{*}) ℜ_{i}^{0} {G_{i}^{⊤} ({\hat{ϕ}}^{*}) ({\hat{b}}^{*} - b^{0 *})} . \end{matrix}

(A80)

Then, from Assumption A3, Theorem 1, and

{sup}_{u} ∥ B (u) ∥ = O (1)

, we have

{\hat{b}}^{*} - b^{0 *} = - {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} \{A_{n} + {\hat{H}}_{n} ({\hat{ϕ}}^{*} - ϕ^{0 *})\},

(A81)

where

{\hat{E}}_{n} = \frac{1}{n_{0}} \sum_{i - 1}^{n} ℜ_{i}^{0} G_{i}^{⊤} ({\hat{ϕ}}^{*}) G_{i} ({\hat{ϕ}}^{*}),

(A82)

{\hat{H}}_{n} = \frac{1}{n_{0}} \sum_{i - 1}^{n} ℜ_{i}^{0} G_{i} ({\hat{ϕ}}^{*}) ϖ_{i}^{* ⊤},

(A83)

A_{n} = \frac{1}{n_{0}} \sum_{i = 1}^{n} G_{i} ({\hat{ϕ}}^{*}) {ℜ_{i}^{0} - I (Y_{i} > w_{n}) + ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i}} .

(A84)

Then, substituting into (A67), we can get

\begin{matrix} 0 = & \frac{1}{n_{0}} \sum_{i = 1}^{n} {{\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) (log (\frac{Y_{i}}{w_{n}}) exp (\hat{α} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*})) - 1)} I (Y_{i} > w_{n}) \\ + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = & \frac{1}{n_{0}} \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) {ℜ_{i}^{0} - I (Y_{i} > w_{n}) + ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i}} \\ + & \frac{1}{n_{0}} \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) ℜ_{i}^{0} {α^{(1) ⊤} (v_{i}; ϕ^{0 *}, η^{0 *}) ({\hat{ϕ}}^{*} - ϕ^{0 *})} \\ + & \frac{1}{n_{0}} \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) ℜ_{i}^{0} {G_{i}^{⊤} ({\hat{ϕ}}^{*}) ({\hat{b}}^{*} - b^{0 *})} + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = & \frac{1}{n_{0}} \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) {ℜ_{i}^{0} - I (Y_{i} > w_{n}) + ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} \\ - & ℜ_{i}^{0} {G_{i}^{⊤} ({\hat{ϕ}}^{*}) {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} A_{n}} + \frac{1}{n_{0}} \sum_{i = 1}^{n} {\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) ℜ_{i}^{0} {α^{(1) ⊤} (v_{i}; ϕ^{0 *}, η^{0 *}) \\ - & G_{i}^{⊤} ({\hat{ϕ}}^{*}) {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} {\hat{H}}_{n}} ({\hat{ϕ}}^{*} - ϕ^{0 *}) + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = & : J_{1} + J_{2} + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) . \end{matrix}

(A85)

For

J_{1}

, a simple calculation yields

\begin{matrix} J_{1} & = \frac{1}{n_{0}} \sum_{i = 1}^{n} α^{(1)} (v_{i}; ϕ^{0 *}, η^{0 *}) M_{1 i} \\ + \frac{1}{n_{0}} \sum_{i = 1}^{n} {{\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) - {\hat{α}}^{(1)} (v_{i}; ϕ^{0 *}, b^{0 *})} M_{1 i} \\ + \frac{1}{n} \sum_{i = 1}^{n} {{\hat{α}}^{(1)} (v_{i}; ϕ^{0 *}, b^{0 *}) - α^{(1)} (v_{i}; ϕ^{0 *}, η^{0 *})} M_{1 i} \\ = : & J_{11} + J_{12} + J_{13}, \end{matrix}

(A86)

where

M_{1 i} = ℜ_{i}^{0} - I (Y_{i} > w_{n}) + ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} - ℜ_{i}^{0} G_{i}^{⊤} ({\hat{ϕ}}^{*}) {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} A_{n} .

(A87)

Note that

\begin{matrix} \frac{1}{n_{0}} \sum_{i = 1}^{n} {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*}) \{ℜ_{i}^{0} - I (Y_{i} > w_{n}) + ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} \\ - ℜ_{i}^{0} G_{i}^{⊤} ({\hat{ϕ}}^{*}) {\hat{E}}_{n}^{- 1} A_{n}\} = 0, \\ \frac{1}{n_{0}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] ℜ_{i}^{0} G_{i}^{* ⊤} ({\hat{ϕ}}^{*}) = 0, \\ J ({\hat{ϕ}}^{*}) - J (ϕ^{0 *}) = O_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) . \end{matrix}

(A88)

Then, by Assumption A3 and

∥ D (u) ∥ = O (K^{- r})

, we can drive that

\begin{matrix} J_{11} & = \frac{1}{n_{0}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] (ℜ_{i}^{0} - I (Y_{i} > w_{n})) \\ + \frac{1}{n_{0}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] ℜ_{i}^{0} D^{⊤} (ϕ^{0 * ⊤} u_{i}) x_{i} \\ - \frac{1}{n_{0}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] ℜ_{i}^{0} G_{i}^{⊤} ({\hat{ϕ}}^{*}) {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} A_{n} \\ + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = \frac{1}{n_{0}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] (ℜ_{i}^{0} - I (Y_{i} > w_{n})) + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) . \end{matrix}

(A89)

In addition, by (A17), it is easy to show that

J_{12} = o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) .

(A90)

Similarly, we can prove that

J_{13} = o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) .

(A91)

We now deal with

J_{2}

. A simple calculation yields

\begin{matrix} J_{2} & = \frac{1}{n_{0}} \sum_{i = 1}^{n} α^{(1)} (v_{i}; ϕ^{0 *}, η^{0 *}) M_{2 i} \\ + \frac{1}{n_{0}} \sum_{i = 1}^{n} {{\hat{α}}^{(1)} (v_{i}; {\hat{ϕ}}^{*}, {\hat{b}}^{*}) - {\hat{α}}^{(1)} (v_{i}; ϕ^{0 *}, b^{0 *})} M_{2 i} \\ + \frac{1}{n} \sum_{i = 1}^{n} {{\hat{α}}^{(1)} (v_{i}; ϕ^{0 *}, b^{0 *}) - α^{(1)} (v_{i}; ϕ^{0 *}, η^{0 *})} M_{2 i} \\ = : J_{21} + J_{22} + J_{23}, \end{matrix}

(A92)

where

M_{2 i} = ℜ_{i}^{0} {α^{(1) ⊤} (v_{i}; ϕ^{0 *}, η^{0 *}) - G_{i}^{⊤} ({\hat{ϕ}}^{*}) {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} {\hat{H}}_{n}} ({\hat{ϕ}}^{*} - ϕ^{0 *}) .

(A93)

Similar arguments to that of

J_{12}

, we can get

J_{22} = o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}), J_{23} = o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) .

(A94)

So,

\begin{matrix} J_{2} & = \frac{1}{n_{0}} \sum_{i = 1}^{n} α^{(1)} (v_{i}; ϕ^{0 *}, η^{0 *}) ℜ_{i}^{0} {α^{(1) ⊤} (v_{i}; ϕ^{0 *}, η^{0 *}) \\ - G_{i}^{⊤} ({\hat{ϕ}}^{*}) {[{\hat{E}}_{n} + o_{p} (1)]}^{- 1} {\hat{H}}_{n}} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = \frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} ϖ_{i}^{*} [ϖ_{i}^{* ⊤} - G_{i}^{⊤} ({\hat{ϕ}}^{*}) {\hat{E}}_{n}^{- 1} {\hat{H}}_{n}] ({\hat{ϕ}}^{*} - ϕ^{0 *}) . \end{matrix}

(A95)

From the foregoing discussion, we get

\begin{matrix} \frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} ϖ_{i}^{*} [ϖ_{i}^{* ⊤} - G_{i}^{⊤} ({\hat{ϕ}}^{*}) {\hat{E}}_{n}^{- 1} {\hat{H}}_{n}] ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = - \frac{1}{n_{0}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] (ℜ_{i}^{0} - I (Y_{i} > w_{n})), \end{matrix}

(A96)

and then, we have

\begin{matrix} \sqrt{n_{0}} ({\hat{ϕ}}^{*} - ϕ^{0 *}) \\ = - \frac{\frac{1}{\sqrt{n_{0}}} \sum_{i = 1}^{n} [ϖ_{i}^{*} - {\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*})] (ℜ_{i}^{0} - I (Y_{i} > w_{n}))}{\frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} ϖ_{i}^{*} ϖ_{i}^{* ⊤} - {\hat{H}}_{n} {\hat{E}}_{n}^{- 1} {\hat{H}}_{n} + o_{p} (1)} . \end{matrix}

(A97)

From Theorem 1, a simple calculation leads to

{\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} G_{i}^{*} ({\hat{ϕ}}^{*}) = H_{n}^{⊤} E_{n}^{- 1} G_{i}^{*} (ϕ^{0 *}) + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}),

(A98)

and

{\hat{H}}_{n}^{⊤} {\hat{E}}_{n}^{- 1} {\hat{H}}_{n}^{⊤} = H_{n}^{⊤} E_{n}^{- 1} H_{n}^{⊤} + o_{p} ({\hat{ϕ}}^{*} - ϕ^{0 *}) .

(A99)

Then, as A1, we have

\sqrt{n_{0}} ({\hat{ϕ}}^{*} - ϕ^{0 *}) = - \frac{\frac{1}{\sqrt{n_{0}}} \sum_{i = 1}^{n} {\tilde{ϖ}}_{i}^{*} (ℜ_{i}^{0} - I (Y_{i} > w_{n}))}{\frac{1}{n_{0}} \sum_{i = 1}^{n} ℜ_{i}^{0} {\tilde{ϖ}}_{i}^{*} {\tilde{ϖ}}_{i}^{* ⊤}} + o_{p} (1) .

(A100)

Consequently, invoking Lemma A1, Lemma A2 and Slutsky’s theorem, we conclude that

\begin{matrix} \sqrt{n_{0}} Ω & (\hat{β} - β^{0} + Ω^{- 1 / 2} n_{0}^{- 1 / 2} ε) \overset{D}{\to} N (0, J Ω J^{T}) . \end{matrix}

(A101)

This completes the proof. □

Proof of Theorem 4.

First, take the first-order Taylor expansion of (28) around

ι_{0}

for a given

C_{i 1}

to obtain

\begin{matrix} \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) \frac{\partial}{\partial ι} [log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι) - C_{i 1} - {\bar{G}}_{i 1}^{⊤} ι] I (Y_{i} > w_{n}) \\ = & \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) \{log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι) - 1\} {\bar{G}}_{i 1} I (Y_{i} > w_{n}) \\ = & \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) \{log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) - 1\} {\bar{G}}_{i 1} I (Y_{i} > w_{n}) \\ + & \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} \tilde{ι}) {\bar{G}}_{i 1} {\bar{G}}_{i 1}^{⊤} I (Y_{i} > w_{n}) (ι - ι_{0}) \end{matrix},

(A102)

in which

\tilde{ι}

is between

ι_{0}

and

{\hat{ι}}_{n}

for given

ϱ_{1}

and satisfies

\tilde{ι} \to ι_{0}

in probability. Consequently,

\begin{matrix} {\hat{ι}}_{n} - ι_{0} \\ = & - {\{\sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} \tilde{ι}) {\bar{G}}_{i 1} {\bar{G}}_{i 1}^{⊤} I (Y_{i} > w_{n})\}}^{- 1} \\ \cdot \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) \{log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) - 1\} {\bar{G}}_{i 1} I (Y_{i} > w_{n}) . \end{matrix}

(A103)

For ease of notation, we write

A_{1} = \frac{1}{n_{0}} \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} \tilde{ι}) {\bar{G}}_{i 1} {\bar{G}}_{i 1}^{⊤} I (Y_{i} > w_{n}),

(A104)

and

\begin{matrix} Σ_{1} = \\ \sqrt{\frac{h_{1}}{n_{0}}} \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) \{log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) - 1\} {\bar{G}}_{i 1} I (Y_{i} > w_{n}) . \end{matrix}

(A105)

Next, we establish the probabilistic convergence of

A_{1}

to

A (ϱ_{1})

, where

A (ϱ_{1})

is a diagonal matrix with entries

\begin{matrix} A_{k k} (ϱ_{1}) = & \frac{n}{n_{0}} μ_{2 (k - 1)} E [c_{0} (x_{i}, ϱ_{1}) w_{n}^{- γ (x_{i}, ϱ_{1})} + c_{1} (X_{i}, Z_{1}) \\ \frac{γ (x_{i}, ϱ_{1})}{γ (x_{i}, ϱ_{1}) + h (x_{i}, ϱ_{1})} \cdot w_{n}^{- γ (x_{i}, ϱ_{1}) - h (x_{i}, ϱ_{1})}] f_{Z} (ϱ_{1}), k = 1, 2 . \end{matrix}

(A106)

Noting that the

(i, j)

th entry of

A_{n}

is

\begin{matrix} {(A_{n})}_{i j} & = \frac{1}{n_{0}} \sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) log (Y_{i} / w_{n}) \\ exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} \tilde{ι}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{i + j - 2} I (Y_{i} > w_{n}), i, j = 1, 2, \end{matrix}

(A107)

Then, we demonstrate the probabilistic convergence of

{(A_{n})}_{i j}

to

A_{i j}

. Using the approximation

e^{x} - 1 \approx x

for

x \to 0

and following [46], we obtain

\begin{matrix} E \{{(A_{n})}_{i j}\} = \frac{n}{n_{0}} E [K_{h_{1}} (ϱ_{i} - ϱ_{1}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{i + j - 2} exp \{C_{i 1} + η_{1} (ϱ_{i})\} \\ \cdot E \{log (Y_{1} / w_{n}) I (Y_{1} > w_{n})\}] + O (\frac{n h_{1}^{2}}{2 n_{0}}) \\ = \frac{n}{n_{0}} E [K_{h_{1}} (ϱ_{i} - ϱ_{1}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{i + j - 2} exp \{C_{i 1} + η_{1} (ϱ_{i})\} \int_{0}^{\infty} pr (Y_{i} > w_{n} e^{t}) d t] + O (\frac{n h_{1}^{2}}{2 n_{0}}) \\ = \frac{n}{n_{0}} E [K_{h_{1}} (ϱ_{i} - ϱ_{1}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{i + j - 2} {c_{0} (x_{i}, ϱ_{i}) w_{n}^{- γ (x_{i}, ϱ_{i})} \\ + c_{1} (x_{i}, ϱ_{i}) \frac{γ (x_{i}, ϱ_{i})}{γ (x_{i}, ϱ_{i}) + h (x_{i}, ϱ_{i})} \cdot w_{n}^{- γ (x_{i}, ϱ_{i}) - h (x_{i}, ϱ_{i})}}] + O (\frac{n h_{1}^{2}}{2 n_{0}}) \\ = \frac{n}{n_{0}} μ_{i + j - 2} E \{c_{0} (x_{i}, ϱ_{i}) w_{n}^{- γ (x_{i}, ϱ_{i})} + c_{1} (x_{i}, ϱ_{i}) \\ \frac{γ (x_{i}, ϱ_{i})}{γ (x_{i}, ϱ_{i}) + h (x_{i}, ϱ_{i})} \cdot w_{n}^{- γ (x_{i} ϱ_{1}) - h (x_{i} ϱ_{1})} ∣ ϱ_{i} = ϱ_{1}\} f_{Z} (ϱ_{1}) + o (1) . \end{matrix}

(A108)

Furthermore, the

(i, j)

-th element of the variance-covariance matrix of

A_{n}

is

\begin{matrix} var \{{(A_{n})}_{i j}\} & = \frac{n}{n_{0}^{2}} E \{K_{h_{1}} (ϱ_{i} - ϱ_{1}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} \tilde{ι}) \\ {log (Y_{i} / w_{n}) I (Y_{i} > w_{n}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{i + j - 2}\}}^{2} \\ - \frac{n}{n_{0}^{2}} [E \{K_{h_{1}} (ϱ_{i} - ϱ_{1}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} \tilde{ι}) \\ {log (Y_{i} / w_{n}) I (Y_{i} > w_{n}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{i + j - 2}\}]}^{2} \\ = \frac{n}{n_{0}^{2} h_{1}} E [K^{2} ((ϱ_{i} - ϱ_{1}) / h_{1}) / h_{1} exp \{2 (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0})\} \\ {log}^{2} (Y_{i} / w_{n}) I (Y_{i} > w_{n}) {(\frac{ϱ_{i} - ϱ_{1}}{h_{1}})}^{2 (i + j - 2)}] - O (\frac{1}{n_{0}}) + O (\frac{n h_{1}}{n_{0}^{2}}) . \end{matrix}

(A109)

Direct calculations and Assumption A6 indicate that

var [{(A_{n})}_{i j}] = O (\frac{n}{n_{0}^{2} h_{n}}) + O (\frac{n h_{n}}{n_{0}^{2}}) = o (1)

. Therefore, the probabilistic convergence of

A_{n}

to

A (ϱ_{1})

follows from the property

R = E (R) + O_{p} (\sqrt{var (R)})

for any random variable R. Additionally, the diagonal structure of

A (ϱ_{1})

is guaranteed by the definitions of

μ_{k}

and

ν_{k}

. Focusing on

{(Σ_{n})}_{11}

, being the sum of independent and identically distributed variables, we aim to determine its asymptotic mean and variance under the assumptions of the Central Limit Theorem. Specifically, the mean of the

(1, 1)

th entry of

Σ_{n}

can be formulated as

\begin{matrix} E \{{(Σ_{n})}_{11}\} = \sqrt{\frac{n^{2} h_{1}}{n_{0}}} E [K_{h_{1}} (ϱ_{i} - ϱ_{1}) \{log (Y_{i} / w_{n}) \\ exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) - 1\} I (Y_{i} > w_{n})] \\ = \sqrt{\frac{n^{2} h_{1}}{n_{0}}} E \{K_{h_{1}} (ϱ_{i} - ϱ_{1}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) \int_{0}^{\infty} pr (Y_{i} > w_{n} e^{t}) d t\} \\ - \sqrt{\frac{n^{2} h_{1}}{n_{0}}} E \{K_{h_{1}} (ϱ_{i} - ϱ_{1}) pr (Y_{i} > w_{n})\} \\ = \sqrt{\frac{n^{2} h_{1}}{n_{0}}} E \{K_{h_{1}} (ϱ_{i} - ϱ_{1}) exp (C_{i 1} + η_{1} (ϱ_{i})) [1 - \frac{h_{1}^{2}}{2} \ddot{η} (ϱ_{i})] \\ \int_{0}^{\infty} pr (Y_{i} > w_{n} e^{t}) d t\} - \sqrt{\frac{n^{2} h_{1}}{n_{0}}} E \{K_{h_{1}} (ϱ_{i} - ϱ_{1}) pr (Y_{i} > w_{n})\} \\ = - \sqrt{\frac{n^{2} h_{1}}{n_{0}}} E \{\frac{h_{1}^{2}}{2} \ddot{η} (ϱ_{i}) μ_{2} [c_{0} (x_{i}, ϱ_{i}) w_{n}^{- γ (x_{i}, ϱ_{i})} \\ + \frac{c_{1} (x_{i}, ϱ_{i}) γ (x_{i}, ϱ_{i})}{γ (x_{i}, ϱ_{i}) + h (x_{i}, ϱ_{i})} w_{n}^{- γ (x_{i}, ϱ_{i}) - h (x_{i}, ϱ_{i})}] \\ - \frac{c_{1} (x_{i}, ϱ_{i}) h (x_{i}, ϱ_{i})}{γ (x_{i}, ϱ_{i}) + h (x_{i}, ϱ_{i})} w_{n}^{- γ (x_{i}, ϱ_{i}) - h (x_{i}, ϱ_{i})} ∣ ϱ_{i} = ϱ_{1}\} f_{Z} (ϱ_{1}) \to Σ_{11} (ϱ_{1}), \end{matrix}

(A110)

and the corresponding variance-covariance matrix is

\begin{matrix} var \{{(Σ_{n})}_{11}\} & = \frac{n h_{1}}{n_{0}} var [K_{h_{1}} (ϱ_{i} - ϱ_{1}) \{log (Y_{i} / w_{n}) \\ exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) - 1\} I (Y_{i} > w_{n})] \\ = \frac{n h_{1}}{n_{0}} E [K_{h_{1}}^{2} (ϱ_{i} - ϱ_{1}) {\{log (Y_{i} / w_{n}) exp (C_{i 1} + {\bar{G}}_{i 1}^{⊤} ι_{0}) - 1\}}^{2} \\ I (Y_{i} > w_{n})] + O (1 / n) \\ = \frac{n}{n_{0}} E {c_{0} (x_{i}, ϱ_{i}) w_{n}^{- γ (x_{i}, ϱ_{i})} + c_{1} (x_{i}, ϱ_{i}) \frac{γ^{2} (x_{i}, ϱ_{i}) + h^{2} (x_{i}, ϱ_{i})}{{(γ (x_{i}, ϱ_{i}) + h (x_{i}, ϱ_{i}))}^{2}} \\ \cdot w_{n}^{- γ (x_{i}, ϱ_{i}) - h (x_{i}, ϱ_{i})} ∣ ϱ_{i} = ϱ_{1}} f_{U} (ϱ_{1}) ν_{0} + o (1) \to Λ_{11} (ϱ_{1}) . \end{matrix}

(A111)

Thus, we conclude that

\sqrt{n_{0} h_{1}} ({\hat{η}}_{1} (ϱ_{1}) - η_{1} (ϱ_{1})) = - {(A_{11} (ϱ_{1}))}^{- 1} Σ_{11} (ϱ_{1}) + o_{p} (\frac{1}{\sqrt{n_{0} h_{1}}} + h_{1}^{2}),

(A112)

which leads to the asymptotic variance matrix

{[A_{11} (ϱ_{1})]}^{- 1} Λ_{11} (ϱ_{1}) {[A_{11} (ϱ_{1})]}^{- 1}

. Therefore, we complete the proof. □

Proof of theorem 5.

As

{\tilde{η}}_{L L, 1} (ϱ_{1})

is the solution of

\sum_{i}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) X_{i 1} {log \frac{Y_{i}}{w_{n}} exp (C_{i 1}) exp (X_{i 1} {\tilde{η}}_{L L, 1} (ϱ_{1})) - 1} I (Y_{i} > w_{n}) = 0,

(A113)

and

{\hat{η}}_{S B L L, 1}

is the solution of

\sum_{i}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) X_{i 1} {log \frac{Y_{i}}{w_{n}} exp ({\hat{C}}_{i 1}) exp (X_{i 1} {\hat{η}}_{S B L L, 1} (ϱ_{1})) - 1} I (Y_{i} > w_{n}) = 0,

(A114)

Then joining the two equations, after a simple calculation, we can get

\begin{matrix} {\hat{η}}_{S B L L, 1} - {\tilde{η}}_{L L, 1} = \\ \frac{\sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) log \frac{Y_{i}}{w_{n}} X_{1 i} exp ({\tilde{C}}_{i 1}) ({\hat{C}}_{i 1} - C_{i 1}) I (Y_{i} > w_{n})}{\sum_{i = 1}^{n} K_{h_{1}} (ϱ_{i} - ϱ_{1}) log \frac{Y_{i}}{w_{n}} X_{1 i}^{2} exp ({\tilde{C}}_{i 1}) I (Y_{i} > w_{n})}, \end{matrix}

(A115)

where

{\tilde{C}}_{i 1}

is between

C_{i 1} + X_{i 1} {\tilde{η}}_{L L, 1} (ϱ_{1})

and

{\hat{C}}_{i 1} + X_{i 1} {\hat{η}}_{S B L L, 1} (ϱ_{1})

. So according to [36], we have

\begin{matrix} sup ∣ {\hat{η}}_{S B L L, 1} - {\tilde{η}}_{L L, 1} ∣ = O_{p} (K^{- r}) . \end{matrix}

(A116)

This completes the proof of the Theorem 5. □

References

Beirlant, J.; Goegebeur, Y.; Segers, J.; Teugels, J.L. Statistics of Extremes: Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2004; Volume 558. [Google Scholar]
Castillo, E.; Hadi, A.S.; Balakrishnan, N.; Sarabia, J.M. Extreme Value and Related Models with Applications in Engineering and Science; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
Resnick, S.I. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Goegebeur, Y.; Guillou, A.; Schorgen, A. Nonparametric regression estimation of conditional tails: The random covariate case. Statistics 2014, 48, 732–755. [Google Scholar] [CrossRef]
Gardes, L.; Stupfler, G. An integrated functional Weissman estimator for conditional extreme quantiles. REVSTAT-Stat. J. 2019, 17, 109–144. [Google Scholar]
Goegebeur, Y.; Guillou, A.; Osmann, M. A local moment type estimator for the extreme value index in regression with random covariates. Can. J. Stat. 2014, 42, 487–507. [Google Scholar] [CrossRef]
Beirlant, J.; Goegebeur, Y. Regression with response distributions of Pareto-type. Comput. Stat. Data Anal. 2003, 42, 595–619. [Google Scholar] [CrossRef]
Wang, H.; Tsai, C.L. Tail index regression. J. Am. Stat. Assoc. 2009, 104, 1233–1240. [Google Scholar] [CrossRef]
Ma, Y.; Jiang, Y.; Huang, W. Empirical likelihood based inference for conditional Pareto-type tail index. Stat. Probab. Lett. 2018, 134, 114–121. [Google Scholar] [CrossRef]
An, H.; Tian, B. Unleashing the Potential of Mixed Frequency Data: Measuring Risk with Dynamic Tail Index Regression Model. Comput. Econ. 2024, 1–49. [Google Scholar] [CrossRef]
Li, R.; Leng, C.; You, J. Semiparametric Tail Index Regression. J. Bus. Econ. Stat. 2020, 40, 82–95. [Google Scholar] [CrossRef]
Ma, Y.; Jiang, Y.; Huang, W. Tail index varying coefficient model. Commun. Stat.-Theory Methods 2019, 48, 235–256. [Google Scholar] [CrossRef]
Momoki, K.; Yoshida, T. Hypothesis testing for varying coefficient models in tail index regression. Stat. Pap. 2024, 1–32. [Google Scholar] [CrossRef]
Ma, S.; Song, P.X.K. Varying index coefficient models. J. Am. Stat. Assoc. 2015, 110, 341–356. [Google Scholar] [CrossRef]
Dong, H.; Otsu, T.; Taylor, L. Estimation of varying coefficient models with measurement error. J. Econom. 2022, 230, 388–415. [Google Scholar] [CrossRef]
Carroll, R.J.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized partially linear single-index models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
Stone, C.J. [Generalized additive models]: Comment. Stat. Sci. 1986, 1, 312–314. [Google Scholar] [CrossRef]
Yu, Y.; Ruppert, D. Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 2002, 97, 1042–1054. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Varying-coefficient models. J. R. Stat. Soc. Ser. B Stat. Methodol. 1993, 55, 757–779. [Google Scholar] [CrossRef]
Chen, Y.; Rao, M.; Feng, K.; Niu, G. Modified varying index coefficient autoregression model for representation of the nonstationary vibration from a planetary gearbox. IEEE Trans. Instrum. Meas. 2023, 72, 3511812. [Google Scholar] [CrossRef]
Lv, J.; Li, J. High-dimensional varying index coefficient quantile regression model. Stat. Sin. 2022, 32, 673–694. [Google Scholar] [CrossRef]
Burnham, K.P.; Anderson, D.R. Multimodel inference: Understanding AIC and BIC in model selection. Sociol. Methods Res. 2004, 33, 261–304. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Chu, W.; Li, R.; Liu, J.; Reimherr, M. Feature selection for generalized varying coefficient mixed-effect models with application to obesity GWAS. Ann. Appl. Stat. 2020, 14, 276. [Google Scholar] [CrossRef] [PubMed]
Feng, S.; Xue, L. Model detection and estimation for single-index varying coefficient model. J. Multivar. Anal. 2015, 139, 227–244. [Google Scholar] [CrossRef]
Song, Y.; Lin, L.; Jian, L. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model. Commun. Nonlinear Sci. Numer. Simul. 2016, 36, 109–128. [Google Scholar] [CrossRef]
Stone, C.J. Additive regression and other nonparametric models. Ann. Stat. 1985, 13, 689–705. [Google Scholar] [CrossRef]
Wang, L.; Yang, L. Spline-Backfitted Kernel Smoothing of Nonlinear Additive Autoregression Model. Ann. Stat. 2007, 2474–2503. [Google Scholar] [CrossRef]
Wang, J.; Yang, L. Efficient and fast spline-backfitted kernel smoothing of additive models. Ann. Inst. Stat. Math. 2009, 61, 663–690. [Google Scholar] [CrossRef]
Liu, R.; Yang, L. Spline-backfitted kernel smoothing of additive coefficient model. Econom. Theory 2010, 26, 29–59. [Google Scholar] [CrossRef]
De Boor, C.; De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978; Volume 27. [Google Scholar]
Ruppert, D. Selecting the number of knots for penalized splines. J. Comput. Graph. Stat. 2002, 11, 735–757. [Google Scholar] [CrossRef]
Yoshida, T. Single-index models for extreme value index regression. arXiv 2022, arXiv:2203.05758. [Google Scholar]
Goegebeur, Y.; Guillou, A.; Stupfler, G. Uniform asymptotic properties of a nonparametric regression estimator of conditional tails. Ann. l’IHP Probab. Stat. 2015, 51, 1190–1213. [Google Scholar] [CrossRef]
Beirlant, J.; Vynckier, P.; Teugels, J.L. Tail index estimation, pareto quantile plots regression diagnostics. J. Am. Stat. Assoc. 1996, 91, 1659–1667. [Google Scholar]
Chavez-Demoulin, V.; Davison, A.C. Generalized additive modelling of sample extremes. J. R. Stat. Soc. Ser. C Appl. Stat. 2005, 54, 207–222. [Google Scholar] [CrossRef]
Youngman, B.D. Generalized additive models for exceedances of high thresholds with an application to return level estimation for US wind gusts. J. Am. Stat. Assoc. 2019, 114, 1865–1879. [Google Scholar] [CrossRef]
Cui, X.; Härdle, W.K.; Zhu, L. The EFM approach for single-index models. Ann. Stat. 2011, 39, 1658–1688. [Google Scholar] [CrossRef]
Johnson, B.A.; Lin, D.; Zeng, D. Penalized estimating functions and variable selection in semiparametric regression models. J. Am. Stat. Assoc. 2008, 103, 672–680. [Google Scholar] [CrossRef]
Schumaker, L.L. Spline Functions: Computational Methods; SIAM: Philadelphia, PA, USA, 2015. [Google Scholar]
Fan, J. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66; Routledge: London, UK, 2018. [Google Scholar]

Figure 1. Box plot for nonzero parameters. The bias is equal to the difference between the estimated and true values.

Figure 2. Box plot for zero parameters. The bias is equal to the difference between the estimated and true values.

Figure 3. The 95% pointwise confidence band for the single-index functions. The blue dashed line in the middle represents the fitted curve, while the red solid line represents the true

η (u)

.

Figure 3. The 95% pointwise confidence band for the single-index functions. The blue dashed line in the middle represents the fitted curve, while the red solid line represents the true

η (u)

.

Figure 4. Single-index function estimation curves.

Figure 5. The blue solid line corresponds to the 45-degree reference line, while the red dashed line represents the estimator.

Figure 6. The (top panel) of the figure shows the actual logarithmic returns, while the (bottom panel) displays the estimated tail index.

Table 1. Simulation results.

c	n	Fraction	Method	${MSE}_{β}$ (STD)		${ASE}_{η}$ (STD)		${ASE}_{γ}$ (STD)		$C_{β}$	$C_{η}$	${IC}_{β}$	${IC}_{η}$
0.1	500	0.380	VICMTIR-VS	0.014	(0.116)	0.039	(0.028)	0.203	(0.174)	17.236	2.930	0.610	0.304
	500	0.380	Oracle	0.010	(0.111)	0.037	(0.026)	0.186	(0.173)	18	3	0	0
	1000	0.363	VICMTIR-VS	0.006	(0.077)	0.018	(0.013)	0.082	(0.112)	17.740	2.982	0.248	0.098
	1000	0.363	Oracle	0.005	(0.071)	0.015	(0.008)	0.054	(0.044)	18	3	0	0
	2000	0.341	VICMTIR-VS	0.003	(0.053)	0.011	(0.009)	0.032	(0.039)	18	3	0	0
	2000	0.341	Oracle	0.003	(0.049)	0.009	(0.004)	0.025	(0.032)	18	3	0	0
0.25	500	0.394	VICMTIR-VS	0.012	(0.108)	0.036	(0.027)	0.199	(0.183)	17.344	2.954	0.568	0.318
	500	0.394	Oracle	0.010	(0.110)	0.035	(0.025)	0.162	(0.154)	18	3	0	0
	1000	0.380	VICMTIR-VS	0.006	(0.074)	0.017	(0.013)	0.079	(0.092)	17.770	2.978	0.244	0.070
	1000	0.380	Oracle	0.005	(0.066)	0.014	(0.007)	0.057	(0.066)	18	3	0	0
	2000	0.364	VICMTIR-VS	0.002	(0.049)	0.011	(0.009)	0.032	(0.036)	18	3	0	0
	2000	0.364	Oracle	0.002	(0.063)	0.009	(0.004)	0.027	(0.031)	18	3	0	0
0.5	500	0.380	VICMTIR-VS	0.011	(0.105)	0.037	(0.027)	0.178	(0.174)	17.418	2.956	0.548	0.342
	500	0.380	Oracle	0.008	(0.101)	0.035	(0.023)	0.141	(0.151)	18	3	0	0
	1000	0.372	VICMTIR-VS	0.005	(0.068)	0.019	(0.013)	0.067	(0.083)	17.802	2.972	0.258	0.050
	1000	0.372	Oracle	0.004	(0.063)	0.015	(0.008)	0.049	(0.055)	18	3	0	0
	2000	0.360	VICMTIR-VS	0.002	(0.046)	0.012	(0.009)	0.031	(0.045)	18	3	0	0
	2000	0.360	Oracle	0.002	(0.043)	0.010	(0.004)	0.026	(0.045)	18	3	0	0

Table 2. The ASEs (std) of all estimators were obtained via Monte Carlo simulation.

n	500						1000						2000
(p,d)	(6,3)		(10,6)		(20,9)		(6,3)		(10,6)		(20,9)		(6,3)		(10,6)		(20,9)
Liner	2.152	(1.913)	2.858	(2.927)	3.704	(3.939)	1.602	(0.918)	1.967	(1.055)	2.672	(1.853)	1.253	(0.513)	1.534	(0.639)	1.934	(0.869)
NPM	0.611	(0.305)	0.736	(0.617)	1.647	(0.461)	0.553	(0.295)	0.692	(0.392)	0.866	(0.522)	0.486	(0.102)	0.531	(0.063)	0.633	(0.091)
SIM	0.244	(0.139)	0.284	(0.181)	0.345	(0.226)	0.202	(0.092)	0.235	(0.123)	0.311	(0.207)	0.185	(0.069)	0.191	(0.069)	0.222	(0.126)
VCM	0.239	(0.093)	0.438	(0.092)	0.842	(0.114)	0.191	(0.046)	0.309	(0.046)	0.479	(0.042)	0.170	(0.029)	0.270	(0.029)	0.316	(0.025)
VICM-VS	0.141	(0.151)	0.178	(0.174)	0.213	(0.190)	0.049	(0.055)	0.067	(0.083)	0.076	(0.104)	0.026	(0.045)	0.031	(0.045)	0.032	(0.034)

Table 3. Variable definitions.

	State Variables	Definition
Outer Variables	SP500	The daily return of the S&P 500 Index.
	N225	The daily return of the Nikkei 225 Index.
	LKS11	The daily return data of the Korean KS11 index.
	IXIC	The daily return of the Nasdaq Composite Index.
	HSI	The daily return of the Hang Seng Index.
	FTSE	The daily return of the FTSE (Financial Times Stock Exchange) Index.
	DJIA	The daily return of the Dow Jones Industrial Average.
Internal Variables	Volume	Defined as the changes in trading volume, calculated as $Δ X_{t} = (l n X_{t} - l n X_{t - 1})$ .
	Money	Defined as the changes in trading trading Value, calculated as $Δ X_{t} = (l n X_{t} - l n X_{t - 1})$ .
	Turnover	The daily turnover rate.
	P/BV	Calculated by price/book value.

Table 4. Descriptive statistics: The reported statistics include the minimum (Min) and maximum (Max), the mean, standard deviation (SD), Skewness (Skew.), and Kurtosis (Kurt.).

	Min	Max	Mean	SD	Skew.	Kurt.
Panel A: daily return data
CSI 300	−9.154	6.208	−0.004	1.449	−0.707	4.907
Panel B: Outer Variables
SP500	−12.765	8.968	0.036	1.147	−0.79	14.769
N225	−11.153	7.813	0.033	1.33	−0.364	5.389
LKS11	−6.42	8.251	0.008	1.046	−0.089	5.701
LIXIC	−13.149	8.935	0.044	1.317	−0.67	8.936
LHSI	−6.567	8.693	0.011	1.262	0.061	3.414
LFTSE	−11.512	9.235	0.016	1.041	−0.707	11.294
DJIA	−13.842	10.764	0.033	1.124	−0.911	20.879
Panel C: Internal Variables
Volume	−1.197	1.093	0.000	0.215	0.438	1.279
Money	−1.228	1.143	0.000	0.204	0.473	1.617
Turnover	0.137	3.095	0.519	0.358	2.953	10.939
P/BV	0.000	3.173	1.74	0.374	1.391	1.857

Table 5. Parameter estimation results of each variable: standard errors are reported in parentheses, where asterisks indicate significance at the *** 1%, ** 5%, and * 10% levels.

	SP500	N225	LKS11	LIXIC	LHSI	LFTSE	DJIA
Volume	0.371 **	0.503 **	0	0.244 **	$0.162$	0.537 **	0.485 **
	(0.212)	(0.240)	(0.246)	(0.226)	(0.313)	(0.230)	(0.225)
Money	0.680 **	$0.062$	0	0	0.660 **	0.193 *	0.246 **
	(0.381)	(0.263)	(0.267)	(0.239)	(0.332)	(0.121)	(0.118)
Turnover	0.219 **	0	$0.071$	0.156 *	0.869 **	0.386 **	0.137 *
	(0.115)	(0.256)	(0.212)	(0.105)	(0.387)	(0.206)	(0.107)
P/BV	0	0.042	0	0.300 ***	0.949 ***	0.011	$0.087$
	(0.338)	(0.251)	(0.197)	(0.077)	(0.293)	(0.176)	(0.329)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

An, H.; Tian, B. Varying Index Coefficient Model for Tail Index Regression. Mathematics 2024, 12, 2011. https://doi.org/10.3390/math12132011

AMA Style

An H, Tian B. Varying Index Coefficient Model for Tail Index Regression. Mathematics. 2024; 12(13):2011. https://doi.org/10.3390/math12132011

Chicago/Turabian Style

An, Hongyu, and Boping Tian. 2024. "Varying Index Coefficient Model for Tail Index Regression" Mathematics 12, no. 13: 2011. https://doi.org/10.3390/math12132011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Varying Index Coefficient Model for Tail Index Regression

Abstract

1. Introduction

2. Model and Method

2.1. Model Setting

2.2. Estimation Procedure

2.3. Tuning Parameter Selection

3. Asymptotic Theory

4. Monte Carlo Studies

5. Empirical Analysis

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI