Next Article in Journal
A Budget Constraint Incentive Mechanism Based on Risk Preferences of Collaborators in Edge Computing
Previous Article in Journal
Ensemble Prediction Method Based on Decomposition–Reconstitution–Integration for COVID-19 Outbreak Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Comprehensive Analysis of MSE in Estimating Conditional Hazard Functions: A Local Linear, Single Index Approach for MAR Scenarios

by
Abderrahmane Belguerna
1,
Hamza Daoudi
2,*,
Khadidja Abdelhak
1,
Boubaker Mechab
3,
Zouaoui Chikr Elmezouar
4 and
Fatimah Alshahrani
5
1
Department of Mathematics, Sciences Institute, S.A University Center, P.O. Box 66, Naama 45000, Algeria
2
Department of Electrical Engineering, College of Technology, Tahri Mohamed University, Al-Qanadisa Road, P.O. Box 417, Bechar 08000, Algeria
3
Laboratory of Statistics and Stochastic Processes, University of Djillali Liabes, P.O. Box 89, Sidi Bel Abbes 22000, Algeria
4
Department of Mathematics, College of Science, King Khalid University, P.O. Box 9004, Abha 61413, Saudi Arabia
5
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(3), 495; https://doi.org/10.3390/math12030495
Submission received: 1 January 2024 / Revised: 27 January 2024 / Accepted: 29 January 2024 / Published: 4 February 2024
(This article belongs to the Section Probability and Statistics)

Abstract

:
In unveiling the non-parametric estimation of the conditional hazard function through the local linear method, our study yields key insights into the method’s behavior. We present rigorous analyses demonstrating the mean square convergence of the estimator, subject to specific conditions, within the realm of independent observations with missing data. Furthermore, our contributions extend to the derivation of expressions detailing both bias and variance of the estimator. Emphasizing the practical implications, we underscore the applicability of two distinct models discussed in this paper for single index estimation scenarios. These findings not only enhance our understanding of survival analysis methodologies but also provide practitioners with valuable tools for navigating the complexities of missing data in the estimation of conditional hazard functions. Ultimately, our results affirm the robustness of the local linear method in non-parametrically estimating the conditional hazard function, offering a nuanced perspective on its performance in the challenging context of independent observations with missing data.

1. Introduction

Functional statistics analysis, a fundamental aspect of statistical modeling, faces increased intricacy when addressing missing data, a widespread problem in real-life situations. The adoption of the Missing at Random context (MAR), which posits that the lack of data is independent of unobserved values, has spurred the advancement of complex approaches to achieve a precise parameter estimate.
The local linear method, known for its flexibility in capturing local data characteristics, has emerged as a promising approach in the field of survival analysis. Specifically, its use becomes crucial when dealing with the complexities linked to MAR data. As we explore this work, we are influenced by a vast amount of previous research that has contributed to the theoretical and practical basis of functional statistics and survival analysis.
In the field of time series data analysis, ref. [1] laid the foundation for determining the rates at which conditional density estimators converge uniformly and exhibit asymptotic normalcy in the framework of single functional index modeling. In their study, ref. [2] investigated the use of local polynomial regression to identify functional predictors and scalar output. They highlighted the effectiveness of these approaches in capturing intricate interactions.
Barrientos-Marin, ref. [3] conducted a comprehensive investigation on the use of locally modeled regression and functional data analysis, which significantly enhances the range of statistical approaches available for analyzing functional data. Ref. [4] demonstrated the applicability and robustness of local polynomial regression for FDA, hence improving its comprehension. Ref. [5] examined the stability of the features of the estimator of the conditional distribution in the single-index model, with a specific emphasis on achieving high levels of uniform consistency.
The study highlighted the challenges and potential benefits of addressing such circumstances. Ref. [6] extended the local linear method to tackle regression problems with missing at random and functional data, emphasizing its practical importance. The literature is abundant with research that tackle the difficulties presented by missing data. Ref. [7] provided an estimation of the average value in a similar context and dataset, which served as the foundation for future advancements.
They concluded the symptotic results of some conditional nonparametric functional parameters in associated data and Utilize a methodology that introduces the infinite-dimensional framework to local linear concepts, as described in [8]. Ref. [9] made notable progress in the field of local polynomial modeling, while ref. [10] made substantial contributions to the analysis of nonlinear time series. These advances have improved the statistical toolset for assessing functional data.
Ref. [11] presented a thorough and complete overview of nonparametric functional data analysis in their influential book. In works of [12] investigate mean estimation for functional covariates with random missing data, offering methods to estimate the mean function under the assumption of random missingness, crucial in statistical modeling.
As we begin our inquiry, we utilize the combined knowledge from these studies, incorporating their findings into the assessment of the mean square error linked to the local linear technique. Adopting a solitary index model, as proposed by [13,14], achieves a trade-off between the intricacy and comprehensibility of the model. The objective of our study is to enhance the existing statistical approaches for survival analysis when dealing with missing data that occurs randomly.
Refs. [15,16,17,18] established the asymptotic normality of the estimator of the conditional hazard function in a hilbertian random variable X in functional single-index model [19,20,21,22]. In the realm of statistical analysis, the challenge of handling incomplete datasets is a recurrent theme, particularly in fields where data collection is subject to uncertainty and randomness [23,24,25,26,27]. Traditional statistical methods often grapple with the complexities posed by missing data, especially when the missingness is not random [28,29,30,31,32]. This paper delves into an innovative approach, employing a local linear method within a single index model, to estimate conditional hazard functions in scenarios where data are missing at random (MAR). The methodology not only addresses the inherent challenges of MAR data but also leverages the nuances of functional statistics to provide a robust analytical framework.
Our focus on MAR contexts stems from their prevalence in real-world data scenarios, ranging from clinical trials to financial time series, where missingness is often related to observed data but not to the unobserved data. This study aims to bridge the gap in existing literature by providing a comprehensive analysis of the effectiveness of the local linear method in such contexts by meticulously evaluating the mean square error (MSE) of the estimates, where it measures the average of the squares of the errors or deviations and the strength of the model depends on a percentage of error. It is used in quantitative fields like mathematics, statistics, and engineering, especially in areas involving prediction, estimation, or modeling. The MSE is a powerful tool in statistical analysis, particularly in the context of Functional Data Analysis (FDA).
The objective of this research is to achieve significant advancements in the methodology and improve the discourse surrounding functional statistics techniques in various scenarios and data types. This includes addressing the issue of survival analysis with missing data.
In the subsequent sections, we present a comprehensive outline of the local linear approach and its utilization in estimates. Next, we provide the single index model and discuss its significance in estimating the conditional hazard function (Section 2). In the following part, we will outline the procedure for assessing the mean square error and provide the essential assumptions required to support our primary finding in the subsequent Section 3. The application and procedures are presented in Section 4, followed by a quantitative analysis in Section 5. Lastly, we provide a summary and propose recommendations for future studies in Section 6, followed by supplementary results and the corresponding proof in Appendix A.

2. The Model and Estimator

The collection of n is independent of identically distributed stochastic processes X i and Y i , where 1 i n , is associated with the original random process ( X , Y ) .
The space in which these processes are specified is H × R . The symbol H denotes a Hilbert space equipped with the norm · . H is separable, which is derived from an inner product < · , · > .
Regarding the individual index Θ component of H , we define the semi-metric d Θ as follows for any ( x , x ) H 2 : The value of d Θ ( x , x ) is equal to the absolute value of the inner product between the difference of x and x , and in the vector Θ , d Θ ( x , x ) = | < x x , Θ > | , in this scenario, we assume that Θ in H represents a structure with a single index.
The functional index Θ appears as a filter allowing the extraction of the part of X explaining the response Y and represents a functional direction which reveals a pertinent explanation of the response variable. In other words, we assume that the F is differentiable with respect to x and Θ such that < Θ , e 1 > = 1 , where e 1 is the first vector of an orthonormal basis of H .
This structure determines the conditional risk function of Y given < X , Θ > = < x , Θ > . We may describe this as λ Θ x ( · ) . It can be expressed as
λ Y X ( x , Θ , y ) = h y / < x , Θ > , ( x , y ) H × R .
We ensure the model’s capacity to be uniquely identified. This implies that for any X in H , we have
λ 1 y | < · , Θ > = λ 2 y | < , Θ > λ 1 λ 2 and Θ = Θ .
X = x , the conventional version F Y X ( x , Θ , y ) of the conditional distribution function of Y exists for each x N x . We aim to estimate the conditional hazard function λ Y X ( x , Θ , y ) . We also assume a density f Y X ( x , Θ , y ) regarding the Lebesgue measure on the real numbers R , for F Y X ( x , Θ , y ) . For y R and F Y X ( x , Θ , y ) ( y ) < 1 , we define the hazard function as follows:
λ Y X ( x , Θ , y ) = f Y X ( x , Θ , y ) 1 F Y X ( x , Θ , y ) ,
The local linear estimator, denoted as λ ^ Y X ( x , Θ , y ) for λ Y X ( x , Θ , y ) , in this paper will be estimated for any y R by
λ ^ Y X ( x , Θ , y ) = f ^ Y X ( x , Θ , y ) 1 F ^ Y X ( x , Θ , y ) , with F ^ Y X ( x , Θ , y ) < 1
Next, we examine the estimator when the data are incomplete, especially regarding the Missing at Random (MAR) status for the answer variables. ( X i , Δ i , Y i ) , 1 i n represents accessible incomplete n—sample data from ( X , Δ , y ) . In this sample, X i is totally observed. When Y i is completely observed Δ i = 1 , and if Y i not observed Δ i = 0 . In addition, the characteristics of the Bernoulli’s diagram Δ are
P Δ = 1 / < Θ , X > = P Δ = 1 / X , Y = P ( X )
Given an explanatory variable X, the conditional probability of seeing the response variable Y is represented by the unknown functional operator P ( X ) . In statistical analysis involving missing data, missing at random is a commonly assumed condition that can be applied in a variety of real-world scenarios ([14]).
The function F Θ x ( · ) is presented as a nonparametric regression model including a dependent variable H ( λ H 1 ( · Y i ) ) . Assume λ H is a set of real numbers ( > 0 ) , and H is a cumulative distribution function. The following observation serves as the basis for this consideration:
E H λ H 1 ( y Y i ) / < Θ , X i = x > F Y X ( x , Θ , y )   when   λ H 0 .
This method is coupled with the idea that our data are randomly missing. In particular, we use the functional local polynomial modeling approach, in which a ^ is used to estimate the conditional cumulative function F ^ Y X ( x , Θ , y ) . The following problem is optimized to determine the parameters ( a ^ , b ^ ) .
The expressions can be directly calculated to yield the following results:
F ^ Y X ( x , Θ , y ) = 1 i , j n H ( λ H 1 y Y j ) T i j ( Θ , x ) 1 i , j n T i j ( Θ , x ) , y R
and
f ^ Y X ( x , Θ , y ) = 1 i , j n H ( 1 ) ( λ H 1 y Y j ) T i j ( Θ , x ) λ H 1 i , j n T i j ( Θ , x ) , y R
where we denote H the derivative of H.
Noteworthy is the fact that
F ^ Y X ( x , Θ , y ) = F ^ N ( x , Θ , y ) / F ^ D ( Θ , x ) and f ^ Y X ( x , Θ , y ) = f ^ N ( x , Θ , y ) / F ^ D ( Θ , x )
where
F ^ N ( x , Θ , y ) = 1 ( n λ K ϕ Θ , x ( λ K ) ) 2 i , j = 1 i , j = n H ( λ H 1 Y y j ) T i j ( Θ , x ) , f ^ N ( x , Θ , y ) = 1 ( n λ K ϕ Θ , x ( λ K ) ) 2 λ H i , j = 1 i , j = n H ( λ H 1 Y y j ) T i j ( Θ , x ) .
and
F ^ D ( Θ , x ) = 1 ( n λ K ϕ Θ , x ( λ K ) ) 2 i , j = 1 i , j = n T i j ( Θ , x )
with
T i j ( Θ , x ) = β Θ ( X i , x ) β Θ ( X i , x ) β Θ ( X j , x ) Δ i Δ j K ( λ K 1 d Θ ( x , X i ) ) K ( λ K 1 ϱ Θ ( x , X j ) )
where ( β Θ : H × H R ) , β Θ ( X i , x ) = < x X i , Θ > , using K as a kernel function and λ K = λ K , n (or λ H = λ H , n ) being a set of positive real numbers, respectively. When dealing with a single functional index and randomly missing data, our estimator can presented.
λ ^ Y X ( x , Θ , y ) = λ H 1 i = 1 , j = 1 i = n , j = n T i j ( Θ , x ) H ( 1 ) ( λ H 1 y Y j ) i = 1 , j = 1 i = n , j = n T Θ , i j ( x ) i = 1 , j = 1 i = n , j = n T i j ( Θ , x ) H ( λ H 1 y Y j ) , for n 1 , y R .

3. The Asymptotic Results

3.1. Assumptions and Necessary Background Knowledge

We make the following assumptions in order to determine the mean square convergence of h ^ Y X ( x , Θ , y ) to λ Y X ( x , Θ , y ) :
We denote by N x a neighborhood of x F and by N y a neighborhood of y R .
In addition, we represent constants C > 0 and C > 0 .
(H1)
For any r > 0 , ϕ Θ , x ( r ) : = ϕ Θ , x ( r , r ) > 0 . There exists a function χ Θ , x ( · ) such that
t ( 1 , 1 ) , lim λ K 0 ϕ Θ , x ( t λ K , λ K ) ϕ Θ , x ( λ K ) = χ Θ , x ( t ) .
(H2)
We define the function ψ Θ , j l
ψ Θ , j l ( x , y ) = l F x ( j ) ( y ) y l
and
Ψ Θ , j l ( s ) = E [ ψ Θ , j l ( X , y ) l F x ( j ) ( y ) y l / β Θ ( x , X ) = s ] , l { 0 , 2 } , j = 0 , 1 ¯ ,
with g ( k ) represents the k t h order derivative of g, the first derivative Ψ l , j ( 0 ) , and the second derivative Ψ l , j ( 0 ) of Ψ l , j ( · ) exists.
(H3)
Functions γ Θ ( · , · ) and β Θ ( · , · ) are such that
z F , C | γ Θ ( x , z ) | | β Θ ( x , z ) | C | γ Θ ( x , z ) | , with C > 0 , C > 0 ,
sup u B ( x , r ) | β ( u , x ) γ Θ ( x , u ) | = o ( r )
and
λ K B ( x , λ K ) β Θ ( u , x ) d P X ( u ) = o B ( x , λ K ) β Θ 2 ( u , x ) d P X ( u )
with B ( x , r ) = z F : | γ Θ ( z , x ) | r .
(H4)
 
(i)
Consider the differentiable function f Y X ( x , Θ , y ) of class C k .
(ii)
The kernel K within [ 1 , 1 ] , satisfies
K 2 ( 1 ) 1 1 ( K 2 ( u ) ) χ Θ , x ( u ) d u > 0 .
(iii)
Given a differentiable kernel H and a positive, bounded, Lipschitzian continuous function H , we have
| t | 2 H ( t ) d t < , ( H ) 2 ( t ) d t < and H ( t ) d t = 1 .
(H5)
μ < , f Y X ( x , Θ , y ) μ , x F , y R , 0 < η < 1 , F Y X ( x , Θ , y ) 1 η , x F , y R .
(H6)
λ K , λ H satisfies
lim n λ K = 0 , lim n λ H = 0 , and lim n n λ H ( j ) ϕ x ( λ K ) = , for j = 0 , 1 .
λ K and λ H bandwidths.
(H7)
In the neighborhood of x, P ( x ) is continuous, and 0 < P ( x ) < 1 .

3.2. Main Results

Theorem 1.
Under hypothesis (H1)–(H7), we demonstrate that
E λ ^ Y X ( x , Θ , y ) λ Y X ( x , Θ , y ) 2 = B Θ , n 2 ( x , y ) + V Θ , H K ( x , y ) n λ H ϕ Θ , x ( λ K ) + o ( λ H 4 ) + o ( λ K 4 ) + o 1 n λ H ϕ Θ , x ( λ K ) ,
where
B Θ , n ( x , y ) = ( B Θ , H f λ Θ x ( y ) B Θ , H F ) λ H 2 + ( B Θ , H f λ Θ x ( y ) B Θ , H F ) λ K 2 1 F Θ x ( y ) ,
with
B f , H ( Θ , y , x ) = 1 2 2 f Y X ( x , Θ , y ) y 2 t 2 H ( t ) d t B f , K ( Θ , y , x ) = 1 2 Ψ 0 , 1 ( 0 ) K ( 1 ) 1 1 ( u 2 K ( u ) ) χ Θ , x ( u ) d u K ( 1 ) 1 1 K ( u ) χ Θ , x ( u ) d u B F , H ( Θ , y , x ) = 1 2 2 F Y X ( x , Θ , y ) y 2 t 2 H ( t ) d t B F , K ( Θ , y , x ) = 1 2 Ψ 0 , 0 ( 0 ) K ( 1 ) 1 1 ( u 2 K ( u ) ) χ Θ , x ( u ) d u K ( 1 ) 1 1 K ( u ) χ Θ , x ( u ) d u ,
also
V Θ , H K h ( x , y ) = λ Y X ( x , Θ , y ) ( 1 F Y X ( x , Θ , y ) ) K 2 ( 1 ) 1 1 ( K 2 ( u ) ) χ Θ , x ( u ) d u K ( 1 ) 1 1 ( K ( u ) ) χ Θ , x ( u ) d u 2 .
Proof of Theorem 1.
To establish the Theorem 1, we employ the next decomposition
λ ^ Y X ( x , Θ , y ) λ Y X ( x , Θ , y ) = 1 1 F ^ Y X ( x , Θ , y ) [ ( f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) ) + f Y X ( x , Θ , y ) 1 F Y X ( x , Θ , y ) ( F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) ) ] 1 1 F ^ Y X ( x , Θ , y ) [ ( f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) ) + μ η ( F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) ) ] ( f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) ) + μ η ( F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) ) .
From Theorems 2 and 3, and the ensuing inequality, we derive the proof of Theorem 1:
ϵ > 0 such that n N P ( 1 F ^ Y X ( x , Θ , y ) < ϵ ) < .
Theorem 2.
Assuming (H1)-(H7), we arrive at
E f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) 2 = B f , H 2 ( Θ , y , x ) λ H 4 + B f , K 2 ( Θ , y , x ) λ K 4 + V Θ , H K f ( x , y ) n λ H ϕ Θ , x ( λ K ) + o ( λ H 4 ) + o ( λ K 4 ) + o 1 n λ H ϕ Θ , x ( λ K )
where V Θ , H K f ( x , y ) = f Θ x ( y ) K 2 ( 1 ) 1 1 ( K 2 ( u ) ) χ Θ , x ( u ) d u K ( 1 ) 1 1 ( K ( u ) ) χ Θ , x ( u ) d u 2 ( H ( t ) ) 2 d t .
We have
f ^ N ( x , Θ , y ) = 1 n ( n 1 ) λ H E [ T Θ , 12 ( x ) ] 1 i j n T i j ( Θ , x ) H ( λ H 1 ( y Y j ) )
and
f ^ Θ , D ( x ) = 1 n ( n 1 ) E [ T Θ , 12 ( x ) ] 1 i j n T i j ( Θ , x )
then
f ^ Y X ( x , Θ , y ) = f ^ N ( x , Θ , y ) f ^ D ( Θ , x ) .
Proof of Theorem 2.
It is derived from the intermediate results presented below.
Lemma 1.
Based on Theorem 2’s hypothesis, we obtain
E f ^ N ( x , Θ , y ) f Y X ( x , Θ , y ) = B f , H ( Θ , y , x ) λ H 2 + B f , K ( Θ , y , x ) λ K 2 + o ( λ H 2 ) + o ( λ K 2 ) .
Lemma 2.
Theorem 2 hypotheses allow us to have
V a r f ^ N ( x , Θ , y ) = V H K f ( Θ , y , x ) n λ H ϕ Θ , x ( λ K ) + o 1 n λ H ϕ Θ , x ( λ k ) .
Lemma 3.
Again, Theorem 2 hypothesis lead us to
C o v ( f ^ N ( x , Θ , y ) , f ^ D ( Θ , x ) ) = O 1 n ϕ Θ , x ( λ K ) .
Lemma 4.
Another time according to Theorem 2’s hypothesis, we obtained
V a r f ^ D ( Θ , x ) = O 1 n ϕ Θ , x ( λ K ) .
Theorem 3.
Assuming (H1)-(H7), we arrive at
E F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) 2 = B F , H 2 ( Θ , y , x ) λ H 4 + B F , K 2 ( Θ , y , x ) λ K 4 + V Θ , H K F ( x , y ) n ϕ Θ , x ( λ K ) + o ( λ H 4 ) + o ( λ K 4 ) + o 1 n ϕ Θ , x ( λ K )
where V Θ , H K F ( x , y ) = F Y X ( x , Θ , y ) ( 1 F Y X ( x , Θ , y ) ) K 2 ( 1 ) 1 1 ( K 2 ( u ) ) χ Θ , x ( u ) d u K ( 1 ) 1 1 ( K ( u ) ) ( 1 ) χ Θ , x ( u ) d u 2 .
We remark
F ^ N ( x , Θ , y ) = F ^ N ( x , Θ , y ) f ^ D ( Θ , x )
where
F ^ N ( x , Θ , y ) = 1 n ( n 1 ) E [ T 12 ( Θ , x ) ] 1 i j n T i j ( Θ , x ) H ( λ H 1 ( y Y j ) ) .
Proof of Theorem 3.
We will depend on the following lemmas
Lemma 5.
Based on Theorem 3’s hypotheses, we obtain
E F ^ N ( x , Θ , y ) F Y X ( x , Θ , y ) = B F , H ( Θ , y , x ) λ H 2 + B F , K ( Θ , y , x ) λ K 2 + o ( λ H 2 ) + o ( λ K 2 ) .
Lemma 6.
Again we based on Theorem 3’s hypotheses, we obtain
V a r F ^ N ( x , Θ , y ) = V H K F ( Θ , y , x ) n ϕ Θ , x ( λ K ) + o 1 n ϕ Θ , x ( λ k ) .
Lemma 7.
Another time, we based on Theorem 3’s hypothesis, we have
C o v ( F ^ N ( x , Θ , y ) , f ^ Θ , D x ) = O 1 n ϕ Θ , x ( λ K ) .
Theorems 1–3, may be derived from these lemmas, where their proofs appear in Appendix A.

4. Application

In the ever-evolving field of finance, understanding the complex relationships between various financial indicators and stock prices is crucial for effective risk management and investment decision making. This scenario explores the application of statistical methods to model the dependence structure between stock prices and estimate the conditional hazard function of a financial event, such as a market downturn, using a single functional index.
To embark on this modeling journey, we begin with a dataset comprising daily stock prices of multiple companies observed over a certain period. The dataset includes two main variables:
  • X: a set of financial indicators representing the health and performance metrics of each company.
  • Y: daily stock returns of each company, which are indicative of the daily price changes.
The primary objective is to capture the common underlying factor that affects stock prices across different companies using a single functional index, denoted as Θ . This index aims to condense the information contained in the financial indicators X and express the conditional hazard function λ Y X ( x , Θ , y ) . This function represents the likelihood of a significant stock price movement given the financial indicators.
In real-world financial datasets, missing data can be prevalent due to various reasons, such as holidays, weekends, or incomplete data collection. The study addresses this issue by considering the missing data mechanism as missing at random (MAR), ensuring that the statistical modeling takes into account the potential biases introduced by missing observations.
Estimation: To estimate the conditional hazard function, the study employs the local linear estimator, denoted as λ ^ Y X ( x , Θ , y ) . This estimator provides a robust and efficient way to estimate the hazard function, and it is adapted for the infinite-dimensional framework.
The study aims to establish the mean square convergence of the estimator to the true conditional hazard function. This is essential for validating the reliability and consistency of the estimator under specified assumptions.
In practice, the application of this study involves several key steps. The first step involves addressing missing stock returns using the assumed missing at random (MAR) mechanism. This ensures that the analysis is not biased by the absence of certain data points. Financial indicators and stock returns are standardized to achieve uniformity in their scales, making them directly comparable and easing the modeling process. Choosing an appropriate kernel function and bandwidth is crucial for the local linear estimator’s performance. These choices impact the smoothing and accuracy of the hazard function estimation. The study estimates the single functional index Θ , which encapsulates the common factor affecting stock prices across companies. This index is central to the dependence modeling process. Utilizing the Local Linear Estimator: The local linear estimator, denoted as λ ^ Y X ( x , Θ , y ) , is implemented to calculate the estimated conditional hazard function. This is done for various financial scenarios, allowing for a nuanced understanding of how different combinations of financial indicators and Θ influence the likelihood of significant stock movements. Interpreting the Hazard Function: The estimated hazard function λ ^ Y X ( x , Θ , y ) is interpreted to gain insights into the impact of financial indicators on the likelihood of significant stock price movements. Researchers and analysts can identify patterns and trends in the data and make informed inferences about the factors driving stock price changes. Ensuring Reliability: Validation is a critical step to ensure the reliability and consistency of the estimated hazard function. The study validates the asymptotic properties of the estimator, confirming that it converges to the true conditional hazard function as sample size increases. This validation reassures stakeholders that the model can be trusted for decision making in the financial domain. In conclusion, this example showcases how statistical methods can be applied in finance to model the dependence structure between stock prices, estimate the conditional hazard function, and handle missing data. The results of such analyses can provide valuable insights for risk management, investment strategies, and understanding the dynamics of financial markets. This multidisciplinary approach illustrates the power of statistical modeling in making sense of complex financial data and facilitating informed decision making in the world of finance.

5. Numerical Study

In this section, we demonstrate our proposed methodology’s practical application in finance. We conduct a numerical study focused on estimating the conditional hazard function. We employ the local linear method for this purpose, utilizing financial time series data as the basis for our analysis.

5.1. Data Generation and Preprocessing

Data

To initiate our analysis, we embark on the generation of synthetic financial datasets that mimic the behavior of stock returns for an extensive portfolio of 100 companies. These datasets span a comprehensive timeframe encompassing 200 trading days. We create these datasets under the assumption of a normal distribution, adhering to statistical principles commonly observed in financial markets. This synthetic data generation process enables us to create realistic and diverse financial scenarios, laying the foundation for our subsequent analytical investigations.

5.2. Plotting Data

For a comprehensive visual understanding of the stock return dynamics in our dataset, we employ a graphical representation Figure 1. This entails the creation of line plots, with the x-axis representing the trading days and the y-axis depicting the stock returns. Each line plot corresponds to a specific company within our portfolio of 100 entities, effectively providing a visual snapshot of their individual return patterns over the 200-day period. This graphical representation enables us to discern trends, fluctuations, and potential anomalies in the stock returns, facilitating a quick and intuitive overview of the dataset’s behavior. By visualizing the data in this manner, we gain essential insights into the diverse performance trajectories of the companies under consideration, setting the stage for more in-depth analysis and interpretation.

5.2.1. Introducing Missing Values

In our pursuit of realism and to mirror the intricacies of real-world financial data, we intentionally introduce missing values into our synthetic dataset. To emulate the unpredictability of data collection processes, we randomly designate approximately 20 % of the dataset as missing values. This stochastic introduction of missing data aligns with the Missing at Random (MAR) assumption, suggesting that the likelihood of data being missing is dependent on other observable variables within the dataset rather than being systematically determined. By incorporating this missing data mechanism, we aim to ensure that our analysis remains robust and relevant to the complexities often encountered in financial datasets. This simulated scenario allows us to assess the methodology’s effectiveness in handling and analyzing data with missing values, a common challenge in practical financial analysis.

5.2.2. Single Index

In our analytical framework, we adopt a straightforward yet powerful approach by assuming a single index model (SIM). In this model, we express the interdependence between various financial indicators and stock returns through a linear relationship, succinctly represented by the vector Θ . This single index (see Figure 2) acts as a composite factor, capturing the common underlying driver that influences stock prices across the multiple companies in our dataset. While its simplicity may seem unassuming, the SIM has proven to be a valuable tool for modeling complex financial relationships, as it offers an elegant means of summarizing diverse indicators into a single, interpretable index. This assumption serves as the cornerstone of our study, enabling us to efficiently examine the impact of these financial indicators on the likelihood of significant stock movements.

5.3. Estimation of Conditional Hazard Function

To quantify the likelihood of significant stock movements within our dataset, we employ the estimation of the conditional hazard function, a pivotal aspect of our analysis. Our chosen methodology for this estimation task involves the utilization of the local linear approach. Specifically, we harness the “KDEMultivariateConditional” function, readily available within the versatile “statsmodels” library. This function equips us with the necessary tools to perform efficient, data-driven estimations of the conditional hazard function, considering both the financial indicators and stock returns. By applying the local linear method and this dedicated function, we leverage the framework of kernel density estimation to produce robust hazard function estimates. This estimation process aids us in uncovering valuable insights into the likelihood of significant stock movements and their dependency on the financial indicators, thus contributing to informed decision making in financial analysis.

5.4. Mean Square Error Evaluation

In our quest for reliable estimations of the conditional hazard function, it is essential to gauge the accuracy and performance of our estimator. To achieve this, we employ a quantitative metric known as the Mean Square Error (MSE). This evaluation process involves a thorough comparison between the hazard values estimated through our model and the ground truth values, which we assume for the purposes of simulation. The MSE serves as a pivotal yardstick, quantifying the extent to which our model aligns with the true hazard values. This numerical assessment provides a clear and objective measure of the goodness of fit for our model. By calculating the MSE, we gain valuable insights into the efficacy of our estimation approach, helping us to fine-tune and enhance our model’s performance and, subsequently, our ability to make informed financial decisions based on these estimations.

5.5. Discussion

The obtained results, including the line plot of stock returns in Figure 3, the estimated hazard function, and the MSE, provide valuable insights into the effectiveness of the proposed methodology in modeling conditional hazards for financial time series data. These findings pave the way for further applications in risk management and decision-making processes within the finance domain. This section provides a detailed overview of each step in the numerical study, allowing readers to understand the process and interpret the results in the context of financial applications.

6. Conclusions

In summary, this study effectively demonstrates the use of a local linear method within a single index model for estimating conditional hazard functions, especially in contexts involving missing at random (MAR) data. This approach, grounded in the fundamentals of functional statistics, has shown its efficacy, particularly when evaluated through mean square error analysis. The results highlight the method’s capability in handling incomplete datasets, a common challenge in practical data analysis. Future research should focus on broadening the scope of this approach to diverse datasets, which would enhance the understanding of its applicability and generalizability. Investigating the method’s robustness across different missing data mechanisms, including MCAR (missing completely at random) and NMAR (not missing at random) scenarios, is also essential. Furthermore, examining the impact of varying sample sizes and the dimensionality of functional predictors is crucial for a deeper understanding of the method’s performance. The inclusion of time-varying covariates in future models could provide a more dynamic representation of data, addressing the evolving nature of many real-world scenarios. Exploring complex index structures and integrating machine learning techniques with traditional statistical models could lead to more robust and adaptive analytical tools. As data volumes continue to grow, enhancing computational efficiency remains a priority, necessitating the development of more sophisticated algorithms. These advancements are not just methodological enhancements but steps towards making functional data analysis more relevant and impactful in practical applications.

Author Contributions

All authors have made significant contributions to this manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by two funding sources: (1) Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R358), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia and (2) the Deanship of Scientific Research at King Khalid University, which provided a grant (R.G.P. 1/366/44) for a Small Group Research Project.

Data Availability Statement

Data sharing not applicable to this article.

Acknowledgments

The authors thank and extend their appreciation to the funders of this work: (1) Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2024R358), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia; (2) The Deanship of Scientific Research at King Khalid University through the Research Groups Program under grant number R.G.P. 1/366/44.

Conflicts of Interest

There are no conflicts of interest to declare by the authors.

Appendix A

The following definitions of quantities will be given for any x F and constant C > 0 . Furthermore, we indicate that i , j = 1 , , n K i = K ( λ K 1 γ Θ ( X i , x ) ) , T Θ , i j = T i j ( Θ , x ) , H j = H ( λ H 1 ( y Y j ) ) and H j = H j ( λ H 1 ( y Y j ) ) .
Proof of Theorem 2.
We will compute the variance and the bias independently of one another:
E f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) 2 = E f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) 2 + V a r f ^ Y X ( x , Θ , y ) .
We used the traditional computation of the estimator’s variance and bias terms to support this theorem.The bias term is first determined for all z 0 and p N * . Then, using this final expansion, we get z = f ^ Θ , D x E [ f ^ Θ , D x ] , and when p = 1 , we get
E f ^ Y X ( x , Θ , y ) f Y X ( x , Θ , y ) = [ E f ^ N ( x , Θ , y ) E [ f ^ D ( Θ , x ) ] E [ f N ( x , Θ , y ) ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) ] E [ f ^ D ( Θ , x ) ] 2 + E [ ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) 2 f ^ Y X ( x , Θ , y ) ] E [ f ^ D ( Θ , x ) ] 2 ] f Y X ( x , Θ , y ) = E f ^ N ( x , Θ , y ) E [ f ^ D ( Θ , x ) ] A 1 E [ f ^ D ( Θ , x ) ] 2 + A 2 E [ f ^ D ( Θ , x ) ] 2 f Y X ( x , Θ , y )
f ^ x ( y ) constrained to a constant C > 0 ( H is bounded), where f ^ Y X ( x , Θ , y ) C / λ H .
A 1 = E [ f N ( x , Θ , y ) ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) ] = C o v f ^ N ( x , Θ , y ) , f ^ D ( Θ , x )
A 2 = E [ ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) 2 f ^ Y X ( x , Θ , y ) ] = V a r f ^ D ( Θ , x ) O ( λ H 1 ) .
B ^ n , 1 ( Θ , x ) = E f ^ N ( x , Θ , y ) E f ^ D ( Θ , x ) f Y X ( x , Θ , y )
Proof of Lemma 1.
B ^ n , 1 ( Θ , x ) = E f ^ N ( x , Θ , y ) E [ f ^ D ( Θ , x ) ] f Y X ( x , Θ , y ) E f ^ D ( Θ , x ) .
Later,
B ^ n , 1 ( Θ , x ) = E [ T Θ , 12 ( λ H 1 H 2 f Y X ( x , Θ , y ) ) ] E [ T Θ , 12 ]
Due to ( X i , Δ i , Y i ) have all identically distributed, we get
B ^ n , 1 ( Θ , x ) = E [ T Θ , 12 ( λ H 1 E [ H 2 | Θ , X 2 ] f Y X ( x , Θ , y ) ) ] E [ T Θ , 12 ] .
To assess E [ H 2 | Θ , X 2 ] as a conditional expectation, we employ the standard variable change t = y z λ H :
λ H 1 E [ H 2 | Θ , X 2 ] = R H ( t ) f Θ X 2 ( y λ H t ) d t
Assuming (H4) and applying a Taylor’s development, we derive
E [ H 2 | X 2 ] = f X 2 Y ( x , Θ , y ) + λ H 2 2 t 2 H ( t ) d t 2 f X 2 Y ( x , Θ , y ) y 2 + o ( λ H 2 ) .
We can rewrite the second one as
E [ H 2 | X 2 ] = ψ 0 ( X 2 , Θ , y ) + λ H 2 2 t 2 H ( t ) d t ψ 2 ( X 2 , Θ , y ) + o ( λ H 2 ) .
Consequently, from (A4), we get
B ^ n , 1 ( Θ , x ) = 1 E [ T Θ , 12 ] λ H 2 2 t 2 H ( t ) d t E T Θ , 12 ψ 2 ( X 2 , Θ , y ) + o ( λ H 2 ) E [ T Θ , 12 f Θ x ( y ) ) ]
In accordance with [12] for the regression estimation, we prove that
E [ T 12 ψ 2 ( X 2 , Θ , y ) ] = ψ 2 ( x , ( X 2 , Θ , y ) , y ) E [ T Θ , 12 ] + E [ T Θ , 12 ( ψ 2 ( X 2 , ( X 2 , Θ , y ) ψ 2 ( X 2 , Θ , y ) ) ] = ψ 2 ( X 2 , Θ , y ) E [ T Θ , 12 ] + E [ T Θ , 12 E [ ψ 2 ( X 2 , Θ , y ) ψ 2 ( X 2 , Θ , y ) | β Θ ( X 2 , x ) ] ] = ψ 2 ( X 2 , Θ , y ) E [ T Θ , 12 ] + E [ T Θ , 12 Ψ 2 ( β Θ ( X 2 , x ) ) ]
Ψ Θ , l ( 0 ) = 0 and E β Θ ( X 2 , x ) T Θ , 12 = 0 , we have:
Having noted that Ψ Θ , l ( 0 ) = 0 , and E β Θ ( X 2 , x ) T Θ , 12 = 0 , we possess
E T Θ , 12 ψ 2 ( X 2 , Θ , y ) = 1 2 Ψ Θ , l ( 0 ) E β Θ 2 ( X 2 , x ) T Θ , 12 + ψ 2 ( X 2 , Θ , y ) E [ T Θ , 12 ] + o ( E β Θ 2 ( X 2 , x ) T Θ , 12 ) .
Thus,
B ^ n , 1 ( Θ , x ) = f Y X ( x , Θ , y ) + λ H 2 2 2 f Y X ( x , Θ , y ) y 2 t 2 H ( t ) d t + o λ H 2 E β Θ 2 ( X 2 , x ) T Θ , 12 E [ T Θ , 12 ] + Ψ Θ , 0 ( 0 ) E β 2 ( X 2 , x ) T Θ , 12 2 E [ T Θ , 12 ] .
Given that the distribution of ( X i , Δ i , Y i ) is identical, we can say
E β ( x , X 2 ) 2 T 12 = E [ β ( x , X 2 ) 2 ( Δ 1 β Θ 2 ( X 1 , x ) K 1 ( Θ , x ) Δ 2 K 2 ( Θ , x ) Δ 1 β Θ ( X 1 , x ) K 1 ( Θ , x ) Δ 2 β Θ ( X 2 , x ) K 2 ( Θ , x ) ) ] = E [ E ( β ( x , X 2 ) 2 Δ 1 β Θ 2 ( X 1 , x ) K 1 ( Θ , x ) Δ 2 K 2 ( Θ , x ) Δ 1 β Θ ( X 1 , x ) K 1 ( Θ , x ) Δ 2 β Θ ( X 2 , x ) K 2 ( Θ , x ) | < Θ , X 2 > ) . ]
That is, we have
P ( Δ 2 = 1 | Θ , X 2 ) = P ( X 2 ) and P ( Δ 1 = 1 | Θ , X 1 ) = P ( X 1 ) .
Subsequently,
E β ( x , X 2 ) 2 T 12 = P ( X 1 ) + o ( 1 ) P ( X 2 ) + o ( 1 ) C ( E [ K Θ , 1 β Θ , 1 2 ] ) 2 E [ K Θ , 1 β Θ , 1 ] E [ K Θ , 1 β Θ , 1 3 ] ,
and we get that, assuming ( H 4 ) : a > 0
E [ K Θ , 1 a β Θ , 1 ] C B ( x , λ K ) β Θ ( u , x ) d P X ( u )
Applying the final part of the assumption (H3), we thus obtain:
λ K E [ K Θ , 1 a β Θ , 1 ] = o B ( x , λ K ) β Θ 2 ( u , x ) d P X ( u ) = o ( λ K 2 ϕ Θ , x ( λ K ) ) .
Thus, it is evident that
E [ K Θ , 1 a β Θ , 1 ] = o ( λ K ϕ Θ , x ( λ K ) ) .
Furthermore, for every b > 1 , we are able to write
E [ K Θ , 1 a β Θ , 1 b ] = E [ K Θ , 1 a γ Θ b ( x , X ) ] + E [ K Θ , 1 a β Θ b ( x , X ) γ Θ b ( x , X ) ] ,
The expressions in [30,31] give us the correct way for compute the terms of the right member.
So,
E f ^ N ( x , Θ , y ) = f Y X ( x , Θ , y ) + λ H 2 2 2 f Y X ( x , Θ , y ) y 2 t 2 H ( t ) d t + o ( λ H 2 ) + λ K 2 2 Ψ Θ , 0 ( 0 ) K ( 1 ) 1 1 ( u 2 K ( u ) ) χ Θ , x ( u ) d u K Θ ( 1 ) 1 1 K ( u ) χ Θ , x ( u ) d u + o ( λ K 2 ) .
Proof of Lemma 2.
We have
V a r f ^ N ( x , Θ , y ) = 1 ( n ( n 1 ) λ H E [ T Θ , 12 ] ) 2 V a r 1 i j n T Θ , i j H j = 1 n ( n 1 ) λ H ( E [ T Θ , 12 ] ) 2 [ n ( n 1 ) E [ T Θ , 12 2 ( H 2 ) 2 ] + n ( n 1 ) E [ T Θ , 12 T 21 H 2 H 1 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 13 H 2 H 3 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 23 H 2 H 3 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 31 H 2 H 1 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 32 ( H 2 ) 2 ] n ( n 1 ) ( 4 n 6 ) E [ T Θ , 12 H 2 ] 2 ] .
We own
E [ T Θ , 12 2 ( H 2 ) 2 ] = E [ ( Δ 1 β Θ 2 ( X 1 , x ) K 1 ( Θ , x ) Δ 2 K 2 ( Θ , x ) Δ 1 β Θ ( X 1 , x ) K 1 ( Θ , x ) Δ 2 β Θ ( X 2 , x ) K 2 ( Θ , x ) ) 2 H 2 ] .
From (A2), we can derive
E [ T Θ , 12 2 H 2 ] = E Δ 1 β Θ , 1 2 K Θ , 1 Δ 2 K Θ , 2 Δ 1 β Θ , 1 K Θ , 1 Δ 2 β Θ , 2 K Θ , 2 2 E H 2 = E Δ 1 β Θ , 1 2 K Θ , 1 Δ 2 K Θ , 2 H 2 2 + Δ 1 β Θ , 1 K Θ , 1 Δ 2 β Θ , 2 K Θ , 2 H 2 2 2 β Θ , 1 3 K Θ , 1 2 Δ 1 Δ 2 β Θ , 2 K Θ , 2 2 H 2 = ( P ( X ) + o ( 1 ) ) E β Θ , 1 2 K Θ , 1 K Θ , 2 H 2 2 + E β Θ , 1 K Θ , 1 β Θ , 2 K Θ , 2 H 2 2 E 2 β Θ , 1 3 K Θ , 1 2 β Θ , 2 K Θ , 2 2 H 2
Assume that (H2)–(H5) holds, we get
E β Θ , 1 2 K Θ , 1 K Θ , 2 H 2 2 = E β Θ , 1 4 K Θ , 1 2 K Θ , 2 2 E ( H 2 2 | < Θ , x > ) = C λ H E [ β Θ , 1 4 K Θ , 1 2 ] E [ K Θ , 2 2 ] C λ H λ K 4 ϕ Θ , x 2 ( λ K ) = O ( λ K 4 λ H ϕ Θ , x 2 ( λ K ) ) E [ T Θ , 12 2 H 2 ] = O ( λ K 4 λ H ϕ x 2 ( λ K ) )
After performing the identical computations for E [ T Θ , 12 2 H 2 ] , we compute the remaining terms of the right side of the Equation (A4).
E [ T Θ , 12 2 H 2 ] = O ( λ K 4 λ H ϕ Θ , x 2 ( λ K ) ) , E [ T Θ , 12 T Θ , 21 H 2 H 1 ] = O ( λ K 4 λ H 2 ϕ Θ , x 2 ( λ K ) ) , E [ T Θ , 12 T Θ , 13 H 2 H 3 ] = E [ T Θ , 12 T Θ , 31 H 2 H 1 ] = E [ T Θ , 12 T Θ , 23 H 2 H 3 ] = O ( λ K 4 λ H 2 ϕ Θ , x 3 ( λ K ) ) , E [ T Θ , 12 T Θ , 32 ( H 2 ) 2 ] = E 2 [ β Θ , 1 2 K Θ , 1 ] E [ K Θ , 1 2 ( H 1 ) 2 ] + o ( λ K 4 λ H ϕ Θ , x 3 ( λ K ) ) .
The principal term in the final cases is the latter, and it can be computed in (A4) in the manner described below:
V a r f ^ N ( x , Θ , y ) = n ( n 1 ) ( n 2 ) ( n ( n 1 ) λ H E [ T Θ , 12 ] ) 2 E 2 E [ K Θ , 1 2 ( H 1 ) 2 ] [ β Θ , 1 2 K Θ , 1 ]
Following the procedures outlined in the preceding Lemma, it is enough to write
V a r f ^ N ( x , Θ , y ) = E [ K Θ , 1 2 ( H 1 ) 2 ] n ( λ H E [ K Θ , 1 ] ) 2 + o 1 n λ H ϕ Θ , x ( λ K ) .
So
E [ K Θ , 1 2 ( H 1 ) 2 ] = E [ K Θ , 1 2 E ( ( H 1 ) 2 | < Θ , X 1 > ) ] = E K Θ , 1 2 ( H ) 2 y z λ k f Θ X 1 ( z ) d z
E ( ( H 1 ) 2 | < Θ , X 1 > ) = λ H ( H ) 2 ( t ) f Θ X 1 ( y λ H t ) d t
Next, we obtain the following by applying Taylor’s expansion of order 1 to f Θ X 1 ( · )
f X 1 ( y λ H t ) = f Y X 1 ( x , Θ , y ) + O ( λ H ) = f Y X 1 ( x , Θ , y ) ( y ) + o ( 1 ) .
Consequently, (A5) implies that
E [ K Θ , 1 2 ( H 1 ) 2 ] = λ H ( H ) 2 ( t ) d t E K Θ , 1 2 f Θ X ( y ) + o ( λ H E [ K Θ , 1 2 ] ) .
Using the same procedures as in Lemma 1’s proof, we once more obtain
E K Θ , 1 2 f Θ X 1 ( y ) = f Θ x ( y ) E [ K Θ , 1 2 ] + o ( E [ K Θ , 1 2 ] )
That means
E [ K Θ , 1 2 ( H 1 ) 2 ] = λ H f Θ x ( y ) E [ K Θ , 1 2 ] ( H ) 2 ( t ) d t + o ( λ H E [ K Θ , 1 2 ] ) .
Thus, from (A4)–(A6), we have the following
V a r f ^ N ( x , Θ , y ) = f Y X ( x , Θ , y ) n λ H ϕ Θ , x ( λ K ) H ( t ) 2 d t K 2 ( 1 ) 1 1 ( K Θ 2 ( u ) ) χ Θ , x ( u ) d u K ( 1 ) 1 1 ( K ( u ) ) χ Θ , x ( u ) d u 2
+ o 1 n λ H ϕ Θ , x ( λ K ) .
Proof of Lemma 3.
A basic computation yields
C o v f ^ N ( x , Θ , y ) , f ^ Θ , D ( x ) = 1 n ( n 1 ) λ H ( E [ T Θ , 12 ] ) 2 C o v 1 i j n T Θ , i j H j , 1 i j n T Θ , i j = 1 n ( n 1 ) λ H ( E [ T Θ , 12 ] ) 2 [ n ( n 1 ) E [ T Θ , 12 2 H 1 ] + n ( n 1 ) E [ T Θ , 12 T Θ , 21 H 2 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 13 H 2 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 23 H 2 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 31 H 2 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 32 H 2 ] n ( n 1 ) ( 4 n 6 ) ( E [ T Θ , 12 H 2 ] E [ T Θ , 12 ] ] .
Through an adjustment, we obtain
E [ T Θ , 12 2 H 2 ] = E [ T Θ , 12 T Θ , 21 H 2 ] = O ( λ K 4 λ H ϕ Θ , x 2 ( λ K ) ) , E [ T Θ , 12 T Θ , 13 H 2 ] = E [ T Θ , 12 T Θ , 31 H 2 ] = O ( λ K 4 λ H ϕ Θ , x 3 ( λ K ) ) , E [ T Θ , 12 T Θ , 23 H 2 ] = E [ T Θ , 12 T Θ , 32 H 2 ] = O ( λ K 4 λ H ϕ Θ , x 3 ( λ K ) ) .
Given that E [ T Θ , 12 ] = O ( λ K 2 ϕ Θ , x 2 ( λ K ) ) , we get
C o v f ^ Θ , N x ( y ) , f ^ D ( Θ , x ) = O 1 n ϕ Θ , x ( λ K ) .
Proof of Lemma 4.
We prove this result by iteratively replacing H with 1 in the previous lemma’s proof. Consequently,
V a r ( f ^ Θ , D x ) = 1 ( n ( n 1 ) E [ T Θ , 12 ] ) 2 V a r 1 i j n T Θ , i j = 1 n ( n 1 ) E [ T Θ , 12 ] ) 2 n ( n 1 ) E [ T Θ , 12 2 ] + n ( n 1 ) E [ T Θ , 12 T Θ , 21 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 13 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 23 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 31 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T , 32 ] n ( n 1 ) ( 4 n 6 ) ( E [ T Θ , 12 ] ) 2 .
With a simple adjustment, we obtain
E [ T Θ , 12 2 ] = E [ T Θ , 12 T Θ , 21 ] = O ( λ K 4 ϕ Θ , x 2 ( λ K ) ) , E [ T Θ , 12 T Θ , 13 ] = E [ T Θ , 12 T Θ , 31 ] = O ( λ K 4 ϕ Θ , x 3 ( λ K ) ) , E [ T Θ , 12 T Θ , 23 ] = E [ T Θ , 12 T Θ , 32 ] = O ( λ K 4 ϕ Θ , x 3 ( λ K ) ) .
Thus, we possess that
V a r f ^ Θ , D x = O 1 n ϕ Θ , x ( λ K ) .
Proof of Theorem 3.
The steps used to prove Theorem 2 are also used to prove this Theorem.
E F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) 2 = E F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) 2 + V a r F ^ Y X ( x , Θ , y ) .
Using the results in [12], we simplify the variance and bias of the last term in the right equation to obtain
E F ^ Y X ( x , Θ , y ) F Y X ( x , Θ , y ) = [ E F ^ N ( x , Θ , y ) E [ f ^ D ( Θ , x ) ] E [ F N ( x , Θ , y ) ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) ] E [ f ^ D ( Θ , x ) ] 2 + E [ ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) 2 F ^ Y X ( x , Θ , y ) ] E [ f ^ D ( Θ , x ) ] 2 ] F Y X ( x , Θ , y ) = E F ^ N ( x , Θ , y ) E [ f ^ D ( Θ , x ) ] A 1 E [ f ^ D ( Θ , x ) ] 2 + A 2 E [ f ^ D ( Θ , x ) ] 2 F Y X ( x , Θ , y )
The boundedness of the kernel H allows F ^ x ( y ) to be bounded by a constant C > 0 , such that F ^ Y X ( x , Θ , y ) C / λ H . Consequently,
A 1 = E [ F N ( x , Θ , y ) ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) ] = C o v F ^ N ( x , Θ , y ) , f ^ D ( Θ , x ) A 2 = E [ ( f ^ D ( Θ , x ) E [ f ^ D ( Θ , x ) ] ) 2 F ^ Y X ( x , Θ , y ) ] = V a r f ^ D ( Θ , x ) O ( λ H 1 ) . B ^ n , 2 ( Θ , x ) = E F ^ N ( x , Θ , y ) E f ^ D ( Θ , x ) F Y X ( x , Θ , y )
Proof of Lemma 5.
Getting
B ^ n , 2 ( Θ , x ) = E [ T Θ , 12 ( H 2 F Θ x ( y ) ) ] E [ T Θ , 12 ]
Because ( X i , Θ i , Y i ) have the same distribution, we have
B ^ n , 2 ( Θ , x ) = E [ T Θ , 12 ( E [ H 2 | Θ , X 2 ] F Θ x ( y ) ) ] E [ T Θ , 12 ] .
An integration by parts, conduct us to write
E [ H 2 | Θ , X 2 ] = R H ( t ) F Θ X 2 ( y λ H t ) d t
Then we repeat the steps in studying B ^ n , 1 ( Θ , x ) to demonstrate that
B ^ n , 2 ( Θ , x ) = λ H 2 2 2 F Θ x ( y ) y 2 t 2 H 2 ( t ) d t + o ( λ H 2 ) + λ K 2 2 Ψ 0 , 0 ( 0 ) K ( 1 ) 1 1 ( u 2 K Θ ( u ) ) ( 1 ) χ Θ , x ( u ) d u K ( 1 ) 1 1 K ( 1 ) ( u ) χ Θ , x ( u ) d u + o ( λ K 2 ) .
Proof of Lemma 6.
It is obvious that
V a r [ F ^ N ( x , Θ , y ) ] = 1 n ( n 1 ) λ H ( E [ T Θ , 12 ] ) 2 [ n ( n 1 ) E [ T Θ , 12 2 ( H 2 ) 2 ] + n ( n 1 ) E [ T Θ , 12 T Θ , 21 H 2 H 1 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 13 H 2 H 3 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 23 H 2 H 3 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 31 H 2 H 1 ] + n ( n 1 ) ( n 2 ) E [ T Θ , 12 T Θ , 32 ( H 2 ) 2 ] n ( n 1 ) ( 4 n 6 ) E [ T Θ , 12 H 2 ] 2 ] .
Using the same procedures as in Lemma 1, we get for these terms
E [ T Θ , 12 2 H 2 2 ] = O ( λ K 4 ϕ Θ , x 2 ( λ K ) ) , E [ T Θ , 12 T Θ , 21 H 1 H 2 ] = O ( λ K 4 ϕ Θ , x 2 ( λ K ) ) , E [ T Θ , 12 T Θ , 13 H 2 H 3 ] = ( F Θ x ( y ) ) 2 E [ β Θ , 1 4 K Θ , 1 2 ] E 2 [ K Θ , 1 ] + o ( λ K 4 ϕ Θ , x 3 ( λ K ) ) , E [ T Θ , 12 T Θ , 23 H 2 H 3 ] = ( F Θ x ( y ) ) 2 E [ β Θ , 1 2 K Θ , 1 ] E [ β Θ , 1 2 K Θ , 1 2 ] E [ K Θ , 1 ] + o ( λ K 4 ϕ Θ , x 3 ( λ K ) ) , E [ T 12 T 31 H 2 H 1 ] = ( F x ( y ) ) 2 E [ β 1 2 K 1 ] E [ β 1 2 K 1 2 ] E [ K 1 ] + o ( λ K 4 ϕ x 3 ( λ K ) ) , E [ T Θ , 12 T Θ , 32 H 2 2 ] = F Θ x ( y ) E 2 [ β Θ , 1 2 K Θ , 1 ] E [ K Θ , 1 2 ] + o ( λ K 4 ϕ Θ , x 3 ( λ K ) ) , E [ T Θ , 12 H 1 ] = O ( λ K 2 ϕ Θ , x 2 ( λ K ) ) .
As a result, (A7) and (A8) imply that
V a r [ F ^ N ( x , Θ , y ) ] = F Y X ( x , Θ , y ) ( 1 F Y X ( x , Θ , y ) ) E [ K Θ , 1 2 ] ( E [ K Θ , 1 ] ) 2 + o 1 n ϕ Θ , x ( λ K ) .
Furthermore,
V a r [ F ^ N ( x , Θ , y ) ] = F Y X ( x , Θ , y ) ( 1 F Y X ( x , Θ , y ) n ϕ Θ , x ( λ K ) K 2 ( 1 ) 1 1 ( K Θ 2 ( u ) ) ( 1 ) χ Θ , x ( u ) d u K ( 1 ) 1 1 ( K ( u ) Θ ) ( 1 ) χ Θ , x ( u ) d u 2 + o 1 n ϕ Θ , x ( λ K ) .

References

  1. Attaoui, S. Strong uniform consistency rates and asymptotic normality of conditional density estimator in the single functional index modeling for time series data. Adv. Stat. Anal. 2014, 98, 257–286. [Google Scholar] [CrossRef]
  2. Baillo, A.; Grané, A. Local linear regression for functional predictor and scalar response. J. Multivar. Anal. 2009, 100, 102–111. [Google Scholar] [CrossRef]
  3. Barrientos-Marin, J.; Ferraty, F.; Vieu, P. Locally modelled regression and functional data. J. Nonparametric Stat. 2010, 22, 617–632. [Google Scholar] [CrossRef]
  4. Berlinet, A.; Elamine, A.; Mas, A. Local linear regression for functional data. Ann. Inst. Stat. Math. 2011, 63, 1047–1075. [Google Scholar] [CrossRef]
  5. Bouchentouf, A.; Djebbouri, T.; Rabhi, A.; Sabri, K. Strong uniform consistency rates of some characteristics of the conditional distribution estimator in the functional single-index model. Appl. Math. 2014, 41, 301–322. [Google Scholar] [CrossRef]
  6. Benchiha, A.; Kaid, Z. Local linear estimate for functional regression with missing data at random. Int. J. Math. Stat. 2018, 19, 22–33. [Google Scholar]
  7. Cheng, P.E. Nonparametric estimation of mean functionals with data missing at random. J. Am. Stat. Assoc. 1994, 89, 81–87. [Google Scholar] [CrossRef]
  8. Demongeot, J.; Laksaci, A.; Madani, F.; Rachdi, M. Local linear estimation of the conditional density for functional data. Comptes Rendus Math. 2010, 348, 931–934. [Google Scholar] [CrossRef]
  9. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Monographs on Statistics and Applied Probability 66; Chapman and Hall/CRC: Boca Raton, FL, USA, 1996. [Google Scholar]
  10. Fan, J.; Yao, Q. Non Linear Time Series. Nonparametric and Parametric Methods; Springer Series in Statistics; Springer: New York, NY, USA, 2003. [Google Scholar]
  11. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; International Springer Series in Statistics; Springer: New York, NY, USA, 2003. [Google Scholar]
  12. Ferraty, F.; Mas, A.; Vieu, P. Non-paramétrique regression on functional data: Infernce and practical aspects. Aust. N. Z. J. Stat. 2007, 49, 207–286. [Google Scholar] [CrossRef]
  13. Ferraty, F.; Park, J.; Vieu, P. Estimation of a functional single index model. In Recent Advances in Functional Data Analysis and Related Topics, Contribution to Statistics, Physica; Physica-Verlag HD: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  14. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2002. [Google Scholar]
  15. Boj, E.; Delicado, P.; Fortiana, J. Distance-based local linear regression for functional predictors. Comput. Stat. Data Anal. 2010, 54, 429–437. [Google Scholar] [CrossRef]
  16. Belabbaci, O.; Rabhi, A.; Soltani, S. Strong Uniform Consistency of Hazard Function with Functional Explicatory Variable in Single Functional Index Model under Censored Data. Appl. Appl. Math. Int. J. (AAM) 2015, 10, 114–138. [Google Scholar]
  17. Bouiadjra, H.B. Conditional hazard function estimate for functional data with missing at random. Int. J. Stat. Econ. 2017, 18, 45–58. [Google Scholar]
  18. El methni, M.; Rachdi, M. Local weighted average estimation of the regression operator for functional data. Commun. Stat.-Theory Methods 2011, 40, 3141–3153. [Google Scholar] [CrossRef]
  19. Efromovich, S. Nonparametric regression with responses missing at random. J. Stat. Plan. Inference 2011, 141, 3744–3752. [Google Scholar] [CrossRef]
  20. Ferraty, F.; Peuch, A.; Vieu, P. Modèle à indice fonctionnel simple. C. R. Acad. Sci. Paris 2003, 336, 1025–1028. [Google Scholar] [CrossRef]
  21. Ferraty, F.; Sued, M.; Vieu, P. Mean estimation with data missing at random for functional covariables. Statistics 2013, 47, 688–706. [Google Scholar] [CrossRef]
  22. Fetitah, O.; Attouch, M.K.; Khardani, S.; Righi, A. Mean estimation with data missing at random for functional covariables. Metrika 2023, 86, 889–929. [Google Scholar] [CrossRef]
  23. Hamidi, N.; Mechab, B. Estimation of the Conditional Quantile for Functional Stationary Ergodic Data with Responses Missing at Random. J. Probab. Stat. Sci. 2018, 16, 131–149. [Google Scholar]
  24. Kenouza, J.; Mechab, B.; Benaissa, S. Functional local linear estimate of the conditional hazard function with missing at random. Int. J. Appl. Math. Stat. 2019, 58, 115–121. [Google Scholar]
  25. Kadiri, N.; Rabhi, A.; Bouchentouf, A. Strong uniform consistency rates of conditional quantile estimation in the single functional index model under random censorship. Depend. Model. 2018, 6, 197–227. [Google Scholar] [CrossRef]
  26. Ling, N.; Liang, L.; Vieu, P. Nonparametric regression estimation for functional stationary ergodic data with missing at random. J. Stat. Plan. Inference 2015, 162, 75–87. [Google Scholar] [CrossRef]
  27. Ling, N.; Liu, Y.; Vieu, P. Conditional mode estimation for functional stationary ergodic data with responses missing at random. Statistics 2016, 50, 991–1013. [Google Scholar] [CrossRef]
  28. Roussas, G. Hazard rate estimation under dependence conditions. J. Stat. Plan. Inference 1989, 22, 81–93. [Google Scholar] [CrossRef]
  29. Demongeot, J.; Laksaci, A.; Rachdi, M.; Rahmani, S. On the Local Linear Modelization of the Conditional Distribution for Functional Data. Sankhya A 2014, 76, 328–355. [Google Scholar] [CrossRef]
  30. Rachdi, M.; Laksaci, A.; Demongeot, J.; Abdali, A.; Madani, F. Theoretical and practical aspects of the quadratic error in the local linear estimation of the conditional density for functional data. Comput. Stat. Data Anal. 2014, 73, 53–68. [Google Scholar] [CrossRef]
  31. Merouan, T.; Massim, I.; Mechab, B. Quadratic error of the conditional hazard function in the local linear estimation for functional data. Int. J. Afr. Stat. 2018, 13, 1759–1777. [Google Scholar] [CrossRef]
  32. Tabti, H.; Ait Saidi, A. Estimation and simulation of conditional hazard function in the quasi-associated framework when the observations are linked via a functional single-index structure. Commu. Stat. Theory Methods 2018, 47, 816–838. [Google Scholar]
Figure 1. Stock return data.
Figure 1. Stock return data.
Mathematics 12 00495 g001
Figure 2. Single index structure α s, β s.
Figure 2. Single index structure α s, β s.
Mathematics 12 00495 g002
Figure 3. Censored and uncensored hazard function estimate.
Figure 3. Censored and uncensored hazard function estimate.
Mathematics 12 00495 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Belguerna, A.; Daoudi, H.; Abdelhak, K.; Mechab, B.; Chikr Elmezouar, Z.; Alshahrani, F. A Comprehensive Analysis of MSE in Estimating Conditional Hazard Functions: A Local Linear, Single Index Approach for MAR Scenarios. Mathematics 2024, 12, 495. https://doi.org/10.3390/math12030495

AMA Style

Belguerna A, Daoudi H, Abdelhak K, Mechab B, Chikr Elmezouar Z, Alshahrani F. A Comprehensive Analysis of MSE in Estimating Conditional Hazard Functions: A Local Linear, Single Index Approach for MAR Scenarios. Mathematics. 2024; 12(3):495. https://doi.org/10.3390/math12030495

Chicago/Turabian Style

Belguerna, Abderrahmane, Hamza Daoudi, Khadidja Abdelhak, Boubaker Mechab, Zouaoui Chikr Elmezouar, and Fatimah Alshahrani. 2024. "A Comprehensive Analysis of MSE in Estimating Conditional Hazard Functions: A Local Linear, Single Index Approach for MAR Scenarios" Mathematics 12, no. 3: 495. https://doi.org/10.3390/math12030495

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop