Next Article in Journal
Matrix Pencil Optimal Iterative Algorithms and Restarted Versions for Linear Matrix Equation and Pseudoinverse
Previous Article in Journal
An Improved Reacceleration Optimization Algorithm Based on the Momentum Method for Image Recognition
Previous Article in Special Issue
Mathematical Logic Model for Analysing the Controllability of Mining Equipment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Change Point Test for Length-Biased Lognormal Distribution under Random Right Censoring

1
Faculty of Science, Kunming University of Scicence and Technology, Kunming 650500, China
2
Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
3
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(11), 1760; https://doi.org/10.3390/math12111760
Submission received: 22 April 2024 / Revised: 21 May 2024 / Accepted: 28 May 2024 / Published: 5 June 2024

Abstract

:
The length-biased lognormal distribution is a length-biased version of lognormal distribution, which is developed to model the length-biased lifetime data from, for example, biological investigation, medical research, and engineering fields. Owing to the existence of censoring phenomena in lifetime data, we study the change-point-testing problem of length-biased lognormal distribution under random censoring in this paper. A procedure based on the modified information criterion is developed to detect changes in parameters of this distribution. Under the sufficient condition of the Fisher information matrix being positive definite, it is proven that the null asymptotic distribution of the test statistic follows a chi-square distribution. In order to evaluate the uncertainty of change point location estimation, a way of calculating the coverage probabilities and average lengths of confidence sets of change point location based on the profile likelihood and deviation function is proposed. The simulations are conducted, under the scenarios of uniform censoring and exponential censoring, to investigate the validity of the proposed method. And the results indicate that the proposed approach performs better in terms of test power, coverage probabilities, and average lengths of confidence sets compared to the method based on the likelihood ratio test. Subsequently, the proposed approach is applied to the analysis of survival data from heart transplant patients, and the results show that there are differences in the median survival time post-heart transplantation among patients of different ages.

1. Introduction

In fields such as medical research [1], survival analysis [2,3], and reliability studies [4], censored data often exist when the life model is a length-biased distribution. Typically, the type of censoring can be classified into left censoring, interval censoring, and right censoring, where the right censoring can further be categorized as Type I censoring, Type II censoring and random censoring, and random censoring is the most common type of right censoring [5,6]. It refers to situations where the total period of observation is fixed, but subjects enter into the study at different points in time. Some subjects experience the events of interest. Others do not, and some are lost to follow-ups. Others will still be alive at the end of the study period. In random censoring, the censored subjects do not all have the same censoring time [5]. In this case, if the censored data are deleted or the censored time is modeled as complete lifetime data, it may lead to the underestimation of important numerical characteristics such as the median or mean. Therefore, analyzing right-censored data in a reasonable manner is of significant importance for exploring the underlying patterns. In this article, we focus on random right-censored data.
The length-biased lognormal distribution is a kind of length-biased distribution based on lognormal distribution. Its probability density function is defined as follows:
g ( x ) = 1 x σ 2 π exp 1 2 σ 2 [ log x ( μ + σ 2 ) ] 2 ,
where x > 0 , < μ < , σ > 0 . For convenience, it is briefly denoted as L B L N ( μ , σ ) . Because the length-biased lognormal distribution is important in describing the characteristics of lifetime distributions, it has been studied extensively. Sansgiry and Akman considered it for product life modeling and deduced its corresponding reliability function [7,8]. Ratnaparkhi and Naik-Nimbalkar studied the estimation problems of length-biased lognormal distribution [9]. The existing literature primarily focuses on the statistical properties [7], parameter estimation [9], and practical applications of the length-biased lognormal distribution [8]. However, there is relatively limited research on the parameter change point test of this distribution under random censoring.
The problem of change point originates from the field of industrial quality control [10] and has become one of the hot topics in statistical research since it was proposed. The research contents mainly include two aspects. One is to test weather there is a change point, and the other is to detect the number and locations of change points if there are existing change points. Following over six decades of research and development, the study of the change point problem has been widely applied in epidemiology, biology, environmental science, and reliability engineering [11,12,13]. For censored data, there are a few studies that have focused on change point testing and estimation for linear transformation models and hazard function parameters based on right-censored data. For instance, Kosorok and Song [14] utilized a score test statistic to investigate the estimation and testing problems of change points for regression coefficients in a linear transformation model for right-censored survival data. Rabhi and Asgharian [15] studied the problem of change point estimation for the hazard function under biased sampling and right-censoring data. Chen et al. [16] proposed a detection procedure based on the empirical likelihood for the changes in the mean residual life functions with right-censored data.
While some of the literature has studied change point detection procedures for linear transformation models and interesting reliability parameters with right-censored data, research on change point testing for length-biased lognormal distribution with random right-censored data has been limited. In this change point detection problem, we can approach it as a model selection problem, where we aim to select the better option between the null hypothesis with no change and the alternative hypothesis with at least one change. Therefore, methods commonly used for model selection can be applied to change-point-testing problems.
Compared to the usual model selection problem, the change point problem introduces a special parameter—the change location. When the change point occurs near the middle of the process, there are no redundant parameters, making it easier to detect the change point. However, when it occurs near the beginning or the end of the sequence, the parameters of one segment of the data (the first part when it is near the beginning, the second part when it is near the end) become redundant. The traditional change point detection method is based on the likelihood ratio (LRT), which does not consider the contribution of the change location.
To address this limitation, we utilized the modified information criterion method, which considers the influence of change point positions on model complexity in the penalty term. This method was employed to detect the change point in the length-biased lognormal distribution with random right-censored data. Additionally, the asymptotic properties of the test statistics constructed based on the MIC with random right censoring have not been studied yet. This paper aims to investigate change point detection for the length-biased lognormal distribution under random right censoring.
The rest of this paper is organized as follows. In Section 2, a change-point-testing model for the parameters of the length-biased lognormal distribution is presented, which is based on random right censoring. A corresponding change point test method is introduced, and sufficient conditions for the positive definiteness of the Fisher information matrix under random right-censored data are provided, along with the asymptotic distribution of the test statistic. Then, a technique to calculate the coverage probabilities and average lengths of confidence sets of change point location based on profile likelihood function and deviation function is proposed. Simulations are carried out, with censoring rates of 10% and 30% under uniform and exponential censoring distributions, to indicate the performance of the detecting procedures in Section 3. In Section 4, we illustrate our method on a survival time dataset in heart transplant patients. Some discussion is provided in Section 5.

2. Methodology

2.1. Change Point Model

Suppose T 1 , T 2 , , T n are independent and identically distributed (i.i.d.) survival times. A common feature of survival data is the presence of right censoring, which is the most common type of censoring. In the presence of censoring, we only observe ( U 1 ,   δ 1 ) , ( U 2 , δ 2 ) ,…, ( U n , δ n ) , where ( U i , δ i ) is obtained as follows,
( U i , δ i ) = min { ( T i , C i ) , I ( T i C i ) } , i = 1 , 2 , , n
where C i are potential censoring times for n subjects, and they are often treated as random variables with distribution function H ( η ) in statistical inference. And we assume that { T 1 , T 2 , , T n } is independent of { C 1 , C 2 , , C n } .
If the independent random variables T 1 , T 2 , , T n follow a length-biased lognormal distribution, that is T i L B L N ( μ i , σ i ) , then the density of the length-biased lognormal distribution under random right-censored data pair ( U i , δ i ) is
p ( u i ) = g ( u i ) δ i S g ( u i ) 1 δ i ,
where
g ( u i ) = 1 u i σ 2 π exp [ log u i ( μ + σ 2 ) ] 2 2 σ 2 ,
S g ( u i ) = 1 Φ log u i ( μ + σ 2 ) σ .
We are interested in testing the null hypothesis of no change in θ i = ( μ i , σ i ) of length-biased lognormal distribution with random right censoring
H 0 : θ 1 = θ 2 = = θ n = θ
against the following alternative hypothesis
H 1 : θ 1 = θ 2 = = θ k 1 θ k 1 + 1 = = θ k 2 θ k 2 + 1 θ k p + 1 = = θ n ,
where k 1 , k 2 , , k p are the locations of p changes in the sequence. Since the multiple change point problem can be simplified to a single change point problem by the binary segmentation method [17], we thus consider testing the null hypothesis (6) against the following alternative hypothesis:
H 1 : θ L = θ 1 = θ 2 = = θ k θ k + 1 = = θ n = θ R ,
where k is the unknown change point location, and θ L and θ R are the parameters before and after the change point, respectively.
Then, the likelihood function under the null hypothesis is
L 0 ( θ ) = i = 1 n g ( u i ) δ i S g ( u i ) 1 δ i · i = 1 n h ( u i ) 1 δ i [ 1 H ( u i ) ] δ i = i = 1 n 1 u i σ 2 π exp 1 2 σ 2 [ log u i ( μ + σ 2 ) ] 2 δ i i = 1 n 1 Φ log u i ( μ + σ 2 ) σ 1 δ i i = 1 n h ( u i ) 1 δ i [ 1 H ( u i ) ] δ i ,
where h ( · ) is the probability density function of the censored distribution, and H ( · ) is the probability distribution function of the censored distribution. The log-likelihood function after removing the constant term is
l 0 ( θ ) = log σ i = 1 n δ i 1 2 σ 2 i = 1 n δ i [ log u i ( μ + σ 2 ) ] 2 + i = 1 n ( 1 δ i ) log 1 Φ log u i ( μ + σ 2 ) 2 σ .
Similarly, the log-likelihood function under the alternative hypothesis is
l 1 ( θ L , θ R ) = log σ L i = 1 k δ i 1 2 σ L 2 i = 1 k δ i [ log u i ( μ L + σ L 2 ) ] 2 + i = 1 k ( 1 δ i ) log 1 Φ log u i ( μ L + σ L 2 ) 2 σ L log σ R i = k + 1 n δ i 1 2 σ R 2 i = k + 1 n δ i [ log u i ( μ R + σ R 2 ) ] 2 + i = k + 1 n ( 1 δ i ) log 1 Φ log u i ( μ R + σ R 2 ) 2 σ R .
Because of the existence of nonlinear functions Φ ( · ) , we obtain θ ^ , θ ^ L and θ ^ R by the numerical method.

2.2. Modified Information Criterion Procedure

The likelihood ratio test (LRT) procedure is one of the most popular methods for parametric change point analysis. Alternatively, a change point problem can be viewed as a model selection problem. That is, we choose a better model between the null hypothesis and the alternative hypothesis. The choice of a model under the null hypothesis corresponds to the situation of no change. Otherwise, it corresponds to the case that at least one change occurs. Hence, the information criterion, such as the Akaike information criterion (AIC) [18] and Schwarz information criterion (SIC) [19], can be used for the change point test. The literature is fruitful with studies which have been conducted in this direction. For instance, Chen and Gupta [20] developed a binary procedure combined with the SIC to search all of the possible variance change points in a sequence of independent Gaussian random variables. Chen and Gupta [21] provided the testing and estimation of a single change point in means and variances of a sequence of independent Gaussian normal random variables based on SIC and the unbiased version of the SIC and so on.
Model complexity is a very important factor in model selection based on the information criteria. The dimension of parameter space is usually used to measure model complexity. However, the method based on SIC does not consider the influence of the change location, which may cause redundancy when the change nears the beginning or the end of data. To solve this issue, Chen et al. [22] modified the traditional information criterion by making the model complexity a function of the change point location, and denoted it as a modified information criterion (MIC). In this paper, we study the change point problem in parameters of the LBLN distribution with random right censoring based on MIC. Under the null hypothesis, the MIC is defined as
MIC ( n ) = 2 log L 0 ( θ ^ ) + 2 log ( n ) ,
where θ ^ = ( μ ^ , σ ^ ) is the MLE of θ = ( μ , σ ) and n is the sample size. The associated MIC statistics under the alternative hypothesis is defined as
MIC ( k ) = 2 log L 1 ( θ ^ L , θ ^ R ) + 4 + 2 k n 1 2 log ( n ) ,
where k is the unknown change point location, θ ^ L and θ ^ R are maximum likelihood estimators of θ L and θ R . For the penalty term 4 + ( 2 k / n 1 ) 2 log ( n ) in Equation (13), when k 1 or k n , it means that the change point positions appear at both ends, respectively, and the penalty term in MIC ( k ) approaches 5 log ( n ) . When k n / 2 , the penalty term approaches 4 log ( n ) , and the complexity of the model is minimized. This is because when the change point position is close to both ends, there are not enough data for estimating the parameters, which may result in a large variance of parameter estimation. Thus, we should set a larger penalty in this case. When the change point is close to the middle of the data, there are enough data to estimate the parameters, and the variance of the parameter estimation is small, so setting a smaller penalty is sufficient. That is, when the suspected change points appear at both ends, stronger evidence is needed to prove the existence of such changes. Therefore, when k closes to 1 or n, a larger penalty needs to be set.
In order to ensure sufficient observations for parameter estimation, the range of k is K = { k | k 0 k n k 0 } , where k 0 = 2 [ log n ] . Based on the minimum information criterion, a model with the smallest MIC value will be considered the best one to fit the data. Then, the min k K MIC ( k ) corresponds to the best model under H 1 . Further, we accept H 0 if
MIC ( n ) min k K MIC ( k ) ,
that is, the model with no change is the best one. We reject H 0 if
MIC ( n ) > min k K MIC ( k ) ,
which indicates that the best model under H 1 is more appropriate to describe the data than the model under H 0 . It leads us to conclude that there exists at least one change in the data. Correspondingly, the location of change can be estimated as follows:
k ^ = arg min k K MIC ( k ) .
In order to make the conclusion more statistically convincing, we construct the test statistics-based MIC as follows:
M n = MIC ( n ) min k K MIC ( k ) + 2 log ( n ) .
Given critical value M under a certain significance level α , one can make the decision of accepting or rejecting H 0 by comparing M and M n . To calculate the critical value of M n , under certain conditions, the asymptotic distribution of the M n can be obtained as shown in Theorem 1, and the following lemmas are required to prove this theorem.
Lemma 1. 
Under the random right-censoring data ( U i , δ i ) , if ( δ i A 11 + ( 1 δ i ) B 11 ) ( δ i A 22 + ( 1 δ i ) B 22 ) ( δ i A 12 + ( 1 δ i ) B 12 ) 2 , then the Fisher information matrix based on these data is positively definite, where
A ( θ ) = A 11 A 12 A 21 A 22 , A i j = E 2 log g ( U i ) θ θ T , ( i = j = 1 , 2 ) B ( θ ) = B 11 B 12 B 21 B 22 , B i j = E 2 log S g ( U i ) θ θ T , ( i = j = 1 , 2 ) .
Proof of Lemma 1. 
According to the definition of random right-censoring data pairs in Section 2.1, the density function under right-censoring data is p ( U i ; θ ) , where
p ( U i ; θ ) = g ( U i ; θ ) δ i S g ( U i ; θ ) ( 1 δ i ) , g ( U i ; θ ) = 1 U i σ 2 π exp 1 2 σ 2 [ log U i ( μ + σ 2 ) ] 2 , S g ( U i ; θ ) = 1 Φ log U i ( μ + σ 2 ) σ .
Then, the logarithmic density function is
log p ( U i ; θ ) = δ i log g ( U i ; θ ) + ( 1 δ i ) log S g ( U i ; θ ) .
For convenience in writing, let Y i = [ log U i ( μ + σ 2 ) ] / σ . Therefore, under the random right-censoring data, the Fisher information matrix of the density function p ( U i ; θ ) based on the length-biased lognormal distribution is as follows:
I ( θ ) = E 2 log p ( U i ; θ ) μ 2 2 log p ( U i ; θ ) μ σ 2 log p ( U i ; θ ) σ μ 2 log p ( U i ; θ ) σ 2 = E 2 log p ( U i ; θ ) μ 2 2 log p ( U i ; θ ) μ σ 2 log p ( U i ; θ ) σ μ 2 log p ( U i ; θ ) σ 2 ,
where
2 log p ( U i ; θ ) μ 2 = δ i σ 2 1 δ i σ 2 Y i ϕ ( Y i ) 1 Φ ( Y i ) ϕ 2 ( Y i ) { 1 Φ ( Y i ) } 2 , 2 log p ( U i ; θ ) μ σ = δ i 2 σ + 2 σ 2 Y i ( 1 δ i ) 1 σ 2 ϕ ( Y i ) 1 Φ ( Y i ) + 1 σ Y i ϕ ( Y i ) 2 + 1 σ Y i 1 Φ ( Y i ) 1 σ ϕ 2 ( Y i ) 2 + 1 σ Y i { 1 Φ ( Y i ) } 2 , 2 log p ( U i ; θ ) σ 2 = δ i 1 σ 2 + 6 σ Y i + 3 σ 2 Y i 2 + 4 ( 1 δ i ) Y i ϕ ( Y i ) 2 + 1 σ Y i 2 1 Φ ( Y i ) 1 σ 2 Y i ϕ ( Y i ) 1 Φ ( Y i ) 1 σ ϕ ( Y i ) 2 + 1 σ Y i 1 Φ ( Y i ) ϕ 2 ( Y i ) 2 + 1 σ Y i 2 { 1 Φ ( Y i ) } 2 , 2 log p ( U i ; θ ) σ μ = 2 log p ( U i ; θ ) μ σ .
Further, the expectations for the second derivative are
E 2 log p ( U i ; θ ) μ 2 = δ i σ 2 1 δ i σ 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) E ϕ 2 ( Y i ) { 1 Φ ( Y i ) } 2 , E 2 log p ( U i ; θ ) μ σ = δ i E 2 σ + 2 σ 2 Y i ( 1 δ i ) 1 σ 2 E ϕ ( Y i ) 1 Φ ( Y i ) + 1 σ E Y i ϕ ( Y i ) 2 + 1 σ Y i 1 Φ ( Y i ) 1 σ E ϕ 2 ( Y i ) 2 + 1 σ Y i { 1 Φ ( Y i ) } 2 , E 2 log p ( U i ; θ ) σ 2 = δ i E 1 σ 2 + 6 σ Y i + 3 σ 2 Y i 2 + 4 ( 1 δ i ) E Y i ϕ ( Y i ) 2 + 1 σ Y i 2 1 Φ ( Y i ) 1 σ 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) 1 σ E ϕ ( Y i ) 2 + 1 σ Y i 1 Φ ( Y i ) E ϕ 2 ( Y i ) 2 + 1 σ Y i 2 { 1 Φ ( Y i ) } 2 .
When calculating the above expectations, it is found that there is no analytical expression for some integrals. We use the Monte Carlo integration method to approximate. In order to reduce the number of these integrals and Monte Carlo approximations, the following expectations can be simplified according to the integration by parts method,
E ϕ 2 ( Y i ) { 1 Φ ( Y i ) } 2 = 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) , E Y i ϕ 2 ( Y i ) { 1 Φ ( Y i ) } 2 = E ϕ ( Y i ) 1 Φ ( Y i ) + 2 E Y i 2 ϕ ( Y i ) 1 Φ ( Y i ) , E Y i 2 ϕ 2 ( Y i ) { 1 Φ ( Y i ) } 2 = 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) + 2 E Y i 3 ϕ ( Y i ) 1 Φ ( Y i ) .
Then, the elements of the Fisher information matrix are
E 2 log p ( U i ; θ ) μ 2 = δ i σ 2 + 1 δ i σ 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) , E 2 log p ( U i ; θ ) μ σ = 2 δ i σ + 1 δ i σ 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) + 1 σ E Y i 2 ϕ ( Y i ) 1 Φ ( Y i ) , E 2 log p ( U i ; θ ) σ 2 = δ i 2 σ 2 + 4 ( 1 δ i ) 2 σ E ϕ ( Y i ) 1 Φ ( Y i ) + 4 E Y i ϕ ( Y i ) 1 Φ ( Y i ) 4 σ E Y i 2 ϕ ( Y i ) 1 Φ ( Y i ) 1 σ 2 E Y i 3 ϕ ( Y i ) 1 Φ ( Y i ) .
Thus,
I ( θ ) = I 11 I 12 I 21 I 22 = δ i A ( θ ) + ( 1 δ i ) B ( θ ) , = δ i A 11 A 12 A 21 A 22 + ( 1 δ i ) B 11 B 12 B 21 B 22 = δ i A 11 + ( 1 δ i ) B 11 δ i A 12 + ( 1 δ i ) B 12 δ i A 21 + ( 1 δ i ) B 21 δ i A 22 + ( 1 δ i ) B 22
where
A ( θ ) = A 11 A 12 A 21 A 22 = 1 σ 2 2 σ 2 σ 2 σ 2 + 4 , B ( θ ) = B 11 B 12 B 21 B 22 ,
B 11 = 1 σ 2 E Y i ϕ ( Y i ) 1 Φ ( Y i ) , B 12 = B 21 = 2 σ E Y i ϕ ( Y i ) 1 Φ ( Y i ) + 1 σ 2 E Y i 2 ϕ ( Y i ) 1 Φ ( Y i ) , B 22 = 2 σ E ϕ ( Y i ) 1 Φ ( Y i ) 4 E Y i ϕ ( Y i ) 1 Φ ( Y i ) + 4 σ E Y i 2 { ϕ ( Y i ) } 2 1 Φ ( Y i ) + 1 σ 2 E Y i 3 ϕ ( Y i ) 1 Φ ( Y i ) .
Since ( δ i A 11 + ( 1 δ i ) B 11 ) ( δ i A 22 + ( 1 δ i ) B 22 ) ( δ i A 12 + ( 1 δ i ) B 12 ) 2 , and ( δ i A 11 + ( 1 δ i ) B 11 ) ( δ i A 22 + ( 1 δ i ) B 22 ) ( δ i A 12 + ( 1 δ i ) B 12 ) 2 = I 11 I 22 I 12 I 21 = | I ( θ ) | 0 , then the Fisher information matrix I ( θ ) is positively definite. □
Theorem 1. 
Based on certain Wald conditions and regular conditions, and if ( δ i A 11 + ( 1 δ i ) B 11 ) ( δ i A 22 + ( 1 δ i ) B 22 ) ( δ i A 12 + ( 1 δ i ) B 12 ) 2 , then the asymptotic null distribution of the test statistic M n for the length-biased lognormal distribution with random right censoring is
M n D χ 2 ( 2 ) , a s n ,
where M n is defined in Equation (15), and Wald conditions and regular conditions are listed in Appendix A.
Proof of Theorem 1. 
Lemma 3 in [22] indicates that k / n approaches 1 / 2 under H 0 , which can be expressed as Δ ϵ = k : | k / n 1 / 2 | < ϵ . This lemma was proved under the Wald conditions and regularity conditions. Let θ L = ( μ L , σ L ) , and θ R = ( μ R , σ R ) , θ = ( μ , σ ) is in a small neighborhood of the true value θ ^ 0 = ( μ 0 , σ 0 ) , that is, M δ = { θ : | θ θ 0 | < δ } , where δ > 0 . So for any ϵ > 0 , δ > 0 , it can be obtained that
M n = MIC ( n ) min k Δ ϵ MIC ( k ) + 2 log n = max k Δ ϵ { MIC ( n ) MIC ( k ) } + 2 log n } = max k Δ ϵ 2 log L 1 ( θ L , θ R ) 2 log L 0 ( θ ) 2 k n 1 2 log n 2 max k Δ ϵ sup θ L M δ i = 1 k log p ( U i ; θ L ) + sup θ R M δ i = k + 1 n log p ( U i ; θ R ) sup θ M δ i = 1 n log p ( U i ; θ ) + o p ( 1 ) .
where
p ( U i ; θ ) = g ( U i ; θ ) δ i S g ( U i ; θ ) ( 1 δ i ) , g ( U i ; θ ) = 1 U i σ 2 π exp 1 2 σ 2 [ log U i ( μ + σ 2 ) ] 2 , S g ( U i ; θ ) = 1 Φ log U i ( μ + σ 2 ) σ .
Let θ ˜ L , θ ˜ R , and θ ˜ be the maximum points of i = 1 k log p ( U i ; θ ) , i = k + 1 n log p ( U i ; θ ) , and i = 1 n log p ( U i ; θ ) , respectively, in the interval ( θ 0 δ , θ 0 + δ ) . Then, there must exist an η such that | η θ 0 | < δ when θ equals to θ ˜ L , θ ˜ R , or θ ˜ . Furthermore, the Taylor expansion of log p ( U i ; θ ) at θ = θ 0 = ( μ 0 , σ 0 ) can be obtained as follows:
log p ( U i ; θ ) log p ( U i ; θ 0 ) = S ( U i ; θ ) | θ = θ 0 ( θ θ 0 ) T + 1 2 ( θ θ 0 ) H ( θ 0 ) ( θ θ 0 ) T + 1 6 M δ 3 + o p ( 1 ) ,
where ( θ θ 0 ) = ( μ μ 0 , σ σ 0 ) , S ( U i ; θ ) = log p ( U i ; θ ) / μ , log p ( U i ; θ ) / σ is the score function of g ( U i ; θ ) . For convenience in writing, let Y i = log U i ( μ + σ 2 ) / σ , then the elements in S ( U i ; θ ) are
log p ( U i ; θ ) μ = δ i σ Y i + 1 δ i σ ϕ ( Y i ) 1 Φ ( Y i ) , log p ( U i ; θ ) σ = δ i 1 σ + 2 Y i + 1 σ Y i 2 + ( 1 δ i ) ( 2 + 1 σ Y i ) ϕ ( Y i ) 1 Φ ( Y i ) .
H ( θ ) is the second-order derivative matrix of p ( U i ; θ ) with respect to each parameter, that is,
H ( θ ) = 2 log p ( U i ; θ ) μ 2 2 log p ( U i ; θ ) μ σ 2 log p ( U i ; θ ) σ μ 2 log p ( U i ; θ ) σ 2 ,
and elements of H ( θ ) are represented as follows:
2 log p ( U i ; θ ) μ 2 = δ i σ 2 + 1 δ i σ 2 Y i ϕ ( Y i ) 1 Φ ( Y i ) ϕ 2 ( Y i ) { 1 Φ ( Y i ) } 2 , 2 log p ( U i ; θ ) μ σ = δ i 2 σ 2 σ 2 Y i + ( 1 δ i ) 1 σ 2 ϕ ( Y i ) 1 Φ ( Y i ) + 1 σ Y i ϕ ( Y i ) 2 + 1 σ Y i 1 Φ ( Y i ) 1 σ ϕ 2 ( Y i ) 2 + 1 σ Y i { 1 Φ ( Y i ) } 2 , 2 log p ( U i ; θ ) σ 2 = δ i 1 σ 2 6 σ Y i 3 σ 2 Y i 2 4 + ( 1 δ i ) Y i ϕ ( Y i ) 2 + 1 σ Y i 2 1 Φ ( Y i ) 1 σ 2 Y i ϕ ( Y i ) 1 Φ ( Y i ) 1 σ ϕ ( Y i ) 2 + 1 σ Y i 1 Φ ( Y i ) ϕ 2 ( Y i ) 2 + 1 σ Y i 2 { 1 Φ ( Y i ) } 2 , 2 log p ( U i ; θ ) σ μ = 2 log p ( U i ; θ ) μ σ .
Then, Equation (16) can be further simplified as
M n max k Δ ϵ sup θ L M δ 2 i = 1 k S ( U i ; θ 0 ) ( θ L θ 0 ) T + i = 1 k ( θ L θ 0 ) H ( θ 0 ) ( θ L θ 0 ) T + sup θ R M δ 2 i = k + 1 n S ( U i ; θ 0 ) ( θ R θ 0 ) T + i = k + 1 n ( θ R θ 0 ) H ( θ 0 ) ( θ R θ 0 ) T sup θ M δ 2 i = 1 n S ( U i ; θ 0 ) ( θ θ 0 ) T + i = 1 n ( θ θ 0 ) H ( θ 0 ) ( θ θ 0 ) T + o p ( 1 ) .
I ( θ ) represents the Fisher information matrix, then I ( θ ) = E ( H ( θ ) ) . According to Lemma 1, we know that when ( δ i A 11 + ( 1 δ i ) B 11 ) ( δ i A 22 + ( 1 δ i ) B 22 ) ( δ i A 12 + ( 1 δ i ) B 12 ) 2 , the Fisher information matrix is positive definite, and its inverse exists. Combining with the properties of quadratic functions, Equation (17) can be further expressed as
M n max k Δ ϵ 1 k i = 1 k S ( U i ; θ 0 ) I 1 ( θ 0 ) 1 k i = 1 k S ( U i ; θ 0 ) T + 1 n k i = k + 1 n S ( U i ; θ 0 ) I 1 ( θ 0 ) 1 n k i = k + 1 n S ( U i ; θ 0 ) T 1 n i = 1 n S ( U i ; θ 0 ) I 1 ( θ 0 ) 1 n i = 1 n S ( U i ; θ 0 ) T + o p ( 1 ) .
Let
Z k = i = 1 k I 1 2 ( θ 0 ) S ( U i ; θ 0 ) , Z n = i = 1 n I 1 2 ( θ 0 ) S ( U i ; θ 0 ) ,
then
M n max k Δ ϵ 1 k Z k Z k T 1 k + 1 n k ( Z n Z k ) ( Z n Z k ) T 1 n k 1 n Z n Z n T 1 n + o p ( 1 ) = max k Δ ϵ n ( n k ) Z k Z k T + n k ( Z n Z k ) ( Z n Z k ) T k ( n k ) Z n Z n T n k ( n k ) + o p ( 1 ) = max k Δ ϵ n · k n 1 k n 1 2 Z k k n Z n Z k k n Z n T n · k n 1 k n 1 2 + o p ( 1 ) max k Δ ϵ V ( t ) V T ( t ) + o p ( 1 ) ,
where
V ( t ) = [ n t ] n 1 [ n t ] n 1 2 n 1 2 Z [ n t ] T + ( n t [ n t ] ) S ( U [ n t ] + 1 ; θ 0 ) [ n t ] n Z n T = ( V 1 ( t ) , V 2 ( t ) ) ,
for n t Δ ϵ .
Without loss of generality, for t [ 1 2 ϵ , 1 2 + ϵ ] ,
V i ( t ) D [ t ( 1 t ) ] 1 2 B 0 ( t ) , i = 1 , 2 ,
where D means converges in distribution, B 0 ( t ) is the Brownian bridge. Then,
sup n t Δ ϵ | B 0 ( t ) B 0 1 2 | 0
and
1 2 1 1 2 1 B 0 2 1 2 χ 1 2 .
So
M n sup | t 1 2 | < ϵ V ( t ) V T ( t ) + o p ( 1 ) sup | t 1 2 | < ϵ 1 t ( 1 t ) B 0 2 ( t ) + 1 t ( 1 t ) B 0 2 ( t ) ,
And then there is P ( M n s ) P ( χ 2 ( 2 ) s o p ( 1 ) ) . That is,
lim ̲ n P ( M n s ) P ( χ 2 2 s ) .
And because M n MIC ( n ) MIC ( n / 2 ) + 2 log n ,
lim ¯ n P ( M n s ) P ( χ 2 2 s ) .
Therefore, M n D χ 2 ( 2 ) as n . □

2.3. Profile Likelihood Function and Deviance Function

The confidence distribution can be regarded as a distribution that depends on samples, which can be used to study the interval estimation and point estimation of the parameter of interest [23]. In particular, it can provide the confidence interval of the parameter of interest at any nominal level through the confidence curve. In the change point problem, the location of the change point is a discrete variable, and the uncertainty analysis of the estimated change location is a challenging task. Similar to reference [24], we establish the confidence curve of the change point location based on profile likelihood and deviation function to analyze the uncertainty of the change location estimation.
Assuming X 1 , X 2 , , X k is a sample from the population density function, f ( x , θ L ) , x 1 , x 2 , , x k is the corresponding sample observation. Meanwhile, X k + 1 , , X n comes from the population density function f ( x , θ R ) , and x k + 1 , , x n is the corresponding sample observation. Then, the log-likelihood function of x 1 , x 2 , , x n is
l ( k , θ L , θ R ) = i = 1 k log f ( x i , θ L ) + i = k + 1 n log f ( x i , θ R ) .
For a given change point k, the profile log-likelihood function can be obtained by maximizing the log-likelihood function (21), where the profile log-likelihood function is defined as follows:
l p r o f ( k ) = max θ L , θ R l ( k , θ L , θ R ) = l ( k , θ ^ L , θ ^ R ) ,
where θ ^ L , θ ^ R are MLEs of θ L and θ R for a given k. Then, k ^ can be obtained by
k ^ = arg l p r o f ( k ^ ) = arg max k l p r o f ( k ) .
The deviance function of k ^ is given by
D ( k , X ) = 2 { l p r o f ( k ^ ) l p r o f ( k ) } ,
where X = ( X 1 , X 2 , , X n ) . To construct k’s confidence curve based on deviance function, we consider the distribution of D ( k , X ) at k, which is denoted as R k ( t ) = P k , θ ^ L , θ ^ R { D ( k , X ) t } . However, the location of the change point is discrete; R k ( t ) does not satisfy Wilks’ theorem. Therefore, we compute R k ( t ) through simulation. And the confidence curve of k is defined by
c c ( k , x o b s ) = P k , θ ^ L , θ ^ R { D ( k , X ) D ( k , x o b s ) } ,
and it can be obtained through the following simulation,
c c ( k , x o b s ) = 1 B j = 1 B I { D ( k , X j * ) < D ( k , x o b s ) } ,
where B is a large number and B = 1000 typically. X j * is a sample from f ( x , θ ^ L ) and f ( x , θ ^ R ) with a given k.
For comparison purposes, we estimate the location of the change point through (14) for computing the deviation function. The specific simulation steps are as follows,
Step 1.
Given sample size n and change point k t r u e , generating a group of random samples with change point k t r u e based on parameters θ L and θ L , record them as x o b s , and calculate the deviation D ( k , x o b s ) at each possible change point based on x o b s .
Step 2.
Compute ( k ^ , θ ^ L , θ ^ R ) based on Step 1.
Step 3.
Generate B = 1000 random samples X j * ( j = 1 , , B ) , where x j 1 * , , x j k * are from f ( x , θ ^ L ) , x j , k + 1 * , , x j n * are from f ( x , θ ^ R ) . And compute D ( k , X j * ) . Then, the confidence curve c c ( k , x o b s ) under the possible change locations is the frequencies of D ( k , X j * ) < D ( k , x o b s ) .
Step 4.
Repeat Step 1 to Step 3 for N = 1000 times to obtain N confidence curves about the change point k.
Step 5.
Given significance level α , the corresponding confidence set of each confidence curve is K s e t = { k : c c ( k , x o b s ) < = 1 α } .
Step 6.
Coverage probabilities of confidence sets. Based on the confidence set K s e t obtained from each confidence curve, then the frequency of k t r u e in K s e t is the coverage probability of the confidence set at the corresponding confidence level 1 α :
c p = 1 N i = 1 N I { k t r u e K s e t } .
Step 7.
Average size of confidence sets. Since the confidence set is the set of estimates of the change location, it is a set of discrete points. Thus, unlike the calculation method for continuous interval length, we define the number of elements in the confidence set as the size of the confidence set.Noting the number of elements in each confidence set K s e t as m i ( i = 1 , , N ) , then the average size of confidence sets is obtained by
l s e t = 1 N i = 1 N m i .
In conclusion, when c p is closer to 1 α , the smaller l s e t means the corresponding estimate method is better.

3. Simulation Study

Based on the no information characteristic of the uniform distribution and the memorylessness of the exponential distribution, these two types of distributions are commonly used as censoring distributions [16,25,26,27,28]. In this study, simulations are conducted in terms of Type I error, power, and coverage probabilities of confidence sets and average size of confidence sets based on uniform and exponential censoring distributions with various censoring rates, sample sizes and change locations.
Case 1: Firstly, the sample sizes are set as n = { 50 , 100 , 150 } , the true change locations are k = { n / 4 , n / 2 , 3 n / 4 } and the significance level is α = 0.05 . For a given change location k, we generate a set of survival time data { T 1 , T 2 , , T k } from L B L N ( 0 , 1 ) , { T k + 1 , T k + 2 , , T n } from L B L N ( μ R , σ R ) , where μ R = { 2 , 1 , 0 , 1 , 2 } , σ R = { 0.5 , 1.0 , 1.5 , 2.0 } . The censoring time are from U ( 0 , m ) , where m determines the censoring rate of survival time observations. And it can be calculated by P ( T i > C i ) = c r , where c r is the censoring proportion given in advance. Let c r = { 10 % , 30 % } , then m = { 10 e 1.5 , 10 / 3 e 1.5 } .
Case 2: The sample size, change location, significance level, and parameters are the same as in Case 1. Different from Case 1, the censoring time observations are from E x p ( m ) , where m is determined by the censoring rate of survival time observations. Two different values of m = { 0.025 , 0.101 } are calculated by P ( T i > C i ) = 10 % and P ( T i > C i ) = 30 % .

3.1. Critical Values and Probability of Type I Error

First of all, we obtain the critical values of the test and probability of Type I error through simulations. The probability of Type I error is approximated by the frequency of Type I errors in N repeated simulation experiments, given by P e r r o r I = S n I / N , where S n I represents the number of Type I errors in N repeated trials with a sample size of n. According to the Central Limit Theorem, S n I / N approximately follows a normal distribution with mean of α and variance of α ( 1 α ) / N , that is,
S n I N N α , α ( 1 α ) N 2 ,
then S n I / N is likely to fall within [ α 3 α ( 1 α ) / N , α + 3 α ( 1 α ) / N ] with a high probability. This means that the values of S n I / N fluctuate within this interval, with the length of the fluctuation interval primarily determined by the number of repeated simulation experiments N. In our simulation, α = 0.05 , N = 1000 . Then, S n I / N is highly likely to fluctuate within [ 0.029 , 0.071 ] . And the specific results are shown in Table 1. From columns 7–10 in Table 1, it can be seen that the frequency of Type I error under both censoring rates almost all fluctuate within the range of [ 0.029 , 0.071 ] based on uniform and exponential censoring, which means the Type I error is effectively controlled.

3.2. Power Comparison

Table 2 and Table 3 show the power of the likelihood ratio test method and the MIC criterion test method in the case of uniform and exponential censoring, respectively.
Taking the test results of uniform censored distribution as an example, we can obtain the following conclusions. First, excluding these combinations with equal powers, the test power calculated based on MIC is mostly higher than that based on the likelihood ratio under the same parameter settings. Second, the powers of the LRT-based method and MIC-based method increase with the increase in sample size. For example, when ( μ R , σ R ) = ( 1.0 , 0.5 ) , c r = 10 % , and k = n / 2 , the power of the LRT-based method increases from 0.514 ( n = 50 ) to 0.910 ( n = 150 ) , the power of the MIC-based method increases from 0.547 ( n = 50 ) to 0.913 ( n = 150 ) . Third, in most cases, the closer the change point occurs to the midpoint of the data series, the higher the power of the test. For example, the power of the MIC-based method is 0.587 at k = 25 , while the power is 0.427 at k = 12 under ( μ R , σ R ) = ( 1 , 1 ) and c r = 10 % . Fourth, when the change point position and parameter settings are same, the power of the test with a censoring rate of 10% is significantly higher than that of a test with a censoring rate of 30% in most cases. For instance, the power of MIC-based method is 0.817 when n = 100 , k = 25 , c r = 10 % , ( μ R , σ R ) = ( 2 , 2 ) . But the power is 0.539 when c r = 30 % . This may be because a larger censoring rate masks the true pattern of change and increases the difficulty of the change point test. Finally, when the sample size, change point position, and censoring rate are the same, the greater the parameter change and the higher the power of the change point test. For example, when the parameter increases from ( 0 , 1 ) to ( 0 , 1.5 ) , the test power based on MIC is ( 0.515 , 0.769 , 0.596 ) when n = 50 , c r = 10 % , and increases to ( 0.976 , 0.999 , 0.978 ) when the parameter increases to ( 1 , 1.5 ) .

3.3. Coverage Probabilities and Average Length of Confidence Sets

To evaluate the uncertainty of change point location estimation, some simulations are conducted to study the coverage probabilities and average lengths of confidence sets for change point location estimation under a uniform censored distribution with censoring rates of 10% and 30%, respectively. The specific results are shown in Table 4 and Table 5.
From Table 4, we can observe that the coverage probability of confidence sets for change point location estimation with a censoring rate of 10% is higher than that corresponding to a censoring rate of 30%, and closer to the given nominal level.
From Table 5, it can be seen that under the same sample size, the same change point location, and the same parameter settings, the average length of confidence sets for change point location estimation corresponding to a censoring rate of 10% is smaller than that corresponding to a censoring rate of 30%. When the change point location is closer to the middle of the dataset, the average length of the confidence set is shorter. From the perspective of testing methods, except for the case of equality, the testing method based on the MIC criterion can provide a shorter confidence set. Under the same testing method and parameter settings, as the sample size increases, the average length of the confidence set becomes shorter. In summary, the smaller the censoring rate, the closer the coverage probability of the confidence set for the change point location estimation is to the nominal level, and the shorter the average length of the confidence sets.

4. Application: Analysis of Survival Data for Heart Transplantation

This section applies the proposed testing method to the Stanford heart transplantation data from February 1980 [29]. The Stanford heart transplantation project ran from October 1967 to February 1980. In total, 184 patients underwent heart transplantation. One patient’s survival time was 0, which is excluded from this study. Therefore, there are 183 samples, of which 71 patients have censored status, resulting in a censored rate of 38.798%, and their censored situation is random. Because only patients who had undergone heart transplantation and were alive at the beginning of data collection are included as observations, this dataset is a length-biased dataset. Next, there is some right censoring in the survival time of patients, meaning that some patients were still alive after the end of the observation period, and their exact survival time could not be determined. For such a randomly right-censored dataset, the proposed method was used to test whether there is a change point in the data.
Before testing, patients were divided into 43 groups according to their age. The median survival time of each group of patients was calculated as shown in Figure 1a. A Q-Q plot was generated to test whether the median survival time of patients follows a length-biased lognormal distribution as shown in Figure 1b. The results indicate that the patient survival time approximately follows a length-biased lognormal distribution. To make this conclusion more convincing, we performed a Kolmogorov–Smirnov test on the dataset. The test statistic was 0.210 and the corresponding p value was 0.306. Since the p value is much greater than 0.05, we accepted the null hypothesis, indicating that there is sufficient evidence to suggest that the dataset follows a length-biased lognormal distribution.
Combining the binary segmentation method, the LRT-based method and the MIC-based method are applied for this data. The values of the test statistics and corresponding p values based on this dataset are calculated and presented in Table 6.
In Table 6, p L R T and p M I C represent the p values for the two testing methods. From the p values, it can be observed that all p values are less than 0.05, which means that the null hypothesis for no change point should be rejected. Therefore, there are three change points in the dataset located at positions { 15 , 30 , 34 } , corresponding to ages 30, 46, and 50 in the data. This means that there are differences in the survival time among patients of different age groups after undergoing heart transplant surgery. The parameter estimation results before and after the change points are ( μ ^ 1 , σ ^ 1 ) = ( 1.807 , 2.308 ) , ( μ ^ 2 , σ ^ 2 ) = ( 5.424 , 0.996 ) , ( μ ^ 3 , σ ^ 3 ) = ( 6.346 , 0.056 ) , and ( μ ^ 4 , σ ^ 4 ) = ( 3.598 , 0.945 ) , respectively.

5. Conclusions and Discussion

Due to the frequent occurrence of random right censoring in life data, this study investigates the parameter change point test problem of length-biased lognormal distribution based on random right-censoring data. For the parameter change point model with random right censoring, a test statistic based on MIC is constructed, and the corresponding testing method is presented. Under the sufficient condition of the Fisher information matrix being positive definite, it is proven that the asymptotic distribution of the MIC-based test statistic is a chi-square distribution with two degrees of freedom under the null hypothesis.
To demonstrate the performance of the testing method, simulations are conducted to investigate the power of the change point test under uniform and exponential distributions. The simulation study considered various combinations of censoring rates and parameter settings. The simulation results indicate that the power of the method based on MIC is generally higher than that of the method based on LRT in most cases. Moreover, as the censored rate decreases, the testing power increases, which means that as the censoring rate decreases, the number of censored data decreases while the amount of complete lifetime data increases. Consequently, the data contain more valuable lifetime information, leading to an increase in the power of hypothesis testing.
For the purpose of assessing the uncertainty in estimation of change point location, the coverage probability and the average lengths of confidence sets are calculated. Simulation results indicate that as the censoring rate decreases, the coverage rate approaches the nominal level in most cases, and the average length of the confidence sets becomes shorter. From the perspective of testing methods, the method based on the MIC criterion yields shorter confidence sets. Under the same testing method and parameter settings, with an increase in sample size, the average length of the confidence sets decreases.
From the simulations, we observe that differences in sample size, change point location, and censoring rate all have an impact on the power of change point testing in random right-censored data. Therefore, factors such as testing methods, censored distribution, and censored rate should be considered comprehensively when investigating the change point problem with censored data.
The unique contributions of the proposed method can be concluded as follows. In our study, the change point problem is viewed as a model selection problem, where we aim to select the better option between the null hypothesis of no change and the alternative hypothesis of at least one change. As a result, methods commonly used for model selection can be applied to change-point-testing problems. However, the change point problem involves a special parameter, namely, the change location. In this paper, we employ the MIC-based method, which takes into account the contribution of change point positions to model complexity in the penalty term. This method is utilized to detect change points in the length-biased lognormal distribution with randomly right-censored data.
The traditional change point detection method is based on the likelihood ratio. In the simulation section, we conduct a comprehensive comparison of the detection performance between the MIC-based method and the LRT-based method. The results indicate that the coverage probability and the average length of the confidence sets for change point estimation are comparable between the two methods. However, under the same parameter settings, the test powers based on the MIC method are generally higher than those based on the likelihood ratio.
There is still much work to be performed in the research of change point detection for the length-biased lognormal distribution in future. (i) In terms of statistical theory, this study primarily focuses on the asymptotic distribution of the test statistic. The estimation of the change point location is discrete, and asymptotic properties, such as the consistency of the estimator, are challenging issues that need to be addressed in future research. (ii) In terms of the method application, the method proposed in this paper can be used for change point detection in data with length-biased distribution characteristics. This will help to establish accurate models, make reasonable predictions of patients’ remaining lifetimes, and analyze the effectiveness of treatment methods or medications for patients.

Author Contributions

Conceptualization, W.N. and Y.T.; Methodology: M.L.; writing—original draft preparation, M.L.; writing—review and editing, W.N. and Y.T.; supervision, Y.T.; funding acquisition, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 12131001.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
LBLNLength-biased lognormal distribution
LRTLikelihood ratio test
AICAkaike information criterion
SICSchwarz information criterion
MICModified information criterion
MLEMaximum likelihood estimation

Appendix A

  • Wald Conditions and Regularity Conditions.
W1.
The distribution of X is either discrete for ( μ , σ ) or is absolutely continuous for ( μ , σ ) .
W2.
For sufficiently small δ and for sufficiently large ρ , the expected values are
E log sup | μ μ | < δ , | σ σ | < δ f ( X ; μ , σ ) 2 < ,
E log sup | μ μ | > ρ , | σ σ | > ρ f ( X ; μ , σ ) 2 < .
W3.
The density function f ( x ; μ , σ ) is continuous in ( μ , σ ) for all x.
W4.
The cumulative distribution function F ( x ; μ , σ ) is identifiable.
W5.
lim | | θ | | f ( x ; θ ) = 0 for all x.
W6.
The parameter space Θ is a closed subset of the two-dimensional Cartesian space.
W7.
f ( x ; μ , σ , δ ) = sup | μ μ | < δ , | σ σ | < δ f ( X ; μ , σ ) is a measurable function of x for any fixed ( μ , σ ) and δ .
R1.
For each θ = ( μ , σ ) Θ , the following derivatives exist for all x,
log f ( x ; μ , σ ) θ , 2 log f ( x ; μ , σ ) θ 2 , 3 log f ( x ; μ , σ ) θ 3 .
R2.
For θ in the neighborhood N ( θ 0 ) , there exist function g ( x ) and H ( x ) such that the following relations hold for all x,
f ( x , θ ) θ g ( x ) , 2 f ( x , θ ) θ 2 g ( x ) ,
2 log f ( x , θ ) θ 2 2 H ( x ) , 3 log f ( x , θ ) θ 3 H ( x ) ,
and
g ( x ) d x < , E θ ( H ( X ) ) < .
R3.
For each θ Θ ,
0 < E θ log f ( X ; θ ) θ 2 < , E θ log f ( X ; θ ) θ 3 <

References

  1. Asgharian, M.; M’Lan, C.E.; Wolfson, D.B. Length-Biased Sampling with Right Censoring. J. Am. Stat. Assoc. 2002, 97, 201–209. [Google Scholar] [CrossRef]
  2. Asgharian, M.; Wolfson, D.B. Asymptotic behavior of the unconditional NPMLE of the length-biased survivor function from right censored prevalent cohort data. Ann. Stat. 2005, 33, 2109–2131. [Google Scholar] [CrossRef]
  3. Qin, J.; Ning, J.; Liu, H.; Shen, Y. Maximum Likelihood Estimations and EM Algorithms with Length-Biased Data. J. Am. Stat. Assoc. 2011, 106, 1434–1449. [Google Scholar] [CrossRef]
  4. Kvam, P. Length Bias in the Measurements of Carbon Nanotubes. Technometrics 2008, 50, 462–467. [Google Scholar] [CrossRef]
  5. Turkson, A.J.; Ayiah-Mensah, F.; Nimoh, V. Handling Censoring and Censored Data in Survival Analysis: A Standalone Systematic Literature Review. Int. J. Math. Math. Sci. 2021, 2021, 9307475. [Google Scholar] [CrossRef]
  6. Klein, J.P.; Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data; Springer: New York, NY, USA, 2003; pp. 63–90. [Google Scholar] [CrossRef]
  7. Sansgiry, P.; Akman, O. Transformations of the lognormal distribution as a selection model. Am. Stat. 2000, 54, 307–309. [Google Scholar] [CrossRef]
  8. Sansgiry, P.; Akman, O. Reliability estimation via length-biased transformation. Commun. Stat.-Theory Methods 2006, 30, 2473–2479. [Google Scholar] [CrossRef]
  9. Ratnaparkhi, M.; Naik-Nimbalkar, U. The length-biased lognormal distribution and its application in the analysis of data from oil field exploration studies. J. Mod. Appl. Stat. Meth. 2012, 11, 255–260. [Google Scholar] [CrossRef]
  10. Page, E.S. Continuous Insepction Schemes. Biometrika 1954, 41, 100–115. [Google Scholar] [CrossRef]
  11. Cai, X.; Said, K.K.; Ning, W. Change-point Analysis with Bathtub Shape for the Exponential Distribution. J. Appl. Stat. 2016, 43, 2740–2750. [Google Scholar] [CrossRef]
  12. Wang, P.; Tang, Y.; Bae, S.J.; He, Y. Bayesian Analysis of Two-Phase Degradation Data Based on Change-Point Wiener Process. Reliab. Eng. Syst. Saf. 2018, 170, 244–256. [Google Scholar] [CrossRef]
  13. Cai, X.; Tian, Y.; Ning, W. Change-point Analysis of the Failure Mechanisms Based on Accelerated Life Tests. Reliab. Eng. Syst. Saf. 2019, 188, 515–522. [Google Scholar] [CrossRef]
  14. Kosorok, M.R.; Song, R. Inference under Right Censoring for Transformation Models with a Change-point Based on a Covariate Threshold. Ann. Stat. 2007, 35, 957–989. [Google Scholar] [CrossRef]
  15. Rabhi, Y.; Asgharian, M. Inference under Biased Sampling and Right Censoring for a Change Point in the Hazard Function. Bernoulli 2017, 23, 2720–2745. [Google Scholar] [CrossRef]
  16. Chen, Y.J.; Ning, W.; Gupta, A.K. Empirical likelihood based detection procedure for change point in mean residual life functions under random censorship. Pharm. Stat. 2016, 15, 246–254. [Google Scholar] [CrossRef]
  17. Vostrikova, L. Detecting Disorder in Multidimensional Random Processes. Sov. Math. Dokl. 1981, 24, 55–59. [Google Scholar]
  18. Akaike, H.; Petrov, B.N.; Czaki, F. Information Theory and an Extension of the Maximum Likelihood Principle. In Second International Symposium on Information Theory; Springer: New York, NY, USA, 1973; pp. 267–281. [Google Scholar]
  19. Schwarz, G.E. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 31–38. [Google Scholar] [CrossRef]
  20. Chen, J.; Gupta, A.K. Testing and Locating Variance Changepoints with Application to Stock Prices. J. Am. Stat. Assoc. 1997, 92, 739–747. [Google Scholar] [CrossRef]
  21. Chen, J.; Gupta, A.K. Change Point Analysis of a Gaussian Model. Stat. Pap. 1999, 40, 323–333. [Google Scholar] [CrossRef]
  22. Chen, J.; Gupta, A.K.; Pan, J. Information Criterion and Change Point Problem for Regular Models. Sankhyā Ind. J. Stat. 2006, 68, 252–282. [Google Scholar]
  23. Singh, K.; Xie, M.; Strawderman, W.E. Confidence Distribution (CD): Distribution Estimator of a Parameter. In Complex Datasets and Inverse Problems: Tomography, Networks and Beyond; Lecture Notes-Monograph Series; IMS: Ann Arbor, MI, USA, 2007; Volume 54, pp. 132–150. [Google Scholar]
  24. Cunen, C.; Hermansen, G.; Hjort, N.L. Confidence Distributions for Change-Points and Regime Shifts. J. Stat. Plan. Infer. 2017, 195, 14–34. [Google Scholar] [CrossRef]
  25. Wan, F. Simulating Survival Data with Predefined Censoring Rates for Proportional Hazards Models. Statist. Med. 2017, 36, 838–854. [Google Scholar] [CrossRef] [PubMed]
  26. Deng, Y.; You, C.; Liu, Y.; Qin, J.; Zhou, X.H. Estimation of Incubation Period and Generation Time Based on Observed Length-Biased Epidemic Cohort with Censoring for COVID-19 Outbreak in Chin. Biometrics 2021, 77, 929–941. [Google Scholar] [CrossRef] [PubMed]
  27. Wan, F. Simulating Survival Data with Predefined Censoring Rates Under a Mixture of Non-Informative Right Censoring Schemes. Commun. Stat. Simul. Comput. 2022, 51, 3851–3867. [Google Scholar] [CrossRef]
  28. Shahin, A.H.; Zhao, A.; Whitehead, A.C.; Alexander, D.C.; Jacob, J.; Barber, D. CenTime: Event-Conditional Modelling of Censoring in Survival Analysis. Med. Image Anal. 2024, 91, 103016. [Google Scholar]
  29. Miller, R.; Halpern, J. Regression with Censored Data. Biometrika 1982, 69, 521–531. [Google Scholar] [CrossRef]
Figure 1. (a) Scatter plot of the median survival time. (b) Q-Q plot for length-biased lognormal distribution.
Figure 1. (a) Scatter plot of the median survival time. (b) Q-Q plot for length-biased lognormal distribution.
Mathematics 12 01760 g001
Table 1. Critical values and the probability of Type I error.
Table 1. Critical values and the probability of Type I error.
cr nCritical ValuesFrequency of Type I Error
Uniform CensoringExponential CensoringUniform CensoringExponential Censoring
LRTMICLRTMICLRTMICLRTMIC
10%5012.83311.84912.72111.4990.0430.0440.0510.053
10012.95211.64413.28212.1950.0650.0610.0620.055
15013.79212.13913.45411.9810.0610.0620.0710.064
20019.83917.79015.30513.3610.0450.0510.0500.048
30019.41317.28316.39914.0360.0490.0530.0470.050
40021.13219.06715.85013.4060.0480.0490.0480.053
30%5014.10313.13115.24914.1270.0570.0560.0450.044
10016.05714.48615.60014.7200.0450.0410.0590.054
15016.12214.59416.69215.3770.0480.0490.0540.052
20019.23217.95114.80013.1030.0540.0510.0530.054
30021.20219.02015.16913.4080.0520.0530.0500.051
40021.87119.65516.12613.5430.0470.0510.0450.053
Table 2. The power of the change point test under uniform censoring ( α = 0.05 ).
Table 2. The power of the change point test under uniform censoring ( α = 0.05 ).
nk cr (−2, 2)(1, 0.5)(2, 0.5)(1, 1)(0, 1.5)(1, 1.5)
LRTMICLRTMICLRTMICLRTMICLRTMICLRTMIC
501210%0.3670.3710.4560.4600.8870.8870.4220.4270.5140.5150.9750.976
25 0.6240.6770.5140.5470.9500.9610.5870.6290.7320.7690.9970.999
38 0.5020.5020.3130.3160.8400.8430.4220.4250.5940.5960.9770.978
1230%0.2160.2220.3140.3220.7380.7440.4210.4210.4200.4220.8770.884
25 0.3650.4100.3280.3470.9000.9150.5250.5550.6220.6700.9760.985
38 0.2840.2930.2340.2390.7590.7640.3920.3950.4620.4660.8920.892
1002510%0.8000.8170.7130.7300.9860.9880.8090.8030.9110.9161.0001.000
50 0.9620.9720.7930.8150.9980.9980.9350.9530.9820.9871.0001.000
75 0.8470.8580.6570.6430.9840.9860.8140.8150.9280.9311.0001.000
2530%0.5190.5390.4730.5000.9810.9800.7530.7660.8230.8341.0001.000
50 0.6870.7240.5390.5400.9940.9990.8870.8970.9250.9451.0001.000
75 0.5750.5550.3820.3910.9860.9860.7380.7560.7040.8010.9990.999
1503810%0.9770.9810.8850.8881.0001.0000.9610.9630.9980.9981.0001.000
75 0.9960.9960.9100.9131.0001.0000.9950.9960.9980.9991.0001.000
112 0.9780.9820.8290.8440.9990.9990.9690.9730.9920.9941.0001.000
3830%0.7560.7780.6090.6450.9990.9990.9320.9350.9490.9411.0001.000
75 0.9110.9250.6500.6661.0001.0000.9850.9920.9950.9981.0001.000
112 0.7720.7850.5400.5651.0001.0000.9310.9350.9420.9431.0001.000
Table 3. The power of change point test under exponential censoring ( α = 0.05 ).
Table 3. The power of change point test under exponential censoring ( α = 0.05 ).
nk cr (−2, 2)(1, 0.5)(2, 0.5)(1, 1)(0, 1.5)(1, 1.5)
LRTMICLRTMICLRTMICLRTMICLRTMICLRTMIC
501210%0.3720.3730.4900.4860.8710.8760.4550.4470.5200.5240.9770.975
25 0.6170.6640.5130.5430.9310.9420.6120.6610.7490.7880.9970.997
38 0.5270.5160.3120.3250.8160.8150.4450.4480.5990.5960.9820.981
1230%0.2420.2530.3190.3070.7420.7390.4040.4080.4220.4150.9030.898
25 0.3770.4200.3490.3610.8840.9060.5630.6160.5890.6310.9720.976
38 0.2920.2850.2250.2290.7550.7660.4170.4170.4570.4550.8780.876
1002510%0.8240.8200.7550.7480.9920.9910.8260.8060.9120.9061.0001.000
50 0.9640.9770.7820.7940.9990.9990.9200.9430.9900.9941.0001.000
75 0.8760.8670.6610.6470.9910.9910.8520.8380.9330.9301.0001.000
2530%0.5030.4880.5430.5350.9780.9790.7680.7610.8150.8020.9990.999
50 0.7330.7540.5470.5480.9980.9980.8910.9130.9160.9391.0001.000
75 0.5740.5670.4180.4010.9790.9800.7800.7580.8010.7940.9990.999
1503810%0.9780.9750.8980.8861.0001.0000.9700.9660.9930.9901.0001.000
75 0.9970.9970.9220.9341.0001.0000.9970.9971.0001.0001.0001.000
112 0.9830.9830.8080.7980.9990.9990.9650.9620.9960.9961.0001.000
3830%0.7890.7770.6020.5841.0001.0000.9260.9200.9530.9521.0001.000
75 0.9080.9230.6630.6631.0001.0000.9920.9940.9960.9961.0001.000
112 0.8170.8030.5670.5420.9980.9980.9330.9260.9520.9451.0001.000
Table 4. The coverage probability of confidence sets for change point location estimation under different parameter combinations.
Table 4. The coverage probability of confidence sets for change point location estimation under different parameter combinations.
n ( μ R , σ R ) 1 α cr = 10 % cr = 30 %
k = 12 k = 25 k = 12 k = 25
LRTMICLRTMICLRTMICLRTMIC
50(1.0, 1.5)0.900.8790.8830.8840.8860.8580.8600.8540.854
0.950.9320.9340.9360.9360.9170.9170.9140.914
0.990.9780.9780.9820.9820.9710.9710.9740.974
(1.0, 2.0)0.900.8820.8830.8930.8930.8390.8390.8400.840
0.950.9330.9330.9330.9330.9020.9020.9110.911
0.990.9800.9800.9790.9790.9730.9730.9840.984
(2.0, 2.0)0.900.8820.8890.8780.8790.8330.8340.7870.789
0.950.9300.9310.9280.9280.9010.9000.8360.834
0.990.9830.9830.9740.9740.9660.9660.8870.885
100 ( μ R , σ R ) 1 α k = 25 k = 50 k = 25 k = 50
(1.0, 1.5)0.900.8800.8800.8700.8700.9100.9100.8700.870
0.950.9600.9600.9300.9300.9500.9500.9300.930
0.990.9900.9900.9800.9800.9900.9900.9600.960
(1.0, 2.0)0.900.8830.8830.8600.8600.9300.9300.9200.910
0.950.9330.9330.9200.9200.9700.9700.9600.940
0.990.9840.9840.9800.9800.9900.9900.9800.980
(2.0, 2.0)0.900.8800.8800.8500.8600.8200.8200.8500.850
0.950.9200.9200.8800.8900.9000.9000.9600.960
0.990.9900.9900.9700.9800.9700.9700.9700.970
Table 5. The average length of the confidence set for estimating the position of change points under different parameter combinations.
Table 5. The average length of the confidence set for estimating the position of change points under different parameter combinations.
n ( μ R , σ R ) 1 α cr = 10 % cr = 30 %
k = 12 k = 25 k = 12 k = 25
LRTMICLRTMICLRTMICLRTMIC
50(1.0, 1.5)0.901.6171.6061.2581.2562.7282.7122.1942.194
0.952.1892.1861.6061.6063.8543.8403.1063.106
0.994.0684.0622.8322.8326.8426.8185.3925.392
(1.0, 2.0)0.901.8921.8861.8421.8403.1963.1733.1023.102
0.952.7352.7322.5812.5814.6774.6524.4134.412
0.994.8844.8814.4364.4378.2188.1987.4377.436
(2.0, 2.0)0.901.3451.3381.2791.2772.1572.1511.9111.899
0.951.7581.7541.6911.6913.0783.0682.6912.682
0.993.2073.2072.9992.9995.5985.5934.7674.755
100 ( μ R , σ R ) 1 α k = 25 k = 50 k = 25 k = 50
(1.0, 1.5)0.901.2301.2301.3101.3101.9901.9902.1702.170
0.951.6801.6801.6901.6902.7702.7703.0703.070
0.993.0103.0102.7902.7905.0105.0105.0805.080
(1.0, 2.0)0.901.7721.7711.8201.8203.3403.3602.2302.160
0.952.4592.4592.4202.4204.6504.6503.4003.290
0.994.1634.1633.9703.9707.7907.7405.5005.320
(2.0, 2.0)0.901.2401.2401.2501.2402.0202.0202.1102.110
0.951.6801.6801.6001.6102.7802.7702.9402.940
0.992.9302.9302.8302.7804.8204.8104.8204.820
Table 6. Change point test results for heart transplant survival time data.
Table 6. Change point test results for heart transplant survival time data.
No. T n p LRT M n p MIC k ^
1–4316.3030.00115.0320.00134
1–348.0930.0138.0450.01215
16–3417.7260.00517.0660.00530
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, M.; Ning, W.; Tian, Y. Change Point Test for Length-Biased Lognormal Distribution under Random Right Censoring. Mathematics 2024, 12, 1760. https://doi.org/10.3390/math12111760

AMA Style

Li M, Ning W, Tian Y. Change Point Test for Length-Biased Lognormal Distribution under Random Right Censoring. Mathematics. 2024; 12(11):1760. https://doi.org/10.3390/math12111760

Chicago/Turabian Style

Li, Mei, Wei Ning, and Yubin Tian. 2024. "Change Point Test for Length-Biased Lognormal Distribution under Random Right Censoring" Mathematics 12, no. 11: 1760. https://doi.org/10.3390/math12111760

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop