Parameter Estimation of the Lomax Lifetime Distribution Based on Middle-Censored Data: Methodology, Applications, and Comparative Analysis

Ren, Peiyao; Gui, Wenhao; Liang, Shan

doi:10.3390/axioms14050330

Open AccessArticle

Parameter Estimation of the Lomax Lifetime Distribution Based on Middle-Censored Data: Methodology, Applications, and Comparative Analysis

by

Peiyao Ren

,

Wenhao Gui

^*

and

Shan Liang

School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(5), 330; https://doi.org/10.3390/axioms14050330 (registering DOI)

Submission received: 14 March 2025 / Revised: 11 April 2025 / Accepted: 22 April 2025 / Published: 26 April 2025

(This article belongs to the Special Issue Computational Statistics and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The Lomax distribution has important applications in survival analysis, reliability engineering, insurance, finance, and other fields. Middle-censoring is an important censoring scheme, and data with middle-censoring will produce censoring in random intervals. This paper studies the parameter estimation of the Lomax distribution based on middle-censored data. The expectation–maximization algorithm is employed to compute the maximum likelihood estimates of the two unknown parameters of the Lomax distribution. After processing the data using the midpoint approach estimation, the parameter estimates are obtained by two computational methods: the Newton–Raphson iteration method and the fixed-point method. Moreover, the calculation methods for the asymptotic confidence intervals of the two parameters are provided, with the confidence interval coverage rate serving as one of the criteria for evaluating the estimation performance. In the Bayesian estimation aspect, the shape parameter is estimated using a Gamma prior distribution, and the Gibbs sampling method is employed for the solution. Finally, both simulation data and real data are used to compare the accuracy of the various estimation methods.

Keywords:

Lomax life distribution; middle-censoring; Bayesian estimation; EM algorithm; MPA estimation; Gibbs sampling

MSC:

62N01; 62F15

1. Introduction

The Lomax distribution, due to its flexibility in describing extreme events, is widely applied in the fields of survival analysis, reliability engineering, insurance, and finance. Middle-censoring is an important mechanism for data censoring and has extensive applications in various fields, including medical research, public health and epidemiology, economics and finance, and more. Although there is a substantial amount of literature on the parameter estimation of the Lomax distribution and middle-censored data separately, their combination has not yet been sufficiently explored. The following is a detailed elaboration on the relevant content.

In lifetime testing, data censoring often occurs, where the exact lifetime data of observed samples may only be partially available. Depending on the different types of censoring that occur in the samples, censoring types can be categorized into right censoring, left censoring, double censoring, and so on. In a study by Leiderman et al. [1], there was a double deletion model of how long it took infants to learn to complete specific tasks in the first year. Groeneboom and Wellner [2] proposed the interval-censorship model, and Schick and Yu [3] proposed the mixed case IC model.

The type of censoring we focus on in this paper is middle-censoring. Middle-censoring is a type of phenomenon in survival analysis, and this type of data was successively proposed by Huang [4] and Jammalamadaka [5] from two different perspectives. It refers to situations in practical research where continuous observation is not possible. If an event occurs in an interval not observed by researchers, only the fact that the subject experienced the event within that interval is known, but not the exact time of occurrence. In the middle-censoring scheme, for n sample units, the lifetimes of the sample units are denoted as

T_{1}, T_{2}, \dots, T_{n}

. For the i-th unit (i = 1, 2, …, n), there exists a random censoring interval

[L_{i}, R_{i}]

, which follows an unknown bivariate distribution and is independent of the lifetime. The exact failure time

T_{i}

of the i-th unit can only be observed if

T_{i} \notin [L_{i}, R_{i}]

. Otherwise, only the censored interval

[L_{i}, R_{i}]

is observed.

Many studies have been conducted on middle-censoring for parametric models. Jammalamadaka and Iyer [6] considered the approximate self-consistency of middle-censored data. Yu, Wong and Li [7] established strong consistency, asymptotic normality, and asymptotic efficiency of the self-consistent estimator and the generalized MLE with middle-censored data. Srikanth K. Iyer [8] studied the maximum likelihood estimation of exponential distributions with middle-censored data and investigated its consistency and asymptotic normality properties. Bennett et al. [9] provided a proof that the EM algorithm converges in such cases. Wang et al. [10] studied the bivariate Weibull distribution and considered the statistical analysis of related competing risks models, obtaining the maximum likelihood estimation (MLE), midpoint approximation (MPA) estimation, and approximate confidence intervals (ACI) for the unknown parameters. Jammalamadaka et al. [11] analyzed discrete lifetime data that follow a geometric distribution that is subject to middle-censoring.

The Lomax distribution, often referred to as the second type of Pareto distribution, was first proposed by K.S. Lomax [12] in 1954. It plays an important role in the analysis of lifetime data in medicine, biological sciences, and engineering. In recent years, many articles have studied the extensions of this distribution and its related statistical properties. For example, Lun [13] conducted research and pointed out the connection between the Lomax lifetime distribution and the exponential distribution under specific conditions. Aljohani et al. [14] investigated the statistical inference of the stress–strength reliability

R = P (X > Y)

of the Lomax distribution under accelerated life testing, using both classical and Bayesian methods for point estimation of R. Alsuhabi et al. [15] extended the Lomax lifetime distribution by proposing a new continuous lifetime model with four parameters, estimating the model parameters using classical methods (e.g., maximum likelihood estimation (MLE) and maximum product of spacings) and non-classical methods (e.g., Bayesian analysis).

Many researchers have studied the estimation methods for the parameters of the Lomax distribution under censoring. Cramer et al. [16] discussed the competing risks model of the Lomax distribution under progressive Type-II censoring, establishing the maximum likelihood estimation of the distribution parameters. Alfaer et al. [17] studied the Lomax lifetime distribution under a balanced joint progressive Type-I censoring scheme, providing point estimates and asymptotic confidence intervals for the model parameters using maximum likelihood estimation. Bayesian estimates and related credible intervals were obtained using MCMC techniques under independent Gamma priors. El-Sherpieny [18] studied the Power Lomax distribution under progressive Type-II hybrid censoring. The authors discussed parameter estimation for this distribution using maximum likelihood estimation (MLE) and the maximum product spacing method. Hasaballah et al. [19] conducted research on parameter estimation for the Lomax lifetime distribution under censored data, employing both Bayesian and non-Bayesian methods to estimate the parameters.

There is an extensive body of research on the Lomax distribution and middle-censoring, and the existing research results are shown in Figure 1.

However, there has been no research to date that establishes a connection between these two concepts. Therefore, this paper aims to address this issue by studying parameter estimation of the Lomax distribution based on middle-censored data and by comparing and analyzing different estimation methods.

In Section 2 of our paper, we introduce the Lomax distribution as a continuous probability distribution and conduct modeling for middle-censored data. We provide a detailed expression for the maximum likelihood estimation under middle-censoring for the Lomax distribution and discuss its properties, which are crucial for the subsequent estimation methods.

In Section 3, we delve into frequentist methods for parameter estimation. We propose the EM algorithm to address the maximum likelihood estimation (MLE) problem for the two unknown parameters of the Lomax distribution. We also provide two alternative solutions for midpoint approach estimation (MPA). Additionally, we calculate the approximate confidence intervals for these parameters, which are necessary for statistical inference.

In Section 4, we explore Bayesian estimation methods, which offer a different perspective on parameter estimation compared to frequentist methods. We use a Gamma distribution as the prior for the shape parameter of the Lomax distribution and apply Bayesian techniques to estimate the parameters. This approach allows us to incorporate prior knowledge and provides a complete posterior distribution for the parameters.

In Section 5, we present the results of simulation experiments and experiments with real datasets. These experiments are designed to assess the accuracy of the various parameter estimation methods developed in the previous sections and to demonstrate their practical utility in real-life scenarios.

Finally, we summarize the main findings of this paper and discuss the significance of this research. In addition, we propose directions for future research and potential solutions.

2. Model Description

2.1. Lomax Distribution

The exponential distribution, commonly encountered in daily life, has a thin tail and is suitable for describing memoryless random events. In contrast, the Lomax distribution is a heavy-tailed distribution, making it more appropriate for modeling data with heavy-tailed characteristics and for describing extreme events. Additionally, compared with the Weibull distribution, whose shape parameter can control the thickness of the tail, the Lomax distribution may be more suitable for describing certain specific types of heavy-tailed data. In practical applications, the Lomax distribution is widely used in areas such as survival analysis (e.g., the lifetime of power transformers), wealth distribution, and queuing service times. Suppose the random variable T follows Lomax distribution, then its probability density function (PDF) and cumulative distribution function (CDF) are given by the Functions (1) and (2).

f (t) = \frac{θ}{λ} {(1 + \frac{t}{λ})}^{- (θ + 1)}, t > 0,

(1)

F (t) = 1 - {(1 + \frac{t}{λ})}^{- θ}, t > 0 .

(2)

where $λ$ is called the scale parameter, $θ$ is called the shape parameter, $λ$ > 0, $θ$ > 0.

The probability density functions of the Lomax distribution with different parameters are shown in Figure 2.

The Lomax distribution can be regarded as a heavy-tailed version of the exponential distribution. While the tail of the exponential distribution decays exponentially, the tail of the Lomax distribution decays according to a power law, specifically

{(x + λ)}^{- (θ + 1)}

. When

θ

is small, the Lomax distribution has a heavier tail, making it more suitable for describing extreme events. Compared to the exponential distribution, the shape parameter

θ

and the scale parameter

λ

of the Lomax distribution provide sufficient flexibility to adapt to different types of data distributions. Therefore, the Lomax distribution has broader applications than the exponential distribution in some cases.

2.2. The Middle-Censoring Model of Lomax

The type of censoring we focus on in this paper is middle-censoring. If a data point falls within a random interval and is not observable, middle-censoring occurs. The middle-censoring scheme can be described as follows.

From the Figure 3, we can see the schematic diagram of the middle-censored data type. For some data, we can only observe the interval form and cannot observe the exact values.

For middle-censored data, there are two modeling approaches. One, proposed by Huang [4], specifically describes that observers selectively observe at certain times. When the lifetime data ends at the observation point, the data is observed; when the end occurs between two observation times, the data can only be recorded as an interval with the two observation times as its endpoints. The other, proposed by Jammalamadaka [5], specifically describes that during the continuous observation of data, there will be intervals that are not observed. When the lifetime data ends within this interval, the data will not be observable. The two think from different angles but the final effect is the same, both are descriptions of middle-censored data. This paper adopts the second modeling approach for middle-censored data.

Assume that n items are tested, and the lifetimes of these items are

T_{1}

, …,

T_{n}

. For item i, there exists a random censored interval

[L_{i}, R_{i}]

, which follows some unknown bivariate distribution.

T_{i}

can be observed only when

T_{i} \notin [L_{i}, R_{i}]

, otherwise it cannot be observed. Suppose

δ_{i} = I (T_{i} \notin [L_{i}, R_{i}])

, where I (·) denotes the indicator function. Therefore, when

δ_{i} = 1

, the observed results are not censored, and we observe the actual value

t_{i} = T_{i}

. In this case, we did not observe

[L_{i}, R_{i}]

. On the other hand, when

δ_{i} = 0

, we only observe the truncated interval

[L_{i}, R_{i}]

. For item i, we observe the following Formula (3):

\{\begin{matrix} (t_{i}, 1) i f T_{i} \notin [L_{i}, R_{i}] \\ ([L_{i}, R_{i}], 0) o t h e r w i s e . \end{matrix}

(3)

In this section, the distribution of the two ends of the censored interval is unknown. In the subsequent parameter estimation test, the exponential distribution with different mean values is used to generate the two ends of the censored data to manually censor the data.

3. Frequentist Estimation

In this section, we use the traditional frequency method and the approximation method to represent the point and interval estimation of the unknown parameters.

3.1. Maximum Likelihood Estimation

Assuming that the number of actual observed data size is $n_{1}$ , the number of middle-censored sample is $n_{2}$ , and the total number of sample is n. Let

t_{i}

be the observed data point, and

L_{i}

and

R_{i}

be the endpoints of the observed middle-censoring interval.

\begin{matrix} \{(t_{1}, 1), (t_{2}, 1), \dots, (t_{n_{1}}, 1), \\ ([L_{n_{1} + 1}, R_{n_{1} + 1}], 0), ([L_{n_{1} + 2}, R_{n_{1} + 2}], 0), \dots, ([L_{n_{1} + n_{2}}, R_{n_{1} + n_{2}}], 0)\} . \end{matrix}

(4)

The likelihood function of the shape parameter

θ

and the scale parameter

λ

based on middle-censored data of the Lomax life distribution [20] is

L (θ, λ) = \prod_{i = 1}^{n_{1}} f (t_{i}) \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} [F (R_{i}) - F (L_{i})] .

(5)

Substituting into Functions (1) and (2), we can obtain Function (6).

L (θ, λ) = \frac{θ^{n_{1}}}{λ^{n_{1}}} \prod_{i = 1}^{n_{1}} {(1 + \frac{t_{i}}{λ})}^{- (θ + 1)} \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} [{(1 + \frac{t_{i}}{λ})}^{- θ} - {(1 + \frac{t_{i}}{λ})}^{- θ}] .

(6)

Taking the logarithm of Function (6), we obtain the log-likelihood function.

\begin{matrix} ln L (θ, λ) = & n_{1} ln θ - n_{1} ln λ - (θ + 1) \sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) \\ + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} ln [{(1 + \frac{L_{i}}{λ})}^{- θ} - {(1 + \frac{R_{i}}{λ})}^{- θ}] . \end{matrix}

(7)

By differentiating Function (7) with respect to the two parameters, we obtain Functions (8) and (9).

\begin{matrix} \frac{\partial ln L (θ, λ)}{\partial θ} = & \frac{n_{1}}{θ} - \sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) \\ + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{{(λ + L_{i})}^{θ} ln (1 + \frac{R_{i}}{λ}) - {(λ + R_{i})}^{θ} ln (1 + \frac{L_{i}}{λ})}{{(λ + R_{i})}^{θ} - {(λ + L_{i})}^{θ}} . \end{matrix}

(8)

\begin{matrix} \frac{\partial ln L (θ, λ)}{\partial λ} = & - \frac{n_{1}}{λ} + (θ + 1) \sum_{i = 1}^{n_{1}} \frac{t_{i}}{λ^{2} + λ t_{i}} + \\ \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{\frac{θ}{λ} [R_{i} {(λ + L_{i})}^{θ + 1} - L_{i} {(λ + R_{i})}^{θ + 1}]}{{(λ + R_{i})}^{θ + 1} (λ + L_{i}) - {(λ + L_{i})}^{θ + 1} (λ + R_{i})} . \end{matrix}

(9)

Equation (10) is obtained by setting the two functions to zero and joining them.

\{\begin{matrix} \frac{n_{1}}{θ} - \sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{{(λ + L_{i})}^{θ} ln (1 + \frac{R_{i}}{λ}) - {(λ + R_{i})}^{θ} ln (1 + \frac{L_{i}}{λ})}{{(λ + R_{i})}^{θ} - {(λ + L_{i})}^{θ}} = 0 \\ - \frac{n_{1}}{λ} + (1 + θ) \sum_{i = 1}^{n_{1}} \frac{t_{i}}{λ^{2} + λ t_{i}} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{\frac{θ}{λ} [R_{i} {(λ + L_{i})}^{θ + 1} - L_{i} {(λ + R_{i})}^{θ + 1}]}{{(λ + R_{i})}^{θ + 1} (λ + L_{i}) - {(λ + L_{i})}^{θ + 1} (λ + R_{i})} = 0 . \end{matrix}

(10)

The strong consistency, asymptotic normality, and asymptotic efficiency of the maximum likelihood estimation under middle-censoring have been proven [7]. When the data are complete, we can directly solve for the shape parameter

θ

and the scale parameter

λ

using this expression, and the existence and uniqueness of the solution can be proven [21]. MLE with middle-censored data also possesses similar properties, but due to the presence of middle-censoring data, it is not feasible to directly derive the maximum likelihood estimates of the parameters. To solve this problem, we will use the EM algorithm to process the missing data.

3.2. Expectation–Maximization Algorithm

The expectation–maximization (EM) algorithm [22] is an iterative optimization algorithm commonly used for models with latent variables or missing data. In the parameter estimation of middle-censored data, the EM algorithm has significant advantages, as it can effectively handle missing data issues and gradually improve the accuracy of parameter estimation through iterative optimization. In most cases, the algorithm converges to a local optimal solution [23]. For many practical problems, the local optimal solution is sufficiently good. Its simplicity, flexibility, and numerical stability make it the preferred method for many practical problems. In particular, for exponential family distributions (such as the Lomax distribution) [22], the E-step and M-step calculations usually have closed-form solutions.

The core idea of the EM algorithm is to gradually improve the estimation of parameters by constructing an expectation step (E-step) and a maximization step (M-step). At the same time, let

t_{i}^{*}

(

i = n_{1} + 1, n_{1} + 2, \dots, n

) be the failure time in the interval

[L_{i}, R_{i}]

at iteration step k, then the complete data can be obtained.

\{(t_{1}, 1), (t_{2}, 1), \dots, (t_{n_{1}}, 1), (t_{n_{1} + 1}^{*}, 0), \dots, (t_{n_{1} + n_{2}}^{*}, 0)\} .

(11)

By completing the complete data, the likelihood function can be obtained as Function (12).

L_{c} (θ, λ) = \prod_{i = 1}^{n_{1}} f (t_{i}) \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} f (t_{i}^{*}) .

(12)

The log-likelihood function is

ln L_{c} (θ, λ) = n ln θ - n ln λ - (θ + 1) \sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) - (θ + 1) \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} ln (1 + \frac{t_{i}^{*}}{λ}) .

(13)

The expected values of the shape parameter and scale parameter are expressed as Equation (14) and Equation (15), respectively, where

f (t; θ, λ)

and

F (t; θ, λ)

represent the probability density function and cumulative distribution function of the Lomax distribution under specific parameters.

E_{θ, λ} [ln (1 + \frac{T_{i}}{λ}) |T_{i} \in (L_{i}, R_{i})] = \frac{\int_{L_{i}}^{R_{i}} ln (1 + \frac{t}{λ}) f (t; θ, λ) d t}{F (R_{i}; θ, λ) - F (L_{i}; θ, λ)},

(14)

E_{θ, λ} [\frac{T_{i}}{λ^{2} + λ T_{i}} |T_{i} \in (L_{i}, R_{i})] = \frac{\int_{L_{i}}^{R_{i}} \frac{t}{λ^{2} + λ T_{i}} f (t; θ, λ) d t}{F (R_{i}; θ, λ) - F (L_{i}; θ, λ)} .

(15)

The EM algorithm is frequently used to handle parameter estimation problems under middle-censored data [24], and the corresponding algorithm steps are as follows (Algorithm 1):

Algorithm 1: EM Algorithm for parameter estimation.

1:: Initialize the parameter values $λ^{(0)}$ and $θ^{(0)}$ using an exponential distribution, and set the tolerance threshold $ε = 0.05$ .
2:: E-step: During the k-th iteration, calculate the expected values.

$\begin{matrix} E_{1 i}^{(k)} = E_{θ^{(k - 1)}, λ^{(k - 1)}} [ln (1 + \frac{T_{i}}{λ}) ∣ T_{i} \in (L_{i}, R_{i})] \\ E_{2 i}^{(k)} = E_{θ^{(k - 1)}, λ^{(k - 1)}} [ln (1 + \frac{T_{i}}{λ}) ∣ T_{i} = t_{i}] \\ E_{3 i}^{(k)} = E_{θ^{(k - 1)}, λ^{(k - 1)}} [\frac{T_{i}}{λ^{2} + λ T_{i}} ∣ T_{i} \in (L_{i}, R_{i})] \\ E_{4 i}^{(k)} = E_{θ^{(k - 1)}, λ^{(k - 1)}} [\frac{T_{i}}{λ^{2} + λ T_{i}} ∣ T_{i} = t_{i}] \end{matrix}$
3:: M-step: Update the parameter estimates by maximizing the likelihood function $L_{c} (θ, λ)$ based on the complete data.

$\begin{matrix} θ^{(k)} = \frac{n}{\sum_{i = 1}^{n_{1}} E_{1 i}^{(k)} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} E_{2 i}^{(k)}} \\ λ^{(k)} = \frac{n}{(θ^{(k)} + 1) (\sum_{i = 1}^{n_{1}} E_{3 i}^{(k)} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} E_{4 i}^{(k)})} \end{matrix}$
4:: Convergence Check: If the following conditions are met, stop the iteration; otherwise, return to Step 2.

$\begin{matrix} | θ^{(k)} - θ^{(k - 1)} | < ε \\ | λ^{(k)} - λ^{(k - 1)} | < ε \end{matrix}$

The value of

ε

in the algorithm needs to be adjusted according to the actual situation. When higher precision for estimation is required, a smaller

ε

can be chosen. When higher computational efficiency is desired, a smaller

ε

can also be chosen.

By iterating through the above EM algorithm, the estimated values of

λ

and

θ

can be obtained.

3.3. Midpoint Approach Estimation

Besides the EM estimation, the midpoint approach estimation (MPA) is also a method used for handling censored data [25], which is especially common in survival analysis and reliability engineering. This method replaces the censored data with the midpoint of the censored interval, thereby performing parameter estimation.

For censored data, we use the average of the two endpoints of the censored interval to replace the censored data.

\hat{t_{i}} = \frac{L_{i} + R_{i}}{2} .

(16)

\{(t_{1}, 1), (t_{2}, 1), \dots, (t_{n_{1}}, 1), (\hat{t_{n_{1} + 1}}, 0), \dots, (\hat{t_{n_{1} + n_{2}}}, 0)\} .

(17)

After obtaining the complete data using the midpoint estimation method, the estimates of the shape and scale parameters can be obtained by solving the equations using the Newton–Raphson iteration method and the fixed-point method. For the complete data, the likelihood function is given by Function (18).

L_{M} (θ, λ) = \prod_{i = 1}^{n_{1}} f (t_{i}) \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} f (\hat{t_{i}}) = \prod_{i = 1}^{n_{1}} \frac{θ λ^{θ}}{{(t_{i} + λ)}^{θ + 1}} \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{θ λ^{θ}}{{(\hat{t_{i}} + λ)}^{θ + 1}} .

(18)

3.3.1. The Newton–Raphson Iteration Method for Solving Under the MPA Estimation

Since the censored interval data have been imputed by the midpoint values of the intervals, the statistical software can directly compute the solutions for the two parameters. Therefore, the Newton–Raphson iteration method can be employed to obtain the estimates of the scale parameter and the shape parameter.

The negative log-likelihood function is given by Function (19).

\begin{matrix} n e g_l o g_l i k e l i h o o d & = - ln L_{M} (θ, λ) \\ = - (\sum_{i = 1}^{n_{1}} ln (\frac{θ λ^{θ}}{{(t_{i} + λ)}^{θ + 1}}) + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} ln (\frac{θ λ^{θ}}{{(\hat{t_{i}} + λ)}^{θ + 1}})) . \end{matrix}

(19)

Using R program (version 4.3.2) (All the codes in the text are run with this device. The device models are as follows: LAPTOP-90NAADHP, equipped with an Intel(R) Core(TM) i7-10510U CPU running at 1.80 GHz with a boost speed of 2.30 GHz. It has 16.0 GB of RAM, with 15.8 GB available. The device ID is B0B39209-7A3F-4006-81CE-AB49F27A02E1, and the product ID is 00342-35692-61862-AAOEM. The system is a 64-bit operating system based on an x64 processor.) for calculation: Select appropriate initial values

λ_{0}

and

θ_{0}

, we can utilize the general optimization function optim to find the minimum of the negative log-likelihood function, thereby obtaining the estimates of the shape and scale parameters.

3.3.2. The Fixed-Point Method for Solving Under the MPA Estimation

After taking the logarithm of the likelihood Function (18) for the complete data and differentiating with respect to the two parameters, the resulting system of equations, where the derivatives are set to zero, is given as Equation (20).

\{\begin{matrix} \frac{n}{θ} - (\sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} ln (1 + \frac{t_{i}}{λ})) = 0 \\ - \frac{n}{λ} + (1 + θ) (\sum_{i = 1}^{n_{1}} \frac{t_{i}}{λ^{2} + λ t_{i}} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{\hat{t_{i}}}{λ^{2} + λ \hat{t_{i}}}) = 0 . \end{matrix}

(20)

The expression for the shape parameter

θ

in terms of the scale parameter

λ

is obtained through transformation by Equation (20).

θ (λ) = \frac{n}{\sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} ln (1 + \frac{\hat{t_{i}}}{λ})} .

(21)

Substituting Function (21) into Equation (20) results in

\frac{n}{λ} - (1 + θ (λ)) [\sum_{i = 1}^{n_{1}} \frac{t_{i}}{λ^{2} + λ t_{i}} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} \frac{\hat{t_{i}}}{λ^{2} + λ \hat{t_{i}}}] = 0 .

(22)

To solve Equation (22), we use the fixed-point method. First, we define the function

g (λ)

as

g (λ) = \frac{1}{n} (1 + \frac{n}{\sum_{i = 1}^{n_{1}} Q_{i} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} {\hat{Q}}_{i}}) (\sum_{i = 1}^{n_{1}} P_{i} + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} {\hat{P}}_{i}) .

(23)

The symbols in this equation are defined as

Q_{i} = ln (1 + \frac{t_{i}}{λ}), {\hat{Q}}_{i} = ln (1 + \frac{\hat{t_{i}}}{λ}), P_{i} = \frac{t_{i}}{1 + \frac{t_{i}}{λ}}, \hat{P_{i}} = \frac{\hat{t_{i}}}{1 + \frac{\hat{t_{i}}}{λ}} .

(24)

Select an appropriate initial value

λ_{0}

, and use the fixed-point iteration method in the R program (version 4.3.2) to solve the equation iteratively. This will yield a proper solution

λ

. Substituting

λ

into Function (21) allows us to obtain the estimate of

θ

.

3.4. The Estimation of Asymptotic Confidence Interval

Censored data, such as the introduction of missing data in the middle, may lead to biased parameter estimates. The censoring mechanism can affect the sample distribution, making it impossible to directly use standard asymptotic theory because the sample distribution is no longer a simple known distribution. Meanwhile, with censored data, increasing the sample size does not necessarily result in more reliable estimates, especially when the degree of censoring is high, as the sample may fail to reflect the true characteristics of the population.

The asymptotic confidence interval (ACI) is a statistical method used for estimating the confidence interval of a parameter. Unlike the classical confidence interval, the ACI is derived based on the properties when the sample size approaches infinity.

For the parameters

θ

and

λ

, it is impossible to obtain an exact confidence interval. Therefore, we use the asymptotic properties of maximum likelihood estimators to construct the ACI [26].

The observed Fisher information matrix for parameters

θ

and

λ

is represented as

I (λ, θ) = [\begin{matrix} I_{11} & I_{12} \\ I_{21} & I_{22} \end{matrix}],

(25)

I_{11} = - \frac{\partial^{2} ln L (θ, λ)}{\partial λ^{2}}, I_{22} = - \frac{\partial^{2} ln L (θ, λ)}{\partial θ^{2}},

(26)

I_{12} = I_{21} = - \frac{\partial^{2} ln L (θ, λ)}{\partial λ \partial θ} .

(27)

The logarithm of the likelihood is

ln L (θ, λ)

. We calculate

I_{11}

,

I_{12}

,

I_{21}

, and

I_{22}

, and substitute the estimated values of

\hat{λ}

and

\hat{θ}

into the calculation formula for the Fisher matrix. The inverse of the matrix yields Function (28).

W (λ, θ) = {[\begin{matrix} I_{11} & I_{12} \\ I_{21} & I_{22} \end{matrix}]}^{- 1} |_{λ = \hat{λ}, θ = \hat{θ}} = [\begin{matrix} \hat{W_{11}} & \hat{W_{12}} \\ \hat{W_{21}} & \hat{W_{22}} \end{matrix}] .

(28)

Through the asymptotic distribution properties of the likelihood function, we know that

(λ - \hat{λ}) / \sqrt{\hat{W_{11}}}, (θ - \hat{θ}) \sqrt{\hat{W_{22}}}

both follow a standard normal distribution. Under a confidence level of

γ

, the asymptotic confidence intervals for the parameters

θ

and

λ

can be obtained.

[\hat{λ} - Z_{\frac{γ}{2}} \sqrt{\hat{W_{11}}}, \hat{λ} + Z_{\frac{γ}{2}} \sqrt{\hat{W_{11}}}],

(29)

[\hat{θ} - Z_{\frac{γ}{2}} \sqrt{\hat{W_{22}}}, \hat{θ} + Z_{\frac{γ}{2}} \sqrt{\hat{W_{22}}}] .

(30)

4. Bayes Estimation

4.1. The Posterior Distribution of the Parameters of the Lomax Distribution Under Middle-Censoring

When the scale parameter

λ

is known, we choose the Gamma distribution with parameters a and b as the prior distribution for the shape parameter

θ

[27], and denote it as

π (θ)

.

π (θ) = \frac{b^{a}}{Γ (a)} θ^{a - 1} e^{- b θ} .

(31)

Since the Gamma distribution is the conjugate prior for the shape parameter of the Lomax distribution, it has desirable properties when selected as the prior distribution. The proof that the Gamma distribution is the conjugate prior for the shape parameter of the Lomax distribution has been included in the Appendix A to ensure the rigor of the paper.

Suppose the experimental data with middle-censoring are denoted as data; then, the likelihood function of data can be represented as

p (data ∣ θ)

, and the joint distribution of data and

θ

is

h (data, θ)

. By integrating

h (data, θ)

over the parameter space $Θ$ , we can obtain the marginal density of data, denoted as

m (data)

.

\begin{matrix} data = & \{(t_{1}, 1), (t_{2}, 1), \dots, (t_{n_{1}}, 1), \\ ([L_{n_{1} + 1}, R_{n_{1} + 1}], 0), ([L_{n_{1} + 2}, R_{n_{1} + 2}], 0), \dots, ([L_{n_{1} + n_{2}}, R_{n_{1} + n_{2}}], 0)\} . \end{matrix}

(32)

p (d a t a | θ) = L (θ, λ) = \frac{θ^{n_{1}}}{λ^{n_{1}}} \prod_{i = 1}^{n_{1}} {(1 + \frac{t_{i}}{λ})}^{- (θ + 1)} \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} [{(1 + \frac{t_{i}}{λ})}^{- θ} - {(1 + \frac{t_{i}}{λ})}^{- θ}] .

(33)

\begin{matrix} h (d a t a, θ) & = p (d a t a | θ) π (θ) \\ = \frac{θ^{n_{1}}}{λ^{n_{1}}} \prod_{i = 1}^{n_{1}} {(1 + \frac{t_{i}}{λ})}^{- (θ + 1)} \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} [{(1 + \frac{t_{i}}{λ})}^{- θ} - {(1 + \frac{t_{i}}{λ})}^{- θ}] \frac{b^{a}}{Γ (a)} θ^{a - 1} e^{- b θ} . \end{matrix}

(34)

m (d a t a) = \int_{Θ} h (d a t a, θ) d θ = \int_{Θ} p (d a t a | θ) π (θ) d θ .

(35)

From Functions (34) and (35), we can derive the posterior distribution of the parameter

θ

given data.

\begin{matrix} π (θ | d a t a) & = \frac{h (d a t a, θ)}{m (d a t a)} \\ \propto θ^{n} \prod_{i = 1}^{n_{1}} {(1 + \frac{t_{i}}{λ})}^{- θ} θ^{a - 1} e^{- b θ} \prod_{i = n_{1} + 1}^{n_{1} + n_{2}} [{(1 + \frac{t_{i}}{λ})}^{- θ} - {(1 + \frac{t_{i}}{λ})}^{- θ}] . \end{matrix}

(36)

Using the absolute error loss function, we can obtain the Bayesian estimate of the shape parameter

θ

.

MSE = \frac{1}{M} \sum_{i = 1}^{M} {(θ - {\hat{θ}}_{i})}^{2} .

(37)

4.2. The Bayesian Estimation of the Shape Parameter Based on Gibbs Sampling When the Scale Parameter Is Known

Directly solving the posterior distribution by integration poses some difficulties. Therefore, we resort to Gibbs sampling for the relevant computations of the posterior distribution. Gibbs sampling is a sampling technique within the Markov Chain Monte Carlo methods and is widely used in Bayesian inference. It generates approximate samples from a target distribution by iteratively sampling from conditional distributions [28].

To utilize Gibbs sampling, we first need to establish the condition that, when the data are complete and the scale parameter is known, the conditional posterior density of the shape parameter follows a Gamma distribution. Let the complete data be denoted as

{data}^{*}

.

Theorem 1.

When the data are complete and the scale parameters are known, the posterior condition density of the shape parameters is Gamma distribution.

Proof.

Through the likelihood function of the Lomax distribution, we can obtain the conditional posterior density of the shape parameter when the data are complete. □

π (θ | d a t a^{*}) \propto θ^{n} \prod_{i = 1}^{n} {(1 + \frac{t_{i}}{λ})}^{- θ} θ^{a - 1} e^{- b θ} .

(38)

Thus, when the scale parameter is known, the conditional posterior distribution of the shape parameter is

G a m m a (n + a, b + \sum_{i = 1}^{n} ln (1 + \frac{t_{i}}{λ})) .

(39)

Therefore, the conclusion is proved. The steps to obtain the shape parameter using Gibbs sampling when the scale parameter is known are shown in Algorithm 2.

Algorithm 2: Gibbs sampling method for estimating shape parameter

θ

.

1:: Determine the initial value of the shape parameter $θ_{0}$ . Define the number of Gibbs sampling iterations as M and the number of initial samples to be discarded as Q.
2:: In the k-th iteration, the missing data in the interval are replaced by conditional expectations, and the replaced values are denoted as $t_{i}^{k #}$ , for $i = n_{1} + 1, n_{1} + 2, \dots, n_{1} + n_{2}$ .

$\begin{matrix} t_{i}^{k #} = E (T | T \in (L_{i}, R_{i}); λ, θ_{k - 1}) = \int_{L_{i}}^{R_{i}} \frac{T f (T; λ, θ_{k - 1}) d T}{F (R_{i}; λ, θ_{k - 1}) - F (L_{i}; λ, θ_{k - 1})} \end{matrix}$
3:: Substitute the imputed data into the distribution, and then randomly draw a sample from the distribution to obtain $θ_{k}$ .

$\begin{matrix} G a m m a (n + a, b + \sum_{i = 1}^{n_{1}} ln (1 + \frac{t_{i}}{λ}) + \sum_{i = n_{1} + 1}^{n_{1} + n_{2}} ln (1 + \frac{t_{i}^{k #}}{λ})) . \end{matrix}$
4:: Repeat Steps 2 and 3 M times to obtain M posterior conditional distribution-related sample data $θ_{k}$ . The estimate of the shape parameter $θ$ based on the sum of squared loss function is

$\begin{matrix} {\hat{θ}}_{bayes} = \frac{\sum_{k = Q + 1}^{M - Q} θ_{k}}{M - Q} . \end{matrix}$

By following the above Gibbs sampling steps, the Bayesian estimate of the shape parameter can be obtained through programming when the scale parameter is known.

5. Simulation Experiments and Real Data Analysis

5.1. Analysis of Simulation Experiments

5.1.1. Analysis of the Solution Methods for MPA Estimation

Using the R program (version 4.3.2), we generate data with three different sets of parameters, where

λ

is the scale parameter and

θ

is the shape parameter. For each data point, censoring intervals are randomly generated using an exponential distribution with different parameters to determine whether the data point is censored. We then perform parameter estimation using the aforementioned methods on the middle-censored data.

The estimated parameters are compared with the true parameters used for data generation. The accuracy of the parameter estimates is evaluated using bias and mean absolute error (MAE). The relevant formulas are as follows:

\begin{matrix} L_{M A E} = \frac{1}{M} \sum_{i = 1}^{M} |λ - \hat{λ_{i}}|, L_{B i a s} = \frac{1}{M} \sum_{i = 1}^{M} (λ - \hat{λ_{i}}), \\ T_{M A E} = \frac{1}{M} \sum_{i = 1}^{M} |θ - \hat{θ_{i}}|, T_{B i a s} = \frac{1}{M} \sum_{i = 1}^{M} (θ - \hat{θ_{i}}) . \end{matrix}

(40)

We also calculate the coverage probability of the asymptotic confidence intervals for the parameter estimates to assess the reliability of these intervals in containing the true parameters. The coverage probability is defined as the proportion of intervals that include the true parameter value.

I_{A C I} (λ) = \{\begin{matrix} 1, λ \in [\hat{λ} - Z_{\frac{γ}{2}} \sqrt{\hat{W_{11}}}, \hat{λ} + Z_{\frac{γ}{2}} \sqrt{\hat{W_{11}}}] \\ 0, λ \notin [\hat{λ} - Z_{\frac{γ}{2}} \sqrt{\hat{W_{11}}}, \hat{λ} + Z_{\frac{γ}{2}} \sqrt{\hat{W_{11}}}] \end{matrix},

(41)

I_{A C I} (θ) = \{\begin{matrix} 1, θ \in [\hat{θ} - Z_{\frac{γ}{2}} \sqrt{\hat{W_{22}}}, \hat{θ} + Z_{\frac{γ}{2}} \sqrt{\hat{W_{22}}}] \\ 0, θ \notin [\hat{θ} - Z_{\frac{γ}{2}} \sqrt{\hat{W_{22}}}, \hat{θ} + Z_{\frac{γ}{2}} \sqrt{\hat{W_{22}}}] \end{matrix} .

(42)

A C_{θ} = E (I_{A C I} (θ)), A C_{λ} = E (I_{A C I} (λ)) .

(43)

In subsequent simulations and experiments with real data, the significance level

γ

of the confidence interval was set to 0.05.

We test the selection of initial values for the simulation experiments and find that the choice of initial values does not affect the convergence of these three estimation methods, which still provide good estimation results. For the fixed-point estimation method, the iterative initial values of the scale parameters are generated using an exponential distribution with an average of 1. For Newton–Raphson algorithm, the iterative initial values of shape parameters are generated using an exponential distribution with a mean of 1, and the iterative initial values of scale parameters are generated using an exponential distribution with a mean of 1.

During the estimation process, the nature of randomly generated middle-censored data may be poor, leading to a compromise in the convergence of the algorithm, resulting in a significant discrepancy between the simulated estimated parameters and the actual set parameters. This can affect the assessment of the accuracy of the estimation method. Therefore, for the parameter estimation results in the looped simulation process, the following outlier removal scheme is adopted: If a data point exceeds the range of

P_{1} - 1.5 \times I Q R

or

P_{2} + 1.5 \times I Q R

, it is considered an outlier and is removed before assessing the accuracy of the estimation results.

P_{1}

represents the first quartile of the estimated parameter results from all iterations,

P_{2}

represents the third quartile of the estimated parameter results from all iterations, and

IQR = P_{2} - P_{1}

. In the simulation results table, we use OR to denote Outlier Ratio, which means the proportion of simulation experiments where the estimation results are outliers out of the total number of experiments during the iterative simulation process.

By comparing the Newton–Raphson iterative method and the fixed-point estimation method, we obtain the accuracy and efficiency of the two methods under different parameter combinations. The results are shown in the following Table 1 and Table 2.

The figure below presents the line charts of the Newton–Raphson iteration method and the fixed-point estimation method in MPA estimation. With the legend given in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 present line graphs drawn based on Table 1 and Table 2. Through these line charts, we can more intuitively make horizontal comparisons of the linear loss, mean absolute error, and confidence interval coverage rate of the parameter estimation results between the two methods, as well as vertical comparisons of the changes in estimation accuracy under different sample sizes. This intuitive comparison allows us to comprehensively evaluate the strengths and weaknesses of the two methods.

Considering that parameter estimation in real scenarios requires not only the accuracy of computational results but also computational efficiency, we introduce a new metric: computation time per cycle. The computation time per cycle is obtained by dividing the total calculation time by the total number of simulation cycles H. This metric is used to compare the computational efficiency of different methods, and the corresponding formula is Formula (44).

Computation time per cycle = \frac{Total cycle time}{H} .

(44)

The computational efficiency of the two estimation methods under different parameter combinations and data scales is shown in Table 3. Here, FP represents the computation time per cycle of the fixed-point estimation method, and NR represents the computation time per cycle of the Newton–Raphson iteration method.

In this simulation data estimation, after generating the data, we performed censoring on it. The endpoints of the censoring intervals follow exponential distributions with parameters 0.5 and 1, respectively. Censoring was applied to each data point; if the data fell within the censoring interval, it was replaced by the interval; otherwise, no censoring occurred. The number of estimation repetitions, H, was set to 1000. The initial shape and scale parameters for the first iteration followed exponential distributions with parameters 0.5 and 1, respectively.

Comparing the results of the MPA parameter estimation using the fixed-point estimation method and the Newton–Raphson iteration method, we can draw the following conclusions:

Computational Efficiency: The Newton–Raphson iteration method optimizes the two parameters by iterating the negative log-likelihood function, yielding the estimation results. The fixed-point estimation method, by incorporating the relationship between the two parameters of the Lomax distribution, simplifies the optimization process. In terms of computational efficiency, the fixed-point method is relatively more efficient.
Point Estimation Results:
(1)
As the sample size increases, the linear and absolute errors of both estimation methods decrease correspondingly (applicable to both the shape and scale parameters). When the sample size reaches a certain level, the trend of error reduction slows down. Therefore, it can be seen that in practical applications, a moderate number of sample points can yield satisfactory estimation results. Sampling too many points increases the workload while providing limited improvement in the accuracy of the estimation results.
(2)
Compared with the results of the three sets of parameters, the Newton–Raphson iteration method of the MPA estimation methods overall outperforms the fixed-point solution, especially in the estimation of the shape parameter $θ$ , where its advantage is more pronounced.
(3)
The estimation of scale parameter $λ$ is not as good as that of shape parameter $θ$ .
Interval Estimation Results:
(1)
The coverage rate of the confidence intervals for the two parameters in the Newton–Raphson iteration method increases with the increase of the initial sample size. The coverage rate of the confidence interval for the shape parameter is particularly satisfactory. However, for the fixed-point estimation method, there is a situation where the confidence interval coverage rate decreases as the amount of data increases, which is speculated to be due to the interval estimation itself not being suitable for application to the fixed-point estimation method.
(2)
The Newton–Raphson iteration method of the MPA estimation methods generally yields better coverage rates for the confidence intervals compared to the fixed-point estimation method.

In addition, we also notice that the proportion of outliers decreases when the data volume is large. This indicates that increasing the data volume can enhance the convergence of the estimation results in the simulation experiments and reduce the occurrence of outliers.

Considering the comparison results of the two MPA estimation methods mentioned above, the method with better overall performance, namely the Newton–Raphson solution, is selected for further comparison with the EM algorithm in the following text.

5.1.2. Comparison of the EM Algorithm and MPA Estimation

In this simulation data estimation, the same data generation and censoring schemes are employed, and the same outlier removal method is applied to the results, thus obtaining the relevant results in Table 4 and Table 5.

In the EM algorithm experiment above, we use the exponential distribution with the mean of 2 to generate the iterative initial values of the shape parameters, and the exponential distribution with the mean of 1 to generate the iterative initial values of the scale parameters.

Similarly, for the convenience of intuitive comparison, with Figure 11 serves as the legend of Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17, the relevant line charts are drawn as Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17 present line graphs drawn based on tables.

From the tables, we can draw the following conclusions:

Computational Efficiency: As can be seen from Table 6, the Newton–Raphson solution for MPA estimation runs faster than the EM algorithm. This is because MPA does not require iteration over the samples, whereas the EM algorithm needs to perform sample iteration and integration operations.
Point Estimation Results:
(1)
Both the maximum likelihood estimates obtained by the EM algorithm and the MPA estimates obtained by the Newton–Raphson method show significant bias when the sample size is small, but achieve higher accuracy when the sample size is large.
(2)
In terms of estimating the shape parameter, the two estimation methods have similar estimation errors overall. However, when estimating the scale parameter, the EM algorithm provides more accurate results with smaller errors.
Interval Estimation Results:
(1)
In terms of confidence interval coverage, the confidence interval coverage of shape parameter $θ$ by both algorithms is inferior to the confidence interval coverage of shape parameter $θ$ estimates. It is speculated that the nature of the scale parameter causes this phenomenon.
(2)
As the amount of data increases, the confidence interval coverage of the estimated results of both methods increases.

Similarly, we can see from the results that the proportion of outliers decreases when the data volume is large.

5.1.3. Bayesian Estimation Results for the Shape Parameter

Before using Gibbs sampling for estimation, the convergence of the sampling results should first be verified. We selected the parameter combination

(θ, λ) = (0.7, 0.7)

and plotted the trace plots of the sampling results when the sample sizes were 1000 and 1500, respectively.

From the Figure 18 and Figure 19, we can observe that the sampling results have become stable, with relatively fixed ranges of fluctuation. Most of the values (black line) fluctuate around the initially set parameter results, which are represented by the red line. Moreover, the range of fluctuation becomes narrower when the sample size is larger, indicating that the parameters have converged. After conducting tests with other parameter combinations, we obtained the same results. Therefore, we can demonstrate that the Bayesian estimation of the Lomax shape parameter under middle-censoring using Gibbs sampling is both scientific and reliable.

After numerically verifying the convergence of the Gibbs sampling estimates, we also experiment with the choice of hyperparameters for the prior distribution. The results show that the impact of different hyperparameter choices on the estimation error is very subtle. The inferred reason is that during the sampling process, the hyperparameters are iterated continuously, and the results gradually converge around the correct estimates. The values influenced by the initial hyperparameters have been discarded and thus do not significantly affect the results. In the subsequent simulation experiments, we estimated the Bayesian estimation accuracy and confidence interval coverage of the shape parameter

θ

under different numbers of Gibbs sampling iterations.

In the simulation experiments, the number of periods N is set to 1000. From the trace plots of the sampling samples, we can see that the samples converge quickly. Therefore, we set the number of initial samples to be discarded Q to 100 to ensure the accuracy of the estimation results and the efficiency of the computational process. To compare the impact of the number of samples on the estimation results, we conducted experiments with 200, 500, and 1000 samples, respectively. The hyperparameters in the prior distribution are chosen as

a = 2.5

and

b = 5

.

From Table 7, it can be concluded that the Bayesian estimation of the shape parameter, under the condition of a known scale parameter, achieves good fitting results. As the sample size increases, the accuracy significantly improves. When the number of initial samples discarded is 100, taking 500 sampling counts versus 1000 sampling counts does not significantly affect the accuracy of the results. Therefore, using an appropriate number of samples can yield good estimation results while saving computational power.

5.2. Real Data Analysis

When testing the estimation method for the following two sets of real data sets, we still used the same method of setting the initial iteration values in the simulation experiment and the same hyperparameter selection.

5.2.1. COVID-19 Datasets

According to the daily COVID-19 death counts divided by the number of new cases reported in [15], we obtain a set of real data samples that conform to the Lomax lifetime distribution.

Using the truncated interval of the exponential distribution with endpoints following parameters 10 and 30, we perform middle-censoring for each data point and obtain the data shown in Table 8.

Estimation results for complete data:

θ = 8.9602

,

λ = 1.1554

. The estimated results of the truncated data are shown in Table 9 and Table 10, which are in line with theoretical expectations.

5.2.2. Global Extreme Weather and Climate Disaster Event Loss Datasets

The Global Extreme Weather and Climate Disaster Loss Dataset for 2000–2021 [29] covers data related to extreme weather and climate disaster events worldwide during the period from 2000 to 2021. We selected the number of fatalities caused by these disasters from 2017 to 2021 as the subject of our analysis. After detailed data processing and statistical analysis, we found that the data essentially conform to the Lomax distribution.

Using the truncated interval of the exponential distribution with endpoints following parameters 0.002 and 0.001, we perform middle-censoring for each data point and obtain the data shown in Table 11.

Estimation results for complete data:

θ = 2.9346

,

λ = 653.5948

. The estimated results of the truncated data are shown in Table 12 and Table 13. In this practical application, the advantages of the EM algorithm are better demonstrated.

6. Summary and Future Prospects for Parameter Estimation of Lomax Distribution Under Middle-Censoring Data

In the field of Lomax parameter estimation with middle-censored data, this paper makes the following contributions:

In this paper, the details of EM algorithm for estimating two unknown parameters of Lomax distribution in middle-censored data are presented, and the results are verified by simulation experiments.
We propose two computational methods for parameter estimation after processing data using the Midpoint Approximation Method (MPA): the Newton–Raphson iteration method and the fixed-point method.
We calculate the ACIs for the two parameters in the middle-censored data, and take the covering probability of the confidence interval as one of the criteria to evaluate the estimation performance. The effectiveness of each estimation method is tested, and the applicability of the asymptotic confidence interval under intermediate censored data is also evaluated.
In the Bayesian estimation framework, we specifically estimate the shape parameter under middle-censoring using the Gamma prior distribution and the Gibbs sampling method, achieving satisfactory estimation results.
Through experiments with both simulated and real data, we validate the effectiveness and applicability of the proposed methods.

Based on the methods and achievements presented in this paper, the following future prospects are proposed, along with potential solutions and a ranking according to their importance:

(1): This paper only experiments with data where the middle-censoring endpoints follow an exponential distribution, without discussing the impact of different distributions of the censoring interval endpoints on estimation, such as the Weibull distribution, uniform distribution, etc. Most of the current research on middle-censoring focuses on endpoints that follow an exponential distribution. Not only for the Lomax distribution, but also for other distributions, the study of different distributions of the endpoints of middle-censoring intervals holds research value.
(2): In Bayesian estimation, we only discuss the estimation of the shape parameter $θ$ when the scale parameter $λ$ is known. However, in practice, it is rare to encounter situations where only one parameter needs to be estimated. Therefore, the Bayesian estimation of both parameters is an important issue that needs to be addressed in the future. Additionally, our Bayesian estimation only considers the Gamma distribution as the prior distribution, without discussing other types of prior distributions, such as non-informative priors or Jeffreys priors.
(3): In simulation experiments, the estimation of confidence intervals is affected by the randomness of generating and censoring data, leading to situations where confidence intervals cannot be calculated. Therefore, the calculation of confidence interval coverage is based on the premise of generating middle-censored data for which progressive confidence intervals can be calculated. Another problem is that the asymptotic confidence interval coverage of scale parameters in simulation experiments is not high, which may be related to the nature of the parameters and the insufficient amount of data required for asymptotic confidence interval calculation. To address these interval estimation-related issues, Monte Carlo simulation can be considered to estimate confidence intervals in the future, or modified asymptotic confidence intervals can be adopted.
(4): Regarding the existence and uniqueness of the maximum likelihood estimation solution for the parameters of the Lomax distribution under middle-censored data, this paper does not provide theoretical proof due to the complexity of the formulas. Future research needs to continue exploring the parameter estimation problems for related distributions under middle-censored data to further improve research in this area.
(5): The EM algorithm has achieved good results in parameter estimation for the Lomax distribution under middle-censored data. However, during its operation, the issue of long computational time has arisen. In future research, the EM algorithm can be optimized through methods such as optimizing initial parameters, changing convergence criteria, and parallel computing. This will increase its computational speed while ensuring its accuracy.

Author Contributions

Conceptualization: P.R. and W.G.; Methodology: P.R. and W.G.; Software: P.R. and S.L.; Investigation: P.R. and S.L.; Writing—Original Draft: P.R.; Writing—Review and Editing: W.G.; Supervision: W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Project 202510004172 which was supported by National Training Program of Innovation and Entrepreneurship for Undergraduates. Wenhao’s work was partially supported by the Science and Technology Research and Development Project of China State Railway Group Company, Ltd. (No. N2023Z020).

Data Availability Statement

The data presented in this study are openly available in [15,29].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Theorem A1.

The Gamma distribution is a conjugate prior distribution of the shape parameters of the Lomax life distribution.

Proof.

Suppose the sample data is denoted as data, consisting of n sample data points, which are

t_{1}, t_{2}, \dots, t_{n}

, respectively.

The prior distribution of shape parameters is the Gamma distribution of parameters a and b, which is shown as follows:

π (θ) = \frac{b^{a}}{Γ (a)} θ^{a - 1} e^{- b θ} .

(A1)

Suppose the experimental data as data; then, the likelihood function of data can be represented as

p (data ∣ θ)

, and the joint distribution of data and

θ

is

h (data, θ)

. By integrating

h (data, θ)

over the parameter space $Θ$ , we can obtain the marginal density of data, denoted as

m (data)

.

p (d a t a | θ) = \frac{θ^{n}}{λ^{n}} \prod_{i = 1}^{n} {(1 + \frac{t_{i}}{λ})}^{- (θ + 1)}

(A2)

\begin{matrix} h (d a t a, θ) = p (d a t a | θ) π (θ) = \frac{θ^{n}}{λ^{n}} \prod_{i = 1}^{n} {(1 + \frac{t_{i}}{λ})}^{- (θ + 1)} \frac{b^{a}}{Γ (a)} θ^{a - 1} e^{- b θ} . \end{matrix}

(A3)

The posterior distribution of the shape parameters obtained by computational simplification is as follows:

π (θ | d a t a) \propto θ^{n} \prod_{i = 1}^{n} {(1 + \frac{t_{i}}{λ})}^{- θ} θ^{a - 1} e^{- b θ} .

(A4)

Thus, the conditional posterior distribution of the shape parameter is

G a m m a (n + a, b + \sum_{i = 1}^{n} ln (1 + \frac{t_{i}}{λ})) .

(A5)

To sum up, we prove that the posterior distribution obtained when the Gamma distribution is selected as the prior distribution of the shape parameter is also the Gamma distribution. Therefore, the Gamma distribution is a conjugate prior distribution of the Lomax shape parameters.

□

References

Leiderman, P.H.; Babu, B.; Kagia, J.; Kraemer, H.C.; Leiderman, G.F. African Infant Precocity and Some Social Influences during the First Year. Nature 1973, 242, 247–249. [Google Scholar] [CrossRef] [PubMed]
Groeneboom, P.; Wellner, J.A. Information Bounds and Nonparametric Maximum Likelihood Estimation; Birkhäuser: Basel, Switzerland, 1992; Volume 19. [Google Scholar]
Schick, A.; Yu, Q. Consistency of the GMLE with Mixed Case Interval-Censored Data. Scand. J. Stat. 2000, 27, 45–55. [Google Scholar] [CrossRef]
Huang, J. Asymptotic properties of nonparametric estimation based on partly interval-censored data. Stat. Sin. 1999, 9, 501–520. [Google Scholar]
Jammalamadaka, S.R.; Mangalam, V. Nonparametric estimation for middle-censored data. J. Nonparametr. Stat. 2003, 15, 253–265. [Google Scholar] [CrossRef]
Jammalamadaka, S.R.; Iyer, S.K. Approximate self-consistency for middle-censored data. J. Stat. Plan. Inference 2004, 124, 75–86. [Google Scholar] [CrossRef]
Yu, Q.Q.; Wong, G.Y.C.; Li, L.X. Asymptotic properties of self-consistent estimators with mixed interval-censored data. Ann. Inst. Stat. Math. 2001, 53, 469–486. [Google Scholar] [CrossRef]
Iyer, S.K.; Jammalamadaka, S.R.; Kundu, D. Analysis of middle-censored data with exponential lifetime distributions. J. Stat. Plan. Inference 2008, 138, 3550–3560. [Google Scholar] [CrossRef]
Bennett, N.A. Some Contributions to Middle-Censoring. Ph.D. Thesis, University of California, Santa Barbara, CA, USA, 2011; p. 3473719. Available online: https://www.proquest.com/dissertations-theses/some-contributions-middle-censoring/docview/896352567/se-2 (accessed on 1 June 2011).
Yan, W.; Yimin, S.; Min, W. Statistical inference for dependence competing risks model under middle censoring. J. Syst. Eng. Electron. 2019, 30, 209–222. [Google Scholar] [CrossRef]
Jammalamadaka, S.R.; Leong, E. Analysis of discrete lifetime data under middle-censoring and in the presence of covariates. J. Appl. Stat. 2015, 42, 905–913. [Google Scholar] [CrossRef]
Lomax, K.S. Business failure: Another example of the analysis of the failure data. J. Am. Stat. Assoc. 1954, 49, 847–852. [Google Scholar] [CrossRef]
Lun, Z.; Khattree, R. An overlooked Lomax-exponential connection. Commun. Stat.–Theory Methods 2022, 51, 26–28. [Google Scholar] [CrossRef]
Aljohani, H.M. Estimation for the P(X > Y) of Lomax distribution under accelerated life tests. Heliyon 2024, 10, e25802. [Google Scholar] [CrossRef]
Alsuhabi, H.; Alkhairy, I.; Almetwally, E.M.; Almongy, H.M.; Gemeay, A.M.; Hafez, E.H.; Aldallal, R.A.; Sabry, M. A superior extension for the Lomax distribution with application to COVID-19 infections real data. Alex. Eng. J. 2022, 61, 11077–11090. [Google Scholar] [CrossRef]
Cramer, E.; Schmiedt, A.B. Progressively Type-II censored competing risks data from Lomax distributions. Comput. Stat. Data Anal. 2011, 55, 1285–1303. [Google Scholar] [CrossRef]
Alfaer, N.M.; Aljohani, H.M. Balanced joint progressively hybrid Type-I censoring samples in estimating the lifetime Lomax distributions. Complexity 2021, 2021, 9929691. [Google Scholar] [CrossRef]
El-Sherpieny, E.S.A.; Almetwally, E.M.; Muhammed, H.Z. Progressive Type-II hybrid censored schemes based on maximum product spacing with application to Power Lomax distribution. Phys. A Stat. Mech. Its Appl. 2020, 553, 124251. [Google Scholar] [CrossRef]
Hasaballah, M.M.; Balogun, O.S.; Bakr, M.E. Non-Bayesian and Bayesian estimation for Lomax distribution under randomly censored with application. AIP Adv. 2024, 14, 025318. [Google Scholar] [CrossRef]
Davarzani, N.; Parsian, A. Statistical inference for discrete middle-censored data. J. Stat. Plan. Inference 2011, 141, 1455–1462. [Google Scholar] [CrossRef]
Giles, D.E.A.; Feng, H.; Godwin, R.T. On the Bias of the Maximum Likelihood Estimator for the Two-Parameter Lomax Distribution. Commun.-Stat.–Theory Methods 2013, 42, 1934–1950. [Google Scholar] [CrossRef]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
Wu, C.F.J. On the convergence properties of the EM algorithm. Ann. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
Asl, M.N.; Belaghi, R.A.; Bevrani, H. Classical and Bayesian inferential approaches using Lomax model under progressively Type-I hybrid censoring. J. Comput. Appl. Math. 2018, 343, 397–412. [Google Scholar] [CrossRef]
Turkson, A.; Ayiah-Mensah, F.; Nimoh, V. Handling censoring and censored data in survival analysis: A standalone systematic literature review. Int. J. Math. Math. Sci. 2021, 2021, 9307475. [Google Scholar] [CrossRef]
Buzaridah, M.M.; Ramadan, D.A.; El-Desouky, B.S. Estimation of some lifetime parameters of flexible reduced logarithmic-inverse Lomax distribution under progressive Type-II censored data. J. Math. 2022, 2022, 1690458. [Google Scholar] [CrossRef]
Okasha, H.; Lio, Y.; Albassam, M. On reliability estimation of Lomax distribution under adaptive Type-I progressive hybrid censoring scheme. Mathematics 2021, 9, 2903. [Google Scholar] [CrossRef]
Kundu, D. Bayesian inference and life testing plan for the Weibull distribution in presence of progressive censoring. Technometrics 2008, 50, 144–154. [Google Scholar] [CrossRef]
Yang, S.N. Global Extreme Weather and Climate Disaster Loss Dataset (2000–2021). 2023. Available online: https://data.casearth.cn/dataset/65388536819aec0f26f4c93a (accessed on 27 May 2023).

Figure 1. Research about the Lomax lifetime distribution and middle-censoring.

Figure 2. Density curves of Lomax distribution with different parameters.

Figure 3. Example of middle-censored data.

Figure 4. The legend of the estimated result line chart in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10.

Figure 5. Estimation results of the shape parameter

θ

under the parameter

(0.7, 0.7)