An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model

Nie, Cong; Liu, Xiaoming; Provost, Serge B.

doi:10.3390/risks12020037

Open AccessFeature PaperArticle

An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model

by

Cong Nie

,

Xiaoming Liu

and

Serge B. Provost

^*

Department of Statistical and Actuarial Sciences, The University of Western Ontario, London, ON N6A 5B7, Canada

^*

Author to whom correspondence should be addressed.

Risks 2024, 12(2), 37; https://doi.org/10.3390/risks12020037

Submission received: 19 December 2023 / Revised: 22 January 2024 / Accepted: 5 February 2024 / Published: 13 February 2024

Download

Browse Figures

Versions Notes

Abstract

:

The phase-type aging model (PTAM) is a class of Coxian-type Markovian models that can provide a quantitative description of the effects of various aging characteristics. Owing to the unique structure of the PTAM, parametric inference on the model is affected by a significant estimability issue, its profile likelihood functions being flat. While existing methods for assessing distributional non-estimability require the subjective specification of thresholds, this paper objectively quantifies estimability in the context of general statistical models. More specifically, this is achieved via a carefully designed cumulative distribution function sensitivity measure, under which the threshold is tailored to the empirical cumulative distribution function, thus becoming an experiment-based quantity. The proposed definition, which is validated to be innately sound, is then employed to determine and enhance the estimability of the PTAM.

Keywords:

parameter estimation; phase-type aging model; identifiability; estimability

1. Introduction

The phase-type aging model (PTAM) belongs to a class of Coxian Markovian models that were introduced in Cheng et al. (2021). The purpose of the PTAM is to provide a quantitative description of the effects of well-known aging characteristics resulting from a genetically determined, progressive, and irreversible process. It provides a means of quantifying the heterogeneity in aging among individuals and identifying the effects of anti-selection (Cheng 2021; Cheng et al. 2021).

The PTAM has a unique structure, including a constant transition rate for the aging process and a functional form for the relationship between aging and death (Cheng et al. 2021). This structure gives rise to flat profile likelihood functions, putting the reliability of parameter estimates into question even if MLEs can be obtained (Cheng 2021). This problem, which is referred to as the non-estimability issue, was studied in Raue et al. (2009). In this context, many estimated values of m, the number of states, produced nearly the same profile likelihood values, resulting in similar model fitting. Actually, for certain statistical models such as the PTAM, the quality of model fitting is not the only consideration since the model parameters convey biological meaning. If the parameter estimates do not make sense in this context, the research problem cannot be satisfactorily addressed, even if the model fitting quality is found to be adequate. Accordingly, we investigated the problem of flat likelihood functions in the literature, focusing on the field of estimability, also referred to as practical identifiability.

Primarily, estimability must be assessed after identifiability, which can be defined as follows according to Lehmann and Casella (1998):

Definition 1.

Let

M = {f (x; θ) : θ \in Θ}

be a statistical model with either finite- or infinite-dimensional parameter space Θ. We say that

M

is identifiable if

\begin{matrix} (f (x; θ_{1}) = f (x; θ_{2})) \Rightarrow (θ_{1} = θ_{2}), \forall θ_{1}, θ_{2} \in Θ . \end{matrix}

If a model is assessed to be non-identifiable, then it proves unnecessary to consider its estimability (Miao et al. 2011). The non-identifiability issue relates to properties of the model structure and can only be removed analytically by adding constraints to the model parameters or proposing new model representations (Hengl et al. 2007; Raue et al. 2009). On the other hand, the non-estimability issue relates to experimental protocols involving data quality (such as being insufficient or too noisy), algorithm approximation, or other noisy measurements. Thus, one must eliminate the non-identifiability issue before assessing estimability, in which case the remaining unreliability of the parameter estimates can be ascribed to experimental protocols (Miao et al. 2011). It has been established that the PTAM is identifiable when the number of states is greater or equal to six (Nie 2022).

We now highlight this paper’s main objectives and contributions with respect to the concept of estimability.

1.1. Defining the Concept of Estimability via an Objective Threshold

In fact, estimability has remained an open problem overall since existing methods for assessing it require the subjective specification of thresholds, which are left to the experimenters to specify. To the best of our knowledge, Gontier and Pfister (2020) seems to be the only paper that investigated and tackled the issue of threshold specification. They proposed a new definition of estimability that is based on a model selection perspective, where the subjective threshold is replaced with Bayes factors. The definition advocated in this paper solves this issue as it is based on a parallel but distinct perspective. Rather than eliminating the threshold, we make it an objective quantity that is related to the experiment itself, thus making it objective.

The proposed definition of estimability bears a resemblance to the profile likelihood function method introduced in Raue et al. (2009), where the non-estimability issue is defined as occurring when the profile-likelihood-based confidence region is infinite up to a subjective threshold. In order to make the threshold experiment-based and, therefore, objective, one needs to develop a methodology that somehow relates the confidence region to the experimental protocol. The proposed definition achieves this by replacing the profile-likelihood-based confidence region with an innovative confidence region that is based on a carefully designed c.d.f. sensitivity measure, whose purpose is to relate the confidence region to the experimental error by quantifying them into single numbers. This enables one to indirectly compare the confidence region and the experimental error. In this case, the threshold is tailored to the experimental protocol and then becomes objective. Since the measure is solely experiment-based, it quantifies the degree of estimability objectively so that the experimenter knows quantitatively the extent to which the experimental design should be improved in order to make the model estimable. The key to our approach is based on the empirical c.d.f. or ECDF; this is also the case for the Anderson–Darling and Kolmogorov–Smirnov tests, which compare a hypothesized distribution with the ECDF. The ECDF constitutes an objective measure as it represents the distribution of the experimental data.

1.2. Extending the Applicability of the Concept of Estimability to Statistical Inference

The concept of estimability originated in the field of biology, where an ODE model was utilized to model dynamic biological systems. It is also interchangeably referred to as “practical identifiability” in the literature. However, very little research on estimability has been conducted in connection with statistical models. Gontier and Pfister (2020) applied the concept of estimability to a statistical model. However, the scope of that paper still remained in system biology as a binomial model was introduced in connection with synaptic transmissions. We address the applicability of the concept of estimability in statistical inference from an objective perspective. Although this is principally illustrated via an application involving the PTAM, the proposed approach is actually applicable in other statistical inference contexts, which are illustrated by means of two additional numerical examples, one involving a discrete distribution and another, a continuous one. This constitutes the second main contribution of this paper.

1.3. Structure of the Paper

This paper is organized as follows. Preliminaries on the PTAM are introduced in Section 2. Section 3 provides a literature review on estimability. Section 4 introduces the proposed definition of estimability, which is validated in Section 5, where it is established to be innately sound. This definition is implemented to assess the estimability of the PTAM via a simulation study in Section 6. Section 7 contains certain remarks and a conclusion.

2. The Phase-Type Aging Model

The phase-type aging model (PTAM) stems from the mortality model introduced by Lin and Liu (2007). Phase-type mortality models enable one to link their parameters to biological and physiological mechanisms of aging, which constitutes a definite advantage. Thus, for instance, the longevity risk facing annuity products can be measured more accurately. Experimental results show that the phase-type mortality model with a four-state developmental period and a subsequent aging period achieved very satisfactory fitting results with respect to the Swedish and USA cohort mortality data set (Lin and Liu 2007). Later on, Su and Sherris (2012) applied the phase-type mortality model to an Australian cohort mortality data set.

In a further study of the PTAM, Cheng et al. (2021) introduced a parsimonious yet flexible representation that allowed for the modeling of various aging patterns. In addition, an efficient algorithm for evaluating the likelihood of the PTAM was developed in Cheng (2023b). The main objective of the PTAM is to describe the human aging process in terms of the evolution of the distribution of physiological ages while utilizing mortality rates as aging-related variables. Thus, although the PTAM can reproduce mortality patterns, it ought not to be treated as a mortality model. In this context, the PTAM is most applicable at human ages beyond the attainment of adulthood, where, relatively speaking, the aging process is the most significant factor that contributes to the variability in lifetimes (Cheng et al. 2021).

2.1. Preliminaries

Definition 2.

Let

{X (t)}_{t \geq 0}

be a continuous time Markov chain (CTMC) defined on a finite state space

S = E \cup Δ = {1, 2, \dots, m} \cup Δ

, where

Δ = {m + 1}

is the absorbing state and

E

is the set of transient states. Let

{X (t)}_{t \geq 0}

have as initial distribution over the transient states

π^{'} = (π_{1}, π_{2}, \dots, π_{m})

such that

π^{'} e = 1

, and let the transition intensity matrix be

\begin{matrix} Λ = [\begin{matrix} S & h \\ 0^{'} & 0 \end{matrix}], \end{matrix}

where

S

is an

m \times m

matrix,

0^{'}

is the transpose of a null vector, and

h = - S e

, with

e

denoting a vector of ones. Define

T : = i n f {t \geq 0 | X (t) = m + 1}

as the time until absorption. Then, T is said to follow a continuous phase-type (CPH) distribution, denoted by

C P H (π, S)

of order m, and

h

is defined as the exit vector.

There is a long history of using phase-type “absorbing time” distributions as survival models; see Aalen (1995), Asmussen et al. (1996), Lin and Liu (2007), and Su and Sherris (2012), among others.

Result 1.

Given

T \sim C P H (π, S)

of order m,

The p.d.f. of T is $f_{T} (t) = π^{'} e^{S t} h$ .
The c.d.f. of T is $F_{T} (t) = 1 - π^{'} e^{S t} e$ .

It is well-known that, given a CPH distribution of order m, if

π^{'} = (1, 0, \dots, 0)

and

S

has the structure

\begin{matrix} S = [\begin{matrix} - (λ_{1} + h_{1}) & λ_{1} & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ_{2} + h_{2}) & λ_{2} & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ_{m - 1} + h_{m - 1}) & λ_{m - 1} \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], \end{matrix}

(1)

where

λ_{i} > 0, h_{j} > 0

,

i = 1, 2, \dots, m - 1

, and

j = 1, 2, \dots, m

, then the resulting distribution constitutes a Coxian distribution with no probability mass at zero Cox (1955a, 1955b). A phase diagram such as that displayed in Figure 1 illustrates the process.

Definition 3.

Given that

T > 0

, the PTAM of order m is defined as a Coxian distribution of order m with transition intensity matrix

S

and exit rate vector

h

such that

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ + h_{2}) & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ + h_{m - 1}) & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{2} \\ ⋮ \\ h_{m - 1} \\ h_{m} \end{matrix}], \end{matrix}

where

λ > 0

,

h_{m} > h_{1} > 0

, and

\begin{matrix} h_{i} = \{\begin{matrix} {(\frac{m - i}{m - 1} h_{1}^{s} + \frac{i - 1}{m - 1} h_{m}^{s})}^{\frac{1}{s}}, & s \neq 0, \\ h_{1}^{\frac{m - i}{m - 1}} h_{m}^{\frac{i - 1}{m - 1}}, & s = 0, \end{matrix} \end{matrix}

(2)

where

i = 1, 2, \dots, m

. This is denoted by

P T A M (h_{1}, h_{m}, s, λ, m)

.

As can be seen from Figure 2, the PTAM has a phase diagram similar to that of the Coxian distribution shown in Figure 1 except that the transition rates are constant and the exit rates are functionally related, as specified in (2).

Note the following:

(i): In Figure 2, each state in the Markov process represents the physiological age—a variable that reflects an individual’s health condition or frailty level. As the aging process proceeds, the frailty level increases until the last state, where the individual’s health conditions have deteriorated to the point of causing death.
(ii): The transition rate $λ$ is assumed to be constant. The exiting rates denoted by $h_{i}$ are the dying rates or force of mortality. With this setup, an individual is randomly located in a certain state at a given calendar age. This mathematically describes the fact that the individuals involved will have different physiological ages given the same calendar age.
(iii): The assumption that dying rates have the structure given in (2) is somewhat reminiscent of the well-known Box–Cox transformation introduced by Box and Cox (1964). The first and last dying rates $h_{1}$ and $h_{m}$ are included in the model parameters, whereas the remaining in-between rates are interpolated in terms of the parameter s, which is a model parameter related to the curvature of the exit rate pattern. To verify this, Figure 3 and Figure 4 present the effect of s on the pattern of the exit rates for $m = 50$ and $m = 100$ , respectively. When $s = 1$ , the dying rates have a linear relationship. When $s > 1$ , the rates are concave, and when $s < 1$ , the rates are convex. In particular, when $s = 0$ , the rates behave exponentially. In practice, it is likely that s is less than one when calibrating to mortality data (Cheng et al. 2021). That is, the dying rates increase faster than linearly as an individual ages. Throughout this paper, it is assumed that $h_{i}$ follows the structure given in (2) for $i = 1, 2, \dots, m$ .
(iv): The value of $λ$ needs to be commensurate to the value of m; otherwise, there would be no need to have numerous m states when $λ$ is small. Accordingly, we let their ratio be a constant; that is,

$\begin{matrix} \frac{m}{λ} = ψ, \end{matrix}$

which can be seen as a reparameterization of the PTAM that involves the five parameters $(h_{1}, h_{m}, s, ψ, m)$ . We note that such a reparameterization establishes a positive covariance between $λ$ and m, which is more in line with the biological interpretation. From now on, this parameterization of the PTAM is utilized.

2.2. Identifiability of the PTAM

The identifiability of the PTAM was established in Nie (2022) for

m \geq 6

, where m denotes the number of states. Illustrative examples of non-identifiable PTAM were also provided for

2 \leq m < 6

to clarify the concept. The identifiability, or mathematical uniqueness, guarantees that no other c.d.f.-equivalent representations of the model exist and that the likelihood function has a unique global maximum.

However, identifiability does not imply estimability. Although the model representation is unique, parameter estimates can still be unreliable when the profile likelihood functions are extremely flat, as a large range of different estimates can produce nearly identical likelihood values (Raue et al. 2009). According to Cheng (2021), the profile likelihood functions of the PTAM’s parameters

h_{1}

,

h_{m}

, and s turn out to be flat, which gives rise to this estimability issue.

3. Literature Review

3.1. Methods for Assessing Estimability

Estimability is also referred to as practical identifiability in the literature. The non-estimability issue arises in the case of flat likelihood functions (Raue et al. 2009) or, equivalently, the insensitivity of the model c.d.f. with respect to its parameters (McLean and McAuley 2012), which may be due to either of the following situations:

(i): The model c.d.f. is insensitive with respect to parameter changes. Accordingly, this aspect involves model sensitivity.
(ii): The effect of one parameter on the c.d.f. may be offset by that of one of the other parameters, this being defined as parameter correlation.

If the model is identifiable, then unreliable parameter estimates, if any, are caused by non-estimability issues that may be due to experimental errors, including data quality (insufficient or too noisy), algorithm approximation, or other noisy measurements (Miao et al. 2011). Unlike identifiability, estimability is less well-defined, and its characterization has remained an open problem. While it is straightforward to think qualitatively that parameters can be “loosely estimated” under noisy measurements, one would need to define quantitatively what this really means (Gontier and Pfister 2020; Raue et al. 2009).

Estimability has been widely studied in system biology, where an ODE model is utilized to model dynamic biological systems. For instance, it is assumed that

\begin{matrix} \frac{d x (t)}{d t} & = f (t, x (t), u (t), θ), \\ y (t) & = h (x (t), u (t), θ) + ϵ (t), \end{matrix}

where

\begin{matrix} x (t) \in R^{n} is a vector of state variables, \\ y (t) \in R^{d} is the output vector, \\ u (t) \in R^{p} is the known system input vector, \\ θ \in R^{q} is the parameter vector, and \\ ϵ (t) \sim N (0, σ^{2} (t) I_{d}) is the measurement noise . \end{matrix}

Four broad types of methodologies are employed. We briefly describe them next.

3.1.1. The Monte Carlo Method

This approach, which was utilized by Aslett (2012) in connection with certain lifetime models, involves repeated parameter estimation from a large number of data sets that were simulated by means of the Monte Carlo method. To apply this method, threshold values for parameter uncertainty levels are required to distinguish estimable from non-estimable parameters. Let

θ_{0}

be the nominal parameter vector obtained from fitting the model to the original data or from prior knowledge (Miao et al. 2011). Let

{\hat{θ}}_{i}

be the vector of the parameter estimates at the

i^{t h}

of M trials, which are based on data simulated from the model having

θ_{0}

as its parameter vector. Then, the average relative estimation error (ARE) associated with

{\hat{θ}}_{i}^{(k)}

, the

k^{t h}

element of

{\hat{θ}}_{i}

, is given by

\begin{matrix} A R E_{k} : = \frac{1}{M} \sum_{i = 1}^{M} \frac{| θ_{0}^{(k)} - {\hat{θ}}_{i}^{(k)} |}{| θ_{0}^{(k)} |}, \end{matrix}

where

θ_{0}^{(k)}

is the

k^{t h}

element

θ_{0}

,

k = 1, 2, \dots, q

. Miao et al. (2011) defined non-estimability as occurring when the ARE of a parameter is sufficiently high or, equivalently, exceeds a pre-selected threshold

Δ_{1}

.

3.1.2. Methods Based on the Correlation Matrix or the Fisher Information Matrix

According to Petersen et al. (2001), the Fisher information matrix (FIM) associated with the ODE model is given by

\begin{matrix} F I M = \sum_{i = 1}^{N} {(\frac{\partial {\hat{y}}_{i}}{\partial \hat{θ}})}^{'} V^{- 1} (\frac{\partial {\hat{y}}_{i}}{\partial \hat{θ}}), \end{matrix}

(3)

where

\begin{matrix} (\frac{\partial {\hat{y}}_{i}}{\partial \hat{θ}}) is defined as the sensitivity matrix and \\ V is a known positive definite matrix of weights on the variances and covariances . \end{matrix}

Rodriguez-Fernandez et al. (2006) proposed a correlation matrix approach for analyzing the estimability of the ODE model. By the Cramér–Rao Theorem, the covariance matrix can be obtained as

\begin{matrix} C \approx F I M^{- 1}, \end{matrix}

the correlation between

θ_{i}

and

θ_{j}

being

\begin{matrix} r_{i j} = \frac{C_{i j}}{\sqrt{C_{i i} C_{j j}}}, i \neq j, 1 \leq i, j \leq q . \end{matrix}

Similarly, Quaiser and Mönnigmann (2009) proposed a total correlation measure. In this instance,

θ_{i}

and

θ_{j}

are deemed to be non-estimable if their correlation is sufficiently high or, equivalently, exceeds a certain threshold

Δ_{2}

.

There exist several methods focusing on the FIM. Dochain and Vanrolleghem (2001) proposed that the condition number, which is defined as the ratio of the largest eigenvalue to the smallest eigenvalue of the FIM, could also be used to assess estimability. The larger the condition number, the more correlated the parameters and the less estimable they are. The model is then non-estimable if the condition number is sufficiently high or, equivalently, exceeds a certain threshold

Δ_{3}

. In addition, Brun et al. (2001) proposed a collinearity index to measure the parameter correlations. The model is deemed less estimable if the collinearity index is relatively large, which also requires the specification of a threshold.

3.1.3. Methods Based on the Model Sensitivity

Another approach is based on the model sensitivity. As can be seen from (3), the FIM is obtained in terms of the sensitivity matrix. Thus, the sensitivity matrix may be extracted from the FIM and analyzed specifically. The sensitivity matrix

S

with observation times

(t_{1}, \dots, t_{N})

is defined as

\begin{matrix} S_{d N \times q} (t_{1}, \dots, t_{N}) : = [\begin{matrix} \frac{\partial y_{1} (t_{1}; θ)}{\partial θ_{1}} & \dots & \frac{\partial y_{1} (t_{1}; θ)}{\partial θ_{q}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial y_{d} (t_{1}; θ)}{\partial θ_{1}} & \dots & \frac{\partial y_{d} (t_{1}; θ)}{\partial θ_{q}} \\ ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ \\ \frac{\partial y_{1} (t_{N}; θ)}{\partial θ_{1}} & \dots & \frac{\partial y_{1} (t_{N}; θ)}{\partial θ_{q}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial y_{d} (t_{N}; θ)}{\partial θ_{1}} & \dots & \frac{\partial y_{d} (t_{N}; θ)}{\partial θ_{q}} \end{matrix}] . \end{matrix}

Several methods are based on the sensitivity matrix. Jacquez and Greif (1985) calculated a sample correlation between the matrix columns. If

ρ (S_{. i}, S_{. j})

is close to one within a certain threshold, then

θ_{i}

and

θ_{j}

are considered to be non-estimable. Other methods exist, such as the principal components analysis (PCA) technique (Degenring et al. 2004), the orthogonal method (Yao et al. 2003), and the eigenvalue method (Vajda et al. 1989), all of which rely on subjective thresholds.

If the model is sufficiently simplified and involves only a few parameters, the sensitivity function

\frac{\partial S (t)}{\partial θ}

can be solved analytically, in which case the sensitivity matrix is not needed. Holmberg (1982) proposed a visual inspection approach to the sensitivity function. The larger the sensitivity measure of one parameter, the greater the change in the model c.d.f. with respect to the change of that parameter. If the sensitivity functions of certain parameters are linearly dependent, then those parameters are functionally related. The drawback of this approach is that a correlation cannot be quantified based on graphs. Moreover, a visual inspection entails a subjective assessment. Determining whether the graphs of the sensitivity functions are truly dependent depends on the experimenter’s assessment.

3.1.4. Methods Based on Profile Likelihood

Raue et al. (2009) proposed an explicit definition of estimability that is based on the profile likelihood function. They defined the profile likelihood confidence interval for parameter

θ_{i}

as

\begin{matrix} C_{i, Δ^{*}} : = {θ_{i} | P L (x; {\hat{θ}}_{i}) - P L (x; θ_{i}) < Δ^{*}}, \end{matrix}

(4)

where

P L (\cdot)

is the profile likelihood function of

θ_{i}

and

Δ^{*}

is a subjectively chosen threshold. Then,

θ_{i}

is said to be non-estimable if

C_{i, Δ^{*}}

is infinite. In other words, given a certain threshold

Δ^{*}

, there exists a

δ_{i} > 0

such that for all

| θ_{i} | > δ_{i}

,

P L (x; {\hat{θ}}_{i}) - P L (x; θ_{i}) < Δ^{*}

holds true. This definition is mathematically clear as it relies on a binary event: whether

C_{i, Δ^{*}}

is infinite or not. However, a subjective threshold

Δ^{*}

is still required, as is the case for the other methods.

3.2. Relationships between Identifiability, Estimability, and Sensitivity

Informally, sensitivity refers to the degree to which a model is affected by its parameter values. Graphically, the more sensitive a model is with respect to one parameter, the more noticeably the model’s c.d.f. is affected by changes in that parameter. In this paper, we quantify this concept by introducing the c.d.f. sensitivity measure that is specified in Definition 4.

Identifiability and estimability are related via the concept of sensitivity. From the perspective of sensitivity, if a statistical model

f (x; θ)

is non-identifiable with the non-identifiable set being

\begin{matrix} A : = {θ_{1}, θ_{2} \in Θ | \forall x, f (x; θ_{1}) = f (x; θ_{2})}, \end{matrix}

then f has zero sensitivity if the parameter vector changes within

A

since the model c.d.f. does not change. Thus, the non-identifiability issue cannot be overcome by improving the experimental design since the model output is then the same for all x.

On the other hand, if the sensitivity is not zero, then different parameters produce different model c.d.f. values. In this case, different values of x produce different model outputs, and the experiment contributes some information toward parameter inference. The more sensitive the model with respect to one parameter, the more estimable that parameter is. Although, as previously mentioned, different measures, such as those based on correlations, condition numbers, and eigenvalues, may be employed, in each case, a threshold must be set.

It should be emphasized that identifiability and estimability are equally important. If a model is non-identifiable, then the likelihood function has multiple global maxima. In that case, although we are relying on a numerical algorithm that aims to maximize the likelihood function, we may question the reliability of the parameter estimates produced by that algorithm since other maxima may potentially exist. In the case of estimability, although we are certain that the MLEs are unique, we may as well question the reliability of the parameter estimates produced by that algorithm since a wide variety of estimates can yield nearly the same likelihood values.

4. A Methodology for Objectively Assessing Estimability

Based on a thorough review of the literature, very little research on estimability appears to have been conducted in connection with statistical models. To the best of our knowledge, there is only one paper, namely Gontier and Pfister (2020), that studied estimability with respect to a statistical model, wherein a new definition of estimability based on a model selection perspective was proposed. A significant contribution of theirs was the elimination of the subjective threshold in (4) by introducing a Bayes factor into the new definition. Our contribution also aims to address the problem of having to set a subjective threshold. Rather than eliminating it, we shall make the threshold an objective, experiment-based quantity.

In order to do so, a methodology needs to be established that relates the confidence region to the experimental protocol. To achieve this, we rely on the following considerations:

(i): The curvature of the likelihood function reflects the sensitivity of the model (c.d.f.) with respect to the parameters (McLean and McAuley 2012).
(ii): The non-estimability issue is defined as occurring when the likelihood-based confidence interval is infinite (Raue et al. 2009).

In regards to the first consideration, we replace the likelihood-based confidence region introduced in Raue et al. (2009) with an innovative confidence region that is based on a carefully designed c.d.f. sensitivity measure. Such a sensitivity measure aims to relate a confidence region to the experimental error by quantifying them. This is indirectly achieved by comparing these quantified numbers under the same measure. In this case, the threshold is tailored to the experimental protocol and, thus, becomes objective. Additionally, in light of the second consideration, we define non-estimability as occurring when the proposed confidence region is infinite.

Several preliminary definitions are needed before defining estimability:

Definition 4.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Then, for

θ_{1}, θ_{2} \in Θ

, the c.d.f. sensitivity between

f (x; θ_{1})

and

f (x; θ_{2})

with respect to the random sample

x = {x_{1}, x_{2}, \dots, x_{n}}

is defined as

\begin{matrix} max_{x_{i} \in x} | F (x_{i}; θ_{1}) - F (x_{i}; θ_{2}) | . \end{matrix}

Definition 5.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Then, for all real numbers

e^{*} > 0

,

f (x; θ_{1})

and

f (x; θ_{2})

are said to be indistinguishable with respect to

e^{*}

if their c.d.f. sensitivity with respect to the random sample

x = {x_{1}, x_{2}, \dots, x_{n}}

is no greater than

e^{*}

. That is, for

θ_{1}, θ_{2} \in Θ, e^{*} > 0

,

\begin{matrix} max_{x_{i} \in x} | F (x_{i}; θ_{1}) - F (x_{i}; θ_{2}) | \leq e^{*} . \end{matrix}

Definition 6.

For a given statistical model

M = {f (x; θ) : θ \in Θ}

, consider the procedure utilized for obtaining parameter estimates for

θ

that are based on the random sample

x = {x_{1}, x_{2}, \dots, x_{n}}

and the numerical algorithm being implemented. Such a procedure is referred to as experiment Φ.

Definition 7.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Let

f (x; \hat{θ})

be an estimated model of f with respect to experiment Φ and

{\hat{F}}_{n} (t)

be the empirical c.d.f. (ECDF) obtained from Φ. Then, the experimental error associated with Φ is defined as the c.d.f. sensitivity between the estimated model and the ECDF. That is,

\begin{matrix} ϵ : = max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - {\hat{F}}_{n} (x_{i})| . \end{matrix}

(5)

Definition 8.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Let

f (x; \hat{θ})

be an estimated model of f with respect to experiment Φ. Then,

f (x; θ)

is said to be indistinguishable with respect to Φ if

f (x; θ)

and

f (x; \hat{θ})

are indistinguishable with respect to the experimental error ϵ as defined in (5). That is, for

θ \in Θ

,

\begin{matrix} max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - F (x_{i}; θ)| \leq ϵ . \end{matrix}

Definition 9.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Let

f (x; \hat{θ})

be an estimated model of f with respect to experiment Φ; then the set

N (f, Φ) \subset Θ

is called a c.d.f. sensitivity-based confidence region (CSCR) with respect to Φ if

\begin{matrix} N : = \{θ \in Θ | max_{x_{i} \in x} | F (x_{i}; \hat{θ}) - F (x_{i}; θ) | \leq ϵ\}, \end{matrix}

(6)

where ϵ is as defined in (5).

Definition 10.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space with

d i m (Θ) = d_{1}

. Given

k \in Z^{+}

, let

P_{k}

be a sub-space of Θ with

d i m (P_{k}) = d_{2} < d_{1}

, and define

R_{k} : = Θ ∖ P_{k}

so that

d i m (R_{k}) = d_{1} - d_{2}

. Let the parameters in

R_{k}

be

r_{k} = \{r_{k}^{(1)}, r_{k}^{(2)}, \dots, r_{k}^{(d_{1} - d_{2})}\}

, and let the parameters in

P_{k}

be

p_{k} = \{p_{k}^{(1)}, p_{k}^{(2)}, \dots, p_{k}^{(d_{2})}\}

. Let

θ_{B}

denote the boundary of the domain in the parameter space Θ. Then, a statistical model

M_{k} : = \{g_{k} (x; p_{k}) : p_{k} \in P_{k}\}

is said to be the

k^{t h}

sub-model of

M

if

\begin{matrix} lim_{r_{k} \to r_{B, k}} F (x; θ) = G_{k} (x; p_{k}), \end{matrix}

where

r_{B, k}

is the boundary of

M_{k}

and

G_{k}

is the associated c.d.f.

Definition 11.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space. Then,

F

is said to be the sub-model family of

M

if it comprises all the sub-models of

M

. Namely,

\begin{matrix} F : = ⋃_{k = 1}^{K} M_{k}, \end{matrix}

where

M_{k}

is a sub-model of

M

and

K \geq 1

is the number of sub-models.

This paper’s principal contribution is the definition of the estimability of a statistical model that follows.

Definition 12.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

with parameter space Θ. Then,

M

is said to be non-estimable with respect to experiment Φ if its CSCR with respect to experiment Φ is infinite. Accordingly,

M

is said to be estimable if its CSCR with respect to experiment Φ is bounded.

Observe that expression (5) quantifies the experimental error as

ϵ

. It can also be interpreted as the tolerance level within which the estimated model c.d.f. can vary. The CSCR essentially includes all possible parameters such that the c.d.f. sensitivity of a model having those parameters is less than the experimental error. Clearly, the smaller the experimental error, the smaller the CSCR, this being due to the fact that the experimental error is set as an upper bound in (6). The next step is to make Definition 12 applicable in practice, as this definition may not be of practical use if utilized directly. This caveat is addressed in the next two theorems.

Theorem 1.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Assume that

M

has a sub-model family

F : = ⋃_{k = 1}^{K} M_{k}

. Then,

M

is non-estimable if there exists at least one sub-model

M_{k} = \{g_{k} (x; p_{k}) : p_{k} \in P_{k}\}

that satisfies both of the following conditions:

(i): $max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - G_{k} (x_{i}; {\hat{p}}_{k})| \leq ϵ$ , where ϵ is as defined in (5), and F and $G_{k}$ are the associated c.d.f. values.
(ii): $r_{B, k}$ contains ∞ or $- \infty$ .

Proof.

Let a sub-model

M_{k}

satisfy both of the conditions, where

k = 1, 2, \dots, K

. By Definition 10, we have

\begin{matrix} lim_{r_{k} \to r_{B, k}} F (x; θ) = G_{k} (x; p_{k}) . \end{matrix}

Let

{\tilde{p}}_{k} (r_{k})

be the estimates of

p_{k}

for given values of

r_{k}

. We write it this way to reflect its dependence on

r_{k}

. Assuming that the parameters are estimated under the same numerical algorithm, then for all x, we have

\begin{matrix} lim_{r_{k} \to r_{B, k}} F (x; {\tilde{p}}_{k} (r_{k}) \cup r_{k}) = G_{k} (x; {\hat{p}}_{k}) . \end{matrix}

(7)

Subsequently, based on (7), we have

\begin{matrix} lim_{r_{k} \to r_{B, k}} max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - F (x_{i}; {\tilde{p}}_{k} (r_{k}) \cup r_{k})| = max_{x_{i} \in x_{k}} |F (x_{i}; {\hat{θ}}_{k}) - G_{k} (x_{i}; {\hat{p}}_{k})| . \end{matrix}

(8)

Denote the

j^{t h}

element in

r_{k}

by

r_{k}^{(j)}

. In light of condition (ii),

r_{k}^{(j)}

then becomes either ∞ or

- \infty

. On applying condition (i), (8) then implies that for some j,

\begin{matrix} \exists δ_{k}^{(j)} > 0 such that max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - F (x_{i}; {\tilde{p}}_{k} (r_{k}) \cup r_{k})| \leq ϵ, \forall | r_{k}^{(j)} | > δ_{k}^{(j)} . \end{matrix}

Thus,

{\tilde{p}}_{k} (r_{k}) \cup r_{k}

belongs to the CSCR by Definition 9. In that case, the CSCR is infinite given the parameters included in

R_{k}

. Then, by Definition 12,

M

is non-estimable with respect to experiment

Φ

. □

Theorem 2.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Assume that

M

has a sub-model family

F : = ⋃_{k = 1}^{K} M_{k}

. Then,

M

is estimable if, for all

M_{k} \in F

, conditions (i) and (ii) specified in Theorem 1 cannot be simultaneously satisfied.

Proof.

If condition (ii) is not satisfied, then

r_{B, k}

does not contain ∞ or

- \infty

. Denote the

j^{t h}

element in

r_{k}

by

r_{k}^{(j)}

, and denote the

j^{t h}

element in

r_{B, k}

by

r_{B, k}^{(j)}

. Then, (7) implies that for any

e > 0

and for all

j = 1, 2, \dots, d_{1} - d_{2}

,

\exists δ_{k}^{(j)} > 0

such that

max_{x_{i} \in x} |F (x_{i}; {\tilde{p}}_{k} (r_{k}) \cup r_{k}) - G_{k} (x; {\hat{p}}_{k})| \leq e

,

\forall r_{k}^{(j)} \in (r_{B, k}^{(j)} - δ_{k}^{(j)}, r_{B, k}^{(j)} + δ_{k}^{(j)})

.

Therefore, no matter whether or not condition (i) is satisfied,

{\tilde{p}}_{k} (r_{k}) \cup r_{k}

is already contained in a bounded region. The CSCR, which is a sub-region of this bounded region, is, of course, bounded as well.

It remains to consider the case when condition (i) is not satisfied while condition (ii) is. In that case, since condition (i) is not verified, we have

\begin{matrix} max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - G_{k} (x_{i}; {\hat{p}}_{k})| > ϵ . \end{matrix}

Since condition (ii) is satisfied, (8) then implies that for all j,

\begin{matrix} \exists δ_{k}^{(j)} > 0 such that max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - F (x_{i}; {\tilde{p}}_{k} (r_{k}) \cup r_{k})| > ϵ, \forall | r_{k}^{(j)} | > δ_{k}^{(j)} . \end{matrix}

This is equivalent to

\begin{matrix} \exists δ_{k}^{(j)} > 0 such that \{| r_{k}^{(j)} | | max_{x_{i} \in x} | F (x_{i}; \hat{θ}) - F (x_{i}; {\tilde{p}}_{k} (r_{k}) \cup r_{k}) | \leq ϵ\} \subset (0, δ_{k}^{(j)}] . \end{matrix}

Therefore, for each parameter of

{\tilde{p}}_{k} (r_{k}) \cup r_{k}

that is in the CSCR by Definition 9, its range in absolute value is covered by a bounded interval

(0, δ_{k}^{(j)}]

. We may then conclude that the entire CSCR is a bounded region. Thus, by Definition 12,

M

is estimable with respect to experiment

Φ

. □

Assuming that the statistical model under consideration is identifiable, the steps to follow for implementing the proposed methodology are enumerated in the next algorithm.

Algorithm 1 The estimability analysis for a statistical model

M = {f (x; θ) : θ \in Θ}

1:: Determine the sub-models of $M$ —a requirement to conduct the proposed estimability analysis.
2:: Obtain a parameter estimate of $M$ , $\hat{θ}$ . The estimator should be consistent. It is recommended to proceed as explained in Section 7.1.
3:: Compute the experimental error $ϵ$ , as specified in Definition 7.
4:: Proceeding as in Step 2, obtain the parameter estimates for all the sub-models of the statistical model.
5:: Compute the c.d.f. sensitivities between the estimated statistical model and each one of its estimated sub-models using Definition 4.
6:: Compare all the c.d.f. sensitivities obtained in Step 5 with the experimental error $ϵ$ determined in Step 3. If all c.d.f. sensitivities are greater than the $ϵ$ , then the statistical model is estimable with respect to the current experiment $Φ$ in light of Theorem 2. Otherwise, it is non-estimable with respect to the current experiment $Φ$ by virtue of Theorem 1.
7:: If non-estimable, one may consider improving the experimental design by making use of the techniques described in Section 5.3 and then reassess estimability, starting with Step 2.

5. Validation of the Proposed Definition

In this section, we substantiate the proposed definition and conclude that it is innately sound. This is achieved by establishing the validity of the associated theoretical results, which happen to be consistent with common sense.

We are considering multiple perspectives, as discussed in Section 5.1, Section 5.2 and Section 5.3. This coherence is further supported by the illustrative examples presented in Section 5.4.

5.1. Validation of the Data Noise, the Algorithm Noise, and the Experimental Error

We first validate the data noise, the algorithm noise, and the experimental error as specified in Definitions 13 and 7. Without any loss of generality, it is assumed that the experimental error originates from the data noise and the algorithm noise (Chis et al. 2011). The data noise and the algorithm noise are next defined as c.d.f. sensitivity measures:

Definition 13.

Consider an identifiable statistical model

M = {f (x; θ) : θ \in Θ}

, where Θ is the parameter space and

F (x; θ)

is the associated c.d.f. Let

f (x; \hat{θ})

be an estimated model with respect to experiment Φ. Denoting the true value of the parameter estimate by

{\hat{θ}}^{T}

, the data noise

ϵ_{d}

and the algorithm noise

ϵ_{A L}

of

M

with respect to experiment Φ can be defined as the following c.d.f. sensitivities, respectively:

\begin{matrix} ϵ_{d} & : = max_{x_{i} \in x} |F_{X} ({\hat{θ}}^{T}; x_{i}) - {\hat{F}}_{n} (x_{i})|, \\ ϵ_{A L} & : = max_{x_{i} \in x} |F_{X} ({\hat{θ}}^{T}; x_{i}) - F_{X} (\hat{θ}; x_{i})| . \end{matrix}

We now show that both types of noise tend to zero as the sample size increases. As the sample size goes to infinity,

\begin{matrix} ϵ_{d} & : = max_{x_{i} \in x} |F_{X} ({\hat{θ}}^{T}; x_{i}) - {\hat{F}}_{n} (x_{i})| \\ \to max_{x \in Ω} | F_{X} (θ^{*}; x) - F_{X} (θ^{*}; x) | \\ = 0, \end{matrix}

where

θ^{*}

is the true parameter of the model (unknown, of course) and

Ω

is the support of

F_{X}

. Note that the limits of

F_{X} ({\hat{θ}}^{T}; x_{i})

and

{\hat{F}}_{n} (x_{i})

are based on asymptotic results on the consistency of the estimator

{\hat{θ}}^{T}

and the convergence of the ECDF

{\hat{F}}_{n} (t)

, respectively. Thus,

ϵ_{d}

is a valid measure of the data noise that behaves as expected.

Now, consider the algorithm noise. As the accuracy of the algorithm tends to perfection,

\begin{matrix} ϵ_{A L} & : = max_{x_{i} \in x} |F_{X} ({\hat{θ}}^{T}; x_{i}) - F_{X} (\hat{θ}; x_{i})| \\ \to max_{x_{i} \in x} |F_{X} ({\hat{θ}}^{T}; x_{i}) - F_{X} ({\hat{θ}}^{T}; x_{i})| \\ = 0 . \end{matrix}

Thus, as the algorithm is becoming nearly exact, the approximation of

{\hat{θ}}^{T}

is more accurate, which also agrees with common sense. Accordingly,

ϵ_{A L}

is a valid measure of the algorithm noise as well.

After having validated the data noise and the algorithm noise, we may now validate the experimental error. Observe that

\begin{matrix} ϵ : = & max_{x_{i} \in x} |F_{X} (\hat{θ}; x_{i}) - {\hat{F}}_{n} (x_{i})| \\ = & max_{x_{i} \in x} |F_{X} (\hat{θ}; x_{i}) - F_{X} ({\hat{θ}}^{T}; x_{i}) + F_{X} ({\hat{θ}}^{T}; x_{i}) - {\hat{F}}_{n} (x_{i})| \\ \leq & max_{x_{i} \in x} |F_{X} ({\hat{θ}}^{T}; x_{i}) - {\hat{F}}_{n} (x_{i})| + max_{x_{i} \in x} |F_{X} (\hat{θ}; x_{i}) - F_{X} ({\hat{θ}}^{T}; x_{i})| \\ = & ϵ_{d} + ϵ_{A L} . \end{matrix}

(9)

Then, as the experimental design tends to perfection, both the data noise and the algorithm noise tend to zero. Consequently,

ϵ

tends to zero by appealing to the so-called “Squeeze Theorem”. It is worth emphasizing that

ϵ_{d}

and

ϵ_{A L}

are unknown to the experimenter since the true value of the parameter estimate

{\hat{θ}}^{T}

is also unknown. Instead, one only knows

\hat{θ}

, which is the output of the numerical algorithm. However, this does not prevent us from establishing that

ϵ

is a valid measure of the experimental error. Otherwise, one would be able to calculate

{\hat{θ}}^{T}

analytically without resorting to numerical algorithms.

5.2. Validation of the c.d.f. Sensitivity-Based Confidence Region

We next validate the definition of the CSCR. Observe that the experimental error,

ϵ

, is an upper bound in (6). Thus, as the experimental error decreases, the CSCR shrinks, making the model more estimable, which is consistent with intuition. Now, as in Section 5.1, consider the argument that the experimental design tends to perfection. Then,

\begin{matrix} N & : = \{θ \in Θ | max_{x_{i} \in x} | F (x_{i}; \hat{θ}) - F (x_{i}; θ) | \leq ϵ\} \\ \to \{θ \in Θ | max_{x \in Ω} |F (x; θ^{*}) - F (x; θ)| \leq 0\} \\ = \{θ \in Θ | max_{x \in Ω} |F (x; θ^{*}) - F (x; θ)| = 0\} \\ = \{θ \in Θ | F (x; θ^{*}) - F (x; θ) = 0, \forall x \in Ω\} \\ = : A, \end{matrix}

where

θ^{*}

is the true (unknown) parameter of the model.

Thus, the CSCR,

N

, collapses to

A

, which is exactly the parameter set for which

M

is non-identifiable. Therefore, if the experimental error goes to zero, then any remaining unreliability associated with the parameter estimates must originate from the non-identifiability issue. This agrees with the statement found in the literature to the effect that it is unnecessary to analyze estimability if the model is non-identifiable since the non-identifiability issue cannot be overcome by improvements in the experimental design. Accordingly, the demonstrated consistency with respect to the CSCR further validates the proposed definition of estimability.

5.3. Validation by Known Techniques

The proposed definition can also be substantiated by making use of known methods for improving estimability. These are discussed next.

5.3.1. Increasing the Sample Size

Increasing the sample size can decrease both

ϵ_{d}

and

ϵ_{A L}

. More observations not only decrease the data noise but also decrease the algorithm noise as the likelihood functions then become more and more concave. Then, in light of the arguments presented in Section 5.1 and Section 5.2,

ϵ

decreases, and the CSCR shrinks, which improves estimability. As the sample size tends to infinity,

ϵ

tends to zero, and the experiment tends to perfection. Thus, the proposed definition is consistent with an increase in the sample size.

5.3.2. Increasing the Convexity of the Log-Likelihood Function

Another way to increase the sample size is to clone the simulated data multiple times. This is called the data cloning method (Lele et al. 2007, 2010; Cheng 2023a). With this approach, the likelihood functions become more and more concave, which makes the algorithm approximation more accurate. Accordingly,

ϵ_{A L}

decreases. However, this approach only reduces the algorithm noise. It cannot improve

ϵ_{d}

, as

{\hat{θ}}^{T}

does not change. Thus, the proposed definition is consistent with the data cloning method.

5.3.3. Improving the Algorithm Design

One can also decrease

ϵ_{A L}

by improving the efficiency of the algorithm approximation. However, the data noise remains unchanged, which is consistent with intuition as the improvement only pertains to the algorithm.

5.3.4. Securing More Complete Information

Estimability can also be enhanced if more information is secured. As the data convey additional information,

ϵ_{d}

decreases. Then, in light of the arguments presented in Section 5.1 and Section 5.2,

ϵ

decreases, and the CSCR shrinks, which improves estimability. Thus, the proposed definition is consistent with obtaining more complete information.

5.4. Illustrative Examples

In Section 5.1, Section 5.2 and Section 5.3, we have theoretically validated our definition of estimability by establishing that it is consistent with common sense. Examples illustrating its applicability to discrete and continuous statistical models follow.

5.4.1. Example 1

Consider the following constrained binomial model:

\begin{matrix} M = \{f (k; m, p = \frac{λ}{m}) = (\binom{m}{k}) {(\frac{λ}{m})}^{k} {(1 - \frac{λ}{m})}^{m - k} : m > λ > 0, m \in Z^{+}\} . \end{matrix}

With the binomial distribution being identifiable, its estimability may be assessed. According to Definition 11,

Θ = {m, λ}

,

P_{1} = {λ}

,

R_{1} = {m}

,

θ_{B} = {0, \infty}

,

r_{B, 1} = {\infty}

. Then, the sub-model family of

M

based on

r_{B, 1}

is

\begin{matrix} F = M_{1} : = \{g_{1} (k; λ) = \frac{e^{- λ} λ^{k}}{k!} : λ > 0\} \end{matrix}

since the Poisson distribution is the limiting distribution of the binomial distribution as

m \to \infty

with the constraint

m p = λ

.

Let the underlying model have as its parameters

m = 10

and

λ = 0.7

(or

p = 0.07

), and consider following three experiments:

Experiment 1: The sample size is $| k | = 50$ .
Experiment 2: The sample size is $| k | = 500$ .
Experiment 3: The sample size is $| k | = 1000$ .

The estimability assessment results are presented in Table 1 and Figure 5 and Figure 6. Table 1 compares the experimental error with the c.d.f. sensitivity between the fitted binomial distribution and its sub-model, which is a direct application of Theorems 1 and 2.

It can be observed that the experimental error decreases as the sample size increases, which again validates the definition of the experimental error. According to Theorem 1,

M

is non-estimable with respect to Experiments 1 and 2 since the c.d.f. sensitivity measure is less than the experimental error. On the other hand, according to Theorem 2,

M

becomes estimable with respect to Experiment 3 since the c.d.f. sensitivity measure is greater.

The above conclusions are further supported by Figure 5, where the CSCRs for Experiments 1, 2, and 3 are visualized. In line with Theorem 1, the CSCR for Experiments 1 and 2 are both infinite. As the sample size increases, the CSCR shrinks until it becomes bounded in Experiment 3, where

M

becomes estimable, which is in line with Theorem 2.

A more intuitive way of interpreting the concept of estimability is displayed in Figure 6. In Experiments 1 and 2, the model is assessed to be non-estimable. This can be interpreted as follows: the inferential power from the data displayed in the histogram is not sufficient to distinguish the estimated p.m.f.s of the underlying model and its sub-model. This is exactly why estimability is also referred to as “practical identifiability” since the shape of the histogram cannot “practically identify” the estimated p.m.f. values of the underlying model and its sub-model. On the other hand, in Experiment 3, the model is assessed to be estimable. Thus, the inferential power is sufficient to distinguish the estimated p.m.f. values. In this case, the shape of the histogram is sufficient to favor the underlying model and dismiss its sub-model.

However, these conclusions cannot be reached by only inspecting Figure 6, which is why the CSCR associated with the proposed definition is crucial.

5.4.2. Example 2

Consider the following constrained Lomax Pareto Type II (henceforth Pareto) model:

\begin{matrix} M = \{f (x; α, θ = \frac{α}{λ}) = \frac{α {(\frac{α}{λ})}^{α}}{{(x + \frac{α}{λ})}^{α + 1}} : λ > 0, α > 0\} . \end{matrix}

With the Pareto distribution being identifiable, its estimability may be assessed. According to Definition 11,

Θ = {α, λ}

,

P_{1} = {λ}

,

R_{1} = {α}

,

θ_{B} = {0, \infty}

,

r_{B, 1} = {\infty}

. Then, the sub-model family of

M

based on

r_{B}

is

\begin{matrix} F = M_{1} : = \{g_{1} (x; λ) = λ e^{- λ x} : λ > 0\} \end{matrix}

since the exponential distribution is the limiting distribution of the Pareto distribution as

α \to \infty

with the constraint

\frac{α}{θ} = λ

.

Let the underlying model have as its parameters

α = 3

and

λ = 0.1

(or

θ = 30

), and consider following three experiments:

Experiment 1: The sample size is $| x | = 100$ .
Experiment 2: The sample size is $| x | = 200$ .
Experiment 3: The sample size is $| x | = 500$ .

The estimability assessment results are presented in Table 2 and Figure 7 and Figure 8. Table 2 compares the experimental error with the c.d.f. sensitivity between the fitted Pareto distribution and its sub-model, which is a direct application of Theorems 1 and 2. The results yield the same conclusions that were obtained in Example 1.

It should be emphasized that the visualization of the CSCR displayed in Figure 5 and Figure 7 is not achievable if the parameter space extends to more than three dimensions, such as in the case of the PTAM, which is considered next. In that case, we have to fully rely on Theorems 1 and 2 and compare the c.d.f. sensitivity with the experimental error. The two-dimensional illustrative examples that were previously presented provide supporting evidence corroborating the validity of Theorems 1 and 2. Another important aspect to point out is that the algorithm utilized to obtain the parameter estimates coincides with that recommended in Section 7.1. This aspect is discussed further in the remainder of this paper.

6. Estimability of the PTAM

In Section 5, we establish that the proposed definition of estimability is innately sound. In this section, we apply this definition to assess the estimability of the PTAM.

6.1. Identifiability of the PTAM

Nie (2022) proved that the PTAM is identifiable when the number of states is greater or equal to six. We may thus proceed to assess its estimability when that is the case.

6.2. Sub-Models of the PTAM

According to Definition 3, it can be concluded that the parameter space for the PTAM,

Θ

, is as follows:

\begin{matrix} Θ & = \{θ = (h_{1}, h_{m}, s, ψ, m) | h_{m} > h_{1} > 0, \frac{m}{ψ} = λ, s \in R, ψ > 0, m \geq 6\}, \end{matrix}

with boundary

θ_{B} = {0, 6, h_{1}, h_{m}, \infty, - \infty}

. Next, one must obtain sub-models of the PTAM in order to investigate its parameter estimability. According to Theorems 1 and 2, it suffices to investigate sub-models whose boundaries contain ∞ or

- \infty

. The results are presented in the following propositions.

Proposition 1.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

s \to \infty

is Coxian of order 2 with

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ \\ 0 & - h_{m} \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{m} \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{1}

with corresponding boundary

r_{B, 1}

. In this case,

R_{1} = {s}

,

P_{1} = {h_{1}, h_{m}, ψ, m}

, and

r_{B, 1} = {\infty}

.

Proposition 2.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{1} \to 0

and

s \to \infty

is Coxian of order 2 with

\begin{matrix} S = [\begin{matrix} - λ & λ \\ 0 & - h_{m} \end{matrix}], h = [\begin{matrix} 0 \\ h_{m} \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{2}

with corresponding boundary

r_{B, 2}

. In this case,

R_{2} = {h_{1}, s}

,

P_{2} = {h_{m}, ψ, m}

, and

r_{B, 2} = {0, \infty}

.

Proposition 3.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

s \to - \infty

is Coxian of order m with

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ + h_{1}) & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ + h_{1}) & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{1} \\ ⋮ \\ h_{1} \\ h_{m} \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{3}

with corresponding boundary

r_{B, 3}

. In this case,

R_{3} = {s}

,

P_{3} = {h_{1}, h_{m}, ψ, m}

, and

r_{B, 3} = {- \infty}

.

Proposition 4.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

s \to - \infty

and

m \to 6

is Coxian of order 6 with

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ + h_{1}) & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ + h_{1}) & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{1} \\ ⋮ \\ h_{1} \\ h_{m} \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{4}

with corresponding boundary

r_{B, 4}

. In that case,

R_{4} = {s, m}

,

P_{4} = {h_{1}, h_{m}, ψ}

and

r_{B, 4} = {- \infty, 6}

.

Proposition 5.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{1} \to 0

and

s \to - \infty

is Coxian of order m with

\begin{matrix} S = [\begin{matrix} - λ & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - λ & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - λ & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], h = [\begin{matrix} 0 \\ 0 \\ ⋮ \\ 0 \\ h_{m} \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{5}

with corresponding boundary

r_{B, 5}

. In that case,

R_{5} = {h_{1}, s}

,

P_{5} = {h_{m}, ψ, m}

and

r_{B, 5} = {0, - \infty}

.

Proposition 6.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{1} \to 0

,

s \to - \infty

, and

m \to 6

is Coxian of order 6 with

\begin{matrix} S = [\begin{matrix} - λ & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - λ & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - λ & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], h = [\begin{matrix} 0 \\ 0 \\ ⋮ \\ 0 \\ h_{m} \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{6}

with corresponding boundary

r_{B, 6}

. In this case,

R_{6} = {h_{1}, s, m}

,

P_{6} = {h_{m}, ψ}

, and

r_{B, 6} = {0, - \infty, 6}

.

Proposition 7.

Consider the PTAM as specified in Definition 3. The limiting distribution under either one of the following cases is exponential with rate

h_{1} + λ

:

(i): $h_{m} \to \infty$ and $s \to \infty$ , in which case $R_{7} = {h_{m}, s}$ , $P_{7} = {h_{1}, ψ, m}$ , and $r_{B, 7} = {\infty, \infty}$ .
(ii): $h_{m} \to \infty$ , $s \to \infty$ , and $m \to 6$ , in which case $R_{7} = {h_{m}, s, m}$ , $P_{7} = {h_{1}, ψ}$ , and $r_{B, 7} = {\infty, \infty, 6}$ .

Denote this as sub-model

M_{7}

with corresponding boundary

r_{B, 7}

.

Proposition 8.

Consider the PTAM as specified in Definition 3. The limiting distribution under either one of the following cases is exponential with rate λ:

(i): $h_{1} \to 0$ , $h_{m} \to \infty$ , and $s \to \infty$ , in which case $R_{8} = {h_{1}, h_{m}, s}$ , $P_{8} = {ψ, m}$ , and $r_{B, 8} = {0, \infty, \infty}$ .
(ii): $h_{1} \to 0$ , $h_{m} \to \infty$ , $s \to \infty$ , and $m \to 6$ , in which case $R_{8} = {h_{1}, h_{m}, s, m}$ , $P_{8} = {ψ}$ , and $r_{B, 8} = {0, \infty, \infty, 6}$ .

Denote this as sub-model

M_{8}

with corresponding boundary

r_{B, 8}

.

Proposition 9.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{m} \to \infty

and

s \to - \infty

is Coxian of order

m - 1

with

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ + h_{1}) & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ + h_{1}) & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - (λ + h_{1}) \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{1} \\ ⋮ \\ h_{1} \\ h_{1} + λ \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{9}

with corresponding boundary

r_{B, 9}

. In this case,

R_{9} = {h_{m}, s}

,

P_{9} = {h_{1}, ψ, m}

, and

r_{B, 9} = {\infty, - \infty}

.

Proposition 10.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{m} \to \infty

,

s \to - \infty

, and

m \to 6

is Coxian of order 5 with

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ + h_{1}) & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ + h_{1}) & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - (λ + h_{1}) \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{1} \\ ⋮ \\ h_{1} \\ h_{1} + λ \end{matrix}] . \end{matrix}

Denote this as sub-model

M_{10}

with corresponding boundary

r_{B, 10}

. In this case,

R_{10} = {h_{m}, s, m}

,

P_{10} = {h_{1}, ψ}

, and

r_{B, 10} = {\infty, - \infty, 6}

.

Proposition 11.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{1} \to 0

,

h_{m} \to \infty

, and

s \to - \infty

is a gamma distribution with shape parameter

m - 1

and rate parameter λ. Denote this as sub-model

M_{11}

with corresponding boundary

r_{B, 11}

. In this case,

R_{11} = {h_{1}, h_{m}, s}

,

P_{11} = {ψ, m}

, and

r_{B, 11} = {0, \infty, - \infty}

.

Proposition 12.

Consider the PTAM as specified in Definition 3. Given m, the limiting distribution as

h_{1} \to 0

,

h_{m} \to \infty

,

s \to - \infty

, and

m \to 6

is a gamma distribution with shape parameter 5 and rate parameter λ. Denote this as sub-model

M_{12}

with corresponding boundary

r_{B, 12}

. In this case,

R_{12} = {h_{1}, h_{m}, s, m}

,

P_{12} = {ψ}

, and

r_{B, 12} = {0, \infty, - \infty, 6}

.

Proposition 13.

Consider the PTAM as specified in Definition 3, and

f (x; θ)

is the associated p.d.f. According to Cheng et al. (2021), the limiting distribution as

m \to \infty

is

\begin{matrix} f (t; h_{1}, h_{m}, s, ψ) = e^{- \int_{0}^{t} h (u; h_{1}, h_{m}, s, ψ) d u} h (t; h_{1}, h_{m}, s, ψ), \end{matrix}

where

\begin{matrix} h (t; h_{1}, h_{m}, s, ψ) = \{\begin{matrix} {((h_{m}^{s} - h_{1}^{s}) \frac{t}{ψ} + h_{1}^{s})}^{\frac{1}{s}}, & s \neq 0, \\ h_{1}^{1 - \frac{t}{ψ}} h_{m}^{\frac{t}{ψ}}, & s = 0, \end{matrix} \end{matrix}

(10)

is the hazard rate of the limiting distribution. Denote this as sub-model

M_{9}

. In this case,

R_{13} = {m}

,

P_{13} = {h_{1}, h_{m}, s, ψ}

, and

r_{B, 13} = r_{B, 1} = {\infty}

.

Interestingly, the hazard rates that are given in (10) for

s \neq 0

and

s = 0

correspond to the generalized Weibull distribution and the Gompertz law of mortality, respectively (Gompertz 1825; Pham and Lai 2007).

Proposition 14.

Consider the PTAM as specified in Definition 3. The limiting distribution under either one of the following cases is exponential with rate

h_{m}

:

(i): $h_{1} \to h_{m}$ , $s \to \infty$ , and $m \to \infty$ , in which case $R_{14} = {h_{1}, s, ψ, m}$ , $P_{14} = {h_{m}}$ , and $r_{B, 14} = {h_{m}, \infty, \infty}$ .
(ii): $h_{1} \to h_{m}$ , $s \to - \infty$ , and $m \to \infty$ , in which case $R_{14} = {h_{1}, s, ψ, m}$ , $P_{14} = {h_{m}}$ , and $r_{B, 14} = {h_{m}, - \infty, \infty}$ .

Denote this as sub-model

M_{14}

with corresponding boundary

r_{B, 14}

.

Proposition 15.

Consider the PTAM as specified in Definition 3. The limiting distribution under either one of the following cases is exponential with rate

h_{1}

:

(i): $h_{m} \to h_{1}$ , $s \to \infty$ , and $m \to \infty$ , in which case $R_{15} = {h_{m}, s, ψ, m}$ , $P_{15} = {h_{1}}$ , and $r_{B, 15} = {h_{1}, \infty, \infty}$ .
(ii): $h_{m} \to h_{1}$ , $s \to - \infty$ , and $m \to \infty$ , in which case $R_{15} = {h_{m}, s, ψ, m}$ , $P_{15} = {h_{1}}$ , and $r_{B, 15} = {h_{1}, - \infty, \infty}$ .

Denote this as sub-model

M_{15}

with corresponding boundary

r_{B, 15}

.

Since

M_{7}

,

M_{8}

,

M_{14}

, and

M_{15}

are all exponential distributions, they can be grouped as one sub-model, which is denoted by

B

. Combining all these sub-models, the sub-model family of the PTAM becomes

\begin{matrix} F : = (⋃_{k = 1}^{6} M_{k}) ⋃ (⋃_{k = 9}^{13} M_{k}) ⋃ B . \end{matrix}

6.3. Simulation Study

In this section, the proposed definition of estimability is applied to assess the estimability of the PTAM via simulation studies.

We consider an underlying PTAM with

h_{1} = 0.0007

,

h_{m} = 0.1542

,

s = - 2.2152, λ = 0.3122

, and

m = 10

. This underlying PTAM is utilized in Cheng (2021) and Cheng et al. (2021) to model the aging process of individuals residing in Channing House, a retirement community in Palo Alto, California (Hyde 1980). The Channing House data are chosen because all the residents in the community are subject to approximately the same circumstances, so, relatively speaking, the aging process is the most significant factor that contributes to the variability in their lifetimes, which is the process we intend to model using the PTAM. Furthermore, female data are chosen to preclude the effects of gender differences.

In Table 3, the proposed definition is applied to assess the model estimability with respect to six experiments:

Experiment 1: A sample size of 50 with 100,000 initial values (IVs).
Experiment 2: A sample size of 50 with 200,000 IVs.
Experiment 3: A sample size of 500 with 100,000 IVs.
Experiment 4: A sample size of 500 with 200,000 IVs.
Experiment 5: A sample size of 2000 with 100,000 IVs.
Experiment 6: A sample size of 2000 with 200,000 IVs.

The initial values (IVs) were randomly generated from the parameter space

Θ

. Those producing the smallest value of

g (\cdot)

in (11) were selected as the parameter estimates

\hat{θ}

. The following can be observed from Table 3:

(i): According to the Theorems 1 and 2, the PTAM is non-estimable with respect to Experiments 1 to 5, whereas it is estimable with respect to Experiment 6.
(ii): One potential barrier to the estimability of the PTAM occurs when the following two conditions are satisfied simultaneously: (i) $s \to - \infty$ and (ii) m is one of the model’s free parameters, which correspond to $M_{3}$ , $M_{5}$ , $M_{9}$ , and $M_{11}$ in Figure 3.
(iii): The algorithm noise decreases as the number of initial values increases. This again validates the proposed definition. However, this trend does not always hold unless the algorithm is selected, as is recommended in Section 7.1.

In conclusion, the non-estimability issue of the PTAM has been thoroughly investigated by applying the proposed definition. One may then arrive at an experimental design that makes the PTAM estimable, such as that utilized in Experiment 6.

7. Remarks and Conclusions

7.1. A Recommendation Regarding the Methodology

It is recommended that a parameter estimate

{\hat{θ}}^{T}

be such that the following function g is minimized:

\begin{matrix} g (θ) = max_{x_{i} \in x} |F_{X} (θ; x_{i}) - {\hat{F}}_{n} (x_{i})|, \end{matrix}

(11)

where

F_{X}, {\hat{F}}_{n}

, and

x

are as specified in Definition 7. Otherwise, there will possibly exist implausible situations where the experimental error actually increases as the algorithm accuracy improves. To verify this, first notice that

\begin{matrix} ϵ & = g (\hat{θ}) and ϵ_{d} = g ({\hat{θ}}^{T}) . \end{matrix}

Accordingly, if

{\hat{θ}}^{T}

is based on optimization criteria other than (11), for instance, the maximization of the likelihood, then it is plausible that

ϵ = g (\hat{θ}) < g ({\hat{θ}}^{T}) = ϵ_{d}

before improving the algorithm accuracy. This is due to the fact that

ϵ_{d}

is not the minimum of g. Now, let

ϵ^{*}

be the experimental error after the algorithm accuracy has improved. As explained in Section 5.3, increasing the algorithm accuracy does not change

ϵ_{d}

, but the experimental error converges to

ϵ_{d}

, making

ϵ < ϵ^{*} < ϵ_{d}

possible, which is unreasonable. Such a situation can be avoided by determining

{\hat{θ}}^{T}

via the minimization of g, in which case

ϵ_{d}

is the minimum of g.

7.2. Future Research Avenues

The concept of estimability being introduced in this paper is entirely original. Two main avenues for future research are discussed in Section 7.2.1 and Section 7.2.2.

7.2.1. Refinements Regarding the c.d.f. Sensitivity

Interestingly, minimizing g is essentially equivalent to fitting a regression model to the ECDF

{\hat{F}}_{n}

while treating g as the loss function. This perspective then suggests that the concept of estimability can be redefined in terms of other loss functions. However, another valid g, if it exists, must also pass all the validation criteria presented in Section 5.1, Section 5.2 and Section 5.3. For example, the mean absolute deviation, that is,

\begin{matrix} g (θ) = \frac{1}{n} \sum_{i = 1}^{n} |F_{X} (θ; x_{i}) - {\hat{F}}_{n} (x_{i})|, \end{matrix}

(12)

where

F_{X}, {\hat{F}}_{n}

, and

x

are as specified in Definition 7, is a valid option. Accordingly, the c.d.f. sensitivity in Definition 4 can then be modified as follows:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} | F (x_{i}; θ_{1}) - F (x_{i}; θ_{2}) | . \end{matrix}

Similarly, Definitions 5–9, 13 and Theorems 1 and 2 have to be modified to be consistent with (12) and then provide a valid alternative definition of estimability. Other definitions can also possibly exist for g. We are planning to investigate such research avenues. An alternative framework may yield a more precise definition of the concept of estimability, which can be readily applicable.

Such refinements of the definition of estimability can lead to improved strategies in situations where, for instance, the samples are small or one or more parameters lie on the boundary and certain linear/non-linear constraints are involved.

7.2.2. Refinements for Left-Truncated and Right-Censored Data

Another research avenue that we plan to pursue pertains to left-truncated and right-censored data. In such instances, the Kaplan–Meier survival curve replaces the ECDF. The experimental error specified in Definition 7 then becomes

\begin{matrix} ϵ : = max_{x_{i} \in x} |F (x_{i}; \hat{θ}) - (1 - {\hat{S}}_{n} (x_{i}))|, \end{matrix}

(13)

where

{\hat{S}}_{n} (x_{i})

denotes the Kaplan–Meier estimates. The remaining definitions and Theorems 1 and 2 should then be modified accordingly.

This generalizes the proposed methodology since

1 - {\hat{S}}_{n} (x_{i})

then boils down to the ECDF,

{\hat{F}}_{n} (x_{i})

, without left truncation and right censoring. Furthermore,

{\hat{S}}_{n} (x_{i})

converges in distribution to

S (x_{i})

as the consistency property still holds for the Kaplan–Meier estimator, and thus, the validation process described in Section 5.1 also applies. Since real data often involve left truncation and right censoring, once related refinements are incorporated in the definition of estimability, we shall apply it to carry out further studies involving actual data sets, such as the Channing House data set, which was previously analyzed by Hyde (1980).

7.3. Concluding Remarks

A novel definition of estimability is proposed to objectively quantify this concept in the context of statistical models. The proposed definition is akin to an existing method in that the non-estimability is defined as occurring when the confidence region is infinite. However, unlike this method, which utilizes a likelihood-based confidence region, a c.d.f. sensitivity-based confidence region is proposed with a view of linking the confidence region to the experimental protocol. Under this setting, the threshold becomes objective as the experimental error becomes an experiment-based quantity under the proposed methodology. The validated definition is then applied to assess the estimability of two statistical models and the PTAM. In the latter case, this solves a potential non-estimability issue. The calculations were carried out with the R software package, the code being available upon request from the first author.

Author Contributions

Conceptualization, X.L.; Methodology, C.N., X.L. and S.B.P.; Software, C.N.; Validation, C.N.; Formal analysis, C.N.; Resources, S.B.P.; Writing—original draft, X.L.; Writing—review and editing, C.N. and S.B.P. Visualization, C.N.; Supervision, S.B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada, grant number R0610A.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We wish to express our sincere appreciation to the three reviewers for their insightful comments and suggestions. The second and third authors acknowledge with thanks the financial support of the Natural Sciences and Engineering Research Council of Canada.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aalen, Odd O. 1995. Phase-type distributions in survival analysis. Scandinavian Journal of Statistics 22: 447–63. [Google Scholar]
Aslett, Louis J. M. 2012. MCMC for Inference on Phase-Type and Masked System Lifetime Models. Ph.D. dissertation, Trinity College, Dublin, Ireland. [Google Scholar]
Asmussen, Søren, Olle Nerman, and Marita Olsson. 1996. Fitting phase-type distributions via the EM algorithm. Scandinavian Journal of Statistics 23: 419–41. [Google Scholar]
Box, George E. P., and David R. Cox. 1964. An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological) 26: 211–43. [Google Scholar] [CrossRef]
Brun, Roland, Peter Reichert, and Hans R. Künsch. 2001. Practical identifiability analysis of large environmental simulation models. Water Resources Research 37: 1015–30. [Google Scholar] [CrossRef]
Cheng, Boquan. 2021. A Class of Phase-Type Aging Models and Their Lifetime Distributions. Ph.D. thesis, Western University, London, ON, Canada. [Google Scholar]
Cheng, Boquan, Bruce Jones, Xiaoming Liu, and Jiandong Ren. 2021. The mathematical mechanism of biological aging. North American Actuarial Journal 25: 73–93. [Google Scholar] [CrossRef]
Cheng, Boquan, and Rogemar Mamon. 2023a. Examining the identifiability and estimability of the phase-type ageing model. Computational Statistics 38: 1–42. [Google Scholar] [CrossRef]
Cheng, Boquan, and Rogemar Mamon. 2023b. A uniformisation-driven algorithm for inference-related estimation of a phase-type ageing model. Lifetime Data Analysis 29: 142–87. [Google Scholar] [CrossRef]
Chis, Oana-Teodora, Julio R. Banga, and Eva Balsa-Canto. 2011. Structural identifiability of systems biology models: A critical comparison of methods. PLoS ONE 6: e27755. [Google Scholar] [CrossRef]
Cox, David R. 1955a. The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables. In Mathematical Proceedings of the Cambridge Philosophical Society. Cambridge: Cambridge University Press, vol. 51, pp. 433–41. [Google Scholar]
Cox, David R. 1955b. A use of complex probabilities in the theory of stochastic processes. In Mathematical Proceedings of the Cambridge Philosophical Society. Cambridge: Cambridge University Press, vol. 51, pp. 313–19. [Google Scholar]
Degenring, Daniela, C. Froemel, Gerhard Dikta, and Ralf Takors. 2004. Sensitivity analysis for the reduction of complex metabolism models. Journal of Process Control 14: 729–45. [Google Scholar] [CrossRef]
Dochain, Denis, and Peter A. Vanrolleghem. 2001. Dynamical Modelling & Estimation in Wastewater Treatment Processes. London: IWA Publishing, Water Intelligence Online, 4. [Google Scholar]
Gompertz, Benjamin. 1825. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London 115: 513–83. [Google Scholar]
Gontier, Camille, and Jean-Pascal Pfister. 2020. Identifiability of a binomial synapse. Frontiers in Computational Neuroscience 14: 86. [Google Scholar] [CrossRef] [PubMed]
Hengl, Stefan, Clemens Kreutz, Jens Timmer, and Thomas Maiwald. 2007. Data-based identifiability analysis of non-linear dynamical models. Bioinformatics 23: 2612–18. [Google Scholar] [CrossRef] [PubMed]
Holmberg, Andrea. 1982. On the practical identifiability of microbial growth models incorporating Michaelis-Menten type nonlinearities. Mathematical Biosciences 62: 23–43. [Google Scholar] [CrossRef]
Hyde, J. 1980. Testing survival with incomplete observations. In Biostatistics Casebook. New York: Wiley, pp. 31–46. [Google Scholar]
Jacquez, John A., and Peter Greif. 1985. Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design. Mathematical Biosciences 77: 201–27. [Google Scholar] [CrossRef]
Lehmann, Erich L., and George Casella. 1998. Unbiasedness. In Theory of Point Estimation. Berlin/Heidelberg: Springer, pp. 83–146. [Google Scholar]
Lele, Subhash R., Brian Dennis, and Frithjof Lutscher. 2007. Data cloning: Easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10: 551–63. [Google Scholar] [CrossRef]
Lele, Subhash R., Khurram Nadeem, and Byron Schmuland. 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105: 1617–25. [Google Scholar] [CrossRef]
Lin, X. Sheldon, and Xiaoming Liu. 2007. Markov aging process and phase-type law of mortality. North American Actuarial Journal 11: 92–109. [Google Scholar] [CrossRef]
McLean, Kevin A. P., and Kim B. McAuley. 2012. Mathematical modelling of chemical processes—obtaining the best model predictions and parameter estimates using identifiability and estimability procedures. The Canadian Journal of Chemical Engineering 90: 351–66. [Google Scholar] [CrossRef]
Miao, Hongyu, Xiaohua Xia, Alan S. Perelson, and Hulin Wu. 2011. On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Review 53: 3–39. [Google Scholar] [CrossRef]
Nie, Cong. 2022. New Developments on the Estimability and the Estimation of Phase-Type Actuarial Models. Ph.D. thesis, Western University, London, ON, Canada. [Google Scholar]
Petersen, B., Krist Gernaey, and P. A. Vanrolleghem. 2001. Practical identifiability of model parameters by combined respirometric-titrimetric measurements. Water Science and Technology 43: 347–55. [Google Scholar] [CrossRef] [PubMed]
Pham, Hoang, and Chin-Diew Lai. 2007. On recent generalizations of the Weibull distribution. IEEE Transactions on Reliability 56: 454–58. [Google Scholar] [CrossRef]
Quaiser, Tom, and Martin Mönnigmann. 2009. Systematic identifiability testing for unambiguous mechanistic modeling–application to jak-stat, map kinase, and nf-κ b signaling pathway models. BMC Systems Biology 3: 1–21. [Google Scholar] [CrossRef]
Raue, Andreas, Clemens Kreutz, Thomas Maiwald, Julie Bachmann, Marcel Schilling, Ursula Klingmüller, and Jens Timmer. 2009. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 25: 1923–29. [Google Scholar] [CrossRef]
Rodriguez-Fernandez, Maria, Jose A. Egea, and Julio R. Banga. 2006. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics 7: 1–18. [Google Scholar] [CrossRef]
Su, Shu, and Michael Sherris. 2012. Heterogeneity of Australian population mortality and implications for a viable life annuity market. Insurance: Mathematics and Economics 51: 322–32. [Google Scholar] [CrossRef]
Vajda, Sandor, Herschel Rabitz, Eric Walter, and Yves Lecourtier. 1989. Qualitative and quantitative identifiability analysis of nonlinear chemical kinetic models. Chemical Engineering Communications 83: 191–219. [Google Scholar] [CrossRef]
Yao, K. Zhen, Benjamin M. Shaw, Bo Kou, Kim B. McAuley, and D. W. Bacon. 2003. Modeling ethylene/butene copolymerization with multi-site catalysts: Parameter estimability and experimental design. Polymer Reaction Engineering 11: 563–88. [Google Scholar] [CrossRef]

Figure 1. Phase diagram for a Coxian distribution with no probability mass at zero.

Figure 2. Phase diagram for the PTAM.

Figure 3. Behavior of the exit rate vector for various values of s with

m = 50

.

Figure 3. Behavior of the exit rate vector for various values of s with

m = 50

.

Figure 4. Behavior of the exit rate vector for various values of s with

m = 100

.

Figure 4. Behavior of the exit rate vector for various values of s with

m = 100

.

Figure 5. Contour plots of the CSCR on the parameter space of

M

. Shades from yellow to orange correspond to high to low values of the c.d.f. sensitivity measure, respectively. The contour lines display the boundary of the CSCR, and the asterisk indicates the optimal parameter estimates

\hat{m}

and

\hat{λ}

. Left panel: Experiment 1—non-estimable. Middle panel: Experiment 2—non-estimable but showing some improvement. Right panel: Experiment 3—estimable.

Figure 5. Contour plots of the CSCR on the parameter space of

M

. Shades from yellow to orange correspond to high to low values of the c.d.f. sensitivity measure, respectively. The contour lines display the boundary of the CSCR, and the asterisk indicates the optimal parameter estimates

\hat{m}

and

\hat{λ}

. Left panel: Experiment 1—non-estimable. Middle panel: Experiment 2—non-estimable but showing some improvement. Right panel: Experiment 3—estimable.

Figure 6. Simulated data by the estimated model and estimated sub-model for the constrained binomial model. Left panel: Experiment 1—non-estimable. Middle panel: Experiment 2—non-estimable but with some improvements. Right panel: Experiment 3—estimable.

Figure 7. Contour plots of the CSCR on the parameter space of

M

. Shades from yellow to orange correspond to high to low values of the c.d.f. sensitivity measure, respectively. The contour lines display the boundary of the CSCR, and the asterisk indicates the optimal parameter estimates

\hat{α}

and

\hat{λ}

. Left panel: Experiment 1—non-estimable. Middle panel: Experiment 2—non-estimable but with some improvements. Right panel: Experiment 3—estimable.

Figure 7. Contour plots of the CSCR on the parameter space of

M

. Shades from yellow to orange correspond to high to low values of the c.d.f. sensitivity measure, respectively. The contour lines display the boundary of the CSCR, and the asterisk indicates the optimal parameter estimates

\hat{α}

and

\hat{λ}

. Left panel: Experiment 1—non-estimable. Middle panel: Experiment 2—non-estimable but with some improvements. Right panel: Experiment 3—estimable.

Figure 8. Simulated data by the estimated model and estimated sub-model for the constrained Pareto model. Left panel: Experiment 1—non-estimable. Middle panel: Experiment 2—non-estimable but showing some improvement. Right panel: Experiment 3—estimable.

Table 1. c.d.f. sensitivity and experimental error for the constrained binomial model—Experiments 1 to 3. The asterisk indicates that the binomial model is non-estimable.

Sample Size	Sub-Model	Sub-Model Boundary	$max_{k_{i} \in k} \{\|F (k_{i}; \hat{m}, \hat{λ}) - G_{1} (k_{i}; \hat{λ})\|\}$	$ϵ$
50	$M_{1}$	$r_{B, 1}$	0.00426 *	$0.05133$
500	$M_{1}$	$r_{B, 1}$	0.00449 *	$0.02269$
1000	$M_{1}$	$r_{B, 1}$	0.01276	$0.00480$

Table 2. c.d.f. sensitivity and experimental error for the constrained Pareto model—Experiments 1 to 3. The asterisk indicates that the Pareto model is non-estimable.

Sample Size	Sub-Model	Sub-Model Boundary	$max_{x_{i} \in x} \{\|F (x_{i}; \hat{α}, \hat{λ}) - G_{1} (x_{i}; \hat{λ})\|\}$	$ϵ$
100	$M_{1}$	$r_{B, 1}$	0.02081 *	$0.05438$
200	$M_{1}$	$r_{B, 1}$	0.02628 *	$0.02685$
500	$M_{1}$	$r_{B, 1}$	$0.03654$	$0.02500$

Table 3. c.d.f. sensitivity and experimental error for the PTAM—Experiments 1 to 6. The asterisk indicates that the PTAM is non-estimable.

Sample Size	Sub-Model	Sub-Model Boundary	100,000 IVs		200,000 IVs
Sample Size	Sub-Model	Sub-Model Boundary	$max_{i} \|F (x_{i}; \hat{θ}) - G_{k} (x_{i}; \hat{p})\|$	$ϵ$	$max_{i} \|F (x_{i}; \hat{θ}) - G_{k} (x_{i}; \hat{p})\|$	$ϵ$
50	$M_{1}$	$r_{B, 1}$	$0.23261$	$0.05807$	$0.22661$	$0.05310$
	$M_{2}$	$r_{B, 2}$	$0.22322$	$0.05807$	$0.21630$	$0.05310$
	$M_{3}$	$r_{B, 3}$	0.02414 *	$0.05807$	${0.01698}^{*}$	$0.05310$
	$M_{4}$	$r_{B, 4}$	$0.12693$	$0.05807$	$0.12693$	$0.05310$
	$M_{5}$	$r_{B, 5}$	0.05007 *	$0.05807$	0.05014 *	$0.05310$
	$M_{6}$	$r_{B, 6}$	$0.12054$	$0.05807$	$0.11965$	$0.05310$
	$M_{9}$	$r_{B, 9}$	0.04271 *	$0.05807$	0.01863 *	$0.05310$
	$M_{10}$	$r_{B, 10}$	$0.14590$	$0.05807$	$0.14300$	$0.05310$
	$M_{11}$	$r_{B, 11}$	0.04976 *	$0.05807$	0.04981 *	$0.05310$
	$M_{12}$	$r_{B, 12}$	$0.14360$	$0.05807$	$0.14360$	$0.05310$
	$M_{13}$	$r_{B, 13}$	$0.16902$	$0.05807$	$0.12708$	$0.05310$
	$B$	$r_{B, 7}, r_{B, 8}, r_{B, 14}$ or $r_{B, 15}$	$0.36318$	$0.05807$	$0.35797$	$0.05310$
500	$M_{1}$	$r_{B, 1}$	$0.32839$	$0.04051$	$0.31173$	$0.03845$
	$M_{2}$	$r_{B, 2}$	$0.21331$	$0.04051$	$0.21132$	$0.03845$
	$M_{3}$	$r_{B, 3}$	0.02024 *	$0.04051$	0.01520 *	$0.03845$
	$M_{4}$	$r_{B, 4}$	$0.09340$	$0.04051$	$0.08392$	$0.03845$
	$M_{5}$	$r_{B, 5}$	0.02903 *	$0.04051$	0.03192 *	$0.03845$
	$M_{6}$	$r_{B, 6}$	$0.07648$	$0.04051$	$0.07648$	$0.03845$
	$M_{9}$	$r_{B, 9}$	0.02352 *	$0.04051$	0.02048 *	$0.03845$
	$M_{10}$	$r_{B, 10}$	$0.10873$	$0.04051$	$0.09800$	$0.03845$
	$M_{11}$	$r_{B, 11}$	0.02827 *	$0.04051$	0.02827 *	$0.03845$
	$M_{12}$	$r_{B, 12}$	$0.09986$	$0.04051$	$0.09987$	$0.03845$
	$M_{13}$	$r_{B, 13}$	$0.30614$	$0.04051$	$0.15558$	$0.03845$
	$B$	$r_{B, 7}, r_{B, 8}, r_{B, 14}$ or $r_{B, 15}$	$0.33164$	$0.04051$	$0.34158$	$0.03845$
2000	$M_{1}$	$r_{B, 1}$	$0.20239$	$0.02972$	$0.26365$	$0.01824$
	$M_{2}$	$r_{B, 2}$	$0.17499$	$0.02972$	$0.17335$	$0.01824$
	$M_{3}$	$r_{B, 3}$	0.02612 *	$0.02972$	$0.02270$	$0.01824$
	$M_{4}$	$r_{B, 4}$	$0.05942$	$0.02972$	$0.05942$	$0.01824$
	$M_{5}$	$r_{B, 5}$	0.01999 *	$0.02972$	$0.02004$	$0.01824$
	$M_{6}$	$r_{B, 6}$	$0.04716$	$0.02972$	$0.04716$	$0.01824$
	$M_{9}$	$r_{B, 9}$	$0.03645$	$0.02972$	$0.02312$	$0.01824$
	$M_{10}$	$r_{B, 10}$	$0.07025$	$0.02972$	$0.07025$	$0.01804$
	$M_{11}$	$r_{B, 11}$	0.01936 *	$0.02972$	$0.01936$	$0.01824$
	$M_{12}$	$r_{B, 12}$	$0.06873$	$0.02972$	$0.06872$	$0.01824$
	$M_{13}$	$r_{B, 13}$	$0.11637$	$0.02972$	$0.15976$	$0.01824$
	$B$	$r_{B, 7}, r_{B, 8}, r_{B, 14}$ or $r_{B, 15}$	$0.34331$	$0.02972$	$0.36079$	$0.01824$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nie, C.; Liu, X.; Provost, S.B. An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model. Risks 2024, 12, 37. https://doi.org/10.3390/risks12020037

AMA Style

Nie C, Liu X, Provost SB. An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model. Risks. 2024; 12(2):37. https://doi.org/10.3390/risks12020037

Chicago/Turabian Style

Nie, Cong, Xiaoming Liu, and Serge B. Provost. 2024. "An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model" Risks 12, no. 2: 37. https://doi.org/10.3390/risks12020037

APA Style

Nie, C., Liu, X., & Provost, S. B. (2024). An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model. Risks, 12(2), 37. https://doi.org/10.3390/risks12020037

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Objective Measure of Distributional Estimability as Applied to the Phase-Type Aging Model

Abstract

1. Introduction

1.1. Defining the Concept of Estimability via an Objective Threshold

1.2. Extending the Applicability of the Concept of Estimability to Statistical Inference

1.3. Structure of the Paper

2. The Phase-Type Aging Model

2.1. Preliminaries

2.2. Identifiability of the PTAM

3. Literature Review

3.1. Methods for Assessing Estimability

3.1.1. The Monte Carlo Method

3.1.2. Methods Based on the Correlation Matrix or the Fisher Information Matrix

3.1.3. Methods Based on the Model Sensitivity

3.1.4. Methods Based on Profile Likelihood

3.2. Relationships between Identifiability, Estimability, and Sensitivity

4. A Methodology for Objectively Assessing Estimability

5. Validation of the Proposed Definition

5.1. Validation of the Data Noise, the Algorithm Noise, and the Experimental Error

5.2. Validation of the c.d.f. Sensitivity-Based Confidence Region

5.3. Validation by Known Techniques

5.3.1. Increasing the Sample Size

5.3.2. Increasing the Convexity of the Log-Likelihood Function

5.3.3. Improving the Algorithm Design

5.3.4. Securing More Complete Information

5.4. Illustrative Examples

5.4.1. Example 1

5.4.2. Example 2

6. Estimability of the PTAM

6.1. Identifiability of the PTAM

6.2. Sub-Models of the PTAM

6.3. Simulation Study

7. Remarks and Conclusions

7.1. A Recommendation Regarding the Methodology

7.2. Future Research Avenues

7.2.1. Refinements Regarding the c.d.f. Sensitivity

7.2.2. Refinements for Left-Truncated and Right-Censored Data

7.3. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI