Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection

Bumbulis, Laura S.; Cook, Richard J.

doi:10.3390/math13081274

Open AccessFeature PaperArticle

Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection

by

Laura S. Bumbulis

^*

and

Richard J. Cook

Department of Statistics and Actuarial Science, University of Waterloo, 200 University Ave. W., Waterloo, ON N2L 3G1, Canada

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(8), 1274; https://doi.org/10.3390/math13081274

Submission received: 10 March 2025 / Revised: 10 April 2025 / Accepted: 11 April 2025 / Published: 12 April 2025

(This article belongs to the Special Issue Improved Mathematical Methods in Decision Making Models)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Processes in biotechnology are considered reliable if they produce samples satisfying regulatory benchmarks. For example, laboratories may be required to show that levels of an undesirable analyte rarely (e.g., in less than 5% of samples) exceed a tolerance threshold. This can be challenging when measurement systems feature a lower limit of detection, rendering some observations left-censored. We investigate the implications of detection limits on location-scale model-based inference in reliability studies, including their impact on large and finite sample properties of various estimators and the sensitivity of results to model misspecification. To address the need for robust methods, we introduce a flexible weakly parametric model in which the right tail of the response distribution is approximated using a piecewise-constant hazard model. Simulation studies are reported that investigate the performance of the established and proposed methods, and an illustrative application is given to a study of drinking can weights. We conclude with a discussion of areas warranting future work.

Keywords:

process reliability; limit of detection; censored data; piecewise-constant hazards; asymptotic theory

MSC:

62N05

1. Introduction

Testing and manufacturing processes in biotechnology are considered reliable if they produce samples satisfying regulatory benchmarks. For example, laboratories separating blood into transfusable components are required to show that levels of an undesirable analyte in the blood units produced rarely (e.g., in less than 5% of samples) exceed a tolerance threshold [1]. Likewise, the United States Food and Drug Administration stipulates maximum allowable levels of residual chemicals and bacterial toxins for medical devices labelled as sterile [2], and of various ingredients for food products labelled as allergen- or gluten-free [3]. Similar situations arise in environmental science, where sites such as nuclear and waste disposal plants must demonstrate that the concentration of a contaminant in the surrounding soil, air, or water is sufficiently low [4].

In these scientific settings, measurement systems often feature a lower limit of detection (LLD), rendering some observations left-censored. Two common ad hoc approaches for dealing with such data are “deletion”, in which samples yielding values below the LLD are discarded, and “substitution”, in which values are taken to be zero, the LLD itself, or some intermediate value (such as half the LLD) [5,6]. While these methods continue to be widely used [7], much of the literature in environmental science has focused on characterizing their limitations and investigating improved techniques for left-censored data (e.g., [8,9,10,11,12,13,14]). These techniques include Kaplan–Meier estimation [15], maximum likelihood estimation for parametric models [16], and “regression on order statistics” [11], the latter of which involves conducting least squares regression on the quantiles of the log-transformed observations. The targets of inference in this work have mostly been central features of the measurement distribution, such as the mean, variance, median, and interquartile range. Our interest lies in the impact of an LLD in studies aiming to demonstrate process reliability, which inherently aims at inferences regarding the upper tail of the distribution. For example, demonstrating that the levels of an analyte are “low enough” (i.e., below some specified threshold) in at least 95% of blood samples requires inferences regarding the 95th percentile of the response distribution. Huynh et al. [17] investigated a fully parametric Bayesian approach to estimating various quantities but found that this method performed comparably to or worse than substitution methods when estimating the 95th percentile. The impact of an LLD on frequentist inference in the upper tail of a distribution has been less studied and motivates our work.

In Section 2, we introduce some notation, specify the hypothesis test of interest, and discuss analysis in the setting without an LLD, and in Section 3, we examine the large sample impacts of an LLD on estimation and inferences from a (possibly misspecified) normal model. Tests of reliability based on misspecified models will often not control rejection rates at the desired levels. Moreover, the power of goodness of fit tests for detecting model misspecification is generally low for typical sample sizes, particularly in the presence of an LLD. In Section 4, we address these issues by introducing a flexible weakly parametric method in which the right tail of the measurement distribution is approximated using a piecewise-constant hazard model. In Section 5, we use simulation studies to investigate finite sample settings with sample sizes commonly encountered in laboratory research. We also report on findings from simulation studies which can help guide piecewise-constant hazard-based model selection as well as sample size specification under various models, and an R function is available for custom investigations. An application to a study of drinking can weights is given for illustration in Section 6. We conclude in Section 7 with some recommendations for the design of future studies and discuss areas for future work.

2. Background on Methods

Let Y denote the real-valued response with cumulative distribution function (c.d.f.) F and density function f. Suppose n independent observations

y_{1}, \dots, y_{n}

are available from the distribution. For

i = 1, \dots, n

, suppose observation

y_{i}

is subject to LLD

d_{i}

such that the measurement

y_{i}

is available if

y_{i} > d_{i}

, but otherwise, we only know

I (y_{i} < d_{i})

. One could consider settings with different LLDs for different subsets of observations; this may be suitable in studies involving multiple laboratories or measurement devices. One may also view the LLD as a quantity that varies randomly depending on environmental factors such as ambient temperature or humidity. For simplicity, we consider the setting of a study with a single measurement device and set

d_{i} = d

for all i.

We consider the goal of demonstrating that a process produces satisfactory measurements defined by a desired percentage of values below some threshold

τ

. We therefore aim to show

F (τ) = P (Y < τ) > p_{0}

at the

α

significance level (i.e., with “100

(1 - α) %

confidence”) for some probability

p_{0}

. Letting

p_{1} = F (τ)

denote the true value of the c.d.f. at

τ

, this entails a test of reliability:

H_{0} : p_{1} = p_{0} vs . H_{A} : p_{1} > p_{0} .

(1)

To carry out a test of this null hypothesis, we focus on the broadly applicable and mathematically convenient family of parametric location-scale models [18]. By convention, we work with the real-valued forms of these distributions (e.g., normal, logistic, extreme value) rather than the non-negative analogs (e.g., log-normal, log-logistic, Weibull); as a result, when measurements take non-negative values (e.g., concentrations), it may be necessary to re-map the original measurements (and the original threshold), say using the log transform, in order to obtain real-valued Y and

τ

.

We now consider the hypothesis test of interest in (1) based on parametric location-scale models. Suppose Y is in the location-scale family with location parameter

θ_{1}

and scale parameter

θ_{2}

with

θ = {(θ_{1}, θ_{2})}^{'}

. We write

p_{1} = F (τ; θ) = F_{0} (\frac{τ - θ_{1}}{θ_{2}})

where

F_{0} (\cdot)

is the c.d.f. for the standardized random variable

(Y - θ_{1}) / θ_{2}

, so that the hypotheses of interest can be expressed equivalently as

H_{0} : F_{0} (\frac{τ - θ_{1}}{θ_{2}}) = p_{0} vs . H_{A} : F_{0} (\frac{τ - θ_{1}}{θ_{2}}) > p_{0} .

With a lower limit of detection, the data for a sample of n independent observations are

{(x_{i}, δ_{i}), i = 1, \dots, n}

, where

x_{i} = max (y_{i}, d)

and

δ_{i} = I (x_{i} = y_{i})

,

i = 1, \dots, n

. The central idea in constructing the testing procedure is to exploit the assumption of the parametric model to learn about the shape of the distribution and enhance inference (i.e., increase the precision and power of tests) regarding the upper tail of the distribution. Under a working model defined by the error distribution

F_{0} (\cdot)

, the likelihood is

L (θ; x, δ) = \prod_{i = 1}^{n} \{f_{0} {(\frac{x_{i} - θ_{1}}{θ_{2}})}^{δ_{i}} F_{0} {(\frac{x_{i} - θ_{1}}{θ_{2}})}^{1 - δ_{i}}\}

where

x = {(x_{1}, \dots, x_{n})}^{'}

and

δ = {(δ_{1}, \dots, δ_{n})}^{'}

. Such a model can be fitted to left-censored data via maximum likelihood [16] using the survreg function in R version 4.3.2 [19] or using Parametric Distribution Analysis (Arbitrary Censoring) in Minitab^® version 21.4.0. To get a

100 p %

confidence interval (CI) for

F (τ; θ) = F_{0} (\frac{τ - θ_{1}}{θ_{2}}) = F_{0} (ψ_{1} + ψ_{2} τ)

where

ψ_{1} = - θ_{1} / θ_{2}

and

ψ_{2} = 1 / θ_{2}

, we first use the delta method [20] to obtain a

100 p %

CI for

ψ_{1} + ψ_{2} τ

, then apply

F_{0}

to the endpoints of this CI. We reject

H_{0}

if the lower limit of the CI exceeds

p_{0}

and fail to reject

H_{0}

otherwise.

3. Inference Under Model Misspecification

The normal distribution is often the default model for responses in applications, so we next consider properties of estimators and inference procedures when one assumes

Y_{1}, \dots, Y_{n}

are (independently) generated from a normal distribution with mean

θ_{1 a}

and variance

θ_{2 a}^{2}

and the data-generating distribution may differ. As illustrations, we consider the cases where Y is generated from (i) a logistic distribution with c.d.f.

F_{T} (y; θ) = {\{1 + exp (- \frac{y - θ_{1}}{θ_{2}})\}}^{- 1},

(2)

and (ii) an extreme value distribution with c.d.f.

F_{T} (y; θ) = 1 - exp \{- exp (\frac{y - θ_{1}}{θ_{2}})\} .

(3)

Here, we use subscript T to emphasize that

F_{T}

is the true (not assumed) c.d.f. of Y.

Now let

X_{i} = max (Y_{i}, d)

with

Δ_{i} = I (X_{i} = Y_{i})

, and fix the left-censoring rate

q = P (Y_{i} \leq d) = E {1 - Δ_{i}}

. Then, the observed data likelihood for the assumed normal model is

L (θ_{a}; x, δ) = \prod_{i = 1}^{n} \{{[\frac{1}{θ_{2 a}} exp (- \frac{1}{2} {(\frac{x_{i} - θ_{1 a}}{θ_{2 a}})}^{2})]}^{δ_{i}} F_{a, 0} {(\frac{x_{i} - θ_{1 a}}{θ_{2 a}})}^{1 - δ_{i}}\},

or, equivalently,

L (ψ_{a}; x, δ) = \prod_{i = 1}^{n} \{{[ψ_{2 a} exp (- \frac{1}{2} {(ψ_{1 a} + ψ_{2 a} x_{i})}^{2})]}^{δ_{i}} F_{a, 0} {(ψ_{1 a} + ψ_{2 a} x_{i})}^{1 - δ_{i}}\}

where

F_{a, 0}

denotes the c.d.f. of

N (0, 1)

, and

ψ_{a} = {(ψ_{1 a}, ψ_{2 a})}^{'} = {(- θ_{1 a} / θ_{2 a}, 1 / θ_{2 a})}^{'}

such that

ψ_{1 a} + ψ_{2 a} x = \frac{x - θ_{1 a}}{θ_{2 a}} .

Since

δ_{i} = I (x_{i} = y_{i})

, we can also write this as

L (ψ_{a}; y, δ) = \prod_{i = 1}^{n} \{{[ψ_{2 a} exp (- \frac{1}{2} {(ψ_{1 a} + ψ_{2 a} y_{i})}^{2})]}^{δ_{i}} F_{a, 0} {(ψ_{1 a} + ψ_{2 a} d)}^{1 - δ_{i}}\} .

(4)

As shown by White [21], the estimator

{\hat{ψ}}_{a}

solves the score equations

S (ψ_{a}; y, δ) = \sum_{i = 1}^{n} S_{i} (ψ_{a}; y_{i}, δ_{i}) = 0

corresponding to (4) and satisfies

\sqrt{n} ({\hat{ψ}}_{a} - ψ_{a}^{*}) \sim M V N (0, A^{- 1} (ψ_{a}^{*}) B (ψ_{a}^{*}) A^{- 1} (ψ_{a}^{*}))

where

$ψ_{a}^{*}$ solves $E_{T} {S_{i} (ψ_{a}; Y_{i}, Δ_{i})} = \int S_{i} (ψ_{a}; y_{i}, δ_{i}) d F_{T} (y_{i}; ψ) = 0,$
$A (ψ_{a}^{*}) = E_{T} \{- \frac{\partial}{\partial ψ_{a}^{'}} S_{i} (ψ_{a}; Y_{i}, Δ_{i})\} |_{ψ_{a} = ψ_{a}^{*}},$ and
$B (ψ_{a}^{*}) = E_{T} \{S_{i} (ψ_{a}; Y_{i}, Δ_{i}) S_{i}^{'} (ψ_{a}; Y_{i}, Δ_{i})\} |_{ψ_{a} = ψ_{a}^{*}}$

and

E_{T} {\cdot}

denotes the expectation under the true data-generating model; see also Gourieroux, Monfort, and Trognon [22] and Qin and Lawless [23]. Appendix A shows how to compute the limiting values and implicit estimands

ψ_{a}^{*}

when the true distribution is logistic or extreme value (case (i) or (ii) above, respectively). Recall that

ψ_{1} = - θ_{1} / θ_{2}

and

ψ_{2} = 1 / θ_{2}

, so to recover the limiting values of estimates for

θ_{1}

,

log (θ_{2})

,

ψ_{1} + ψ_{2} τ

, and

F_{T} (τ; ψ)

, we, respectively, use the following:

$- ψ_{1 a}^{*} / ψ_{2 a}^{*}$ ;
$- log (ψ_{2 a}^{*})$ ;
$ψ_{1 a}^{*} + ψ_{2 a}^{*} τ$ ;
$F_{a, 0} (ψ_{1 a}^{*} + ψ_{2 a}^{*} τ)$ .

The bias of an estimator is defined as its estimand based on the assumed model minus its intended estimand (e.g.,

bias ({\hat{θ}}_{1 a}) = - ψ_{1 a}^{*} / ψ_{2 a}^{*} - θ_{1}

). Plots of bias as a function of the left-censoring rate q for the first two estimands listed above are displayed in Figure 1; here, we use scale

θ_{2} = 1

(i.e.,

log (θ_{2}) = 0

), and location

θ_{1}

, chosen to align the 95th percentile of the data-generating distribution with that of the standard normal, resulting in

θ_{1} = - 1.30

for the logistic distribution and

θ_{1} = 0.55

for the extreme value distribution. Examining these plots at

q = 0

shows that, in the absence of an LLD, model misspecification may lead to substantial bias. Note that larger LLDs may or may not be associated with less absolute bias, indicating that a loss of information in the lower tail of the distribution may, in some cases, improve performance in terms of bias but the LLD cannot be relied upon for this purpose.

The bias curves for the second two estimands, displayed in Figure 2, further demonstrate the meaningful bias that may result from model misspecification in the absence of an LLD, even at high reliability levels. As before, increasing the left-censoring rate may or may not reduce bias here, although the reliability estimator under the extreme value error distribution shows marked reduction in asymptotic bias with a larger LLD.

In order to investigate the relationship between the LLD and rejection rates for tests of the hypotheses in Section 2, we next examine the asymptotic rejection rate as a function of the left-censoring rate. We begin by reformulating our hypothesis test in terms of the quantity

c^{'} ψ = \frac{τ - θ_{1}}{θ_{2}}

where

c = {(1, τ)}^{'}

and

ψ = {(ψ_{1}, ψ_{2})}^{'} = {(- θ_{1} / θ_{2}, 1 / θ_{2})}^{'}

; in particular, since

F_{a} (τ; ψ) = F_{a, 0} (c^{'} ψ)

, where

F_{a, 0}

is again the standardized c.d.f. under the assumed model, and we wish to test

H_{0} : F_{a} (τ; ψ) = p_{0} vs . H_{A} : F_{a} (τ; ψ) > p_{0},

we apply

F_{a, 0}^{- 1}

to both sides of each hypothesis to get

H_{0} : c^{'} ψ = F_{a, 0}^{- 1} (p_{0}) vs . H_{A} : c^{'} ψ > F_{a, 0}^{- 1} (p_{0}) .

Now, let

{\hat{ψ}}_{a}

be the estimate of

ψ

under the assumed model with limiting value

ψ_{a}^{*}

. Given the robust (“true”) variance

{Var}_{T} ({\hat{ψ}}_{a})

of White [21] and the corresponding variance

{Var}_{a} ({\hat{ψ}}_{a})

under the assumed model, we can compute the asymptotic rejection rate (at the 0.025 one-sided significance level) for our reformulated hypothesis test using

P (\frac{c^{'} {\hat{ψ}}_{a} - F_{a, 0}^{- 1} (p_{0})}{\sqrt{{Var}_{a} (c^{'} {\hat{ψ}}_{a})}} > 1.96) = 1 - P (\frac{F_{a, 0}^{- 1} (p_{0}) - c^{'} ψ_{a}^{*} + 1.96 \sqrt{c^{'} {Var}_{a} ({\hat{ψ}}_{a}) c}}{\sqrt{c^{'} {Var}_{T} ({\hat{ψ}}_{a}) c}}) .

(5)

Explicit expressions for

{Var}_{T} ({\hat{ψ}}_{a})

and

{Var}_{a} ({\hat{ψ}}_{a})

are derived under an assumed normal model in Appendix B and Appendix C.

In a statistical hypothesis test, the “type I error rate” refers to the probability of rejecting

H_{0}

when it is true (i.e., incorrectly concluding that

p_{1} > p_{0}

when the truth is

p_{1} = p_{0}

in this case), while “power” refers to the probability of rejecting

H_{0}

when it is false (i.e., correctly concluding that

p_{1} > p_{0}

) across multiple samples from the same population. An illustration of the rejection rates resulting from (5) is shown in Figure 3 (type I error;

p_{1} = 0.95

) and Figure 4 (power;

p_{1} = 0.97, 0.98, 0.99

), respectively; here, we use

p_{0} = 0.95

,

n = 80

, scale

θ_{2} = 1

, and location

θ_{1}

, chosen to align the

p_{1}

quantile of the data-generating distribution with that of the standard normal.

As expected, when the error distribution is normal (and thus matches the assumed model), the type I error rate is at the nominal level of 0.025 (red dashed line in Figure 3) and power is a strictly decreasing function of the left-censoring rate q (Figure 4). When the error distribution is logistic, we see an elevated false positive rate relative to the normal data case (Figure 3) as well as increased power (Figure 4); by contrast, when it is extreme value, the false positive rate is very low and power is also greatly diminished. Inference is invalid in both of these cases due to poor control of the type I error rate. Interestingly, the impact of q on asymptotic power does not depend much on the true reliability level

p_{1}

in the normal and logistic error settings but increases substantially as a function of

p_{1}

with extreme value errors (Figure 4).

4. A Piecewise-Constant Hazard-Based Model

In Appendix D, we investigate the performance of the Shapiro–Francia test [24] of normality under an LLD and find that some location-scale distributions that differ greatly in their tails (and, therefore, differ greatly in their reliability estimates) may be indistinguishable to whole-distribution goodness of fit tests. Based on this finding, we now introduce a flexible class of models based on piecewise-constant hazard models from the field of survival analysis [18]. This approach requires fewer parametric assumptions and intentionally discards observations in the lower tail of the measurement distribution to improve the robustness of inference in the upper tail.

We first consider non-negative values

V_{i} = exp (Y_{i})

,

i = 1, \dots, n

, and corresponding threshold value

τ_{V} = exp (τ)

. To limit the influence of observations in the lower tail on inference in the upper tail, we next restrict attention to values of

v > τ_{V} - ϵ

for some tolerance

ϵ \geq 0

. In particular, we take

v_{0} = τ_{V} - ϵ

and simply assign a mass to the event

V < v_{0}

given by

p = P (V < v_{0})

. Letting

η = logit (p)

yields

F_{V} (v_{0}; η) = P (V < v_{0}; η) = \frac{exp (η)}{1 + exp (η)} = expit (η) .

Then, to model

F_{V} (v) = 1 - F_{V} (v)

for

v > v_{0}

, we partition

(v_{0}, \infty)

into K intervals using cut-points

v_{0} = b_{0} < b_{1} < \dots < b_{K} = \infty

and let

B_{k} = [b_{k - 1}, b_{k})

for

k = 1, \dots, K

. We then use the model

F_{V} (v; p, ρ) = P (V > v; p, ρ) = (1 - p) e^{- Λ (v_{0}, v; ρ)}

where the following apply:

$Λ (v_{0}, v; ρ) = \sum_{k = 1}^{K} W_{k} (v_{0}, v) ρ_{k}$ is the “cumulative hazard”;
$W_{k} (v_{0}, v) = \int_{v_{0}}^{v} I (u \in B_{k}) d u$ is the “time at risk” in $B_{k}$ over the interval $[v_{0}, v)$ .

Letting

β_{k} = log ρ_{k}

for

k = 1, \dots, K

and

β = {(β_{1}, \dots, β_{K})}^{'}

, we can write

F_{V} (v; η, β) = (1 - expit (η)) exp \{- \sum_{k = 1}^{K} W_{k} (v_{0}, v) e^{β_{k}}\} .

4.1. Maximum Likelihood Estimation

In what follows, we suppress the subscript V on density

f_{V}

, c.d.f.

F_{V}

, and survivor function

F_{V}

of V, for ease of notation. Since

d W_{k} (v_{0}, v) / d y = I (v \in B_{k})

, then the corresponding density (assuming

v \geq v_{0}

so that

I (v \in B_{k}) = 1

for some

k \in {1, \dots, K}

) is given by

f (v; η, β) = (1 - expit (η)) exp \{- \sum_{k = 1}^{K} W_{k} (v_{0}, v) e^{β_{k}}\} \sum_{k = 1}^{K} I (v \in B_{k}) e^{β_{k}} .

(6)

Now, for a sample of n observations from independent units, let

δ_{i} = I (v_{i} \geq v_{0})

with

δ_{\cdot} = \sum_{i = 1}^{n} δ_{i}

for

i = 1, \dots, n

. Further, let

S_{i k} = W_{k} (v_{0}, v_{i})

with

S_{\cdot k} = \sum_{i = 1}^{n} S_{i k}

and

δ_{i k} = I (v_{i} \in B_{k})

with

δ_{\cdot k} = \sum_{i = 1}^{n} δ_{i k}

for

k = 1, \dots, K

. Note that since v can only fall in interval

B_{k}

if

v \geq v_{0}

, then

δ_{i} δ_{i k} = δ_{i k}

, and since

S_{i k} = 0

when

δ_{i} = 0

, then

δ_{i} S_{i k} = S_{i k}

. Lastly, we replace

\sum_{k = 1}^{K} I (v \in B_{k}) e^{β_{k}}

with

\prod_{k = 1}^{K} e^{I (v \in B_{k}) β_{k}}

in (6). Then, accounting for both

v \in (0, v_{0})

and

v \in (v_{0}, \infty)

, the likelihood is

\begin{matrix} L (η, β; v) & = \prod_{i = 1}^{n} F {(v_{0}; η)}^{1 - δ_{i}} f {(v_{i}; η, β)}^{δ_{i}} \\ = \prod_{i = 1}^{n} \{{[expit (η)]}^{1 - δ_{i}} {[(1 - expit (η)) exp \{- \sum_{k = 1}^{K} W_{k} (v_{0}, v_{i}) e^{β_{k}}\} \prod_{k = 1}^{K} e^{I (v_{i} \in B_{k}) β_{k}}]}^{δ_{i}}\} \\ = {[expit (η)]}^{n - δ_{\cdot}} {(1 - expit (η))}^{δ_{\cdot}} exp \{- \sum_{i = 1}^{n} \sum_{k = 1}^{K} δ_{i} S_{i k} e^{β_{k}}\} (\prod_{i = 1}^{n} \prod_{k = 1}^{K} e^{δ_{i} δ_{i k} β_{k}}) \\ = {[expit (η)]}^{n} {(\frac{1 - expit (η)}{expit (η)})}^{δ_{\cdot}} exp \{\sum_{k = 1}^{K} (β_{k} δ_{\cdot k} - S_{\cdot k} e^{β_{k}})\} . \end{matrix}

(7)

Now, let

θ = {(η, β^{'})}^{'}

. We have

ℓ (θ) = log L (θ) = n log (expit (η)) - η δ_{\cdot} + \sum_{k = 1}^{K} (β_{k} δ_{\cdot k} - S_{\cdot k} e^{β_{k}}) .

Taking the first partial derivatives of

ℓ (θ)

and solving the resulting estimating equations readily yields maximum likelihood estimates of

\hat{η} = logit (1 - \frac{δ_{\cdot}}{n}) and {\hat{β}}_{k} = log (\frac{δ_{\cdot k}}{S_{\cdot k}}), k = 1, \dots, K .

4.2. Profile Likelihood Ratio Test

To test process reliability using this model, we use a profile likelihood ratio test [16], which is conducted as follows:

Write $L (γ, β)$ where $γ = γ (θ) = F (τ_{V}; θ)$ : letting

$F_{1} (τ; β) = exp \{- \sum_{k = 1}^{K} W_{k} (v_{0}, τ) e^{β_{k}}\},$

we have $γ = (1 - expit (η)) F_{1} (τ_{V}; β)$ , so

$expit (η) = \frac{F_{1} (τ_{V}; β) - γ}{F_{1} (τ_{V}; β)} and \frac{1 - expit (η)}{expit (η)} = \frac{γ}{F_{1} (τ_{V}; β) - γ} .$

Hence, continuing from the likelihood function (7) above, we have

$L (γ, β) = γ^{δ_{\cdot}} {[F_{1} (τ_{V}; β) - γ]}^{n - δ_{\cdot}} F_{1} {(τ_{V}; β)}^{- n} exp \{\sum_{k = 1}^{K} (β_{k} δ_{\cdot k} - S_{\cdot k} e^{β_{k}})\} .$
Compute the profile likelihood estimate $\hat{β} (γ)$ : We can optimize the log-likelihood

$\begin{matrix} ℓ (γ, β) & = δ_{\cdot} log γ + (n - δ_{\cdot}) log (F_{1} (τ_{V}; β) - γ) \\ - n log F_{1} (τ_{V}; β) + \sum_{k = 1}^{K} (β_{k} δ_{\cdot k} - S_{\cdot k} e^{β_{k}}) \\ = δ_{\cdot} log γ + (n - δ_{\cdot}) log (exp \{- \sum_{k = 1}^{K} W_{k} (v_{0}, τ_{V}) e^{β_{k}}\} - γ) \\ + n \sum_{k = 1}^{K} W_{k} (v_{0}, τ_{V}) e^{β_{k}} + \sum_{k = 1}^{K} (β_{k} δ_{\cdot k} - S_{\cdot k} e^{β_{k}}) . \end{matrix}$

numerically for a given value of $γ$ to get $\hat{β} (γ)$ .
Under $H_{0} : γ = γ_{0}$ , we have

$- 2 [ℓ (γ_{0}, \hat{β} (γ_{0})) - ℓ (\tilde{γ}, \tilde{β})] \sim χ_{(1)}^{2},$

where $\tilde{η}$ and $\tilde{β}$ are the (non-profile) MLEs derived in Section 4.1, and $\tilde{γ} = \tilde{γ} (\tilde{η}, \tilde{β})$ . Thus, a $100 (1 - α) %$ profile likelihood interval for $γ$ is given by

$ℓ (γ, \hat{β} (γ)) - ℓ (\tilde{γ}, \tilde{β}) \geq - χ_{(1, 1 - α)}^{2} / 2$

where $χ_{(1, 1 - α)}^{2}$ is the $1 - α$ quantile of $χ_{(1)}^{2}$ .
Reject $H_{0}$ if $γ_{0} = 1 - p_{0}$ falls outside the profile likelihood interval computed in step (3). This constitutes a hypothesis test at significance level $α$ .

5. Empirical Studies

5.1. Bias, Power, Robustness, and Efficiency

To assess the bias and efficiency of estimation and the power of reliability tests for these methods, and to assess their robustness to model misspecification, we consider several simulation studies. In each of the simulations detailed below, two thousand simulation iterations are run and we conduct one-sided tests for

p_{0} = 0.95

reliability at both the 0.025 and 0.05 significance levels.

For each of the methods, we run a simulation for each combination of (a) sample size n; (b) true reliability

p_{1}

; (c) true data-generating distribution; (d) fitted model; and (e) LLD value d. Possible sample sizes n include 20, 40, 60, and 80, while possible values for

p_{1}

include 0.95 (representing the null hypothesis), 0.97, and 0.99. All true data-generating distributions are set to have scale parameter

θ_{2} = 1

; under this constraint, the possible true distributions include the standard normal, as well as logistic and extreme value distributions, with location parameter

θ_{1}

chosen to align the

p_{1}

quantile of the true distribution with that of the standard normal. See Equations (2) and (3) for the specification of the logistic and extreme value distributions. Possible fitted models vary depending on the modelling method. In the case of the fully parametric models of Section 2, we fit normal models, so the fitted model is correctly specified when the true data-generating distribution is standard normal but misspecified otherwise. In the case of the piecewise-constant hazard-based models, we let

Q_{p}

denote the pth percentile of the true data-generating distribution and consider models with interval cut-points at

b_{0} = Q_{90}

;

b_{0} = Q_{85}

,

b_{1} = Q_{90}

, and

b_{2} = Q_{95}

; and

b_{0} = Q_{85}

,

b_{1} = Q_{89}

, and

b_{2} = Q_{93}

; as well as models with cut-points at the corresponding empirical percentiles. We focus here on cut-points

Q_{p}

with

p \geq 85

to ensure robustness in approximating the upper tail of the measurement distribution near the critical reliability threshold

τ

. Varying the LLD only affects the fully parametric method since the other models deliberately avoid distinguishing between observations above and below the LLD for sufficiently small values in

y_{1}, \dots, y_{n}

. For this method, one possible value of the LLD is

d = Q_{10}^{N}

where

Q_{p}^{N}

denotes the pth percentile of the standard normal distribution. Under this LLD, the asymptotic percentage of left-censored observations is 10% when fitting a normal model but greater than 10% when fitting logistic or extreme value models. Thus, to disentangle the effects of these differing left-censoring rates from the effects of model misspecification itself, we also consider an LLD of

d = Q_{10}

, based on the true data-generating distribution.

In order to more extensively explore the spectrum of models from fully parametric to fully non-parametric, we compare the results of the normal and piecewise-constant hazard-based models to those obtained using a non-parametric method, where p-values and smoothed “mid-p” values are computed according to a one-sided exact binomial test [25].

A selection of simulation results is shown in Table 1 and Table 2 (full results can be found in the Supplementary Materials online). Table 1 shows the estimates of

F (τ)

obtained using each of the models described above, while Table 2 shows the corresponding rejection rates for testing

H_{0} : F (τ) = p_{0}

vs.

H_{A} : F (τ) > p_{0}

where

p_{0} = 0.95

. In each table, the first section of twelve lines shows the results of simulating data under the null hypothesis that reliability is at the nominal level of

p_{0} = 0.95

, while in the second and third sections, the reliability exceeds

p_{0}

. Note that each column heading for the piecewise-constant hazard-based models indicates the

p_{1}, \dots, p_{K}

empirical percentiles

{\hat{Q}}_{p_{1}, \dots, p_{K}}

defining the interval cut-points of the fitted model; the model with cut-points at

{\hat{Q}}_{85}

,

{\hat{Q}}_{89}

, and

{\hat{Q}}_{93}

is omitted when

n = 20

since no observations would fall in the interval

[{\hat{Q}}_{85}, {\hat{Q}}_{89})

, and such a model would, therefore, not be used in this case.

The leftmost column of Table 1 shows that while fitting a normal model leads to accurate estimates of

F (τ)

when the data are truly normal, bias becomes appreciable under the logistic and, particularly, the extreme value error distribution; this matches the asymptotic results shown in Figure 2. Similarly, Table 2 shows that type I error is well controlled and power reasonably high for larger sample sizes under correct model specification, but (as expected based on Figure 3 and Figure 4) these rejection rates are inflated and severely depressed, respectively, for the logistic and extreme value error distributions. The last column of Table 1 and the last two columns of Table 2 show that the exact tests address the bias issue and perform roughly the same across all settings since they do not rely on distributional assumptions; both p-values and mid-p-values here result in very low rejection rates, however, even when the true reliability of

F (τ) = 0.99

far exceeds the nominal reliability of 0.95 that we aim to demonstrate. Prohibitively large sample sizes would be required to achieve acceptable levels of power using exact tests.

The piecewise-constant hazard-based models generally outperform the normal model in terms of both bias in the reliability estimate (Table 1) and power (Table 2) when the data are extreme value, while at the same time, they perform comparably to the normal model when data are generated under a normal or logistic distribution. Bias is least for larger values of

F (τ)

where down-weighted values in the left tail become less relevant for estimation (Table 1). Type I error is relatively well controlled for all simulation settings, particularly at larger sample sizes, which is a notable improvement in the cases of logistic and extreme value data generation (Table 2). Overall, the piecewise-constant hazard-based method provides a balance between the power obtained by correctly specified parametric models and the robustness of non-parametric methods. While the small sample sizes we have focused on here are typical in reliability studies, the performance of these piecewise-constant models is highly scalable as larger samples would allow one to increase the number of cut-points (thereby increasing flexibility and robustness) without sacrificing power.

5.2. Model Selection

Although Table 1 and Table 2 show that no one piecewise-constant hazard-based model uniformly outperforms the others, model cut-points may be chosen by optimizing the worst possible outcome across all simulation settings. Figure 5 shows the minimum sample size needed to achieve the worst-case empirical power of 90% across normal, logistic, and extreme value data-generating distributions, using a piecewise-constant hazard-based model with two cut-points

b_{0}

and

b_{1}

, as well as the largest possible empirical type I error rate across those distributions when the sample size is fixed at this optimal value. As before, each of these distributions has

θ_{2} = 1

and a 95th percentile equal to that of a standard normal.

Here, we see that when testing for

p_{0} = 0.95

and the true reliability is

p_{1} = 0.99

, the two-cut-point model that best balances power and the type I error rate appears to have its cut-points at the 90th and 98th percentiles of the observed data (Figure 5). It is important to note that even in this optimal case, we require a sample size of 150 to achieve the desired power of 90%, and the corresponding type I error rate is slightly inflated at 0.039. If controlling the type I error rate is of greater importance, one may prefer a model with cut-points at the 92nd and 98th percentiles, where we see a required sample size of 180 and a worst-case type I error rate of 0.022.

Figure 5 illustrates the trade-off between type I error rates and the sample sizes required to achieve the specified power level using two-cut-point models (for particular values of

p_{0}

,

p_{1}

, threshold

τ

, significance level

α

, and worst-case empirical power), and similar analyses could be conducted for different choices of tolerance

ϵ

and number of intervals K (as well as other specifications of the reliability test). Piecewise-constant hazard-based models do not typically require many cut-points to accurately capture the shape of the target distribution [26], and in the context of reliability studies, we recommend using two to four cut-points (

1 \leq K \leq 3

). Moreover, the smaller the tolerance

ϵ

, the closer the first cut-point

b_{0}

will be to the exponentiated threshold

τ_{V}

and the closer the piecewise-constant hazard-based reliability test will be to the exact binomial test from Section 5.1; smaller

ϵ

values, therefore, result in greater robustness but worse efficiency, as seen by the larger sample sizes and better-controlled type I error rates at the lower ends of Figure 5a and Figure 5b, respectively. The effects of changing the number and spacing of the other cut-points are more difficult to characterize, and fitting multiple models is recommended in practice to assess the sensitivity of results to a given choice.

6. Application

In this section, we apply the piecewise-constant hazard-based models introduced in Section 4 to a study of drink can weights, used as an example in the documentation for the PROC CAPABILITY statement in SAS^® version 9.4 [27], and we compare the results to those obtained using the fully parametric models of Section 2. The corresponding dataset consists of 100 can weights that we suppose were measured using an industrial scale with an LLD of 0.9 oz; eight of these measurements fall below the LLD, while 98 fall below the threshold

exp (τ) = 1.12

. Here, interest may lie in demonstrating that at least 95% of cans weigh at most 1.12 oz (say, in order to meet freight shipment weight limits). We therefore aim to test

H_{0} : F (τ) = p_{0}

vs.

H_{A} : F (τ) > p_{0}

where

τ = log (1.12)

,

p_{0} = 0.95

, and F is the c.d.f. of the distribution of log-can weights. We perform analysis on the log scale, as indicated in Section 2.

Figure 6 shows the empirical c.d.f. of the data for observations above the LLD. Non-parametric methods yield

\hat{F} (τ) = 98 / 100 = 0.98

, with exact binomial tests resulting in

p - value = \sum_{u = 98}^{100} (\binom{100}{u}) \cdot 0 . 95^{u} \cdot {(1 - 0.95)}^{100 - u} = 0.118

and

mid - p - value = p - value - \frac{1}{2} [(\binom{100}{98}) \cdot 0 . 95^{98} \cdot {(1 - 0.95)}^{100 - 98}] = 0.078 .

While the estimate

\hat{F} (τ)

is above

p_{0} = 0.95

, neither the p-value nor the mid-p-value provides evidence of process reliability at the 0.025 significance level.

The upper tails of various fitted fully parametric (Section 2) and piecewise-constant hazard-based (Section 4) models are shown in Figure 7, with corresponding reliability estimates and p-values shown in Table 3. Here, we fit logistic and extreme value models in addition to the normal models examined in the derivations of Appendix A–Appendix C and the simulation study of Section 5.1. Each piecewise-constant hazard-based model is specified by either the vector of empirical pth percentiles

{\hat{Q}}_{p}

used as its cut-points (Table 3) or, for ease of plotting, the corresponding vector of probabilities p (Figure 7); these models are chosen as follows:

$({\hat{Q}}_{90})$ is one of the models considered in the simulation study of Section 5.1;
$({\hat{Q}}_{85}, {\hat{Q}}_{93}, {\hat{Q}}_{95})$ is an adaptation of another model $({\hat{Q}}_{85}, {\hat{Q}}_{90}, {\hat{Q}}_{95})$ from the simulation study, with the middle cut-point adjusted upward to ensure that each interval of the resulting model strictly contains at least one observation;
$({\hat{Q}}_{92}, {\hat{Q}}_{98})$ is suggested by the simulation from Section 5.2;
$({\hat{Q}}_{91}, {\hat{Q}}_{99})$ and $({\hat{Q}}_{94}, {\hat{Q}}_{97})$ are the models that are, respectively, the closest to and farthest from demonstrating reliability from a grid search of two-cut-point models whose first cut-point is in ${{\hat{Q}}_{91}, {\hat{Q}}_{92}, {\hat{Q}}_{93}, {\hat{Q}}_{94}}$ and whose second cut-point is in ${{\hat{Q}}_{95}, {\hat{Q}}_{96}, \dots, {\hat{Q}}_{99}}$ .

Note that the third model

({\hat{Q}}_{85}, {\hat{Q}}_{89}, {\hat{Q}}_{93})

from the simulation study is omitted because

{\hat{Q}}_{85} = {\hat{Q}}_{89}

for this dataset, meaning that it is effectively a two-cut-point model.

Table 3 shows that, as with the non-parametric method, each of these eight model-based reliability estimates falls above the desired level of

p_{0} = 0.95

. Unlike the exact binomial test, however, some of the corresponding confidence and likelihood intervals have lower limits above 0.95 and, therefore, suggest that the drink can manufacturing process is reliable. We also see that likelihood intervals based on the piecewise-constant hazard-based models are much less variable than the confidence intervals based on their fully parametric counterparts.

Practitioners often use so-called probability plots to visualize the outcomes of fully parametric model fitting, and these are shown in Figure 8. Here, we plot the log-can weights y on the horizontal axis and

{\hat{ψ}}_{1} + {\hat{ψ}}_{2} y

on the vertical axis but label these axes with can weights

exp (y)

and reliability estimates

F_{a, 0} ({\hat{ψ}}_{1} + {\hat{ψ}}_{2} y)

, respectively, where

F_{a, 0}

is the c.d.f. of the assumed standard error distribution; see Section 2 for the definition of

ψ

.

7. Conclusions

The impact of an LLD on inference in the centre of a distribution has been well studied in the environmental science literature. In this paper, we show that when interest lies in the right tail of the distribution and parametric models are misspecified, an LLD can attenuate bias and even slightly increase the power of tests for reliability, but these modest benefits are not guaranteed and, critically, come at the cost of invalid inference due to poor control of type I error rates. Moreover, the power of goodness of fit tests can be limited even with larger sample sizes, and the presence of an LLD exacerbates this issue; selection of an appropriate fully parametric model, therefore, presents a substantial challenge. In order to circumvent this challenge, we introduce a weakly parametric method of testing process reliability based on piecewise-constant hazard models that de-emphasizes observations in the lower tail of the measurement distribution and successfully balances efficiency with robustness to model misspecification. We use simulations to identify two two-cut-point piecewise-constant hazard-based models that may provide good asymptotic results for particular values of the true and required reliability levels, and apply the methods discussed in this paper to a study of drinking can weights.

While the methods proposed here provide a basic framework for right-tail inference under an LLD, one limitation is that (as seen by Figure 5) the results from piecewise-constant hazard-based models are somewhat sensitive to one’s choice of cut-points. To address this issue, one could consider using splines (say, using survPen in R version 4.3.2 [28]), which may offer more robustness at a modest price in terms of efficiency and power. Practical considerations can also complicate analysis. One such issue is that LLDs may vary between subsets of observations; for example, systematic variation might occur if technological advancements decrease an LLD over time by improving measurement device precision [5,9], while random variation might occur if environmental factors such as wind speed or ambient temperature affect device performance [29]. In other settings involving multiple laboratories, different devices may be used to measure concentrations of analytes, so device-to-device variation in detection limits can also raise challenges. Models for the LLD may be helpful in these situations. Other practical issues that merit attention occur when multiple different LLDs are present (e.g., due to use of multiple measurement devices) and when measurements between the LLD and a so-called “limit of quantification” are said by the device manufacturer to be observed with some uncertainty. More work is required to develop methods suited to these more complicated settings.

Finally, one reviewer raised the notion of Bayesian reliability testing. As noted in Section 1, when one uses fully parametric models of the form presented in Section 2, Bayesian methods have been found to estimate the 95th percentile comparably (at best) to substitution methods [17]. A Bayesian approach could also be compatible, however, with the piecewise-constant hazard-based models that we have proposed here. In particular, one would need to specify a prior for each parameter in the model (e.g., a gamma prior for the exponential hazard rate in each piece [30] and a beta prior for the probability p), calculate the posterior distribution for the true reliability

p_{1}

, and stipulate a decision rule for using this posterior to conduct the test of reliability given by (1). Such a test could be done by constructing a

100 (1 - α) %

credible interval for

p_{1}

and checking whether it covers

p_{0}

[31]. This is an interesting possible direction for future research.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13081274/s1.

Author Contributions

Conceptualization, L.S.B. and R.J.C.; methodology, L.S.B. and R.J.C.; software, L.S.B.; validation, L.S.B.; formal analysis, L.S.B.; resources, R.J.C.; writing—original draft preparation, L.S.B.; writing—review and editing, L.S.B. and R.J.C.; visualization, L.S.B.; supervision, R.J.C.; funding acquisition, R.J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada through Discovery Grants to Richard J. Cook (RGPIN-2017-04207).

Data Availability Statement

The data that support the findings of this study are openly available at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/qcug/qcug_code_capspec2.htm (accessed on 26 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Parameter Estimands in Misspecified Location-Scale Models with Left-Censored Data

Given likelihood (4), the observed log-likelihood is

ℓ (ψ_{a}; y, δ) = \sum_{i = 1}^{n} \{δ_{i} [log ψ_{2 a} - \frac{1}{2} {(ψ_{1 a} + ψ_{2 a} y_{i})}^{2}] + (1 - δ_{i}) log F_{0} (z_{d})\} .

(A1)

Now, consider the resulting random vector

S_{i} (ψ_{a}; Y_{i}, Δ_{i})

of score functions for a single observation, obtained by taking partial derivatives of a contribution to (A1) with respect to

ψ_{a}

; we drop the subscripts i for notational convenience to denote this by

S (ψ_{a}; Y, Δ)

. Note

\partial z_{d} / \partial ψ_{1 a} = 1

and

\partial z_{d} / \partial ψ_{2 a} = d

, so

S_{1} (ψ_{a}; Y, Δ) = \frac{\partial ℓ}{\partial ψ_{1 a}} = - Δ (ψ_{1 a} + ψ_{2 a} Y) + (1 - Δ) \frac{f_{0} (z_{d})}{F_{0} (z_{d})}

and

S_{2} (ψ_{a}; Y, Δ) = \frac{\partial ℓ}{\partial ψ_{2 a}} = Δ [\frac{1}{ψ_{2 a}} - Y (ψ_{1 a} + ψ_{2 a} Y)] + (1 - Δ) d \frac{f_{0} (z_{d})}{F_{0} (z_{d})}

where

f_{0}

denotes the pdf of

N (0, 1)

. Hence,

E \{S_{1} (ψ_{a}; Y, Δ)\} = - (1 - q) (ψ_{1 a} + ψ_{2 a} E_{T} {Y | Y > d}) + q \frac{f_{0} (z_{d})}{F_{0} (z_{d})}

and

E \{S_{2} (ψ_{a}; Y, Δ)\} = (1 - q) (\frac{1}{ψ_{2 a}} - ψ_{1 a} E_{T} {Y | Y > d} - ψ_{2 a} E_{T} {Y^{2} | Y > d}) + q d \frac{f_{0} (z_{d})}{F_{0} (z_{d})}

where

E_{T} {\cdot}

denotes the expectation under the true data-generating model.

Solving the system of equations given by

E \{S_{1} (ψ_{a}; Y, Δ)\} = 0

and

E \{S_{2} (ψ_{a}; Y, Δ)\} = 0

determines the values

ψ_{a}^{*}

of the estimands in this analysis. Algebraic manipulations of this system show that

ψ_{1 a} = \frac{\frac{1}{ψ_{2 a}} + ψ_{2 a} (d E_{T} {Y | Y > d} - E_{T} {Y^{2} | Y > d})}{E_{T} {Y | Y > d} - d},

which can be combined with the two estimating equations to solve for

ψ_{a}^{*}

; the conditional expectations can be evaluated numerically.

Appendix B. Robust Asymptotic Variance of Parameter Estimators in Misspecified Location-Scale Models with Left-Censored Data

Let

{\hat{ψ}}_{a}

be the estimate of

ψ

under the assumed location-scale model with limiting value

ψ_{a}^{*}

. In this appendix, we derive the robust (“true”) variance

{Var}_{T} ({\hat{ψ}}_{a}) = \frac{1}{n} A^{- 1} (ψ_{a}^{*}) B (ψ_{a}^{*}) A^{- 1} (ψ_{a}^{*})

of

{\hat{ψ}}_{a}

[21] under the assumption of a normal model, where

A (ψ_{a}^{*}) = E_{T} \{- \frac{\partial}{\partial ψ_{a}^{'}} S (ψ_{a}; Y, Δ)\} |_{ψ_{a} = ψ_{a}^{*}}

and

B (ψ_{a}^{*}) = E_{T} \{S (ψ_{a}; Y, Δ) S^{'} (ψ_{a}; Y, Δ)\} |_{ψ_{a} = ψ_{a}^{*}} .

Firstly, it is straightforward to show that

f_{0}^{'} (u) = - u f_{0} (u)

and

f_{0}^{''} (u) = f_{0} (u) (u^{2} - 1)

where

f_{0}

is the density of a standard normal distribution. Thus, for

A (ψ_{a})

, we have

\begin{matrix} - \frac{\partial}{\partial ψ_{1 a}} S_{1} (ψ_{a}; Y, Δ) = Δ - (1 - Δ) \frac{f_{0}^{'} (z_{d}) F_{0} (z_{d}) - f_{0}^{2} (z_{d})}{F_{0}^{2} (z_{d})}, \\ - \frac{\partial}{\partial ψ_{2 a}} S_{2} (ψ_{a}; Y, Δ) = Δ (\frac{1}{ψ_{2 a}^{2}} + Y^{2}) - (1 - Δ) d^{2} \frac{f_{0}^{'} (z_{d}) F_{0} (z_{d}) - f_{0}^{2} (z_{d})}{F_{0}^{2} (z_{d})}, \end{matrix}

and

- \frac{\partial}{\partial ψ_{2 a}} S_{1} (ψ_{a}; Y, Δ) = Δ Y - (1 - Δ) d \frac{f_{0}^{'} (z_{d}) F_{0} (z_{d}) - f_{0}^{2} (z_{d})}{F_{0}^{2} (z_{d})},

so

\begin{matrix} {[A (ψ_{a})]}_{11} = 1 - q - q \frac{f_{0}^{'} (z_{d}) F_{0} (z_{d}) - f_{0}^{2} (z_{d})}{F_{0}^{2} (z_{d})}, \\ [A (ψ_{a})]_{22} = (1 - q) (\frac{1}{ψ_{2 a}^{2}} + E_{T} {Y^{2} | Y > d}) - q d^{2} \frac{f_{0}^{'} (z_{d}) F_{0} (z_{d}) - f_{0}^{2} (z_{d})}{F_{0}^{2} (z_{d})}, \end{matrix}

and

{[A (ψ_{a})]}_{12} = {[A (ψ_{a})]}_{21} = (1 - q) E_{T} {Y | Y > d} - q d \frac{f_{0}^{'} (z_{d}) F_{0} (z_{d}) - f_{0}^{2} (z_{d})}{F_{0}^{2} (z_{d})} .

Now, for

B (ψ_{a})

, note that

Δ = Δ^{2}

,

(1 - Δ) = {(1 - Δ)}^{2}

, and

Δ (1 - Δ) = 0

, so

\begin{matrix} S_{1}^{2} (ψ_{a}; Y, Δ) = Δ {(ψ_{1 a} + ψ_{2 a} Y)}^{2} + (1 - Δ) {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2}, \\ S_{2}^{2} (ψ_{a}; Y, Δ) = Δ {[\frac{1}{ψ_{2 a}} - Y (ψ_{1 a} + ψ_{2 a} Y)]}^{2} + (1 - Δ) d^{2} {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2}, \end{matrix}

and

\begin{matrix} S_{1} (ψ_{a} & ; Y, Δ) S_{2} (ψ_{a}; Y, Δ) \\ = - Δ (ψ_{1 a} + ψ_{2 a} Y) [\frac{1}{ψ_{2 a}} - Y (ψ_{1 a} + ψ_{2 a} Y)] + (1 - Δ) d {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2} \\ = - Δ [\frac{ψ_{1 a}}{ψ_{2 a}} + (1 - ψ_{1 a}^{2}) Y - 2 ψ_{1 a} ψ_{2 a} Y^{2} - ψ_{2 a}^{2} Y^{3}] + (1 - Δ) d {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2} . \end{matrix}

Taking the expectations of these expressions and simplifying algebraically results in

\begin{matrix} {[B (ψ_{a})]}_{11} = (1 - q) (ψ_{1 a}^{2} + 2 ψ_{1 a} ψ_{2 a} E_{1} + ψ_{2 a}^{2} E_{2}) + q {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2}, \\ [B (ψ_{a})]_{22} = (1 - q) (\frac{1}{ψ_{2 a}^{2}} - \frac{2 ψ_{1 a}}{ψ_{2 a}} E_{1} + (ψ_{1 a}^{2} - 2) E_{2} + 2 ψ_{1 a} ψ_{2 a} E_{3} + ψ_{2 a}^{2} E_{4}) + q d^{2} {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2}, \end{matrix}

and

{[B (ψ_{a})]}_{12} = {[B (ψ_{a})]}_{21} = q d {[\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]}^{2} - (1 - q) [\frac{ψ_{1 a}}{ψ_{2 a}} + (1 - ψ_{1 a}^{2}) E_{1} - 2 ψ_{1 a} ψ_{2 a} E_{2} - ψ_{2 a}^{2} E_{3}],

where we now let

E_{k}

denote the conditional expectation

E_{T} {Y^{k} | Y > d}

for notational convenience. Plugging

ψ_{a} = ψ_{a}^{*}

from Appendix A into the equations above yields

A (ψ_{a}^{*})

and

B (ψ_{a}^{*})

.

Appendix C. Assumed Asymptotic Variance of Parameter Estimators in Misspecified Location-Scale Models with Left-Censored Data

As in Appendix B, let

{\hat{ψ}}_{a}

be the estimate of

ψ

under the assumed location-scale model with limiting value

ψ_{a}^{*}

. In this appendix, we derive the variance

{Var}_{a} ({\hat{ψ}}_{a})

of

{\hat{ψ}}_{a}

under the assumed model. Using the delta method [20], we can show

{Var}_{a} ({\hat{ψ}}_{a}) \approx G^{'} (ϕ_{a}^{*}) {Var}_{a} ({\hat{ϕ}}_{a}) G^{'} {(ϕ_{a}^{*})}^{'}

where

ϕ_{a}^{*} = {(ϕ_{1 a}^{*}, ϕ_{2 a}^{*})}^{'} = {(θ_{1 a}^{*}, log θ_{2 a}^{*})}^{'}

,

{\hat{ϕ}}_{a}

is the estimate of

ϕ = {(θ_{1}, log θ_{2})}^{'}

under the assumed model,

G^{'} (ϕ) = [\begin{matrix} - e^{- ϕ_{2}} & ϕ_{1} e^{- ϕ_{2}} \\ 0 & - e^{- ϕ_{2}} \end{matrix}],

and

{Var}_{a}^{- 1} ({\hat{ϕ}}_{a})

is the expected information with respect to

ϕ_{a}

evaluated at

ϕ_{a} = ϕ_{a}^{*}

. Based on (A1), the log-likelihood parameterized by

ϕ_{a}

can be written as

ℓ (ϕ_{a}; y, δ) = \sum_{i = 1}^{n} \{- δ_{i} [ϕ_{2 a} + \frac{1}{2} {(\frac{y_{i} - ϕ_{1 a}}{exp (ϕ_{2 a})})}^{2}] + (1 - δ_{i}) log F_{0} (z_{d})\}

where

z_{d} = (d - ϕ_{1 a}) / exp (ϕ_{2 a})

. It is straightforward to show

\partial z_{d} / \partial ϕ_{1 a} = - 1 / exp (ϕ_{2 a})

and

\partial z_{d} / \partial ϕ_{2 a} = - z_{d}

. As before, we omit subscripts i to denote the random vector

S_{i} (ϕ_{a}; Y_{i}, Δ_{i})

of score functions for a single observation by

S (ϕ_{a}; Y, Δ)

. Then,

S_{1} (ϕ_{a}; Y, Δ) = \frac{\partial ℓ}{\partial ϕ_{1 a}} = Δ (\frac{Y - ϕ_{1 a}}{{[exp (ϕ_{2 a})]}^{2}}) - \frac{1 - Δ}{exp (ϕ_{2 a})} [\frac{f_{0} (z_{d})}{F_{0} (z_{d})}]

and

S_{2} (ϕ_{a}; Y, Δ) = \frac{\partial ℓ}{\partial ϕ_{2 a}} = Δ [{(\frac{Y - ϕ_{1 a}}{exp (ϕ_{2 a})})}^{2} - 1] + (1 - Δ) \frac{f_{0}^{'} (z_{d})}{F_{0} (z_{d})},

so

\begin{matrix} - \frac{\partial}{\partial ϕ_{1 a}} S_{1} (ϕ_{a}; Y, Δ) = \frac{1}{{[exp (ϕ_{2 a})]}^{2}} [Δ + (1 - Δ) f_{0} (z_{d}) \frac{z_{d} F_{0} (z_{d}) + f_{0} (z_{d})}{F_{0}^{2} (z_{d})}], \\ - \frac{\partial}{\partial ϕ_{2 a}} S_{2} (ϕ_{a}; Y, Δ) = 2 Δ {(\frac{Y - ϕ_{1 a}}{exp (ϕ_{2 a})})}^{2} + (1 - Δ) z_{d} f_{0} (z_{d}) \frac{(z_{d}^{2} - 1) F_{0} (z_{d}) + z_{d} f_{0} (z_{d})}{F_{0}^{2} (z_{d})}, \end{matrix}

and

- \frac{\partial}{\partial ϕ_{2 a}} S_{1} (ϕ_{a}; Y, Δ) = \frac{1}{exp (ϕ_{2 a})} [2 Δ (\frac{Y - ϕ_{1 a}}{exp (ϕ_{2 a})}) + (1 - Δ) f_{0} (z_{d}) \frac{(z_{d}^{2} - 1) F_{0} (z_{d}) + z_{d} f_{0} (z_{d})}{F_{0}^{2} (z_{d})}] .

Thus

\begin{matrix} {[{Var}_{a}^{- 1} ({\hat{ϕ}}_{a})]}_{11} = \frac{1}{{[exp (ϕ_{2 a})]}^{2}} [1 - q + q f_{0} (z_{d}) \frac{z_{d} F_{0} (z_{d}) + f_{0} (z_{d})}{F_{0}^{2} (z_{d})}], \\ {[{Var}_{a}^{- 1} ({\hat{ϕ}}_{a})]}_{22} = \frac{2 (1 - q)}{{[exp (ϕ_{2 a})]}^{2}} (E_{2} - 2 ϕ_{1 a} E_{1} + ϕ_{1 a}^{2}) + q z_{d} f_{0} (z_{d}) \frac{(z_{d}^{2} - 1) F_{0} (z_{d}) + z_{d} f_{0} (z_{d})}{F_{0}^{2} (z_{d})}, \end{matrix}

and

\begin{matrix} [{Var}_{a}^{- 1} & ({\hat{ϕ}}_{a})]_{12} = {[{Var}_{a}^{- 1} ({\hat{ϕ}}_{a})]}_{21} \\ = \frac{1}{exp (ϕ_{2 a})} [2 (1 - q) (\frac{E_{1} - ϕ_{1 a}}{exp (ϕ_{2 a})}) + q f_{0} (z_{d}) \frac{(z_{d}^{2} - 1) F_{0} (z_{d}) + z_{d} f_{0} (z_{d})}{F_{0}^{2} (z_{d})}] . \end{matrix}

As in Appendix B,

E_{k}

denotes the conditional expectation

E_{T} {Y^{k} | Y > d}

.

Appendix D. Testing Goodness of Fit Under LLD

Here, we consider a small simulation study to investigate the finite sample performance of the Shapiro–Francia test for normality under left-censoring. We conduct this test using the gofTestCensored function in the EnvStats package of R version 4.3.2 [32]. Specifically, suppose

X = {(X_{1}, \dots, X_{n})}^{'}

are the order statistics of a distribution with c.d.f. F and interest lies in testing the null hypothesis that F is the c.d.f. of an arbitrary normal distribution against the alternative that it is the c.d.f. of some other distribution. In the absence of censoring, this can be done using the Shapiro–Francia statistic [24], which is defined as

W = \frac{{(\sum_{i = 1}^{n} a_{i} X_{i})}^{2}}{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}}

where

a_{i}

is a standardized version of

m_{i}

, the expectation of the ith order statistic for a random sample of size n from the standard normal distribution; that is, if

m = {(m_{1}, \dots, m_{n})}^{'}

with

a = {(a_{1}, \dots, a_{n})}^{'}

, then

a = m / \sqrt{m^{'} m}

. Note that

a^{'} a = 1

, and since the normal distribution is symmetric, then the averages

\bar{m}

and

\bar{a}

satisfy

\bar{m} = 0

and

\bar{a} = 0

. Using these facts, it is straightforward to show that

W = r^{2} (a, X)

where

r (u, v) = \frac{\sum_{i = 1}^{n} (u_{i} - \bar{u}) (v_{i} - \bar{v})}{\sqrt{\sum_{i = 1}^{n} {(u_{i} - \bar{u})}^{2} \sum_{i = 1}^{n} {(v_{i} - \bar{v})}^{2}}}

denotes the sample Pearson correlation coefficient of two length-n vectors u and v. This W statistic can be approximated by replacing

m_{i}

in a with the Blom score

{\tilde{m}}_{i} = Φ^{- 1} (\frac{i - 3 / 8}{N + 1 / 4})

for

i = 1, \dots, n

[33], where

Φ

denotes the c.d.f. of the standard normal distribution, resulting in

\tilde{W} = r^{2} (b, X)

where

b = \tilde{m} / \sqrt{{\tilde{m}}^{'} \tilde{m}}

[34]. To adapt

\tilde{W}

to left-censored data, we suppose k observations in a realization x of X have been censored, and let

x^{\circ} = {(x_{k + 1}, \dots, x_{n})}^{'}

be the subset of x that has been observed exactly. We further let

b^{\circ} = {(b_{k + 1}, \dots, b_{n})}^{'}

. Now, if F is the c.d.f. of a normal distribution, then the standardized expected order statistics a will be highly linearly correlated with x, and, thus, so will the approximation b of a. In this case,

r (b, x) \approx 1 \approx r (b^{\circ}, x^{\circ})

, and we can simply use

\tilde{w} = r^{2} (b^{\circ}, x^{\circ});

(A2)

this seems to be the argument invoked by Royston [35], whom the authors of gofTestCensored cite to justify this step [32]. To compute p-values based on (A2), Royston [35] explores transformations

Z = g (\tilde{W})

that are approximately normal. The parameters

μ_{Z}

and

σ_{Z}

of this normal distribution are estimated by regressing

z_{α} = g ({\tilde{w}}_{α})

on

Φ^{- 1} (α)

for

α = 0.9, 0.95, 0.99

where the quantiles

{\tilde{w}}_{α}

of

\tilde{W}

have been approximated via simulation [36], and we compute

p - value = 1 - Φ (\frac{z - {\hat{μ}}_{Z}}{{\hat{σ}}_{Z}})

where z is the observed value of Z [35].

To generate data for this simulation study, we use three location-scale distributions with location parameter

θ_{1}

and scale parameter

θ_{2}

: normal; logistic with the c.d.f. given by (2); and extreme value with the c.d.f. given by (3). For each of these models, we take

θ_{2} = 1

, and, given our goal of inference in the upper tail, we choose

θ_{1}

such that the 95th percentile equals that of a standard normal. We specify the LLD via a fixed left-censoring rate in each scenario, leading to an LLD of

- 1.28

for the normal data,

- 3.50

for the logistic data, and

- 1.70

for the extreme value data when the left-censoring rate is 10%; and an LLD of

- 0.67

for the normal data,

- 2.40

for the logistic data, and

- 0.70

for the extreme value data when the left-censoring rate is 25%. The results of these simulations are shown in Table A1.

Table A1. Empirical rejection rates at the 0.05 significance level for the Shapiro–Francia test of normality under various left-censoring rates, based on 2000 simulated samples of size n. All Monte Carlo standard errors for the rejection rate estimates are <

0.001

.

Table A1. Empirical rejection rates at the 0.05 significance level for the Shapiro–Francia test of normality under various left-censoring rates, based on 2000 simulated samples of size n. All Monte Carlo standard errors for the rejection rate estimates are <

0.001

.

Left-Censoring Rate	n	Data-Generation Model
Left-Censoring Rate	n	Normal	Logistic	Extreme Value
0%	20	0.048	0.116	0.332
	40	0.056	0.182	0.582
	60	0.056	0.215	0.768
	80	0.058	0.238	0.879
10%	20	0.047	0.117	0.124
	40	0.056	0.160	0.224
	60	0.056	0.209	0.346
	80	0.050	0.229	0.491
25%	20	0.047	0.112	0.080
	40	0.060	0.164	0.124
	60	0.059	0.222	0.180
	80	0.056	0.252	0.264

With data generated under the logistic distribution, we see that the Shapiro–Francia test can have very low power with moderately large sample sizes, even in the absence of left-censoring. On the other hand, with extreme value data generation, power is initially reasonably good for the larger sample sizes but can be severely compromised by even modest levels of left-censoring; this may be due to the fact that much of the substantial difference between the normal and extreme value distributions is found in the tails.

References

United States Food and Drug Administration. Considerations for the Development of Dried Plasma Products Intended for Transfusion: Guidance for Industry; Docket Number FDA-2018-D-3759; United States Food and Drug Administration: Silver Spring, MD, USA, 2019. Available online: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/considerations-development-dried-plasma-products-intended-transfusion (accessed on 17 October 2024).
United States Food and Drug Administration. Submission and Review of Sterility Information in Premarket Notification (510(k)) Submissions for Devices Labeled as Sterile: Guidance for Industry and Food and Drug Administration Staff; Docket Number FDA-2008-D-0611; United States Food and Drug Administration: Silver Spring, MD, USA, 2024. Available online: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/submission-and-review-sterility-information-premarket-notification-510k-submissions-devices-labeled (accessed on 17 October 2024).
United States Food and Drug Administration. Approaches to Establish Thresholds for Major Food Allergens and for Gluten in Food; United States Food and Drug Administration: Silver Spring, MD, USA, 2006. Available online: https://www.fda.gov/food/food-labeling-nutrition/approaches-establish-thresholds-major-food-allergens-and-gluten-food (accessed on 17 October 2024).
Government of Ontario. R.R.O. 1990, Reg. 347: General—Waste Management. In Environmental Protection Act; Government of Ontario: Toronto, ON, Canada, 2024. Available online: https://www.ontario.ca/laws/regulation/900347 (accessed on 17 October 2024).
Newman, M.C.; Dixon, P.M.; Looney, B.B.; Pinder, J.E., III. Estimating Mean and Variance for Environmental Samples with Below Detection Limit Observations. JAWRA J. Am. Water Resour. Assoc. 1989, 25, 905–916. [Google Scholar] [CrossRef]
Helsel, D.R. Much Ado About Next to Nothing: Incorporating Nondetects in Science. Ann. Occup. Hyg. 2010, 54, 257–262. [Google Scholar] [CrossRef] [PubMed]
Hwang, M.; Lee, S.C.; Park, J.-H.; Choi, J.; Lee, H.-J. Statistical methods for handling nondetected results in food chemical monitoring data to improve food risk assessments. Food Sci. Nutr. 2023, 11, 5223–5235. [Google Scholar] [CrossRef] [PubMed]
Gilliom, R.J.; Helsel, D.R. Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resour. Res. 1986, 22, 135–146. [Google Scholar] [CrossRef]
Helsel, D.R.; Cohn, T.A. Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resour. Res. 1988, 24, 1997–2004. [Google Scholar] [CrossRef]
Farewell, V.T. Some comments on analysis techniques for censored water quality data. Environ. Monit. Assess. 1989, 13, 285–294. [Google Scholar] [CrossRef]
Shumway, R.H.; Azari, R.S.; Kayhanian, M. Statistical Approaches to Estimating Mean Water Quality Concentrations with Detection Limits. Environ. Sci. Technol. 2002, 36, 3345–3353. [Google Scholar] [CrossRef]
Antweiler, R.C.; Taylor, H.E. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics. Environ. Sci. Technol. 2008, 42, 3732–3738. [Google Scholar] [CrossRef]
Shoari, N.; Dubé, J.-S.; Chenouri, S. On the use of the substitution method in left-censored environmental data. Hum. Ecol. Risk Assess. Int. J. 2015, 22, 435–446. [Google Scholar] [CrossRef]
Tekindal, M.A.; Erdoğan, B.D.; Yavuz, Y. Evaluating Left-Censored Data Through Substitution, Parametric, Semi-parametric, and Nonparametric Methods: A Simulation Study. Interdiscip. Sci. Comput. Life Sci. 2017, 9, 153–172. [Google Scholar] [CrossRef]
Kaplan, E.L.; Meier, P. Nonparametric Estimation from Incomplete Observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
Sprott, D.A. Statistical Inference in Science; Springer: New York, NY, USA, 2000. [Google Scholar] [CrossRef]
Huynh, T.; Quick, H.; Ramachandran, G.; Banerjee, S.; Stenzel, M.; Sandler, D.P.; Engel, L.S.; Kwok, R.K.; Blair, A.; Stewart, P.A. A Comparison of the β-Substitution Method and a Bayesian Method for Analyzing Left-Censored Data. Ann. Occup. Hyg. 2016, 60, 56–73. [Google Scholar] [CrossRef]
Lawless, J.F. Statistical Models and Methods for Lifetime Data, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar] [CrossRef]
Therneau, T.M.; Grambsch, P.M. Modeling Survival Data: Extending the Cox Model; Springer: New York, NY, USA, 2000. [Google Scholar]
Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury: Pacific Grove, CA, USA, 2002. [Google Scholar]
White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
Gourieroux, C.; Monfort, A.; Trognon, A. Pseudo Maximum Likelihood Methods: Theory. Econometrica 1984, 52, 681–700. [Google Scholar] [CrossRef]
Qin, J.; Lawless, J. Empirical Likelihood and General Estimating Equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
Shapiro, S.S.; Francia, R.S. An Approximate Analysis of Variance Test for Normality. J. Am. Stat. Assoc. 1972, 67, 215–216. [Google Scholar] [CrossRef]
Berry, G.; Armitage, P. Mid-P confidence intervals: A brief review. J. R. Stat. Soc. Ser. D Stat. 1995, 44, 417–423. [Google Scholar] [CrossRef]
Lawless, J.F.; Zhan, M. Analysis of interval-grouped recurrent event data using piecewise-constant rate functions. Can. J. Stat. 1998, 26, 549–565. [Google Scholar] [CrossRef]
SAS Institute Inc. Reading Spec Limits from an Input Data Set; SAS Institute Inc.: Cary, NC, USA, n.d.; Available online: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/qcug/qcug_code_capspec2.htm (accessed on 11 January 2025).
Fauvernier, M.; Remontet, L.; Uhry, Z.; Bossard, N.; Roche, L. survPen: An R package for hazard and excess hazard modelling with multidimensional penalized splines. J. Open Source Softw. 2019, 4, 1434. [Google Scholar] [CrossRef]
Wigle, A.; Béliveau, A.; Blackmore, D.; Lapeyre, P.; Osadetz, K.; Lemieux, C.; Daun, K.J. Estimation and Applications of Uncertainty in Methane Emissions Quantification Technologies: A Bayesian Approach. ACS ES T Air 2024, 1, 1000–1014. [Google Scholar] [CrossRef]
Arjas, E.; Gasbarra, D. Nonparametric Bayesian Inference from Right Censored Survival Data, Using the Gibbs Sampler. Stat. Sin. 1994, 4, 505–524. [Google Scholar]
Ghosh, J.K.; Delampady, M.; Samanta, T. An Introduction to Bayesian Analysis: Theory and Methods, 1st ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
Millard, S.P. EnvStats: An R Package for Environmental Statistics; Springer: New York, NY, USA, 2013. [Google Scholar]
Blom, G. Statistical Estimates and Transformed Beta-Variables; John Wiley & Sons: Hoboken, NJ, USA, 1958. [Google Scholar]
Weisberg, S.; Bingham, C. An Approximate Analysis of Variance Test for Non-Normality Suitable for Machine Calculation. Technometrics 1975, 17, 133–134. [Google Scholar] [CrossRef]
Royston, P. A Toolkit for Testing for Non-Normality in Complete and Censored Samples. J. R. Stat. Soc. Ser. D Stat. 1993, 42, 37–43. [Google Scholar] [CrossRef]
Verrill, S.; Johnson, R.A. Tables and Large-Sample Distribution Theory for Censored-Data Correlation Statistics for Testing Normality. J. Am. Stat. Assoc. 1988, 83, 1192–1197. [Google Scholar] [CrossRef]

Figure 1. Asymptotic bias of estimates for location parameter

θ_{1}

and log-scale parameter

log (θ_{2})

as a function of the left-censoring rate q under logistic (left panels) and extreme value (right panels) error distributions. The red dashed line represents the reference value of zero asymptotic bias.

Figure 1. Asymptotic bias of estimates for location parameter

θ_{1}

and log-scale parameter

log (θ_{2})

as a function of the left-censoring rate q under logistic (left panels) and extreme value (right panels) error distributions. The red dashed line represents the reference value of zero asymptotic bias.

Figure 2. Asymptotic bias of estimates for standardized threshold value

(τ - θ_{1}) / θ_{2}

and reliability

F (τ)

as a function of the left-censoring rate q under logistic (left panels) and extreme value (right panels) error distributions. The red dashed line represents the reference value of zero asymptotic bias.

Figure 2. Asymptotic bias of estimates for standardized threshold value

(τ - θ_{1}) / θ_{2}

and reliability

F (τ)

as a function of the left-censoring rate q under logistic (left panels) and extreme value (right panels) error distributions. The red dashed line represents the reference value of zero asymptotic bias.

Figure 3. Asymptotic type I error rate for a hypothesis test of

H_{0} : p_{1} = 0.95

vs.

H_{A} : p_{1} > 0.95

as a function of the left-censoring rate q under logistic and extreme value (EVD) data generation. The red dashed lines indicate the nominal type I error rate of

α = 0.025

. Sample size is

n = 80

.

Figure 3. Asymptotic type I error rate for a hypothesis test of

H_{0} : p_{1} = 0.95

vs.

H_{A} : p_{1} > 0.95

as a function of the left-censoring rate q under logistic and extreme value (EVD) data generation. The red dashed lines indicate the nominal type I error rate of

α = 0.025

. Sample size is

n = 80

.

Figure 4. Asymptotic power for hypothesis tests of

H_{0} : p_{1} = 0.95

vs.

H_{A} : p_{1} > 0.95

as a function of the left-censoring rate q under normal, logistic, and extreme value (EVD) data generation. Sample size is

n = 80

.

Figure 4. Asymptotic power for hypothesis tests of

H_{0} : p_{1} = 0.95

vs.

H_{A} : p_{1} > 0.95

as a function of the left-censoring rate q under normal, logistic, and extreme value (EVD) data generation. Sample size is

n = 80

.

Figure 5. (a) Minimum sample size n needed to achieve worst-case empirical power of 90% across normal, logistic, and extreme value data-generating distributions, using a piecewise-constant hazard-based model with two cut-points

b_{0}

and

b_{1}

at the specified empirical quantiles; and (b) worst-case empirical type I error rate across those distributions when the sample size in each case is the value n shown in (a). Tests are conducted at the 0.025 significance level with

p_{0} = 0.95

and

p_{1} = 0.99

, and each empirical rejection rate is obtained using 2000 simulations. Green cells indicate more desirable configurations of sample size and control of type I error while red cells indicate configurations with impractically large sample sizes or unacceptably elevated type I error rates.

Figure 5. (a) Minimum sample size n needed to achieve worst-case empirical power of 90% across normal, logistic, and extreme value data-generating distributions, using a piecewise-constant hazard-based model with two cut-points

b_{0}

and

b_{1}

at the specified empirical quantiles; and (b) worst-case empirical type I error rate across those distributions when the sample size in each case is the value n shown in (a). Tests are conducted at the 0.025 significance level with

p_{0} = 0.95

and

p_{1} = 0.99

, and each empirical rejection rate is obtained using 2000 simulations. Green cells indicate more desirable configurations of sample size and control of type I error while red cells indicate configurations with impractically large sample sizes or unacceptably elevated type I error rates.

Figure 6. Empirical c.d.f. of the log-can weight distribution. Open and closed circles, respectively, denote observations below and above the LLD. The dashed blue line shows the target reliability level of

p_{0} = 0.95

, while the dashed red lines show the non-parametric reliability estimate

{\hat{p}}_{1}

.

Figure 6. Empirical c.d.f. of the log-can weight distribution. Open and closed circles, respectively, denote observations below and above the LLD. The dashed blue line shows the target reliability level of

p_{0} = 0.95

, while the dashed red lines show the non-parametric reliability estimate

{\hat{p}}_{1}

.

Figure 7. Fitted c.d.f.s of the log-can weight distribution based on fully parametric and piecewise-constant hazard-based models. Dotted black lines show 95% CIs for the fully parametric models. Each piecewise-constant hazard-based model is specified by the vector of probabilities p defining the empirical percentiles

{\hat{Q}}_{p}

used as its cut-points; the vertical dashed grey lines show the values of these cut-points. The horizontal dashed blue lines show the target reliability level of

p_{0} = 0.95

, while the dashed red lines show the reliability estimates

{\hat{p}}_{1}

based on the fitted models.

Figure 7. Fitted c.d.f.s of the log-can weight distribution based on fully parametric and piecewise-constant hazard-based models. Dotted black lines show 95% CIs for the fully parametric models. Each piecewise-constant hazard-based model is specified by the vector of probabilities p defining the empirical percentiles

{\hat{Q}}_{p}

used as its cut-points; the vertical dashed grey lines show the values of these cut-points. The horizontal dashed blue lines show the target reliability level of

p_{0} = 0.95

, while the dashed red lines show the reliability estimates

{\hat{p}}_{1}

based on the fitted models.

Figure 8. Probability plots of the log-can weight distribution based on fully parametric models. Dashed grey lines show the estimated 5th through 95th percentiles of the distribution (horizontal) and the corresponding can weights (vertical). Dotted black lines show 95% CIs. The dashed blue line shows the target reliability level of

p_{0} = 0.95

, while the dashed red lines show the reliability estimates

{\hat{p}}_{1}

.

Figure 8. Probability plots of the log-can weight distribution based on fully parametric models. Dashed grey lines show the estimated 5th through 95th percentiles of the distribution (horizontal) and the corresponding can weights (vertical). Dotted black lines show 95% CIs. The dashed blue line shows the target reliability level of

p_{0} = 0.95

, while the dashed red lines show the reliability estimates

{\hat{p}}_{1}

.

Table 1. Estimates of

F (τ)

obtained under 10% left-censoring based on 2000 simulated samples of size n.

Table 1. Estimates of

F (τ)

obtained under 10% left-censoring based on 2000 simulated samples of size n.

$F (τ)$	Data-Generating Model	n	Normal	Piecewise-Constant ¹			Exact Test
$F (τ)$	Data-Generating Model	n	Normal	${\hat{Q}}_{90}$	${\hat{Q}}_{85, 90, 95}$	${\hat{Q}}_{85, 89, 93}$	Exact Test
0.95	Normal	20	0.949	0.957	0.955		0.949
		40	0.950	0.951	0.951	0.950	0.949
		60	0.950	0.949	0.950	0.946	0.949
		80	0.949	0.947	0.950	0.947	0.949
	Logistic	20	0.951	0.953	0.955		0.950
		40	0.950	0.943	0.949	0.945	0.949
		60	0.951	0.938	0.948	0.940	0.950
		80	0.951	0.935	0.947	0.941	0.950
	Extreme Value	20	0.925	0.959	0.957		0.951
		40	0.923	0.954	0.952	0.952	0.950
		60	0.923	0.952	0.951	0.948	0.949
		80	0.923	0.952	0.951	0.949	0.949
0.97	Normal	20	0.968	0.972	0.972		0.969
		40	0.969	0.970	0.970	0.970	0.969
		60	0.969	0.969	0.969	0.968	0.969
		80	0.969	0.968	0.969	0.968	0.969
	Logistic	20	0.973	0.969	0.973		0.970
		40	0.973	0.962	0.968	0.966	0.969
		60	0.974	0.958	0.966	0.961	0.969
		80	0.974	0.956	0.965	0.960	0.970
	Extreme Value	20	0.942	0.974	0.974		0.970
		40	0.941	0.972	0.971	0.972	0.969
		60	0.941	0.971	0.970	0.971	0.969
		80	0.941	0.971	0.970	0.971	0.969
0.99	Normal	20	0.988	0.990	0.990		0.990
		40	0.989	0.990	0.989	0.990	0.990
		60	0.989	0.990	0.989	0.991	0.990
		80	0.989	0.990	0.989	0.990	0.989
	Logistic	20	0.993	0.989	0.991		0.990
		40	0.994	0.987	0.989	0.988	0.990
		60	0.994	0.986	0.988	0.987	0.990
		80	0.995	0.986	0.987	0.986	0.990
	Extreme Value	20	0.964	0.989	0.990		0.991
		40	0.964	0.989	0.990	0.990	0.990
		60	0.964	0.989	0.990	0.991	0.990
		80	0.964	0.990	0.990	0.990	0.990

¹ Column heading indicates the

p_{1}, \dots, p_{K}

empirical percentiles

{\hat{Q}}_{p_{1}, \dots, p_{K}}

defining the interval cut-points of the fitted piecewise-constant hazard-based model.

Table 2. Empirical null hypothesis rejection rates at the

α = 0.025

significance level, testing for 95% reliability under 10% left-censoring, based on 2000 simulated samples of size n.

Table 2. Empirical null hypothesis rejection rates at the

α = 0.025

significance level, testing for 95% reliability under 10% left-censoring, based on 2000 simulated samples of size n.

$F (τ)$	Data-Generating Model	n	Normal	Piecewise-Constant ¹			Exact Test
$F (τ)$	Data-Generating Model	n	Normal	${\hat{Q}}_{90}$	${\hat{Q}}_{85, 90, 95}$	${\hat{Q}}_{85, 89, 93}$	p-Value	mid-p
0.95	Normal	20	0.039	0.035	0.015		0.000	0.000
		40	0.029	0.031	0.042	0.048	0.000	0.000
		60	0.031	0.026	0.035	0.053	0.000	0.045
		80	0.026	0.023	0.032	0.037	0.020	0.020
	Logistic	20	0.048	0.059	0.017		0.000	0.000
		40	0.049	0.047	0.044	0.056	0.000	0.000
		60	0.050	0.042	0.042	0.058	0.000	0.050
		80	0.058	0.028	0.030	0.033	0.021	0.021
	Extreme Value	20	0.002	0.027	0.018		0.000	0.000
		40	0.000	0.022	0.030	0.039	0.000	0.000
		60	0.001	0.023	0.029	0.044	0.000	0.048
		80	0.000	0.019	0.029	0.034	0.017	0.017
0.97	Normal	20	0.100	0.072	0.032		0.000	0.000
		40	0.124	0.087	0.077	0.090	0.000	0.000
		60	0.177	0.113	0.085	0.137	0.000	0.153
		80	0.225	0.125	0.111	0.128	0.090	0.090
	Logistic	20	0.167	0.132	0.037		0.000	0.000
		40	0.241	0.169	0.110	0.159	0.000	0.000
		60	0.340	0.155	0.119	0.166	0.000	0.164
		80	0.405	0.154	0.120	0.144	0.093	0.093
	Extreme Value	20	0.005	0.059	0.023		0.000	0.000
		40	0.003	0.064	0.062	0.080	0.000	0.000
		60	0.002	0.089	0.076	0.129	0.000	0.156
		80	0.000	0.108	0.098	0.116	0.078	0.078
0.99	Normal	20	0.355	0.192	0.055		0.000	0.000
		40	0.639	0.374	0.253	0.364	0.000	0.000
		60	0.848	0.524	0.406	0.557	0.000	0.538
		80	0.924	0.620	0.524	0.604	0.429	0.429
	Logistic	20	0.583	0.339	0.112		0.000	0.000
		40	0.846	0.539	0.383	0.486	0.000	0.000
		60	0.958	0.603	0.474	0.580	0.000	0.538
		80	0.989	0.648	0.541	0.608	0.448	0.448
	Extreme Value	20	0.029	0.132	0.044		0.000	0.000
		40	0.035	0.273	0.200	0.277	0.000	0.000
		60	0.048	0.435	0.364	0.515	0.000	0.557
		80	0.072	0.590	0.542	0.603	0.459	0.459

¹ Column heading indicates the

p_{1}, \dots, p_{K}

empirical percentiles

{\hat{Q}}_{p_{1}, \dots, p_{K}}

defining the interval cut-points of the fitted piecewise-constant hazard-based model.

Table 3. Reliability estimates at threshold

τ = log (1.12)

and corresponding confidence/likelihood intervals obtained for the distribution of log-can weights using fully parametric and piecewise-constant hazard-based models.

Table 3. Reliability estimates at threshold

τ = log (1.12)

and corresponding confidence/likelihood intervals obtained for the distribution of log-can weights using fully parametric and piecewise-constant hazard-based models.

Model ¹	$\hat{F} (τ)$	Confidence/Likelihood Interval ²
Normal	0.961	$(0.920, 0.983)$
Logistic	0.951	$(0.839, 0.986)$
Extreme Value	0.982	$(0.965, 0.992)$
$({\hat{Q}}_{90})$	0.976	$(0.952, 1.000)$
$({\hat{Q}}_{91}, {\hat{Q}}_{99})$	0.974	$(0.949, 1.000)$
$({\hat{Q}}_{92}, {\hat{Q}}_{98})$	0.972	$(0.922, 0.994)$
$({\hat{Q}}_{94}, {\hat{Q}}_{97})$	0.973	$(0.881, 0.993)$
$({\hat{Q}}_{85}, {\hat{Q}}_{93}, {\hat{Q}}_{95})$	0.975	$(0.929, 1.000)$

¹ Each piecewise-constant hazard-based model (last five rows) is specified by the vector of empirical pth percentiles

{\hat{Q}}_{p}

used as its cut-points. ² This is a two-sided 95% confidence interval for the fully parametric models (first three rows) and a one-sided 97.5% likelihood interval for the piecewise-constant hazard-based models (last five rows).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bumbulis, L.S.; Cook, R.J. Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection. Mathematics 2025, 13, 1274. https://doi.org/10.3390/math13081274

AMA Style

Bumbulis LS, Cook RJ. Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection. Mathematics. 2025; 13(8):1274. https://doi.org/10.3390/math13081274

Chicago/Turabian Style

Bumbulis, Laura S., and Richard J. Cook. 2025. "Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection" Mathematics 13, no. 8: 1274. https://doi.org/10.3390/math13081274

APA Style

Bumbulis, L. S., & Cook, R. J. (2025). Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection. Mathematics, 13(8), 1274. https://doi.org/10.3390/math13081274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection

Abstract

1. Introduction

2. Background on Methods

3. Inference Under Model Misspecification

4. A Piecewise-Constant Hazard-Based Model

4.1. Maximum Likelihood Estimation

4.2. Profile Likelihood Ratio Test

5. Empirical Studies

5.1. Bias, Power, Robustness, and Efficiency

5.2. Model Selection

6. Application

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Parameter Estimands in Misspecified Location-Scale Models with Left-Censored Data

Appendix B. Robust Asymptotic Variance of Parameter Estimators in Misspecified Location-Scale Models with Left-Censored Data

Appendix C. Assumed Asymptotic Variance of Parameter Estimators in Misspecified Location-Scale Models with Left-Censored Data

Appendix D. Testing Goodness of Fit Under LLD

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI