Next Article in Journal
Artificial Intelligence Techniques Used to Extract Relevant Information from Complex Social Networks
Next Article in Special Issue
A Robust and High-Dimensional Clustering Algorithm Based on Feature Weight and Entropy
Previous Article in Journal
Second-Order Side-Channel Analysis Based on Orthogonal Transform Nonlinear Regression
Previous Article in Special Issue
Change-Point Detection in a High-Dimensional Multinomial Sequence Based on Mutual Information
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates

Department of Mathematics and Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(3), 506; https://doi.org/10.3390/e25030506
Submission received: 28 January 2023 / Revised: 24 February 2023 / Accepted: 13 March 2023 / Published: 15 March 2023
(This article belongs to the Special Issue Statistical Methods for Modeling High-Dimensional and Complex Data)

Abstract

:
Under the Bayesian framework, this study proposes a Tweedie compound Poisson partial linear mixed model on the basis of Bayesian P-spline approximation to nonparametric function for longitudinal semicontinuous data in the presence of nonignorable missing covariates and responses. The logistic regression model is simultaneously used to specify the missing response and covariate mechanisms. A hybrid algorithm combining the Gibbs sampler and the Metropolis–Hastings algorithm is employed to produce the joint Bayesian estimates of unknown parameters and random effects as well as nonparametric function. Several simulation studies and a real example relating to the osteoarthritis initiative data are presented to illustrate the proposed methodologies.

1. Introduction

Semicontinuous data, characterized by nonnegative continuous value with a discrete mass of zero, appear frequently in many fields, such as medicine, health, economics, and ecology. Models for longitudinal semicontinuous data have, in particular, been receiving a lot of attention in two ways. The first approach is the two-part mixed model wherein a mixture of Bernoulli with positive support distribution is used to model zero and positive components separately (Olsen and Schafer [1]; Berk and Lachenbruch [2]; Tooze et al. [3]; Su et al. [4,5]; Liu et al. [6]; Zhou et al. [7]). However, Hasan et al. [8] and Yan and Ma [9] pointed out that such artificial separation based on the two-part modeling method breaks down the serial patterns in the analysis of time series and longitudinal data. The second approach is the compound Poisson mixed model for modelling longitudinal and repeated measurement or cluster data in an integral way. For example, Zhang [10] investigated several statistical inference methods for Tweedie compound Poisson linear mixed models from the frequentist and Bayesian perspective. Swallow et al. [11] developed a Bayesian hierarchical Tweedie regression model by incorporating serial temporal and spatial correlation into the Tweedie distribution in the analysis of longitudinal semicontinuous ecological data. Ye et al. [12] investigated the sensitivity analysis for priors in Tweedie compound Poisson random effect models under a Bayesian framework. In particular, Yan and Ma [9] incorporated serially dependent distribution-free random effects into the compound Poisson regression model for longitudinal semicontinuous data. However, all the abovementioned compound Poisson mixed models have limitations in that they either do not consider nonlinear smooth effects of covariates, such as time and age variables, or do not deal with missing responses and covariates.
It is well known that handling missing data has become an active research field in data analysis. Many methods have been proposed to make statistical inference on various regression models with nonignorable missing response or covariates. For example, Ibrahim et al. [13,14] proposed two methods by which to estimate unknown parameters in generalized linear models with nonignorable missing covariates and generalized linear mixed models with nonignorable missing responses by using the EM algorithm, respectively. In addition, based on these frequentist approaches of handling nonignorable missing response or covariate data, their Bayesian analogues have been extended to various regression models. For example, from a Bayesian perspective, see Huang et al. [15] for generalized linear models with nonignorably missing covariates, Lee and Tang [16] for nonlinear structural equation models with nonignorable missing data, Tang and Zhao [17] for nonlinear reproductive dispersion mixed models for longitudinal data with nonignorable missing covariates, Tang et al. [18] for a nonlinear dynamic factor analysis model with nonparametric prior and possible nonignorable missingness, Zhou et al. [7] for two-part hidden Markov models for semicontinuous longitudinal data with nonignorable missing covariates, Wang and Tang [19] for Bayesian quantile regression with mixed discrete and nonignorable missing covariates, and Wang et al. [20] for Bayesian latent factor on image regression with nonignorable missing data. Therefore, we propose a fully Bayesian method by which to simultaneously estimate unknown parameters, random effects and nonparametric function in a Tweedie compound Poisson partial linear mixed models on the basis of Bayesian P-spline approximation to nonparametric function in the presence of nonignorable missing covariates and responses, where the nonignorable missing data mechanism is specified by a logistic regression model.
For the sake of brevity and readability, we first introduce the main mathematical symbols and their descriptions in the rest of paper summarized in Table 1.
The paper is organized as follows. In Section 2, we give a description of the data. In Section 3, we describe a Tweedie compound Poisson partial linear mixed models in the presence of nonignorable missing covariates and responses. We present the Bayesian P-spline to model the nonparametric function. The logistic regression model is simultaneously used to specify the missing response and covariate mechanisms, and a sequence of one-dimensional conditional distributions is used to model the joint probability function of the missing covariates. In Section 4, the prior distributions and posterior distributions of unknown parameters and latent variables are presented. In Section 5, two simulation studies and an example are given to illustrate our proposed methodologies. In Section 6, we give some conclusions. In the Appendix A and Appendix B, the conditional distributions for Gibbs sampling and the Metropolis–Hastings algorithm are given.

2. Data Description

In this section, we describe the Osteoarthritis Initiative (OAI) database, which is available at https://www.oai.ucsf.edu (accessed on 4 April 2017). The OAI cohort study investigated the causes of knee osteoarthritis for 4796 patients aged 45 and older, and collected some information such as age, sex, and body mass index (BMI) for these patients at baseline, 12 months, 24 months, 36 months, and 48 months. Thus, this information is collected at most five times because of the missing data involved. In addition, this OAI study adopted the Western Ontario and McMaster Universities Arthritis Index (WOMAC) disability scores to assess the pain intensity in these patients with hip and/or knee osteoarthritis. Higher scores on the WOMAC score indicate worse pain, stiffness, and functional limitations for these patients. A sample of two patients (denoted by ID 9019406 and ID 9025191) from the OAI study is presented in Table 2.
The missing rates for the longitudinal WOMAC scores outcome at baseline, 12 months, 24 months, 36 months, 48 months are 0.3 % , 7.1 % , 10.9 % , 12.1 % , and 12.3 % , respectively. Moreover, the missing rates for covariate BMI at five different time points are 0 % , 11.5 % , 16.0 % , 18.9 % , and 21 % , respectively. It can be seen from Figure 1 that the observed WOMAC numeric score at 12 months, 24 months, 36 months, and 48 months are right-skewed with a large numerical proportion of zeros, where the bold line on the left of each histogram denotes the frequency for zero. Specifically, more than 36.9 % of the observations of all time points are zeros; thus we consider the WOMAC numeric score as a longitudinal semicontinuous response with missing data in this article.

3. Statistical Models

3.1. Tweedie Compound Poisson Distribution

As in Ma and Jørgensen [21], the probability density function of the Tweedie compound Poisson distribution has the following form,
f p ( y ; μ , ϕ ) = c p ( y ; ϕ ) exp 1 ϕ y μ 1 p 1 p μ 2 p 2 p ,
where p is the power parameter satisfying 1 < p < 2 , μ and ϕ are the mean parameter and dispersion parameter, respectively, and the expression for c p ( y ; ϕ ) is not analytically tractable when y > 0 . If a nonnegative random variable Y is distributed as a Tweedie compound Poisson distribution, then we simply denote Y T w p ( μ , ϕ ) in the rest of paper. Moreover, we have E ( Y ) = μ and Var ( Y ) = ϕ μ p . Furthermore, the random number Y of the Tweedie compound Poisson distribution is readily generated from the following stochastic representation
Y = i = 1 U X i ,
where U is distributed as a Poisson distribution with mean λ , X i is the independent and identically distributed gamma distribution with mean α γ and variance α γ 2 , and U and X i are assumed to be independent. After some calculations, the relationship between the two sets of parameters in Equations (1) and (2) are derived as
μ = λ α γ λ = μ 2 p ϕ ( 2 p ) p = α + 2 α + 1 α = 2 p p 1 ϕ = λ 1 p ( α γ ) 2 p 2 p γ = ϕ ( p 1 ) μ p 1 .
It follows from Equation (2) that the joint probability distribution of Y , U is given by
p Y , U y , u | λ , α , γ = p Y | U y | u , α , γ × p U u | λ = exp ( λ ) ( 0 , 0 ) y u α 1 exp y / γ Γ u α γ u α × λ u u ! exp λ R + × Z + .
Thus, the marginal distribution of Y , U has the abovementioned form given in Equation (1).

3.2. The Model

For modeling, we first introduce some notations. Let y i j be the longitudinal semicontinuous outcome with missing data of the ith patient with osteoarthritis measured at time t i j ( i = 1 , , n , j = 1 , , n i ). In the OAI study, n = 4796 is the number of patients with n i = 5 denoting the number of repeated observations per patient. Given random effects b i , Y i 1 , , Y i n i are conditionally independent and each Y i j | b i is assumed to be the Tweedie compound Poisson distribution, that is
Y i j | b i Tw p ( μ i j , ϕ ) ,
where μ i j is the conditional expectation of the response Y i j , ϕ is the dispersion parameter to be estimated and 1 < p < 2 . Inspired by GLMM method, the conditional expectation μ i j is modeled by
log ( μ i j ) = η i j = x i j T β + z i j T b i + g ( t i j ) ,
where β is a q × 1 vector of unknown regression parameter of interest, x i j is a q × 1 vector of covariates in the presence of missing data, b i is distributed as N r ( 0 , Σ ) , z i j is a r × 1 vector of covariates relating to the random effects b i , and g ( t i j ) denotes an unknown nonparametric function satisfying the twice-differentiable property in term of time effects t i j . In this article, the model defined in Equations (1) and (2) is referred to as a Tweedie compound Poisson partial linear mixed model.
Inspired by Lang and Brezger [22], we used the Bayesian P-spline method based on a linear combination of B-spline basic functions to approximate the unknown nonparametric function, that is
g ( t i j ) = h = 1 H ξ h B h ( t i j ) ,
where B h ( · ) is the hth B-spline basis function, H is the number of B-spline basis function, and ξ h is the B-spline coefficients to be estimated. Under the Bayesian framework, ξ h is treated as a random variable, and defined by the following first-order random walk; that is, ξ h = ξ h 1 + v h , where v h N ( 0 , τ ξ 2 ) for h = 2 , , H and the diffuse prior ξ 1 is proportional to constant. The variance parameter τ ξ 2 is viewed as a global smoothing parameter. Although it is easy to estimate the global smoothing parameter, this global smoothing parameter is difficult to characterize in terms of the highly oscillating features for the underlying nonparametric functions g ( t ) . To overcome this issue, we introduce the additional hyperparameters δ h as local smoothing parameters, which can improve the estimation of a function with significantly different curvatures at different points t i j . Thus, υ h is assumed to be the normal distribution with heterogeneous variance; that is, υ h N ( 0 , τ ξ 2 / δ h ) for h = 2 , , H . Furthermore, let ξ = ( ξ 1 , , ξ H ) T and δ = ( δ 2 , , δ H ) T . The prior distribution for ξ is derived in the matrix form
ξ | τ ξ 2 exp ( 1 2 τ ξ 2 ξ T Q ξ ) ,
where the penalty matrix Q is given by
Q = δ 2 δ 2 0 0 0 0 0 δ 2 δ 2 + δ 3 δ 3 0 0 0 0 0 δ 3 δ 3 + δ 4 δ 4 0 0 0 0 0 0 0 δ H 2 δ H 2 + δ H 1 δ H 1 0 0 0 0 0 δ H 1 δ H 1 + δ H 0 0 0 0 0 0 δ H 0 0 0 0 δ H δ H .
Here, the prior distribution of smooth parameter τ ξ 2 is distributed as an inverse gamma distribution; that is, p ( τ ξ 2 ) I G ( a τ , b τ ) .

3.3. Missing Data Mechanism Assumptions

In this article, let y i = ( y i 1 , , y i n i ) T be a n i × 1 vector of response ( i = 1 , , n ), and x i j be a q × 1 vector of covariates in the presence of missing data, respectively, whereas z i j are completely observed. In what follows, we assume that the missing data mechanism for response and covariates are nonignorable. Let y i = ( y o i T , y m i T ) T and x i j = ( x o i j T , x m i j T ) T , where y o i ( n 1 i × 1 ) and y m i ( n 2 i × 1 ) are vectors of the observed and missing components of responses in y i satisfying n 1 i + n 2 i = n i , respectively; x o i j ( q 1 i j × 1 ) and x m i j ( q 2 i j × 1 ) are vectors of the observed and missing covariate in x i j satisfying q 1 i j + q 2 i j = q , respectively. Let r y i = ( r y i 1 , , r y i n i ) T be an indicator variable which indicates whether y i = ( y i 1 , , y i n i ) T is missing; that is,
r y i j = 1 , y i j is missing , 0 , y i j is observed .
Inspired by Ibrahim et al. [14], it is common to specify a Bernoulli distribution for the following nonignorable missing mechanism. Thus, given y i and unknown parameter φ y , the conditional probability function of r y i j is distributed as
p ( r y i j | y i , φ y ) = Pr ( r y i j = 1 | y i , φ y ) r y i j 1 Pr ( r y i j = 1 | y i , φ y ) 1 r y i j ,
where Pr ( r y i j = 1 | y i , φ y ) is specified by a logistic regression model,
logit Pr ( r y i j = 1 | y i , φ y ) = φ y 0 + φ y 1 y i j + φ y 2 y i , j 1 = u y i j ,
in which logit Pr ( r y i j = 1 | y i , φ y ) = log Pr ( r y i j = 1 | y i , φ y ) 1 Pr ( r y i j = 1 | y i , φ y ) .
Similarly, let r x i j = ( r x i j 1 , , r x i j q ) T be an indicator variable, which indicates whether x i j is missing, and each r x i j k is defined as follows:
r x i j k = 1 , x i j k is missing , 0 , x i j k is observed .
For conditional probability density Pr ( r x i j | x i j , φ x ) , we consider the following nonignorable data mechanisms,
Pr ( r x i j | x i j , φ x ) = Pr ( r x i j q | x i j 1 , x i j q , r x i j 1 , , r x i j , q 1 , φ x q ) × Pr ( r x i j , q 1 | x i j 1 , x i j , q 1 , r x i j 1 , , r x i j , q 2 , φ x , q 1 ) × × Pr ( r x i j 2 | x i j 1 , x i j 2 , r x i j 1 , φ x 2 ) Pr ( r x i j 1 | x i j 1 , φ x 1 ) ,
in which Pr ( r x i j k | x i j 1 , , x i j k , r x i j 1 , , r x i j , k 1 , φ x k ) is defined by a logistic regression model
logit P r ( r x i j k = 1 | x i j 1 , , x i j k , r x i j 1 , , r x i j , k 1 , φ x k ) = φ x k 0 + φ x k 1 x i j 1 + + φ x k k x i j k + φ x k , k + 1 r x i j 1 + + φ x k , 2 k 1 r x i j , k 1 = v x i j k ,
where φ x k = ( φ x k 0 , φ x k 1 , , φ x k , 2 k 1 ) T .
In this article, we consider the following other type of the nonignorable missing mechanism for response and covariates. Specifically, in the first type, the nonignorable missing mechanism for response is specified by a logistic regression model,
logit Pr ( r y i j = 1 | y i , φ y , x i j ) = φ y 0 + φ y 1 y i j + φ y 2 x i j 1 + φ y 3 x i j 2 + + φ y , k + 1 x i j k ,
where x i j 1 , x i j k are all missing covariables. For missing covariate, Pr ( r x i j k | x i j 1 , , x i j k , y i j , φ x k ) is given by a logistic regression model,
logit Pr ( r x i j k = 1 | x i j 1 , , x i j k , y i j , φ x k ) = φ x k 0 + φ x k 1 x i j 1 + + φ x k k x i j k + φ x k , k + 1 y i j = v x i j k ,
where φ x k = ( φ x k 0 , φ x k 1 , , φ x k , k + 1 ) T .
In what follows, we assume that the covariate x i j = ( x i j 1 , x i j 2 , , x i j q ) T is continuous, and there is missingness in the first m dimension and complete observation in the rest q m dimension. According to Ibrahim et al. [13], the joint probability function of the missing covariates is simplied by a sequence of one-dimensional conditional distributions as follows,
p ( x i j 1 , x i j 2 , , x i j , m 1 , x i j m | x 0 , α ) = p ( x i j m | x 0 , x i j 1 , x i j 2 , , x i j , m 1 , α m ) × × p ( x i j 2 | x 0 , x i j 1 , α 2 ) , p ( x i j 1 | x 0 , α 1 ) ,
where i = 1 , , n and j = 1 , , n i , x 0 = ( x i j , m + 1 , x i j , m + 2 , , x i j q ) , α = ( α 1 , α 2 , , α m ) . Here, covariates x 0 do not need to be modelled because they are always observed. In addition, continuous missing covariates are generally assumed to follow the normal distribution. For example,
p ( x i j k | x 0 , x i j 1 , x i j 2 , , x i j , k 1 , α k ) N ( μ i j k , σ k 2 ) , k = 1 , 2 , , m ,
where mean parameter μ i j k is given by
μ i j k = α k , 0 + α k 1 x i j 1 + α k 2 x i j 2 + + α k , k 1 x i j , k 1 + α k k x i j , k + 1 + α k , k + 1 x i j , k + 2 + + α k , q 1 x i j q .
Here, α k = ( α k , 0 , α k 1 , , α k , q 1 ) .

4. Bayesian Inference

To investigate the Bayesian inference on parameters of interest, we first introduce the following notations. Let Y o = y o 1 , , y o n and Y m = y m 1 , , y m n be the sets of observed and missing values of response variables, respectively. Similarly, X o = x o 11 , , x o 1 n i , , x o n 1 , , x o n n n and X m = x m 11 , , x m 1 n i , , x m n 1 , , x m n n n are the sets of observed and missing values corresponding to covariates, respectively. Let U = { u i j : i = 1 , , n , j = 1 , , n i } denote the latent variable. Let b = { b 1 , , b n } and Z = { z i j : i = 1 , , n , j = 1 , , n i } denote the vector of random effects and the vector of covariates relating to random effects. Let T = { t i j : i = 1 , , n , j = 1 , , n i } be the vector of time effects relating to the nonparametric part. Denote the vector of indicator variables and parameters relating to missing data mechanism by r = { r y , r x } and φ = { φ x , φ y } , where r y = { r y 1 , , r y n } and r x = { r x 11 , , r x 1 n 1 , , r x n 1 , , r x n n n } . On the whole, let θ = { β , p , ϕ , Σ , ξ , τ ξ 2 , δ , φ } , α , and σ m 2 = { σ x k 2 : k = 1 , , m } be all the parameters to be estimated in our considered model. Given the observed data { Y o , X o , Z , T , r } , the joint posterior distribution of θ , α , σ m 2 is given by
p ( θ , α , σ m 2 | Y o , X o , Z , , T , r ) p ( Y o | X o , Z , T , θ ) p ( r | Y o , X o , φ ) p ( X o | α , σ m 2 ) p ( θ ) p ( α ) p ( σ m 2 ) ,
where p ( Y o | X o , Z , T , β , p , ϕ , ξ ) = p ( Y , U , b | X , Z , T , β , p , ϕ , ξ ) d Y m d X m d U d b and p ( r | Y o , X o , φ ) = p ( r | Y , X , φ ) d Y m d X m .
Clearly, it is difficult to generate the random sample from the posterior distribution p ( θ , α , σ m 2 | Y o , X o , Z , T , r ) because Equation (14) has high-dimensional integration. Thus, inspired by the data augmentation method (Tanner and Wong [23]), we adopt the following posterior distribution, p ( θ , α , σ m 2 | Y , X , T , Z , r , b , U ) , to solve the high-dimensional integration issue. Meanwhile, it is easy to generate the random sample from p ( Y m , X m , b , U , θ , α , σ m 2 | Y o , X o , Z , T , r ) via the Gibbs sampler (Geman and Geman [24]). That is, random samples Y m , X m , U , b , θ , α , σ m 2 are iteratively generated by means of the following conditional distributions p ( Y m | Y o , X , U , Z , T , b , r , θ ) , p ( X m | Y , X o , U , Z , T , b , r , φ x , α , σ m 2 ) , p ( U | Y , X , Z , T , p , ϕ ) , p ( b | Y , X , U , Z , T , θ ) , p ( β , p , ϕ , , ξ | Y , X , U , Z , T ) , p ( τ ξ 2 | ξ , δ ) , p ( δ l | ξ , τ ξ 2 ) , p ( φ | Y , X , r ) , p ( α | X , σ m 2 ) , and p ( σ m 2 | X , α ) . To derive the abovementioned conditional distributions, we adopt the following joint logarithmic likelihood function of ( Y , U , b )
l ( θ ; Y , U , b ) = log i = 1 n j = 1 n i p ( y i j , u i j | b i , θ ) p ( b i ) = y i j = 0 log p ( y i j , u i j | , b i , θ ) + y i j > 0 log p ( y i j , u i j | b i , θ ) + i = 1 n log ( p ( b i ) ) = 1 ϕ i = 1 n j = 1 n i μ i j 2 p 2 p y i j > 0 y i j ϕ ( p 1 ) μ i j p 1 + log Γ ( u i j 2 p p 1 ) + log u i j ! + log y i j + y i j > 0 u i j 2 p p 1 log y i j p 1 log ϕ p 1 log ( 2 p ) + i = 1 n log ( p ( b i ) ) .
Moreover, the prior distributions of β , p, ϕ , Σ , ξ , τ ξ 2 , δ ρ , φ , α and σ x k 2 are given by
p ( β ) N ( β 0 , A ) , logit ( p 1 ) N ( 0 , 10000 ) , log ( ϕ ) N ( 0 , 10000 ) , p ( Σ ) I W k ( ρ 0 , R 0 ) , ξ | τ ξ 2 exp ( 1 2 τ ξ 2 ξ T Q ξ ) , p ( τ ξ 2 ) I G ( a τ , b τ ) , p ( δ ρ ) Γ ( a δ , b δ ) , p ( φ y ) N ( φ y 0 , B ) , p ( φ x ) N ( φ x 0 , C ) , p ( α ) N ( α x 0 , D ) , p ( σ x k 2 ) I G ( a x k , b x k ) ,
where ρ 0 , a τ , b τ , a δ , b δ , a x k , b x k , φ x 0 , φ y 0 , α 0 , A , B , C , D , Σ , R 0 is the pregiven hyperparameter, N is the normal distribution, I W k ( · , · ) is the k-dimensional inverse Wishart distribution, Γ is the gamma distribution, IG is the inverse gamma distribution, and logit ( a ) = log a 1 a . As for the choices of hyperparameters with regard to the Bayesian P-spline method, Lang and Brezger [22] pointed out that a τ = 1 and a small value for b τ for example, b τ = 0.005 or b τ = 0.0005 , leading to an almost diffuse prior for τ ξ 2 . Moreover, the hyperparameters a δ and b δ are simultaneously taken to be 0.5 , which can characterize the highly oscillating features for some nonparametric functions. As for the power parameter p, Ye et al. [12] adopted the following priors to conduct the sensitivity analysis: p Uniform ( 1 , 2 ) , logit ( p 1 ) N ( 0 , 100 ) , p 1 Beta ( 0.1 , 0.1 ) and p 1 Beta ( 0.01 , 0.01 ) . As a result, Ye et al. [12] chose the logit ( p 1 ) N ( 0 , 100 ) prior as the optimal for p in the Tweedie compound Poisson distribution based on the sensitivity analysis. The choices of hyperparameters for other prior distributions are discussed in Section 5. The conditional distributions, Gibbs sampling and Metropolis–Hastings algorithm are shown in the Appendix A and Appendix B.

Bayesian Estimates

Let β ( l ) , ϕ ( l ) , p ( l ) , Σ ( l ) , α ( l ) , φ y ( l ) , φ x ( l ) , σ x k 2 ( l ) : l = 1 , , L be random samples from the joint posterior distribution p ( β , ϕ , p , Σ , α , φ y , φ x , σ x k 2 | Y , X , Z , r , b ) . The Bayesian estimates of parameters β , ϕ , p, Σ , α , φ y , φ x and σ x k 2 can be obtained by
β ^ = 1 L l = 1 L β ( l ) , ϕ ^ = 1 L l = 1 L ϕ ( l ) , p ^ = 1 L l = 1 L p ( l ) , Σ ^ = 1 L l = 1 L Σ ( l ) , α ^ = 1 L l = 1 L α ( l ) , φ ^ y = 1 L l = 1 L φ y ( l ) , φ ^ x = 1 L l = 1 L φ x ( l ) , σ ^ x k 2 = 1 L l = 1 L σ x k 2 ( l ) .
Similarly, the consistency estimates of the posterior covariance matrix var ( β , ϕ , p , Σ , α , φ y , φ x , σ x k 2 | Y , X , Z , r , b ) for parameters β , ϕ , p, Σ , α , φ y , φ x and σ x k 2 can be obtained from the sample covariance matrix of their random samples. For example, the posterior covariance matrix var ( β | Y , X , Z , r , b ) can be consistently estimated by
var ^ ( β | Y , X , Z , r , b ) = 1 L 1 l = 1 L ( β ( l ) β ^ ) ( β ( l ) β ^ ) T .
In addition, the corresponding standard deviation can be estimated by the diagonal elements of the sample covariance matrix of the random sample sequence.

5. Numerical Examples

In this section, two simulation studies and a real example relating to the OAI data are conducted to investigate the performance of our proposed Bayesian methodologies.

5.1. Simulation Studies

In the first simulation study, we assume that the longitudinal semicontinuous datasets { y i j : i = 1 , , n , j = 1 , n i } with n = 150 and n i = 4 are simulated from the Tweedie compound Poisson distribution Tw p ( μ i j , ϕ ) and the conditional mean μ i j is given by
log ( μ i j ) = x i j 1 β 1 + x i j 2 β 2 + x i j 3 β 3 + b i + g ( t i j ) ,
where covariate x i j 3 is generated from the standard normal distribution, and x i j 1 and x i j 2 are independently simulated from the normal distribution N ( α 10 + α 11 x i j 3 , σ x 1 2 ) and N ( α 20 + α 21 x i j 1 + α 22 x i j 3 , σ x 2 2 ) , respectively. In addition, the random effects b i are independent and identically distributed as N ( 0 , Σ ) , and the true curve of nonparametric function is given by g ( t ) = sin ( 2 π t ) with t i j U ( 0 , 1 ) . The true values of the abovementioned parameters are taken to be σ x 1 2 = 0.25 , σ x 2 2 = 0.36 , p = 1.5 , ϕ = 0.5 , Σ = 0.64 , β = ( β 1 , β 2 , β 3 ) T = ( 1 , 1 , 1 ) T , α 1 = ( α 10 , α 11 ) T = ( 0.05 , 0.5 ) T , α 2 = ( α 20 , α 21 , α 22 ) T = ( 0.9 , 0.05 , 0.9 ) T . In what follows, it is assumed that covariate x i j 3 is completely observed, while response y i j and covariates x i j 1 , x i j 2 are subject to missingness. Thus, the nonignorable missing mechanism for these three variables are modelled by the following logistic regression model,
logit Pr ( r y i j = 1 | y i j , y i , j 1 , φ y ) = φ y 0 + φ y 1 y i j + φ y 2 y i , j 1 , logit Pr ( r x i j 1 = 1 | x i j 1 , φ x 1 ) = φ x 10 + φ x 11 x i j 1 , logit Pr ( r x i j 2 = 1 | x i j 1 , x i j 2 , r x i j 1 , φ x 2 ) = φ x 20 + φ x 21 x i j 1 + φ x 22 x i j 2 + φ x 23 r x i j 1 ,
where the truth values of φ y , φ x 1 , and φ x 2 are given by φ y = ( φ y 0 , φ y 1 , φ y 2 ) T = ( 2.4 , 0.1 , 0.1 ) T , φ x 1 = ( φ x 10 , φ x 11 ) T = ( 2.5 , 0.1 ) T , φ x 2 = ( φ x 20 , φ x 21 , φ x 22 , φ x 23 ) T = ( 1.9 , 0.05 , 0.05 , 0.3 ) T . The missing data for x i j 1 , x i j 2 and y i j were generated by (18), and the average proportion of missing data for x i j 1 , x i j 2 and y i j on the basis of 50 replications are 8.7 % , 13.3 % , and 7.5 % , respectively.
To investigate the effect of different prior information on the Bayesian estimate for unknown parameters, three types of prior information are considered as follows.
Type I: The hyperparameters β 0 , φ y 0 , φ x 1 0 , φ x 2 0 , α x 1 0 and α x 2 0 are taken to be the truth values corresponding to their parameters; ρ 0 = 8 , R 0 = 2 , a τ = 1 , b τ = 0.005 , a δ = 0.5 and b δ = 0.5 ; A, B, C x 1 , C x 2 , D x 1 , and D x 2 are taken to be 0.25 I 3 , 0.25 I 3 , 0.25 I 2 , 0.25 I 4 , 0.25 I 2 , 0.25 I 3 , where I d denotes the d × d identity matrix. This scenario is viewed as a good piece of prior information.
Type II: The hyperparameters β 0 , φ y 0 , φ x 1 0 , φ x 2 0 , α x 1 0 and α x 2 0 are taken to be 2 times truth values corresponding to their parameters; A, B, C x 1 , C x 2 , D x 1 , and D x 2 are taken to be 0.75 I 3 , 0.75 I 3 , 0.75 I 2 , 0.75 I 4 , while other hyperparameters are taken to be the same as those given in Type I. This scenario is viewed as an inaccurate prior information.
Type III: The hyperparameters β 0 , φ y 0 , φ x 1 0 , φ x 2 0 , α x 1 0 , and α x 2 0 are taken to be zero vector, respectively; A, B, C x 1 , C x 2 , D x 1 , and D x 2 are taken to be 100 I 3 , 100 I 3 , 100 I 2 , 100 I 4 , while other hyperparameters are taken to be the same as those given in Type I. This scenario is viewed as a noninformative prior information.
For each of the above-generated 50 datasets, the hybrid algorithm combining the block Gibbs sampler and Metropolis–Hastings algorithm is used to produce the joint Bayesian estimates of unknown parameters, random effects, and nonparametric function. To ensure the convergence of the hybrid algorithm for each replication, we collected 5000 observations after 5000 iterations to calculate Bayesian estimates, which are reported in Table 3, where “Bias” is the difference between the mean value of parameters obtained from 50 replication and the truth value, “SD” is the standard deviation of the estimates on the basis of 50 replications, and “RMS” is the root mean square between the estimates on the basis of 50 replications and its true value. It can be seen from Table 3 that (i) Bayesian estimates for unknown parameters were reasonably accurate in our considered three different prior information because all Bias values are less than 0.1 , and (ii) the estimated values of SD and RMS are less than 0.5 and there is little difference between these two estimated values regardless of any priors. Thus, Bayesian estimates are not sensitive to our considered three prior pieces of information. In addition, examination of Figure 2 indicated that our proposed Bayesian P-spline method to approximate nonparametric function is validated to be feasible because the estimated curves of nonparametric function g ( t ) matched well with the true curve in our considered simulation studies.
In the second simulation study, the simulated setup is the same as the first simulation study except for the missing mechanism. Here, the other nonignorable missing mechanism model for y i j , x i j 1 and x i j 2 is given by
logit Pr ( r y i j = 1 | y i j , x i j 1 , x i j 2 , φ y ) = φ y 0 + φ y 1 y i j + φ y 2 x i j 1 + φ y 3 x i j 2 , logit Pr ( r x i j 1 = 1 | x i j 1 , y i j , φ x 1 ) = φ x 10 + φ x 11 x i j 1 + φ x 12 y i j , logit Pr ( r x i j 2 = 1 | x i j 1 , x i j 2 , y i j , φ x 2 ) = φ x 20 + φ x 21 x i j 1 + φ x 22 x i j 2 + φ x 23 y i j ,
where the true values of unknown parameters are taken to be φ y = ( φ y 0 , φ y 1 , φ y 2 , φ y 3 ) T = ( 2.2 , 0.1 , 0.1 , 0.1 ) T , φ x 1 = ( φ x 10 , φ x 11 , φ x 12 ) T = ( 2.5 , 0.2 , 0.1 ) T , φ x 2 = ( φ x 20 , φ x 21 , φ x 22 , a n d φ x 23 ) T = ( 1.9 , 0.05 , 0.05 , 0.3 ) T . The average proportion of missing data for x i j 1 , x i j 2 and y i j is 10 % , 11 % , and 8.5 % , respectively. Similar to the first simulation study, we also considered three different prior pieces of information for their corresponding hyperparameters. These findings in Table 4 and Figure 3 show that (i) all Bias values corresponding to unknown parameters are less than 0.1 except that the Bias value for φ x 10 is 0.1013 under the Type II prior and the Bias values for φ y 0 and φ x 23 are 0.1102 and 0.1029 under the Type III prior, respectively, and (ii) the estimated curves of nonparametric function g ( t ) also matched well with the true curve regardless of three different priors. Clearly, Bayesian estimates under the first two priors are better than those obtained from the Type III prior. All in all, our proposed Bayesian approach is feasible in our considered missing mechanism.

5.2. Real Example

In this section, the application of our proposed semiparametric Bayesian approach is illustrated by the analysis of longitudinal semicontinuous data from the OAI, which was discussed in Section 2. The OAI longitudinal data were analyzed by various approaches, such as Chen and Wehrly [25,26]. However, these authors only considered the observed data by reducing 4796 patients to 1499 patients and assumed the log transformation of the WOMAC score plus 1 to approximate the normal distribution. In this study, our scientific interest is to link the covariates, such as age, sex, and BMI with the outcome WOMAC score while accounting for nonignorable missing response with a point mass at zero and covariates data. In addition, we viewed age as the individual-level covariate modeled nonparametrically with the other covariate variables modeled parametrically. Let the outcome Y i j represent the WOMAC numeric score for the right knees of the ith ( i = 1 , 2 , , 4796 ) patient recorded at the jth time point ( j = 1 , 2 , , 5 corresponding to 0, 12, 24, 36, and 48 months). As discussed in Section 2, we regarded the WOMAC numerical score as a longitudinal semicontinuous outcome in this real example.
Here, given random effects b i , Y i j | b i follows the Tweedie compound Poisson distribution; that is Y i j | b i Tw p ( μ i j , ϕ ) . The conditional mean μ i j is simultaneously linked to covariates, random effects, and nonparametric function as follows,
log ( μ i j ) = β 0 + β 1 BMI i j + β 2 SEX i j + b i + g ( AGE i j ) ,
where the covariates SEX i j (1 for male or 2 for female) and AGE i j are completely observed, while the outcome Y i j and covariate BMI i j are missing and their corresponding missing rates are 8.5 % and 13.5 % , respectively. Furthermore, we consider the following missing data mechanisms for covariate BMI i j and outcome Y i j ,
logit Pr ( r BMI i j = 1 | BMI i j , φ B M I ) = φ B M I 0 + φ B M I 1 BMI i j , logit Pr ( r Y i j = 1 | Y i j , Y i , j 1 , φ Y ) = φ Y 0 + φ Y 1 Y i j + φ Y 2 Y i , j 1 ,
where φ B M I = ( φ B M I 0 , φ B M I 1 ) T , and φ Y = ( φ Y 0 , φ Y 1 , φ Y 2 ) T . In addition, we assume that the missing distribution for covariate BMI i j follows the normal distribution N ( α 10 + α 11 SEX i j , σ B M I 2 ) and random effect b i is distributed as the normal distribution N ( 0 , Σ ) . Bayesian estimates of unknown parameters and their corresponding standard error as well as the nonparametric function are displayed in Table 5 and Figure 4. Table 5 indicates that the covariates BMI and Sex have the positive significant effect on the WOMAC score at the significance level of 0.05 . The result shows that the WOMAC score increases as BMI increases. The higher the BMI score a patient has, the greater intensity of knee osteoarthritis the patient will suffer. The positive significant effect of the covariate Sex on the WOMAC score indicates that the average WOMAC score for females are higher compared with males. Women are more vulnerable to greater intensity than men. Chen and Wehrly [25,26] assumed a linear age effect on the WOMAC score parametrically, but an insignificant effect on the WOMAC score are presented in their studies. It appears from Figure 4 that the Bayesian estimates of nonparametric function g ( AGE i j ) based on the P-spline method has a significant nonlinear trend. Specifically, there was a sharp decrease from age 45 to approximately age 49 and from age 60 to approximately age 73, respectively. Moreover, stabilization seems to have started at age 73. In the missing mechanism model, we found that the Bayesian estimates of unknown parameters φ B M I 1 and φ Y 2 significantly deviated from zero. Thus, it is reasonable to incorporate the missing data into our proposed semiparametric Bayesian model in the analysis of OAI dataset because missing data mechanisms for Y i j and BMI i j are nonignorable.

6. Conclusions

In this paper, we have introduced a new Tweedie compound Poisson partial linear mixed model with nonignorable missing covariates and responses by assuming that the random effect is distributed as a multivariate normal distribution and the nonparametric function is modelled by the Bayesian P-splines simultaneously. The logistic regression model is simultaneously used to model the missing response and covariate mechanisms. This article has the following contributions: (i) our proposed Bayesian semiparametric mixed effects model can model both zero and positive components of the longitudinal semicontinuous data in an integral way while accounting for the nonignorable missing responses and covariates simultaneously; (ii) our proposed partial linear mixed models based on Bayesian P-spline can characterize the nonlinear smooth effects of covariate in the analysis of longitudinal semicontinuous data; (iii) the conditional distributions for the Gibbs sampling algorithm and Metropolis–Hastings algorithm of our proposed model are derived; and (iv) two simulation studies and a real example are used to illustrate the effectiveness and feasibility of our several considered missing mechanisms.

Author Contributions

Conceptualization, Z.W. and X.D.; methodology, X.D. and Z.W.; software, Z.W., X.D. and W.Z.; validation, Z.W., X.D. and W.Z.; formal analysis, Z.W. and X.D.; investigation, Z.W., X.D. and W.Z.; preparation of the original work draft, X.D. and Z.W.; visualization, Z.W. and W.Z.; supervision, funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (12161014), the National Statistical Science Research Project (2021LY011), the Guizhou Provincial Science and Technology Project ([2020]1Y009), the Innovative Exploration and New Academic Seedling Project of Guizhou University of Finance and Economics (No. 2022XSXMA18).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The research data are available on the website: https://www.oai.ucsf.edu, accessed on 4 April 2017.

Acknowledgments

We are grateful to Zhixian Yang for careful English editing during the preparation of the revision.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Conditional Distributions for the Gibbs Sampling Algorithm

First, the conditional distribution of the missing part used in the Gibbs sampling algorithm is as follows.
(1)
The logarithmic joint conditional distribution of ( y m i , u i ) is
log p ( y m i , u i | y o i , x i , z i , t i , b i , Σ , β , p , ϕ , ξ , r y i , φ y ) log p ( y i , u i | x i , z i , t i , b i , θ ) + log p ( r y i | y i , φ y ) = log j = 1 n i p ( y i j , u i j | x i j , z i j , t i j , b i , θ ) + log j = 1 n i p ( r y i j | y i , φ y ) = j = 1 n i log p ( y i j , u i j | x i j , z i j , t i j , b i , θ ) + j = 1 n i log p ( r y i j | y i , φ y ) = 1 ϕ j = 1 n i μ i j 2 p 2 p j = 1 n i y i j ϕ ( p 1 ) μ i j p 1 + log Γ ( u i j 2 p p 1 ) + log u i j ! + log y i j I y i j > 0 + j = 1 n i u i j 2 p p 1 log y i j p 1 log ϕ p 1 log ( 2 p ) I y i j > 0 + j = 1 n i log exp ( u y i j ) 1 + exp ( u y i j ) r y i j 1 exp ( u y i j ) 1 + exp ( u y i j ) 1 r y i j j = 1 n i y i j ϕ ( p 1 ) μ i j p 1 + log y i j I y i j > 0 + j = 1 n i u i j 2 p p 1 log y i j p 1 I y i j > 0 + j = 1 n i r y i j u y i j log [ 1 + exp ( u y i j ) ] ,
where u y i j = φ y 0 + φ y 1 y i j + φ y 2 y i , j 1 or u y i j = φ y 0 + φ y 1 y i j + φ y 2 x i j 1 + φ y 3 x i j 2 + + φ y , k + 1 x i j k ( x i j 1 , x i j 2 , , x i j k are all missing covariables), x i = x i 1 , , x i n i , z i = z i 1 , , z i n i , t i = t i 1 , , t i n i and u i = u i 1 , , u i n i .
(2)
The logarithmic conditional distribution of x m i is
log p ( x m i | x o i , u i , y i , z i , t i , b i , Σ , β , p , ϕ , ξ , α , φ x ) log p ( y i , u i | x i , z i , t i , b i , θ ) + log p ( r x i | x i , φ x ) + log p ( x m i | α ) = log j = 1 n i p ( y i j , u i j | x i j , z i j , t i j , b i , θ ) + log k = 1 q p ( r x i j k | x i , φ x ) + log p ( x m i | α ) = j = 1 n i log p ( y i j , u i j | x i j , z i j , t i j , b i , θ ) + k = 1 q log p ( r x i j k | x i , φ x ) + log p ( x m i | α ) = 1 ϕ j = 1 n i μ i j 2 p 2 p j = 1 n i y i j ϕ ( p 1 ) μ i j p 1 + log Γ ( u i j 2 p p 1 ) + log u i j ! + log y i j I y i j > 0 + j = 1 n i u i j 2 p p 1 log y i j p 1 log ϕ p 1 log ( 2 p ) I y i j > 0 + k = 1 p log exp ( v i j k ) 1 + exp ( v i j k ) r x i j k 1 exp ( v i j k ) 1 + exp ( v i j k ) 1 r x i j k + log p ( x m i | α ) 1 ϕ j = 1 n i μ i j 2 p 2 p j = 1 n i y i j ϕ ( p 1 ) μ i j p 1 I y i j > 0 + k = 1 p r x i j k v i j k log [ 1 + exp ( v i j k ) ] + log p ( x m i | α ) ,
where v x i j k = φ x k 0 + φ x k 1 x i j 1 + + φ x k k x i j k + φ x k , k + 1 r x i j 1 + + φ x k , 2 k 1 r x i j , k 1 or v x i j k = φ x k 0 + φ x k 1 x i j 1 + + φ x k k x i j k + φ x k , k + 1 y i j , p ( x m i | α ) is specified by (12) and (13).
(3)
The conditional distribution of φ y is
p ( φ y | Y , r y ) p ( r y | Y , φ y ) p ( φ y ) = i = 1 n j = 1 n i p ( r y i j | y i , φ y ) p ( φ y ) exp i = 1 n j = 1 n i ( r y i j u y i j log ( 1 + exp ( u y i j ) ) ) 1 2 ( φ y φ y 0 ) T B 1 ( φ y φ y 0 ) . ,
where u y i j = φ y 0 + φ y 1 y i j + φ y 2 y i , j 1 or u y i j = φ y 0 + φ y 1 y i j + φ y 2 x i j 1 + φ y 3 x i j 2 + + φ y , k + 1 x i j k ( x i j 1 , x i j 2 , , x i j k are all missing covariables).
(4)
The conditional distribution of φ x is
p ( φ x k | X , r x ) exp i = 1 n j = 1 n i ( r x i j k v x i j k log ( 1 + exp ( v x i j k ) ) ) 1 2 ( φ x k φ x k 0 ) T C 1 ( φ x k φ x k 0 ) ,
where v x i j k = φ x k 0 + φ x k 1 x i j 1 + + φ x k k x i j k + φ x k , k + 1 r x i j 1 + + φ x k , 2 k 1 r x i j , k 1 or v x i j k = φ x k 0 + φ x k 1 x i j 1 + + φ x k k x i j k + φ x k , k + 1 y i j .
(5)
The conditional distribution of α is
p ( α | X , r x ) p ( X | α , r x ) p ( α ) i = 1 n j = 1 n i p ( x i j q | x i j 1 , , x i j , q 1 , α q ) r x i j q p ( x i j 2 | x i j 1 , α 2 ) r x i j 2 × p ( x i j 1 | α 1 ) r x i j 1 p ( α ) = i = 1 n j = 1 n i k = 1 q p ( x i j k | x i j 1 , , x i j , k 1 , α q ) r x i j k p ( α ) i = 1 n j = 1 n i k = 1 q p ( x i j k | x i j 1 , , x i j , k 1 , α q ) r x i j k exp 1 2 ( α α 0 ) T D 1 ( α α 0 ) .
(6)
According to (13), we know x i j 1 N ( μ i j 1 , σ x 1 2 ) , and p ( σ x 1 2 ) I G ( a x 1 , b x 1 ) . Then the conditional distribution of σ x 1 2 is
p ( σ x 1 2 | x i j 1 , α 1 ) ( σ x 1 2 ) a x 1 1 exp ( b x 1 σ x 1 2 ) i = 1 n j = 1 n i 1 2 π σ x 1 2 exp ( x i j 1 μ i j 1 ) 2 2 σ x 1 2 ( σ x 1 2 ) a x 1 n n i 2 1 exp 1 σ x 1 2 b x 1 + 1 2 i = 1 n j = 1 n i ( x i j 1 μ i j 1 ) 2 .
Clearly, p ( σ x 1 2 | x i j 1 , α 1 ) I G ( a x 1 + n n i 2 , b x 1 + 1 2 i = 1 n j = 1 n i ( x i j 1 μ i j 1 ) 2 ) , where I G is the inverse gamma distribution. In addition, the conditional distributions of σ x 2 2 , σ x 3 2 , , σ x m 2 are the same as that of σ x 1 2 .
The conditional distribution of the nonparametric part used in the Gibbs sampling algorithm is as follows.
(1)
The logarithmic conditional distribution of ξ is
log p ( ξ | U , Y , X , T , β , p , ϕ , b , Σ ) 1 ϕ i = 1 n j = 1 n i μ i j 2 p 2 p y i j > 0 y i j ϕ ( p 1 ) μ i j p 1 1 2 τ ξ 2 ξ T Q ξ .
(2)
The conditional distribution of τ ξ 2 is
p ( τ ξ 2 | ξ , δ ) ( τ ξ 2 ) r a n k ( Q ) 2 a τ 1 exp 1 2 τ ξ 2 ξ T Q ξ b τ τ ξ 2 .
Clearly, τ ξ 2 | ξ , δ I G ( r a n k ( Q ) 2 + a τ , 1 2 ξ T Q ξ + b τ ) , where I G is the inverse gamma distribution.
(3)
The conditional distribution of δ ρ is
p ( δ ρ | ξ , τ ξ 2 ) ( δ ρ ) a δ 1 / 2 exp δ ρ ( ( ξ ρ ξ ρ 1 ) 2 2 τ ξ 2 + b δ ) .
Clearly, p ( δ ρ | ξ , τ ξ 2 ) Γ a δ + 1 2 , ( ξ ρ ξ ρ 1 ) 2 2 τ ξ 2 + b δ , where Γ is the gamma distribution.
Finally, the conditional distributions of other parameters used in the Gibbs sampling are as follows.
(1)
The logarithmic conditional distribution of β is
log p ( β | U , Y , X , T , ϕ , p , b , Σ , ξ ) 1 ϕ i = 1 n j = 1 n i μ i j 2 p 2 p y i j > 0 y i j ϕ ( p 1 ) μ i j p 1 1 2 ( β β 0 ) T A 1 ( β β 0 ) .
(2)
The logarithmic conditional distribution of p is
log p ( p | U , Y , X , T , β , ϕ , b , Σ , ξ ) 1 ϕ i = 1 n j = 1 n i μ i j 2 p 2 p y i j > 0 y i j ϕ ( p 1 ) μ i j p 1 + log Γ ( n i j 2 p p 1 ) + y i j > 0 n i j 2 p p 1 log y i j p 1 log ϕ p 1 log ( 2 p ) log p 1 2 p 2 2 × 10 , 000 .
(3)
The logarithmic conditional distribution of ϕ is
log p ( ϕ | U , Y , X , T , β , p , b , Σ , ξ ) 1 ϕ i = 1 n j = 1 n i μ i j 2 p 2 p y i j > 0 y i j ϕ ( p 1 ) μ i j p 1 + y i j > 0 n i j ( log ϕ p 1 ) log ϕ 2 2 × 10 , 000 .
(4)
The logarithmic conditional distribution of b is
log p ( b | U , Y , X , T , β , p , ϕ , ξ , Σ ) 1 ϕ i = 1 n j = 1 n j μ i j 2 p 2 p y i j > 0 y i j ϕ ( p 1 ) μ i j p 1 i = 1 n 1 2 b i T Σ 1 b i .
Thus, log p ( b i | U , Y , X , T , β , p , ϕ , ξ , Σ ) is proportional to
1 ϕ j = 1 n j μ i j 2 p 2 p j = 1 n i y i j ϕ ( p 1 ) μ i j p 1 I y i j > 0 1 2 b i T Σ 1 b i .
(5)
The conditional distribution of Σ is
p ( Σ | U , Y , X , T , β , p , ϕ , b , ξ ) Σ ( ρ 0 + n + d + 1 ) 2 exp 1 2 t r Σ 1 ( i = 1 n b i b i T + R 0 ) .
Clearly, Σ | U , Y , X , T , β , p , ϕ , b , ξ I W d ( n + ρ 0 , R 0 + i = 1 n b i b i T ) , where I W k ( · , · ) is the k-dimensional inverse Wishart distribution.
(6)
The logarithmic conditional distribution of U is
log p ( U | Y , ϕ , p ) y i j > 0 log Γ ( u i j 2 p p 1 ) + log u i j ! + log y i j + y i j > 0 u i j 2 p p 1 log y i j p 1 log ϕ p 1 log ( 2 p ) ,
thus,
log p ( u i j | Y , ϕ , p ) log Γ ( u i j 2 p p 1 ) + log u i j ! + log y i j I y i j > 0 + u i j 2 p p 1 log y i j p 1 log ϕ p 1 log ( 2 p ) I y i j > 0 .

Appendix A. Metropolis–Hastings Algorithm

To implement the Metropolis–Hastings algorithm, we assume that the current iteration values of u i j , β , p, ϕ , b i , and ξ are u i j ( l ) , β ( l ) , p ( l ) , ϕ ( l ) , b i ( l ) , and ξ ( l ) , and the proposal distributions of the new random samples u i j * , β * , p * , ϕ * , b i * , and ξ * were selected as zero truncated Poisson distribution f ( u i j ( l ) ; σ n 2 λ | u i j ( l ) > 0 ) , multivariate normal distribution N ( β ( l ) , σ β 2 Ω β ) , normal distribution log p 1 2 p N ( p ( l ) , σ p 2 ) , normal distribution log ϕ N ( ϕ ( l ) , σ ϕ 2 ) , normal distribution N ( b i ( l ) , σ b i 2 Ω b i ) , and multivariate normal distribution N ( ξ ( l ) , σ ξ 2 Ω ξ ) , respectively, where λ denotes the mean parameter of the Poisson distribution, N denotes the normal distribution, and σ u 2 , σ β 2 , σ p 2 , σ ϕ 2 , σ b i 2 , and σ ξ 2 are the tuned parameters, respectively. Furthermore, Ω β , Ω b i , and Ω ξ are derived as
Ω β = 2 p ϕ i = 1 n j = 1 n i μ i j 2 p x i j T x i j + p 1 ϕ y i j > 0 y i j μ i j 1 p x i j T x i j + A 1 1 , Ω b i = 2 p ϕ j = 1 n i μ i j 2 p + p 1 ϕ j = 1 n i y i j μ i j 1 p I y i j > 0 + 1 1 , Ω ξ = 2 p ϕ i = 1 n j = 1 n i μ i j 2 p B T ( t i j ) B ( t i j ) + p 1 ϕ y i j > 0 y i j μ i j 1 p B T ( t i j ) B ( t i j ) + Q τ ξ 2 1 ,
where i = 1 , , n , j = 1 , n i . Finally, we give the accepted probability of u i j , β , p, ϕ , b i , and ξ used in the Metropolis–Hastings algorithm as follows:
min p ( u i j * | Y , p , ϕ ) f ( u i j ( t ) ; σ u 2 , λ | u i j ( l ) > 0 ) p ( u i j ( l ) | Y , p , ϕ ) f ( u i j * ; σ u 2 , λ | u i j * > 0 ) , 1 , min p ( β * | Y , X , Z , T , ϕ , p , b , Σ , ξ ) p ( β ( l ) | Y , X , Z , T , ϕ , p , b , Σ , ξ ) , 1 , min p ( p * | U , Y , X , Z , T , β , ϕ , b , Σ , ξ ) p ( p ( l ) | U , Y , X , Z , T , β , ϕ , b , Σ , ξ ) , 1 , min p ( ϕ * | U , Y , X , Z , T , β , p , b , Σ , ξ ) p ( ϕ ( l ) | U , Y , X , Z , T , β , p , b , Σ , ξ ) , 1 , min p ( b i * | U , Y , X , Z , T , β , p , ϕ , ξ , Σ ) p ( b i ( l ) | U , Y , X , Z , T , β , p , ϕ , ξ , Σ ) , 1 , min p ( ξ * | U , Y , X , Z , T , β , p , ϕ , b , Σ ) p ( ξ ( l ) | U , Y , Z , X , T , β , p , ϕ , b , Σ ) , 1 .

References

  1. Olsen, M.K.; Schafer, J.L. A two-part random-effects model for semicontinuous longitudinal data. J. Am. Stat. Assoc. 2001, 96, 730–745. [Google Scholar] [CrossRef]
  2. Berk, K.; Lachenbruch, P.A. Repeated measures with zeros. Stat. Methods Med. Res. 2002, 11, 303–316. [Google Scholar] [CrossRef]
  3. Tooze, J.A.; Grunwald, G.K.; Jones, R.H. Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res. 2002, 11, 341–355. [Google Scholar] [CrossRef]
  4. Su, L.; Tom, B.D.; Farewell, V.T. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 2009, 10, 374–389. [Google Scholar] [CrossRef]
  5. Su, L.; Tom, B.D.; Farewell, V.T. A likelihood-based two-part marginal model for longitudinal semicontinuous data. Stat. Methods Med. Res. 2015, 24, 194–205. [Google Scholar] [CrossRef] [Green Version]
  6. Liu, L.; Strawderman, R.L.; Johnson, B.A.; O’Quigley, J.M. Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study. Stat. Methods Med. Res. 2016, 25, 133–152. [Google Scholar] [CrossRef]
  7. Zhou, X.X.; Kang, K.; Song, X.Y. Two-part hidden Markov models for semicontinuous longitudinal data with nonignorable missing covariates. Stat. Med. 2020, 39, 1801–1816. [Google Scholar] [CrossRef]
  8. Hasan, M.T.; Yan, G.H.; Ma, R.J. Analysis of periodic patterns of daily precipitation through simultaneous modeling of its serially observed occurrence and amount. Environ. Ecol. Stat. 2014, 21, 811–824. [Google Scholar] [CrossRef]
  9. Yan, G.H.; Ma, R.J. Modelling occurrence and quantity of longitudinal semicontinuous data simultaneously with nonparametric unobserved heterogeneity. Can. J. Stat. 2023, in press. [Google Scholar]
  10. Zhang, Y.W. Likelihood-based and Bayesian Methods for Tweedie Compound Poisson Linear Mixed Models. Stat. Comput. 2013, 23, 743–757. [Google Scholar] [CrossRef]
  11. Swallow, B.; Buckland, S.T.; King, R.; Toms, M.P. Bayesian hierarchical modelling of continuous non-negative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter. Biom. J. 2016, 58, 357–371. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Ye, T.; Lachos, V.H.; Wang, X.J.; Dey, D.K. Comparisons of zero-augmented continuous regression models from a Bayesian perspective. Stat. Med. 2021, 40, 1073–1100. [Google Scholar] [CrossRef]
  13. Ibrahim, J.G.; Lipsitz, S.R.; Chen, M.H. Missing covariates in generalized linear models when the missing data mechanism is non-ignorable. J. R. Stat. Soc. Ser. B 1999, 61, 173–190. [Google Scholar] [CrossRef]
  14. Ibrahim, J.G.; Chen, M.H.; Lipsitz, S.R. Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika 2001, 88, 551–564. [Google Scholar] [CrossRef]
  15. Huang, L.; Chen, M.H.; Ibrahim, J.G. Bayesian analysis for generalized linear models with nonignorably missing covariates. Biometrics 2005, 61, 767–780. [Google Scholar] [CrossRef]
  16. Lee, S.Y.; Tang, N.S. Bayesian analysis of nonlinear structural equation models with nonignorable missing data. Psychometrika 2006, 71, 541–564. [Google Scholar] [CrossRef]
  17. Tang, N.S.; Zhao, H. Bayesian analysis of nonlinear reproductive dispersion mixed models for longitudinal data with nonignorable missing covariates. Commun. Stat. Simul. Comput. 2014, 43, 1265–1287. [Google Scholar] [CrossRef]
  18. Tang, N.S.; Chow, S.M.; Ibrahim, J.G.; Zhu, H.T. Bayesian sensitivity analysis of a nonlinear dynamic factor analysis model with nonparametric prior and possible nonignorable missingness. Psychometrika 2017, 82, 875–903. [Google Scholar] [CrossRef]
  19. Wang, Z.Q.; Tang, N.S. Bayesian quantile regression with mixed discrete and nonignorable missing covariates. Bayesian Anal. 2020, 15, 579–604. [Google Scholar] [CrossRef]
  20. Wang, X.Q.; Song, X.Y.; Zhu, H.T. Bayesian latent factor on image regression with nonignorable missing data. Stat. Med. 2021, 40, 920–932. [Google Scholar] [CrossRef] [PubMed]
  21. Ma, R.; Jørgensen, B. Nested generalized linear mixed models: An orthodox best linear unbiased predictor approach. J. R. Stat. Soc. Ser. B 2007, 69, 625–641. [Google Scholar] [CrossRef]
  22. Lang, S.; Brezger, A. Bayesian P-splines. J. Comput. Graph. Stat. 2004, 13, 183–212. [Google Scholar] [CrossRef] [Green Version]
  23. Tanner, M.A.; Wong, W.H. The Calculation of Posterior Distributions by Data Augmentation. J. Am. Stat. Assoc. 1987, 82, 528–540. [Google Scholar] [CrossRef]
  24. Geman, S.; Geman, D. Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 721–741. [Google Scholar] [CrossRef]
  25. Chen, H.C.; Wehrly, T.E. Assessing correlation of clustered mixed outcomes from a multivariate generalized linear mixed model. Stat. Med. 2015, 34, 704–720. [Google Scholar] [CrossRef]
  26. Chen, H.C.; Wehrly, T.E. Approximate uniform shrinkage prior for a multivariate generalized linear mixed model. J. Multivar. Anal. 2016, 145, 148–161. [Google Scholar] [CrossRef]
Figure 1. Histogram for the observed WOMAC numeric score in the OAI dataset.
Figure 1. Histogram for the observed WOMAC numeric score in the OAI dataset.
Entropy 25 00506 g001
Figure 2. The estimated function and true function of g ( t ) for three priors: type I (left panel), type II (middle panel), and type III (right panel) in the first simulation.
Figure 2. The estimated function and true function of g ( t ) for three priors: type I (left panel), type II (middle panel), and type III (right panel) in the first simulation.
Entropy 25 00506 g002
Figure 3. The estimated function and true function of g ( t ) for three priors: type I (left panel), type II (middle panel), type III (right panel) in the second simulation.
Figure 3. The estimated function and true function of g ( t ) for three priors: type I (left panel), type II (middle panel), type III (right panel) in the second simulation.
Entropy 25 00506 g003
Figure 4. Nonparametric estimate of effects of age on the WOMAC numeric score in the OAI dataset.
Figure 4. Nonparametric estimate of effects of age on the WOMAC numeric score in the OAI dataset.
Entropy 25 00506 g004
Table 1. Symbols and description.
Table 1. Symbols and description.
SymbolsDescription
UA Poisson distribution random variable
μ The mean parameter of Tweedie compound Poisson distribution
ϕ The dispersion parameter of Tweedie compound Poisson distribution
pThe power parameter of Tweedie compound Poisson distribution
β A q × 1 vector of unknown regression parameter
b i A d × 1 vector of random effect ( b i N d ( 0 , Σ ) )
Σ A d × d covariance matrix
g ( · ) An unknown nonparametric function
ξ An H × 1 vector of B-spline coefficient ( ξ = ( ξ 1 , , ξ H ) T )
B ( · ) The B-spline basis function
τ ξ 2 A global smoothing parameter
QThe H × H penalty matrix with elements δ = ( δ 2 , , δ H ) T
Y o Set of observed values of response variable Y  ( Y o = { y o 1 , , y o n } )
Y m Set of missing values of response variable Y  ( Y m = { y m 1 , , y m n } )
X o Set of observed values of covariates X  ( X o = { x o 11 , x o 1 n 1 , , x o n 1 , , x o n n n } )
X m Set of missing values of covariates X  ( X m = { x m 11 , x m 1 n 1 , , x m n 1 , , x m n n n } )
ZVector of covariates relating to random effects ( Z = { z i j : i = 1 , , n , j = 1 , n i } )
TVector of time effects ( T = { t i j : i = 1 , , n , j = 1 , , n i } )
rVector of indicator variables relating to missing data mechanism ( r = { r y , r x } )
φ Vector of parameters relating to missing data mechanism ( φ = { φ x , φ y } )
α The parameter in the covariables’ distribution ( α = ( α 1 , , α m ) )
σ m 2 The parameter in the covariables’ distribution ( σ m 2 = { σ x k 2 : k = 1 , , m } )
Table 2. Sample data from the OAI study (M denotes the missing data).
Table 2. Sample data from the OAI study (M denotes the missing data).
IDMonthResponse VariableCovariates
WOMAC ScoreBMISEXAGE
90194060023.5Male71
901940612122.6Male72
901940624022.9Male73
901940636MMMale74
901940648MMMale75
90251910124.2Female55
902519112024.7Female56
9025191241.0624.7Female57
902519136125.4Female58
902519148025.4Female59
Table 3. Bayesian estimates of parameters in the first simulation study.
Table 3. Bayesian estimates of parameters in the first simulation study.
Par.Type IType IIType III
BiasSDRMSBiasSDRMSBiasSDRMS
β 1 0.00070.07950.0787−0.00680.06640.06610.00420.07630.0756
β 2 −0.00110.05090.05040.00260.05670.05620.00450.05530.0549
β 3 0.00040.07230.0716−0.00510.06780.0673−0.00680.08150.0810
p−0.01430.02650.0299−0.01800.02270.0288−0.02050.02060.0289
ϕ −0.02470.04730.0529−0.03550.04080.0538−0.03710.04060.0547
Σ −0.03550.09740.1028−0.01950.08260.0841−0.03540.08710.0932
α 10 0.01930.06530.06750.01590.06730.06850.02930.07490.0797
α 11 −0.00310.08660.0858−0.01480.06500.06600.00230.08130.0805
α 20 0.01950.06770.06980.02900.07240.0773−0.00210.07330.0726
α 21 0.01520.12050.1203−0.00750.11250.11160.00410.14790.1464
α 22 −0.01710.09630.09680.01440.08270.0831−0.01230.10360.1033
φ y 0 0.00160.14530.1439−0.06720.16800.1794−0.05650.16030.1685
φ y 1 −0.01560.05920.06070.00030.04680.04630.00270.05390.0535
φ y 2 −0.00050.06400.06330.00900.05200.05220.00540.05690.0566
φ x 10 −0.01190.13080.1301−0.08020.16600.1828−0.06220.15680.1673
φ x 11 −0.02750.15520.15610.00950.20890.20700.01260.18890.1874
φ x 20 −0.01390.15120.1503−0.04610.21330.2162−0.08860.17230.1921
φ x 21 −0.03230.19150.1923−0.02830.19140.19150.06160.25770.2624
φ x 22 0.00880.13190.13090.02700.14260.1437−0.03750.15280.1558
φ x 23 −0.04520.27070.27170.00180.31270.3096−0.01700.45630.4521
σ x 1 2 0.02000.01700.02620.02450.01820.03040.02150.01980.0291
σ x 2 2 0.02370.02330.03300.02750.02140.03470.02990.02200.0370
Table 4. Bayesian estimation of parameters in the second simulation study.
Table 4. Bayesian estimation of parameters in the second simulation study.
Par.Type IType IIType III
BiasSDRMSBiasSDRMSBiasSDRMS
β 1 −0.00730.07940.0790−0.00500.07300.07240.00290.07170.0711
β 2 −0.01230.05510.05590.01190.05600.0567−0.00440.05770.0573
β 3 0.01210.07640.0766−0.01240.07860.07880.00030.07110.0704
p−0.01820.02740.0327−0.01320.02550.0285−0.01400.02370.0273
ϕ −0.02980.04440.0531−0.02620.04060.0480−0.02670.04160.0491
Σ −0.03430.07370.0806−0.04230.09040.0990−0.05450.08700.1019
α 10 0.08850.07850.11770.08370.05950.10230.08520.07480.1129
α 11 0.00570.07750.07700.01800.06270.06460.01060.07810.0780
α 20 −0.04660.08340.0948−0.06030.06610.0889−0.04580.07110.0840
α 21 −0.01260.13300.1322−0.04930.12820.1362−0.03920.13760.1418
α 22 0.00650.10150.10070.00710.10390.10310.00010.09930.0983
φ y 0 0.00940.13940.1383−0.07690.19870.2112−0.11020.19170.2194
φ y 1 0.00700.05090.05080.01080.04980.05050.01670.04530.0479
φ y 2 −0.02510.18070.1806−0.00550.19770.19580.01140.22910.2271
φ y 3 0.03170.11310.1164−0.02030.13850.1386−0.03550.14350.1464
φ x 10 −0.05400.14400.1524−0.10130.17610.2016−0.04490.18600.1895
φ x 11 −0.00420.19220.19030.03220.20890.2093−0.02260.20190.2011
φ x 12 0.00850.05050.05070.00940.06980.06970.00610.05630.0561
φ x 20 −0.03370.20630.2070−0.02460.22340.2225−0.00770.25640.2540
φ x 21 −0.04780.20890.21230.01870.20690.20570.01690.21820.2167
φ x 22 0.00070.14720.1457−0.02010.13220.13240.00450.17440.1727
φ x 23 −0.02210.13410.1346−0.07800.17680.1916−0.10290.20640.2288
σ x 1 2 0.02730.02190.03490.02300.01940.03000.02510.02160.0330
σ x 2 2 0.03530.03060.04650.03360.02970.04470.03640.02600.0446
Table 5. Bayesian estimates and standard errors in the real example.
Table 5. Bayesian estimates and standard errors in the real example.
ParameterEstSD
β 0 −0.4928350.0778
β 1 0.0584050.0024
β 2 0.1415410.0370
p1.2587450.0031
ϕ 3.1469450.0245
Σ 1.8407970.0550
α 10 21.0041050.3076
α 11 4.5200230.1779
φ Y 0 −2.6949600.0196
φ Y 1 0.0005220.0024
φ Y 2 0.0429580.0023
φ B M I 0 −1.3386320.1937
φ B M I 1 −0.0170450.0067
σ B M I 2 30.0679840.5088
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Z.; Duan, X.; Zhang, W. Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates. Entropy 2023, 25, 506. https://doi.org/10.3390/e25030506

AMA Style

Wu Z, Duan X, Zhang W. Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates. Entropy. 2023; 25(3):506. https://doi.org/10.3390/e25030506

Chicago/Turabian Style

Wu, Zhenhuan, Xingde Duan, and Wenzhuan Zhang. 2023. "Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates" Entropy 25, no. 3: 506. https://doi.org/10.3390/e25030506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop