Small Area Estimation under Poisson–Dirichlet Process Mixture Models

Qiu, Xiang; Ke, Qinchun; Zhou, Xueqin; Liu, Yulu

doi:10.3390/axioms13070432

Open AccessArticle

Small Area Estimation under Poisson–Dirichlet Process Mixture Models

¹

School of Science, Shanghai Institude of Technology, Shanghai 201418, China

²

Shanghai Institute of Applied Mathematics and Mechanics, School of Mechanics and Engineering Science, Shanghai University, Shanghai 200072, China

^*

Author to whom correspondence should be addressed.

Axioms 2024, 13(7), 432; https://doi.org/10.3390/axioms13070432

Submission received: 18 April 2024 / Revised: 15 June 2024 / Accepted: 25 June 2024 / Published: 27 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we propose an improved Nested Error Regression model in which the random effects for each area are given a prior distribution using the Poisson–Dirichlet Process. Based on this model, we mainly investigate the construction of the parameter estimation using the Empirical Bayesian(EB) estimation method, and we adopt various methods such as the Maximum Likelihood Estimation(MLE) method and the Markov chain Monte Carlo algorithm to solve the model parameter estimation jointly. The viability of the model is verified using numerical simulation, and the proposed model is applied to an actual small area estimation problem. Compared to the conventional normal random effects linear model, the proposed model is more accurate for the estimation of complex real-world application data, which makes it suitable for a broader range of application contexts.

Keywords:

small area estimation; nested error regression models; Bayesian nonparametric estimation; Poisson–Dirichlet process; MCMC algorithm

MSC:

62G05; 62J05

1. Introduction

The study of small area estimation has received more and more attention due to the increasing demand for reliable estimates in governmental departments, enterprises, agriculture, commerce, and socioeconomic fields. The issue of how to improve the estimation accuracy and obtain reliable estimates of subpopulation parameters is known as small area estimation. The term “small area” usually refers to a small geographic area, such as a county, city, state, or a small population group that is crossclassified by demographic characteristics, such as a certain age, gender, or ethnic group. Since a majority of sample surveys are designed to estimate certain parameters for the overall population, there are no particular requirements for sample sizes for subgroups, which can lead to the possibility that the parameter estimates for subgroups may not be as accurate as desired. Furthermore, if the sample size of a survey is determined to achieve a specified level of accuracy on a large scale, there may not be enough resources to conduct a second survey aimed at achieving a similar level of accuracy on a smaller scale.

The application of indirect estimation, in particular model-based estimation for small areas, to address this issue has been widely accepted. Recognizing the commonality or similarity of several small areas in certain aspects, we can construct reliable small area estimates with the help of well-defined models, thus borrowing relevant auxiliary information from other variables in the area, from neighboring small areas, or even from other sources, such as population census data. This concept has also greatly contributed to the development of the small area estimation problem, and various models and methods have been proposed to estimate small area parameters by borrowing auxiliary information. For example, one can check the publications of Rao [1] and Pfeffermann [2] for understanding and reviewing. Rao and Molina [3] give a detailed and previous description of the models and methods for small area estimation.

Fay and Herriot [4] were the first to propose an area level model, the Fay-Herriot model, to solve the problem of small area estimation of per capita income in the USA. Battese, Harter, and Fuller [5] were the first to adopt the unit level model, the Nested Error Regression Model, which combines agricultural survey data and satellite data, to estimate the average acreage of crops in twelve counties in the state of Iowa in the United States. With the growing demand for small area estimation, there has been a large amount of literature on extending these two base models to accommodate different data structures and requirements. For example, Serena et al. [6] provided two extensions of the Fay–Herriot model. The first one is a multivariate extension, which jointly models the survey estimates of two or more different but related demographic characteristics; the second extension is to build a functional measurement error model into the original Fay–Herriot model to consider the case of covariates with errors. Yang and Chen [7] improved the Nested Error Regression model by clustering small areas based on their centers to obtain a new model and estimate the model parameters based on the model.

The accuracy and precision of small area estimators depend on the validity of the model. In this context, we pay particular attention to an important underlying assumption of the model, namely the normality of random effects. The assumption of normality of random effects is not necessarily reasonable for no reason other than computational convenience. In particular, this assumption is difficult to detect in practice because it involves unobservable quantities. Therefore, it is necessary to investigate the flexible modeling of random effects. Datta et al. [8] considered the case where random effects do not exist and proposed a bootstrap method for hypothesis testing that allows us to determine the presence or absence of random effects in various regions of the model. This is because the use of random effects will increase the variability of the estimates when the actual data structure does not include random effects. Sugasawa and Kubokawa [9] also proposed a Nested Error Regression model that utilizes uncertain random effects and gave estimates of the corresponding model parameters. Ferrante and Pacei [10] considered the asymmetry of the data they examined and thus relaxed the normality assumption of the Fay–Herriot model by adopting a skewed distribution. They proposed a multivariate skewed small area model and applied the model to the business statistics of firms. Fabrizi and Trivisano [11] considered two extensions of the Fay–Herriot model where both extensions were for the assumptions of the random effects of the model, which were an exponential power distribution and a skewed power distribution. Chakraborty et al. [12] proposed a mixture model based on the Fay–Herriot model with random effects obeying a two-component normal form. Diallo and Rao [13] concerned themselves with the skewed distribution of the response variable and suggested replacing the assumption of normality for both random effects and errors with a skew normal distribution. Tsujino and Kubokawa [14] investigated a model in which the random effects remain normal but the errors obey a skew normal distribution, and they gave an expression for predicting the random effects.

In small area estimation, parametric models are usually used to build mixed models to achieve estimation. However, parametric models suffer from model mis-specification, which may produce unreliable small area estimations. The application of nonparametric and semiparametric models to achieve small area estimation has been partially considered in the literature. Opsomer et al. [15] used a p-spline approach applied to the nonparametric estimation of the regression component in a Nested Error Regression model linear model. Polettini [16] used a Dirichlet Process mixture model to implement the construction of random effects in the Fay–Herriot model.

To further extend the Nested Error Regression model, this paper proposes an improved Nested Error Regression model in which the default normality assumption for random effects is replaced by a nonparametric specification, i.e., using the Poisson–Dirichlet Process. In Nested Error Regression models, random effects are typically assumed to be normally distributed. This assumption simplifies the theoretical and computational analysis of the model. However, the normality assumption may not hold in certain cases, thus leading to inaccurate model estimates. To address this limitation, we propose the use of a nonparametric specification to replace the default normality assumption. Nonparametric methods do not rely on specific parameters or distributional forms; instead, they learn their structure or characteristics directly from the data. The Poisson–Dirichlet Process is proposed as a prior distribution for the random effects. This approach differs from the conventional assumption that the random effects in each area follow a fixed distribution. Instead, the model allows the distribution of these random effects to change adaptively based on the data. The Poisson–Dirichlet Process is a two-parameter generalization of the Dirichlet Process, where, in addition to the concentration parameter, an additional parameter called the discount parameter is also added. Similar to the Dirichlet Process, samples from the Poisson–Dirichlet Process correspond to discrete distributions that have the same support as its underlying distribution. The underlying distribution of the Poisson–Dirichlet Process is the Poisson–Dirichlet Distribution introduced by Pitman and Yor [17]. Due to its properties, we can apply it to the prior distribution of random effects, which has an unobservable variable. There are many studies related to the Dirichlet Process and the Poisson–Dirichlet Process in Bayesian nonparametrics. For example, Al-labadi et al. [18], by applying Bayesian estimation of the Dirichlet Process, proposed a novel methodology for computing varentropy and varextropy. Handa [19] studied a two-parameter Poisson–Dirichlet Process based on point process theory. Favaro et al. [20] developed a two-parameter Poisson–Dirichlet Process model for dealing with the problem of the predictive sampling of species or kinds. Performing exact Bayesian inference on nonparametric models is always challenging, because it is difficult to derive the posterior distribution. This drives us to use the Markov chain Monte Carlo (MCMC) algorithm [21] for approximate inference.

The paper is organized as follows: Section 2 reviews the classical Nested Error Regression model and the Poisson–Dirichlet Process. Section 3 describes the proposed model. In Section 4, the methods corresponding to the estimation of each parameter in the model and the algorithm flow are given. In Section 5 and Section 6, the feasibility of the model and the corresponding parameter estimation methods are investigated by applying them to simulated and real data. The article is briefly summarized in the last section.

2. Theoretical Background

2.1. Nested Error Regression Models

Battese, Harter, and Fuller [5] published a seminal paper that contributed to the popularization of model-based small area estimation in government statistics. Model-based small area estimation requires the use of appropriate regression models that relate area target variables to appropriate auxiliary variables obtained from other surveys and records administered by government agencies. These authors presented Nested Error Regression (NER) models, as well as a unit-level model that combines satellite data and farm-level survey observations to determine the average corn and soybean acreage in twelve counties in Iowa, USA.

Suppose that the population of interest U is partitioned into mutually m independent small areas

U_{i}

, where the total size is

N = \sum_{i = 1}^{m} N_{i}

, and the sample size is

n = \sum_{i = 1}^{m} n_{i}

. The sequence

(y_{i 1}, x_{i 1}), \dots, (y_{i n_{i}}, x_{i n_{i}})

is the target variable and the corresponding covariate for the observation of the jth sample unit in the ith small area, where

i = 1, 2, \dots, m

is an indicator of the small areas. And

x_{i j} = (1, x_{i j_{1}}, \dots, x_{i j_{(p - 1)}})

is a known p-dimensional vector of covariates. Battese et al. [5] proposed the following normal NER model:

y_{i j} = x_{i j}^{'} β + ν_{i} + ε_{i j}

(1)

where

β = (β_{0}, \dots, β_{p - 1})

denotes the unknown p-dimensional vector of the regression coefficients, and

ν_{i}

and

ε_{i j}

denote the area-specific random effect and error, respectively, for the ith small area. It is assumed that the random effects

ν_{i}

and errors

ε_{i j}

follow normal distributions,

ν_{i} \sim N (0, σ_{ν}^{2})

and

ε_{i j} \sim N (0, σ_{ε}^{2})

, and that the random effects

ν_{i}

and errors

ε_{i j}

are independent of each other. We frequently seek to obtain the estimated mean

ρ = (ρ_{1}, \dots, ρ_{m})

of the target variable for all small areas through the small area model, also known as the small area mean

ρ

. Assuming that the mean of the covariate

\bar{x}

is available for each small area, the ith small area mean

ρ_{i}

is defined by the following formula:

ρ_{i} = {\bar{x}}_{i}^{'} β + ν_{i}

(2)

As shown below in the hierarchical Bayesian approach based on the model in Equation (1) for predicting

ρ_{i}

, this is accomplished by giving a prior distribution to the unknown model parameter

ψ = (β, σ_{ν}^{2}, σ_{ε}^{2})

.

The NER model can be described as follows:

Define a conditional on $ρ = (ρ_{1}, \dots, ρ_{m})$ and $ψ = (β, σ_{ν}^{2}, σ_{ε}^{2})$ , as well as estimator $Y_{i} \sim N (ρ_{i}, σ_{ν}^{2})$ , for $i = 1, 2, \dots, m$ , independently;
Define a conditional on $ψ = (β, σ_{ν}^{2}, σ_{ε}^{2})$ and small area means $ρ_{i} \sim N ({\bar{x}}_{i}^{'} β, σ_{ν}^{2})$ , for $i = 1, 2, \dots, m$ , independently;
The model parameters $ψ = (β, σ_{ν}^{2}, σ_{ε}^{2})$ are given a prior distribution with a density $π (ψ)$ .

2.2. Poisson–Dirichlet Process

The Poisson–Dirichlet Process (PDP), also known as the Pitman–Yor Process, is a two-parameter extension of the Dirichlet Process. Similar to the Dirichlet Process, the Poisson–Dirichlet Process is a distribution placed on top of a distribution. The distribution underlying the Poisson–Dirichlet Process is the Poisson–Dirichlet Distribution. Assume that there exists a pair of parameters

α

and

θ

such that

0 \leq α < 1

and

θ > - α

. Here, we name

α

as the discount parameter and

θ

as the concentration parameter. Let

V_{1}, V_{2}, \dots

be a sequence of mutually independent random variables, and allow

V_{k}

obey the following distribution:

V_{k} | α, θ \sim B e t a (1 - α, θ + k α)

; define

(p_{1}, p_{2}, \dots)

as follows:

p_{1} = V_{1}, p_{k} = V_{k} \prod_{i = 1}^{k - 1} (1 - V_{i})

(3)

Meanwhile,

(p_{1}, p_{2}, \dots)

satisfies

\sum_{k = 1}^{\infty} p_{k} = 1

. If we arrange

(p_{1}, p_{2}, \dots)

in descending order to obtain

p = ({\tilde{p}}_{1}, {\tilde{p}}_{2}, \dots)

, then p is a Poisson-Dirichlet distribution, which is denoted as

p \sim P D (α, θ)

. Having defined the Poisson–Dirichlet Distribution, we can formally define the Poisson–Dirichlet Process. Assume that the base distribution

H_{0}

is a probability distribution on the measurable space

(χ, B)

. Let

X_{1}, X_{2}, \dots

be a sequence of independently and identically distributed random variables from the base distribution

H_{0}

, and assume that

p \sim P D (α, θ)

; then, we define the random probability measure G on

(χ, B)

to be the Poisson–Dirichlet Process with parameters

α

and

θ

and base distribution

H_{0}

, which is given by the following:

G (x | α, θ, H_{0}) = \sum_{k = 1}^{\infty} p_{k} δ_{X_{k}} (x)

(4)

where

δ_{X_{k}} (x)

denotes the Dirac measure of degeneracy at point

X_{k}

:

δ_{X_{k}} (x) = \{\begin{matrix} 1, & x = X_{k} \\ 0, & o t h e r w i s e \end{matrix}

. This stick-breaking distribution G is also denoted as

G \sim P D (α, θ; H_{0})

. The parameters

α

and

θ

determine the power law properties of the Poisson–Dirichlet Process. In practical modeling, the Poisson–Dirichlet Process is more appropriate than the Dirichlet Process, because it exhibits power law properties that can be captured in natural language, and

α = 0

corresponds to the Dirichlet Process.

3. Small Area Model with PDP Random Effects

As in the NER model, in the proposed model, we assume that the observations

y_{i} = (y_{i 1}, y_{i 2}, \dots y_{i n_{i}}), i = 1, 2, \dots, m

are a set of data associated with independent and identically distributed random effects

ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}}), i = 1, 2, \dots, m

. We consider replacing the normal random effects prior distribution of the NER model with a Bayesian nonparametric prior; namely, we assume that

ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}}), i = 1, 2, \dots, m

is independently and identically distributed, thus obeying an unknown probability measure

H_{i}, i = 1, 2, \dots, m

. At this moment, the distribution of the random effects

H_{i}, i = 1, 2, \dots, m

, as an unknown quantity, can be given a Bayesian nonparametric prior. In this section, by assuming

0 \leq α < 1

and

θ > - α

, we introduce the Poisson–Dirichlet Process with parameters

α

and

θ

and a base distribution

H_{0}

as a prior to assign the distributions

H_{i}, i = 1, 2, \dots, m

of the random effects to the model, and we obtain a unit-level small area model with PDP random effects:

y_{i j} = x_{i j}^{'} β + ν_{i j} + ε_{i j}, j = 1, 2, \dots, n_{i}, i = 1, 2, \dots, m

(5)

where

x_{i j}

is a known auxiliary variable in the p-dimension associated with the observation

y_{i j}

,

β

is an unknown vector of regression coefficients in the p-dimension,

ε_{i j}

is a random variable with a mean of 0 and a variance of

V a r (ε_{i j}) = σ_{ε}^{2}

, and

ν_{i j}

is a random effect with a magnitude reflecting the differences between units in different areas;

ε_{i j}

and

ν_{i j}

are mutual and independent, and they can be expressed as follows:

ε_{i j} \sim N (0, σ_{ε}^{2}) j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(6)

ν_{i 1}, \dots, ν_{i n_{i}} | H_{i} \overset{i i d}{\sim} H_{i} i = 1, 2, \dots, m

(7)

H_{i} \sim P D (α, θ; H_{0})

(8)

where

H_{0}

is the base distribution,

α

is the discount parameter, and

θ

is the concentration parameter. Parameters

α

and

θ

determine not only the power law properties of the Poisson–Dirichlet Process, but also the probability that a new random effect

ν_{i j}

will be sampled if

ν_{i 1}, \dots, ν_{i j - 1}

already exists and given

H_{i}

. In the proposed model, when a new random effect

ν_{i j}

is drawn, it either comes from one of the previous classes of

ν_{i 1}, \dots, ν_{i j - 1}

or a new one from

H_{i}

. If

ν_{i j}

comes from one of the previous classes of

ν_{i 1}, \dots, ν_{i j - 1}

, its probability is positively related to the number of data points contained in this class. The larger the value of the parameter

θ

, the greater the probability that

ν_{i j}

will be drawn from

H_{i}

in a new class. For the random effects variable

ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}})

, let

ν_{i 1}^{*}, \dots, ν_{i K_{i}}^{*}

be the elements of

ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}})

that are not identical to each other, let

K_{i}

be the number of classes in

ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}})

, and let

m_{i k}

be the number of elements in the kth class; then the sampling probability of

ν_{i j}

is as follows:

P r (ν_{i j} = ν_{i k}^{*} | ν_{i 1}, \dots, ν_{i j - 1}) = \frac{m_{i k} - α}{j - 1 + θ}, k = 1, 2, \dots, K_{i}

(9)

P r (ν_{i j} \notin \{ν_{i 1}^{*}, \dots, ν_{i K_{i}}^{*}\} | ν_{i 1}, \dots, ν_{i j - 1}) = \frac{K_{i} α + θ}{j - 1 + θ}

(10)

The hierarchical structure of our model can be represented as follows.

The NER model with PDP random effects:

Define a conditional on $ν = (ν_{1}, \dots, ν_{m})$ , $ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}})$ , $β$ , and $σ_{ε}^{2}$ ; estimator $y_{i j}$ is given by $y_{i j} \sim N (x_{i j}^{'} β + ν_{i j}, σ_{ε}^{2})$ , for $j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m$ , independently;
Define a conditional on $α$ , $θ$ , and $H_{0}$ ; the random effects $ν$ are given by $ν_{i 1}, \dots, ν_{i n_{i}} | H_{i} \overset{i i d}{\sim} H_{i}$ and $H_{i} \sim P D (α, θ; H_{0})$ , for $j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m$ , independently.

4. Estimation

In small area estimation, we aim to obtain good estimators of the small area mean

ρ_{i}

. Based on the proposed model, we mainly consider the estimation of the model parameters

β

,

σ_{ε}^{2}

,

α

,

θ

, and

H_{0}

, and the conditional mean

\tilde{ρ_{i}} (β, σ_{ε}^{2}, α, θ, H_{0}, y_{i j})

is given the data

y_{11}, \dots, y_{m n_{m}}

.

In the small area model with PDP random effects, for the ith small area, the random effect

ν_{i 1}, \dots, ν_{i n_{i}}

that comes from

ν_{i 1}, \dots, ν_{i n_{i}} | H_{i} \sim H_{i}

,

ν_{i 1}, \dots, ν_{i n_{i}}

can be divided into mutually exclusive classes

ν_{i 1}^{*}, \dots, ν_{i K_{i}}^{*}

, and we define

m_{i} = (m_{i 1}, \dots, m_{i K_{i}})

, where

m_{i j}

denotes the number of contained elements in the jth class of the ith small area, and

K_{i}

denotes the number of mutually exclusive classes in

ν_{i 1}, \dots, ν_{i n_{i}}

. Assume that we assign data points

y_{i 1}, \dots, y_{i n_{i}}

according to the classes of

ν_{i 1}, \dots, ν_{i n_{i}}

to obtain

K_{i}

groups:

(y_{i 1}^{1}, \dots, y_{i m_{1}}^{1}), \dots, (y_{i 1}^{K_{i}}, \dots, y_{i m_{K_{i}}}^{K_{i}})

. The likelihood function of the model parameters

β

,

σ_{ε}^{2}

,

α

,

θ

, and

H_{0}

is

\begin{matrix} L (β, σ_{ε}^{2}, α, θ, H_{0}) & = E_{ν} (\prod_{k = 1}^{n_{j}} f (y_{i j} | ν_{i j})) = E_{ν} (\prod_{j = 1}^{m_{i 1}} f (y_{i j}^{1} | ν_{i 1}^{*}) \dots \prod_{j = 1}^{m_{i K_{i}}} f (y_{i j}^{K_{i}} | ν_{i K_{i}}^{*})) \\ = E_{K, m} (\int \prod_{j = 1}^{m_{i 1}} f (y_{i j}^{1} | u) g_{0} (u) d u \dots \int \prod_{j = 1}^{m_{i K_{i}}} f (y_{i j}^{K_{i}} | u) g_{0} (u) d u) \end{matrix}

(11)

where

E_{x}

denotes the marginal expectation about x, f denotes the probability density function of normality, and

h_{0}

denotes the density function of the base distribution

H_{0}

. This likelihood function is too complex to be maximized or numerical, so we consider applying empirical Bayesian estimation to solve the estimation of the model parameters.

In this section, we consider the application of empirical Bayesian nonparametric methods to study the estimation of the regression coefficients

β

and the error variance

σ_{ε}^{2}

for the small area model with PDP random effects, as well as the estimation of the discounting parameter

α

, the concentration parameter

θ

, and the base distribution

H_{0}

for the Poisson–Dirichlet Process when given the known data

y_{11}, \dots y_{1 n_{1}}, \dots, y_{m 1}, \dots, y_{m n_{m}}

. We explain the algorithms used to derive these estimates in detail.

4.1. Proposed Approach

4.1.1. Estimation of Regression Coefficients and Error Variance

The first consideration we make is the estimation of the regression coefficients

β

and the error variance

σ_{ε}^{2}

in the model. Assuming that the random effects

ν_{i j}

are known, by rewriting Equation (5) and defining

y_{i j}^{*}

in the model, we obtain the following:

y_{i j}^{*} = y_{i j} - ν_{i j} = x_{i j}^{'} β + ε_{i j} j = 1, 2, \dots, n_{i}, i = 1, 2, \dots, m

(12)

We consider a matrix representation of the above equation such that

Y^{*} = {(Y_{1}^{*}, \dots, Y_{m}^{*})}^{'}

, and

X = {(X_{1}, \dots, X_{m})}^{'}

is a matrix with column rank

r (X)

and error

ε = {(ε_{1}, \dots, ε_{m})}^{'}

. In this case,

Y_{i}^{*} = {(y_{i 1}^{*}, \dots, y_{i n_{i}}^{*})}^{'}

,

X_{i} = {(x_{i 1}, \dots, x_{i n_{i}})}^{'}

, and

ε_{i} = {(ε_{i 1}, \dots, ε_{i n_{i}})}^{'}

; thus, we again obtain the following:

Y^{*} = X β + ε

(13)

ε \sim N (0, σ_{ε}^{2})

(14)

For the above linear regression model, we consider the idea of using the classical algorithm of parameter estimation for solving the regression model, that is, the least squares estimation algorithm, to obtain the estimates of the regression coefficients

β

and the error variance

σ_{ε}^{2}

.

For the regression coefficients

β

, the objective of least squares estimation is to find an estimate of the regression coefficients

\hat{β}

that minimizes the sum of the squared residuals of all sample observations

Q (\hat{β}) = \sum {(Y_{i}^{*} - {\hat{Y}}_{i}^{*})}^{2}

. The sum of the squared residuals is easily obtained by derivation:

Q (\hat{β}) = Y^{*'} Y^{*} - 2 {\hat{β}}^{'} X^{'} Y^{*} + {\hat{β}}^{'} X^{'} X \hat{β}

(15)

By minimizing the sum of the squared residuals

Q (\hat{β})

and assuming that

{(X^{'} X)}^{- 1}

exists, an estimate of the regression coefficient

β

can be obtained as follows.

\hat{β} = {(X^{'} X)}^{- 1} X^{'} Y^{*}

(16)

For the estimation of the error variance

σ_{ε}^{2}

, we first consider that the error vector

ε = Y^{*} - X β

is an unobservable vector. Suppose we replace

β

with the least squares estimate

\hat{β}

of

β

, thus defining the residual vector

\hat{ε} = Y^{*} - X \hat{β}

. It is natural to consider using the residual sum of squares

R S S = {\hat{ε}}^{'} \hat{ε}

as a measure of the magnitude of

σ_{ε}^{2}

. This can be obtained by substituting Equation (16) and

ε = Y^{*} - X β

into the residual sum of squares

R S S = {\hat{ε}}^{'} \hat{ε}

:

R S S = (Y^{*'} Y^{*} - Y^{*'} X {(X^{'} X)}^{- 1} X^{'} Y^{*})

(17)

Furthermore, we compute the expectation

E (R S S) = n - r (X)

of the

R S S

; then, we can obtain an unbiased estimate of the error variance

σ_{ε}^{2}

:

{\hat{σ}}_{ε}^{2} = \frac{1}{n - r (X)} (Y^{*'} Y^{*} - Y^{*'} X {(X^{'} X)}^{- 1} X^{'} Y^{*})

(18)

4.1.2. Estimation of the Base Distribution and Two Parameters of the Poisson–Dirichlet Process

We first discuss the estimation of the base distribution

H_{0}

of the Poisson–Dirichlet Process for random effects. Yang and Wu [22] proposed the application of the multivariate kernel density method to estimate the base distribution

H_{0}

under the Dirichlet Process prior; Qiu, Yuan, and Zhou [23] considered applying the multivariate kernel density method to estimate the base distribution

H_{0}

of the Poisson–Dirichlet Process under a multigroup data structure. We can also apply the multivariate kernel density method to realize the estimation of the base distribution of our model.

Assuming that the random effect

ν_{i j}

is known, we can equivalently obtain

ν_{i j}^{*} \sim H_{0}

,

K_{1}, \dots, K_{m}

, and

m_{i j}

. The density function

h_{0}

of the base distribution

H_{0}

can then be estimated using the following equation:

{\tilde{h}}_{0} (\cdot) = \frac{1}{\sum K_{i}} \sum_{i = 1}^{m} \sum_{j = 1}^{K_{i}} ω_{t} (\cdot, ν_{i j}^{*})

(19)

where

ω_{t} (x, ν_{i j}^{*}) = \frac{1}{t} ω (\frac{x - ν_{i j}^{*}}{t})

is some kernel function with bandwidth

t > 0

. We choose the kernel function as a Gaussian kernel function to realize the estimation.

Then, we discuss the estimation of the two parameters

α

and

θ

of the Poisson–Dirichlet Process for random effects. Similar to Carlton [24], who studied the estimation of the parameters of the Poisson–Dirichlet Process for a single set of data, we apply maximum likelihood method to estimate the parameters

α

and

θ

.

For each small area

i = 1, 2, \dots, m

, define

A_{j}^{i} ⩾ 0

to denote the number of categories containing j individuals in the ith small area, where

\sum_{j = 1}^{n_{i}} A_{j}^{i} = K_{i}

, and

\sum_{j = 1}^{n_{i}} j A_{j}^{i} = n_{i}

. Denote

A^{i} = (A_{1}^{i}, A_{2}^{i}, \dots, A_{n_{i}}^{i}), i = 1, 2, \dots, m

. Then, the log likelihood functions for parameters

α

and

θ

are given below:

\begin{matrix} l (α, θ) & = \sum_{i = 1}^{m} log P r (A^{i} = a^{i}) = \sum_{i = 1}^{m} log N (a^{i}) - \sum_{i = 1}^{m} \sum_{l = 1}^{n_{i} - 1} log (θ + l) \\ + \sum_{i = 1}^{m} \sum_{l = 1}^{K_{i} - 1} log (θ + l α) + \sum_{i = 1}^{m} \sum_{j = 2}^{n_{i} - 1} a_{j}^{i} \sum_{l = 1}^{j - 1} log (l - α) \end{matrix}

(20)

where

A^{i} = a^{i}

is the given observed data, and

N (a^{i}) = \frac{n_{i}!}{\prod_{j = 1}^{n_{i}} {(j!)}^{a_{j}^{i}} (a_{j}^{i})!}

. The maximum likelihood estimates of the parameters

α

and

θ

are obtained by solving the equations:

l_{α} (α, θ) = \frac{\partial l (α, θ)}{\partial α} = \sum_{i = 1}^{m} \sum_{l = 1}^{K_{i} - 1} \frac{l}{θ + l α} - \sum_{i = 1}^{m} \sum_{j = 2}^{n_{i} - 1} a_{j}^{i} \sum_{l = 1}^{j - 1} \frac{1}{l - α} = 0

(21)

l_{θ} (α, θ) = \frac{\partial l (α, θ)}{\partial θ} = - \sum_{i = 1}^{m} \sum_{j = 2}^{n_{i} - 1} \frac{1}{θ + l} + \sum_{i = 1}^{m} \sum_{l = 1}^{K_{i} - 1} \frac{1}{θ + l α} = 0

(22)

Here, we use the numerical method of the Newton–Rapson iteration to obtain the above maximum likelihood estimates

\tilde{α}

and

\tilde{θ}

.

4.2. Algorithms

The random effects

ν_{i j}

are unobservable hidden variables, and we construct the following pseudoestimates:

{\tilde{\tilde{h}}}_{0} (\cdot; h_{0}, β, σ_{ε}^{2}, α, θ) = E ({\tilde{h}}_{0} (\cdot, ν) | Y) = E (\frac{1}{\sum K_{i}} \sum_{i = 1}^{m} \sum_{j = 1}^{K_{i}} ω_{t} (\cdot, ν_{i j}^{*}) | Y)

(23)

(\tilde{\tilde{β}}, {\tilde{\tilde{σ}}}_{ε}^{2}, \tilde{\tilde{α}}, \tilde{\tilde{θ}}) (h_{0}, β, σ_{ε}^{2}, α, θ) = E (\tilde{β} (ν), {\tilde{σ}}_{ε}^{2} (ν), \tilde{α} (ν), \tilde{θ} (ν) | Y)

(24)

Given the observed data Y, we can introduce an algorithm that is computed according to the following iterative formula given some initial values of the parameters

h_{0}

,

β

,

σ_{ε}^{2}

,

α

, and

θ

:

{\hat{h}}_{0}^{(r + 1)} (\cdot) = E_{({\hat{h}}_{0}^{(r)}, {\hat{β}}^{(r)}, {\hat{σ}}_{ε}^{2}^{(r)}, {\hat{α}}^{(r)}, {\hat{θ}}^{(r)})} ({\tilde{h}}_{0} (\cdot, ν) | Y)

(25)

({\hat{β}}^{(r + 1)}, {\hat{σ}}_{ε}^{2}^{(r + 1)}, {\hat{α}}^{(r + 1)}, {\hat{θ}}^{(r + 1)}) = E_{({\hat{h}}_{0}^{(r)}, {\hat{β}}^{(r)}, {\hat{σ}}_{ε}^{2}^{(r)}, {\hat{α}}^{(r)}, {\hat{θ}}^{(r)})} (\tilde{β} (ν), {\tilde{σ}}_{ε}^{2} (ν), \tilde{α} (ν), \tilde{θ} (ν) | Y)

(26)

where

E_{({\hat{h}}_{0}^{(r)}, {\hat{β}}^{(r)}, {\hat{σ}}_{ε}^{2}^{(r)}, {\hat{α}}^{(r)}, {\hat{θ}}^{(r)})}

represents the posterior expectation

E ({\tilde{h}}_{0} (\cdot, ν) | Y)

, and

E (\tilde{β} (ν), {\tilde{σ}}_{ε}^{2} (ν), \tilde{α} (ν), \tilde{θ} (ν) | Y)

is calculated by applying

({\hat{h}}_{0}^{(r)}, {\hat{β}}^{(r)}, {\hat{σ}}_{ε}^{2}^{(r)}, {\hat{α}}^{(r)}, {\hat{θ}}^{(r)})

to replace the unknown parameter

(h_{0}, β, σ_{ε}^{2}, α, θ)

when estimating the parameter in the

r + 1

th iteration. Then, we can obtain the parameter estimates:

({\hat{h}}_{0}, \hat{β}, {\hat{σ}}_{ε}^{2}, \hat{α}, \hat{θ}) = lim_{r \to \infty} ({\hat{h}}_{0}^{(r)}, {\hat{β}}^{(r)}, {\hat{σ}}_{ε}^{2}^{(r)}, {\hat{α}}^{(r)}, {\hat{θ}}^{(r)})

(27)

However, it is not an easy task to compute the above posterior expectation expression during the iterative process, and we must consider applying the MCMC algorithm to seek its numerical solution. In the following, we will discuss the process of computational estimation of the model, which is divided into three stages, namely the selection of the initial values, the full conditional distribution of the MCMC algorithm, and the sampling and estimation.

4.2.1. Selection of Initial Values

During each iteration, we need to give the initial values of the parameters to achieve the corresponding parameter estimation.

We first consider the initial value of the base distribution

{\hat{h}}_{0}^{(0)}

, assuming that f is a normal density function, and we obtain the maximum likelihood estimate

{\hat{ν}}_{i j} = Y_{i j}

by solving the equation

f (Y_{i j} | {\hat{ν}}_{i j}) = {max}_{u} f (Y_{i j} | u)

. This results in a kernel estimate of

h_{0}

as the initial value of the base distribution

{\hat{h}}_{0}^{(0)}

:

{\hat{h}}_{0}^{(0)} (\cdot) = \frac{1}{n} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} ω_{t} (\cdot, ν_{i j})

(28)

For the selection of the two parameters

α

and

θ

, in the absence of information, we can choose

{\hat{α}}_{0}^{(0)}

and

{\hat{θ}}_{0}^{(0)}

to be some random values that satisfy the requirements

0 \leq α < 1

and

θ > - α

. Given

{\hat{h}}_{0}^{(0)}

,

{\hat{α}}_{0}^{(0)}

and

{\hat{θ}}_{0}^{(0)}

, we obtain the hidden variable

ν_{i j}

by extracting it from

P D ({\hat{α}}_{0}^{(0)}, {\hat{θ}}_{0}^{(0)}; {\hat{h}}_{0}^{(0)})

so that we can obtain the initial values

β_{0}^{(0)}

and

{σ_{ε}^{2}}_{0}^{(0)}

of the regression coefficient

β

and the error variance

σ_{ε}^{2}

from the least squares estimation.

4.2.2. Full Conditional Distributions of the MCMC Algorithm

Given the observations

Y_{i j}

and

ν_{i j}

, notate

ν_{i}^{- j} = (ν_{i 1}, ν_{i 2}, \dots, ν_{i j - 1}, ν_{i j + 1}, . ., ν_{i n_{i}})

to denote the residual vector removing

ν_{i j}

from

ν_{i} = (ν_{i 1}, \dots, ν_{i n_{i}})

,

K_{i}^{- j}

to denote the number of mutually exclusive elements in

ν_{i}^{- j}

, and

m_{i k}

to denote the number of elements taking the value

ν_{i t}^{*}

in

ν_{i}^{- j}

. In order to apply the MCMC algorithm to solve the posterior expectation, we need to discuss the full conditional distribution of the MCMC algorithm.

Theorem 1.

For each

j = 1, 2, \dots, n_{i}

, given

ν_{i}^{- j}

and

Y_{i j}

, the conditional distribution of

ν_{i j}

is

(ν_{i j} | ν_{i}^{- j}, Y_{i j}, σ_{ε}^{2}, β, α, θ, h_{0}) \sim q_{0} H_{ν} + \sum_{k = 1}^{K_{i}^{- j}} q_{k} δ (ν_{i k}^{*}, ν_{i j})

(29)

where

q_{0} \propto (θ + α K_{i}^{- j}) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β)

, and

q_{k} \propto (m_{i k} - α) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β)

, thus satisfying the condition

q_{0} + \sum_{k = 1}^{K_{i}^{- j}} q_{k} = 1

;

H_{ν}

denotes the posterior distribution of

ν_{i j}

given observation

Y_{i j}

.

Proof of Theorem 1.

The posterior distribution of the Poisson–Dirichlet Process is known to be

P r (ν_{i j} \in \cdot | ν_{i}^{- j}, α, θ, h_{0}) = \frac{θ + α K_{i}^{- j}}{θ + n_{i} - 1} h_{0} (\cdot) + \frac{1}{θ + n_{i} - 1} \sum_{k = 1}^{K_{i}^{- j}} (m_{i k} - α) δ (ν_{i k}^{*}, \cdot)

(30)

The conditional distribution of

ν_{i j}

is obtained as given in

(ν_{i}^{- j}, Y_{i j})

:

\begin{matrix} d H (ν_{i j} | ν_{i}^{- j}, Y_{i j}, σ_{ε}^{2}, β, α, θ, h_{0}) \\ = \frac{d F (ν_{i j}, ν_{i}^{- j}, Y_{i j} | σ_{ε}^{2}, β, α, θ, h_{0})}{\int d F (ν_{i j}, ν_{i}^{- j}, Y_{i j} | σ_{ε}^{2}, β, α, θ, h_{0}) d ν_{i j}} \\ = \frac{f (Y_{i j} | ν_{i j}, ν_{i}^{- j}, σ_{ε}^{2}, β) d F (ν_{i j}, ν_{i}^{- j} | α, θ, h_{0})}{\int f (Y_{i j} | ν_{i j}, ν_{i}^{- j}, σ_{ε}^{2}, β) d F (ν_{i j}, ν_{i}^{- j} | α, θ, h_{0})} \\ = \frac{f (Y_{i j} | ν_{i j}, ν_{i}^{- j}, σ_{ε}^{2}, β) d F (ν_{i j} | ν_{i}^{- j}, α, θ, h_{0}) f (ν_{i}^{- j})}{\int f (Y_{i j} | ν_{i j}, ν_{i}^{- j}, σ_{ε}^{2}, β) d F (ν_{i j} | ν_{i}^{- j}, α, θ, h_{0}) f (ν_{i}^{- j})} \\ = \frac{f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) d F (ν_{i j} | ν_{i}^{- j}, α, θ, h_{0})}{\int f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) d F (ν_{i j} | ν_{i}^{- j}, α, θ, h_{0})} \\ = \frac{(θ + α K_{i}^{- j}) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) h_{0} (ν_{i j}) d ν_{i j} + \sum_{k = 1}^{K_{i}^{- j}} (m_{i k} - α) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) δ (ν_{i k}^{*}, ν_{i j})}{(θ + α K_{i}^{- j}) \int f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) h_{0} (d ν_{i j}) + \sum_{k = 1}^{K_{i}^{- j}} (m_{i k} - α) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β)} \end{matrix}

(31)

We let

\land (Y_{i j}) = \frac{1}{(θ + α K_{i}^{- j}) \int f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) h_{0} (d ν_{i j}) + \sum_{k = 1}^{K_{i}^{- j}} (m_{i k} - α) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β)}

(32)

Then, we obtain

\begin{matrix} d H (ν_{i j} | ν_{i}^{- j}, Y_{i j}) = \land (Y_{i j}) (θ + α K_{i}^{- j}) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) h_{0} (ν_{i j}) d ν_{i j} \\ + \land (Y_{i j}) \sum_{k = 1}^{K_{i}^{- j}} (m_{i k} - α) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β) δ (ν_{i k}^{*}, ν_{i j}) \end{matrix}

(33)

Let

q_{0} = \land (Y_{i j}) (θ + α K_{i}^{- j}) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β)

, and let

q_{k} = \land (Y_{i j}) (m_{i k} - α) f (Y_{i j} | ν_{i j}, σ_{ε}^{2}, β)

, thus satisfying the condition

q_{0} + \sum_{k = 1}^{K_{i}^{- j}} q_{k} = 1

and making

H_{ν} = h_{0} (ν_{i j}) d ν_{i j}

; thus, the conditional distribution of

ν_{i j}

is obtained as follows:

(ν_{i j} | ν_{i}^{- j}, Y_{i j}, σ_{ε}^{2}, β, α, θ, h_{0}) \sim q_{0} H_{ν} + \sum_{k = 1}^{K_{i}^{- j}} q_{k} δ (ν_{i k}^{*}, ν_{i j})

(34)

□

4.2.3. Sampling and Estimation

Given the initial values of the parameters and given the full conditional distribution of the MCMC algorithm, we present the sampling stage of the iterative computation. For all

r = 0, 1, \dots

, with the parameter estimates

({\hat{α}}^{(r)}, {\hat{θ}}^{(r)}, {\hat{h}}_{0}^{(r)}, β^{(r)}, {σ_{ε}^{2}}^{(r)})

already obtained for the rth iteration, we consider sampling from the posterior distribution of the hidden variable

ν_{i j}

. The sampling phase is divided into two steps: the first step is Gibbs sampling based on the full conditional distribution of the MCMC algorithm, and the second step is to consider the use of an accelerated step, that is, to consider the introduction of an auxiliary parameter to sample

ν_{i j}^{*}

at the end of each iteration. This is because if the sampling is done directly based on the full conditional distribution of the MCMC algorithm, the problem may arise that f and

h_{0}

are not conjugate, or the MCMC chain is slow to converge when

\sum_{k = 1}^{K_{i}^{- j}} q_{k}

is relatively large with respect to

q_{0}

. And, the method of introducing auxiliary parameters can be applied to the two types of problems mentioned above.

To start, we draw

ν_{i j}^{b}

from

P D ({\hat{α}}^{(r)}, {\hat{θ}}^{(r)}; {\hat{h}}_{0}^{(r)})

, for

b = 0, 1, 2, \dots, B

, where B stands for the number of repetitions within an iteration and is a sufficiently large number. We introduce an auxiliary variable two-step update

ν_{i j}^{b}

based on Equation (34), thus referring to the method in the article by Neal [25].

Define the auxiliary variable

c (ν_{i}^{b}) = (c_{1}, c_{2}, \dots, c_{n_{i}})

to denote which category

ν_{i j}^{b}

belongs to in

ν_{i}^{* b} = (ν_{i 1}^{* b}, ν_{i 2}^{* b}, \dots, ν_{i K_{i}}^{* b}, 0, \dots, 0)

, and use

m^{b} = (m_{i 1}, m_{i 2}, \dots, m_{i K_{i}}, 0, \dots, 0)

to denote the number of occurrences of

ν_{i j}^{b}

in

ν_{i}^{b}

. Assuming that

ν_{i}^{b}

,

c (ν_{i}^{b})

,

ν_{i}^{* b}

, and

m^{b}

have been obtained, first we consider updating the auxiliary variables first.

For the update of

c_{i j}, j = 1, 2, \dots, n_{i}

, suppose that

m_{i c_{j}} = m_{i c_{j}} - 1

if

c_{i j}

is removed; at this point,

K_{i}^{- j} = K_{i} - 1

if

m_{i c_{j}} = 0

. Otherwise, the value of

K_{i}^{- j}

is the same as that of

K_{i}

. Then, the new

c_{i j}

is drawn again, and the distribution of the new

c_{i j}

is taken as follows:

\begin{matrix} q_{c} & = P r (c_{i j} = c | Y_{i j}, c_{i}^{- j}, ν_{i 1}^{*}, \dots, ν_{i K_{i}^{- j}}^{*}) \\ = \{\begin{matrix} w (m_{i c} - {\hat{α}}^{(r)}) f (Y_{i j} | ν_{i c}^{*}, {\hat{σ}}_{ε}^{2 (r)}, {\hat{β}}^{(r)}), & 1 ⩽ c ⩽ n_{i}, m_{i c} \neq 0 \\ w ({\hat{θ}}^{(r)} + {\hat{α}}^{(r)} K_{i}^{- j}) f (Y_{i j} | ν_{i c}^{*}, {\hat{σ}}_{ε}^{2 (r)}, {\hat{β}}^{(r)}), & 1 ⩽ c ⩽ n_{i}, m_{i c} = 0 \end{matrix} \end{matrix}

(35)

where w is a normalization parameter designed to satisfy the condition

\sum_{j = 0}^{n_{i}} q_{j} = 1

.

Through the sampling stage, we can obtain

ν_{i}^{b}, b = 1, 2, \dots, B

. Next, we can estimate the parameters based on the obtained hidden variables

ν_{i}^{b}

. We obtain the estimates of each parameter

({\tilde{h}}_{0} (ν_{i}^{b}), {\tilde{σ}}_{ε}^{2} (ν_{i}^{b}), \tilde{β} (ν_{i}^{b}), \tilde{α} (ν_{i}^{b}), \tilde{θ} (ν_{i}^{b}))

according to the algorithm mentioned earlier; then, we can obtain the

r + 1

th iteration estimates:

({\hat{h}}_{0}^{(r + 1)} (\cdot), {\hat{σ}}_{ε}^{2 (r + 1)}, {\hat{β}}^{(r + 1)}, {\hat{α}}^{(r + 1)}, {\hat{θ}}^{(r + 1)}) = \frac{1}{B} \sum_{b = 1}^{B} ({\tilde{h}}_{0} (ν_{i}^{b}), {\tilde{σ}}_{ε}^{2} (ν_{i}^{b}), \tilde{β} (ν_{i}^{b}), \tilde{α} (ν_{i}^{b}), \tilde{θ} (ν_{i}^{b}))

(36)

5. Simulation

This section provides the simulation results to study the estimation performance of the proposed parameters under a small area model with PDP randon effects.

5.1. Model Setup and Simulation Conditions

We first design a finite population containing

m = 20

small areas, and we take a certain number of samples in each small area. For convenience, we set the sample capacity of each small area to

n_{i} = 30

.

Then, we provide the following model

y_{i j} = β_{0} + x_{i j_{1}} β_{1} + ν_{i j} + ε_{i j}, j = 1, 2, \dots, n_{i}, i = 1, 2, \dots, m

(37)

ν_{i 1}, \dots, ν_{i n_{i}} | H_{i} \overset{i i d}{\sim} H_{i} i = 1, 2, \dots, m

(38)

H_{i} \sim P D (α, θ; H_{0})

(39)

ε_{i j} \sim N (0, σ_{ε}^{2}) j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(40)

The basic simulation assumptions are as follows:

Five choices of parameters $(α, θ)$ are: $(0.5, 10)$ , $(0.3, 5)$ , $(0.9, 3)$ , $(0.4, 7)$ , or $(0.5, 2)$ , and the base distribution is set to be $h_{0} \sim N (0, 1)$ , $h_{0} \sim t (5)$ , or $h_{0} \sim t (2)$ ;
The error $ε_{i j}$ comes from a normal distribution $N (0, 0 . 1^{2})$ ;
The true value of the regression coefficient is set to $β = {(β_{0}, β_{1})}^{'} = {(1, 2)}^{'}$ ;
The initial parameter values $({\hat{α}}^{(0)}, {\hat{θ}}^{(0)})$ are set to be $(0.2, 5)$ , $(0.2, 2)$ , $(0.5, 1)$ , $(0.1, 5)$ , or $(0.1, 1)$ when $(α, θ)$ is $(0.5, 10)$ , $(0.3, 5)$ , $(0.9, 3)$ , $(0.4, 7)$ , or $(0.5, 2)$ ;
The initial random effects are set to ${\hat{ν}}_{i j}^{(0)} = y_{i j}$ ;
The number of iterations for the MCMC algorithm is set to $R = 200$ .

5.2. Simulation Results and Analysis

The simulation results are given in detail by the following figures and tables. Table 1 shows the simulation results of the estimation of the corresponding parameters for the different cases of varying the values of parameters

α

and

θ

versus transforming the base distribution under the PDP prior for random effects in the proposed model. The first column in Table 1 shows all the parameters that were estimated, the second column shows the settings of the base distribution for the different cases, the third column shows the true values of the corresponding parameters, the fourth and fifth columns show the bias and the MSE, and the sixth column shows the 95 percent confidence domain. Figure 1 shows the density estimation curves for the parameters

α

and

θ

under different simulation data.

In Table 1, we calculated the bias, MSE, and confidence intervals of

\hat{α}

and

\hat{θ}

for different types of base distributions. We assumed that the base distribution follows the normal distribution

N (0, 1)

and the t distribution, respectively. Here, we chose t distributions with five and two degrees of freedom as the base distributions for estimation. The results of these estimators obtained under two different types of base distribution conditions were similar. The biases of

\hat{α}

were inside the interval

(- 0.073, 0.091)

, and the biases of

\hat{θ}

were inside the interval

(- 0.901, 0.486)

. The MSEs of

\hat{α}

were less than 0.04, although the MSEs of

\hat{α}

were large, and most of the estimated confidence intervals failed to capture the true values of the parameters. Meanwhile, we considered the bias, MSE, and confidence intervals of

\hat{α}

and

\hat{θ}

with different true values for

α

and

θ

under the condition where the base distribution follows a normal distribution or a t distribution. From Table 1, we can see that when the base distribution followed the normal distribution and

(α, θ) = (0.5, 10)

, the bias and MSE of

\hat{θ}

were larger than those of other cases. The bias and MSE of

\hat{θ}

were the smallest when the base distribution followed

t (2)

and

(α, θ) = (0.5, 2)

. The density curve of

\hat{α}

and

\hat{θ}

in Figure 1 also agrees with the estimated results of Table 1.

Table 1 lists the estimation results of the regression coefficients given different values of the parameters

α

and

θ

and different base distribution conditions. The biases and MSEs of

β_{0}

and

β_{1}

were maintained at very low levels, which reflect the accuracy and reliability of the estimates. Figure 2 shows the density estimation curve of

β_{0}

and

β_{1}

when

(α, θ) = (0.5, 10)

based on

N (0, 1)

, which coincides with the estimation results in Table 1.

Table 2 presents the results of a comparison between the estimates of the regression coefficients

β_{0}

and

β_{1}

of the proposed model and those of the NER model, as derived from five sets of simulations. Table 2 demonstrates that the estimates of

β_{0}

and

β_{1}

obtained from the estimation of both the proposed model and the NER model were highly close to the true values. However, for the simulation where the base distribution followed the normal distribution, the NER model estimation was slightly more accurate than the proposed model. Conversely, when the base distribution followed the t distribution, the proposed model estimation was more precise than the NER model estimation.

The Total Variation Distance (TVD) is a statistical distance measure between probability distributions, which represents the distance between the true and estimated distribution of the base distribution

h_{0}

and thus serves as a basis for evaluating the estimation performance. Figure 3 shows the Total Variation Distance between the two distributions for each iteration under these five scenarios. As can be seen from Figure 3, the gap between the estimated distribution and the true distribution was small, and the total variance distance was less than 0.3 in each iteration.

Table 3 shows the simulation results for the estimates of all small area means

ρ_{i}

when the situation was that

(α, θ) = (0.5, 10)

and the base distribution is

N (0, 1)

. The first column in Table 3 shows the mean of

ρ_{i}

, and the second column is the estimated value of the small area mean. As can be seen from Table 3, the estimates about the small area means were more accurate, and the estimates were close to the means without large deviations.

In order to more directly reflect the reliability of the estimation of the small area means, we introduced squared residuals to measure the degree of matching between the estimated values of the small area means and the true values, and we give five graphs of the squared residuals of the small area means under these five scenarios. As shown in Figure 4, the squared residuals for all small area means were small, with most of the squared residuals centered between 0 and 0.3.

5.3. Simulated Normal Data

In this section, we demonstrated how we implemented the estimation by using simulated data with the aim of testing the strengths and weaknesses of the model. Similar to the simulations in the previous section and for ease of computation, we assumed that there are

m = 20

small areas, and the sample size for each small area was set to

n_{i} = 30

. And, the simulation data

y_{i j}

were derived from the following model structure:

y_{i j} = β_{0} + x_{i j_{1}} β_{1} + ν_{i j} + ε_{i j}, j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(41)

ν_{i j} \sim N (0, σ_{ν}^{2}) j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(42)

ε_{i j} \sim N (0, σ_{ε}^{2}) j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(43)

where the random effects

ν_{i j}

comes from a normal distribution

N (0, 4)

, and the error

ε_{i j}

comes from a normal distribution

N (0, 0 . 1^{2})

. The true value of the regression coefficient was set to

β = {(β_{0}, β_{1})}^{'} = {(1, 2)}^{'}

.

Then, we used the proposed model with PDP random effects to estimate and obtain the corresponding estimates. Table 4 shows the results of parameter estimation. The estimates of the fixed effects in Table 4 were very close to the true values, and through Table 4, we can also observe the parameters

α

and

θ

of the PDP prior for the random effects of the model. Based on the estimates of the parameters

α

and

θ

and the estimated distribution of the base distribution of the random effects, we further obtained the estimates of the means of all the small areas through the model, which are presented through Table 5. Table 5 shows that the estimation of the small area mean was more reliable, and the estimates were close to the means without major deviations. Figure 5 shows squared residuals for all the small area means. We can find that the residual squared of all small area means were small, thus demonstrating the validity of our model and method.

6. Application

Following this, we applied the proposed model to a dataset of combined income and other sociological variables for the Spanish provinces [26], which is available in the R package sae [27]. This dataset contains 20 regions, 21 variables, and a total of 1050 observation units. We retained and integrated these variables to select four variables, that is, incomedata, age, edu, and sex, thus representing total income, age, education level, and gender, respectively. Because the central aim of the survey was to understand the income levels of the Spanish provinces, incomedata was used directly as a response variable in the model. The remaining three variables, age, edu, and sex, are considered to be closely related to income and therefore served as auxiliary variables in the model. We removed any units containing missing values in these variables and normalized these data.

The proposed model can be described as follows:

i n c o m e d a t a_{i j} = β_{0} + β_{1} e d u_{i j} + β_{2} s e x_{i j} + β_{3} a g e_{i j} + ν_{i j} + ε_{i j}, j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(44)

ν_{i 1}, \dots, ν_{i n_{i}} | H_{i} \overset{i i d}{\sim} H_{i} i = 1, 2, \dots, m

(45)

H_{i} \sim P D (α, θ; h_{0})

(46)

ε_{i j} \sim N (0, σ_{ε}^{2}) j = 1, 2, \dots, n_{i}; i = 1, 2, \dots, m

(47)

Based on the proposed model, by applying the parameter estimation method in Section 4 and the provided algorithm to run two hundred rounds, the estimation results of each parameter were obtained and are shown in Table 6. Figure 6 gives the density plot of the base distribution obtained from our estimation.

7. Conclusions

In this paper, we proposed to use the Poisson–Dirichlet Process in a Nested Error Regression model to provide a priori distributions for random effects in unit-level data. In the small area model, since the random effects are not directly observable as hidden variables, we applied the MCMC algorithm to extract the random effects at fixed initial values and constructed parameter estimates in the prior; we then gave estimates of the parameters such as regression coefficients and the base distributions with known random effects. Through numerical simulations and the application of example data, we demonstrated the feasibility of the studied model and the practicality of the estimation algorithm.

Our proposed model and its parameter estimation method have significant advantages. Firstly, the Poisson–Dirichlet process as a prior is able to flexibly capture the nonparametric properties of the random effects, thus overcoming the limitations of the traditional normality assumption and improving the adaptability and accuracy of the model. Second, the effective application of the MCMC algorithm ensures the robustness and accuracy of parameter estimation, especially when dealing with complex data and models. Both the theoretical and simulation results confirm these advantages, thus making our model and method widely applicable and effective in practical applications.

Although our proposed model and method performed well in several aspects, there are still some shortcomings and directions worthy of further research. The computational complexity of the models is high, especially when dealing with large-scale datasets, and the computational cost may become a limiting factor for their application. Therefore, developing more efficient computational methods and algorithms is a future research focus. In addition, with the development of data science, combining new statistical techniques and machine learning algorithms to improve and optimize the model is also a worthy research direction.

Author Contributions

Conceptualization, X.Z.; methodology, Q.K. and X.Z.; software, Q.K. and X.Z.; validation, Q.K. and X.Z.; formal analysis, Q.K. and X.Z.; investigation, X.Z.; resources, X.Q.; writing—original draft, Q.K.; writing—review and editing, X.Z.; visualization, X.Q.; supervision, Y.L.; project administration, X.Q. and Y.L.; funding acquisition, X.Q. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (Grant Nos. 12032016 and 12372277).

Data Availability Statement

The data that support the findings of this study are openly available, which can be downloaded from https://cran.r-project.org/web/packages/sae/index.html (accessed on 15 April 2024).

Acknowledgments

We thank the associate editor and the reviewers for their useful feedback that improved the quality and clarity of this paper.

Conflicts of Interest

The authors declared no conflicts of interest.

References

Rao, J.N. Small Area Estimation; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 331. [Google Scholar]
Pfeffermann, D. New important developments in small area estimation. Stat. Sci. 2013, 28, 40–68. [Google Scholar] [CrossRef]
Rao, J.N.; Molina, I. Small Area Estimation; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Fay, R.E., III; Herriot, R.A. Estimates of income for small places: An application of James-Stein procedures to census data. J. Am. Stat. Assoc. 1979, 74, 269–277. [Google Scholar] [CrossRef]
Battese, G.E.; Harter, R.M.; Fuller, W.A. An error-components model for prediction of county crop areas using survey and satellite data. J. Am. Stat. Assoc. 1988, 83, 28–36. [Google Scholar] [CrossRef]
Arima, S.; Bell, W.R.; Datta, G.S.; Franco, C.; Liseo, B. Multivariate Fay–Herriot Bayesian estimation of small area means under functional measurement error. J. R. Stat. Soc. Ser. A Stat. Soc. 2017, 180, 1191–1209. [Google Scholar] [CrossRef]
Yang, Z.; Chen, J. Small area mean estimation after effect clustering. J. Appl. Stat. 2020, 47, 602–623. [Google Scholar] [CrossRef] [PubMed]
Datta, G.S.; Hall, P.; Mandal, A. Model selection by testing for the presence of small-area effects, and application to area-level data. J. Am. Stat. Assoc. 2011, 106, 362–374. [Google Scholar] [CrossRef]
Sugasawa, S.; Kubokawa, T. Bayesian estimators in uncertain nested error regression models. J. Multivar. Anal. 2017, 153, 52–63. [Google Scholar] [CrossRef]
Ferrante, M.R.; Pacei, S. Small domain estimation of business statistics by using multivariate skew normal models. J. R. Stat. Soc. Ser. A Stat. Soc. 2017, 180, 1057–1088. [Google Scholar] [CrossRef]
Fabrizi, E.; Trivisano, C. Robust linear mixed models for small area estimation. J. Stat. Plan. Inference 2010, 140, 433–443. [Google Scholar] [CrossRef]
Chakraborty, A.; Datta, G.S.; Mandal, A. A two-component normal mixture alternative to the Fay-Herriot model. Stat. Transit. New Ser. 2016, 17, 67–90. [Google Scholar]
Diallo, M.S.; Rao, J. Small area estimation of complex parameters under unit-level models with skew-normal errors. Scand. J. Stat. 2018, 45, 1092–1116. [Google Scholar] [CrossRef]
Tsujino, T.; Kubokawa, T. Empirical Bayes methods in nested error regression models with skew-normal errors. Jpn. J. Stat. Data Sci. 2019, 2, 375–403. [Google Scholar] [CrossRef]
Opsomer, J.D.; Claeskens, G.; Ranalli, M.G.; Kauermann, G.; Breidt, F.J. Non-parametric small area estimation using penalized spline regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 2008, 70, 265–286. [Google Scholar] [CrossRef]
Polettini, S. A Generalised Semiparametric Bayesian Fay–Herriot Model for Small Area Estimation Shrinking Both Means and Variances. Bayesian Anal. 2016, 12, 729–752. [Google Scholar] [CrossRef]
Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 1997, 25, 855–900. [Google Scholar] [CrossRef]
Al-Labadi, L.; Hamlili, M.; Ly, A. Bayesian Estimation of Variance-Based Information Measures and Their Application to Testing Uniformity. Axioms 2023, 12, 887. [Google Scholar] [CrossRef]
Handa, K. The two-parameter Poisson–Dirichlet point process. Bernoulli 2009, 15, 1082–1116. [Google Scholar] [CrossRef]
Favaro, S.; Lijoi, A.; Mena, R.H.; Prünster, I. Bayesian non-parametric inference for species variety with a two-parameter Poisson–Dirichlet process prior. J. R. Stat. Soc. Ser. B Stat. Methodol. 2009, 71, 993–1008. [Google Scholar] [CrossRef]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2013. [Google Scholar]
Yang, L.; Wu, X. Estimation of Dirichlet process priors with monotone missing data. J. Nonparametr. Stat. 2013, 25, 787–807. [Google Scholar] [CrossRef]
Qiu, X.; Yuan, L.; Zhou, X. MCMC sampling estimation of Poisson-Dirichlet process mixture models. Math. Probl. Eng. 2021, 2021, 6618548. [Google Scholar] [CrossRef]
Carlton, M.A. Applications of the Two-Parameter Poisson-Dirichlet Distribution; University of California: Los Angeles, CA, USA, 1999. [Google Scholar]
Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 2000, 9, 249–265. [Google Scholar] [CrossRef]
Molina, I.; Rao, J.N. Small area estimation of poverty indicators. Can. J. Stat. 2010, 38, 369–385. [Google Scholar] [CrossRef]
Molina, I.; Marhuenda, Y. sae: An R Package for Small Area Estimation. R J. 2015, 7, 81–98. [Google Scholar] [CrossRef]

Figure 1. Density estimation curve of

α

and

θ

. (a)

(α, θ) = (0.5, 10)

based on

N (0, 1)

; (b)

(α, θ) = (0.3, 5)

based on

N (0, 1)

; (c)

(α, θ) = (0.9, 3)

based on

N (0, 1)

; (d)

(α, θ) = (0.4, 7)

based on

t (5)

; (e)

(α, θ) = (0.5, 2)

based on

t (2)

.

Figure 1. Density estimation curve of

α

and

θ

. (a)

(α, θ) = (0.5, 10)

based on

N (0, 1)

; (b)

(α, θ) = (0.3, 5)

based on

N (0, 1)

; (c)

(α, θ) = (0.9, 3)

based on

N (0, 1)

; (d)

(α, θ) = (0.4, 7)

based on

t (5)

; (e)

(α, θ) = (0.5, 2)

based on

t (2)

.

Figure 2. Density estimation curve of regression coefficients when

(α, θ) = (0.5, 10)

based on

N (0, 1)

.

Figure 2. Density estimation curve of regression coefficients when

(α, θ) = (0.5, 10)

based on

N (0, 1)

.

Figure 3. Total variation distance between two distributions at each iteration. (a)

(α, θ) = (0.5, 10)

based on

N (0, 1)

; (b)

(α, θ) = (0.3, 5)

based on

N (0, 1)

; (c)

(α, θ) = (0.9, 3)

based on

N (0, 1)

; (d)

(α, θ) = (0.4, 7)

based on

t (5)

; (e)

(α, θ) = (0.5, 2)

based on

t (2)

.

Figure 3. Total variation distance between two distributions at each iteration. (a)

(α, θ) = (0.5, 10)

based on

N (0, 1)

; (b)

(α, θ) = (0.3, 5)

based on

N (0, 1)

; (c)

(α, θ) = (0.9, 3)

based on

N (0, 1)

; (d)

(α, θ) = (0.4, 7)

based on

t (5)

; (e)

(α, θ) = (0.5, 2)

based on

t (2)

.

Figure 4. Squared residuals results of small area means. (a)

(α, θ) = (0.5, 10)

based on

N (0, 1)

; (b)

(α, θ) = (0.3, 5)

based on

N (0, 1)

; (c)

(α, θ) = (0.9, 3)

based on

N (0, 1)

; (d)

(α, θ) = (0.4, 7)

based on

t (5)

; (e)

(α, θ) = (0.5, 2)

based on

t (2)

.

Figure 4. Squared residuals results of small area means. (a)

(α, θ) = (0.5, 10)

based on

N (0, 1)

; (b)

(α, θ) = (0.3, 5)

based on

N (0, 1)

; (c)

(α, θ) = (0.9, 3)

based on

N (0, 1)

; (d)

(α, θ) = (0.4, 7)

based on

t (5)

; (e)

(α, θ) = (0.5, 2)

based on

t (2)

.

Figure 5. Squared residuals results of small area means of simulated data.

Figure 6. Density of estimated base distribution of real data.

Table 1. Performance of parameters estimation.

Parameter	Base Distribution	True Value	Bias	MSE	Confidence Interval
$α$	$N (0, 1)$	0.5	−0.06534407	0.03458486	(0.427019, 0.442293)
$θ$		10	−0.9001747	22.34711	(8.874574, 9.325077)
$β_{0}$		1	0.01457439	0.005534238	(1.011374, 1.017774)
$β_{1}$		2	0.03045286	0.01246042	(2.025742, 2.035163)
$α$	$N (0, 1)$	0.3	0.08081297	0.02533613	(0.361643, 0.399983)
$θ$		5	0.4856122	7.561553	(5.105321, 5.865904)
$β_{0}$		1	0.01848704	0.002993466	(1.011289, 1.025685)
$β_{1}$		2	0.04352289	0.004664257	(2.036166, 2.050880)
$α$	$N (0, 1)$	0.9	−0.07212803	0.01389355	(0.814840, 0.840904)
$θ$		3	−0.8307739	3.068224	(1.915222, 2.423231)
$β_{0}$		1	0.08176262	0.01210896	(1.071468, 1.092058)
$β_{1}$		2	−0.08813282	0.01125358	(1.903614, 1.920121)
$α$	$t (5)$	0.4	0.05213893	0.02911436	(0.429428, 0.474850)
$θ$		7	−0.5294164	15.58317	(5.916658, 7.024509)
$β_{0}$		1	0.06114231	0.006702937	(1.053531, 1.068753)
$β_{1}$		2	0.02870615	0.004429664	(2.020312, 2.037100)
$α$	$t (2)$	0.5	0.09099937	0.01867303	(0.576749, 0.605250)
$θ$		2	−0.3801979	2.727176	(1.395155, 1.844450)
$β_{0}$		1	−0.1348095	0.02203277	(0.856507, 0.873875)
$β_{1}$		2	0.3587933	0.1333754	(2.349268, 2.368318)

Table 2. Estimated mean of regression coefficients.

Regression Coefficient	True Value	NER Model	NER Model with PDP Random Effects
		0.9607364	1.014574
		1.017708	1.018487
$β_{0}$	1	1.001620	1.081763
		1.069568	1.061142
		0.7113994	0.8651905
		2.045244	2.030453
		2.038286	2.043523
$β_{1}$	2	1.900959	1.911867
		2.044987	2.028706
		2.4582498	2.358793

Table 3. Results of small areas means when

(α, θ) = (0.5, 10)

based on

N (0, 1)

.

Table 3. Results of small areas means when

(α, θ) = (0.5, 10)

based on

N (0, 1)

.

Area	Sample Mean	Estimate
1	2.089628	2.220492
2	1.793437	1.916604
3	1.692420	1.731447
4	2.428903	2.606485
5	1.550903	2.462545
6	2.068002	2.527393
7	2.412526	2.482989
8	1.717398	1.926363
9	2.042811	2.390698
10	2.094958	2.358032
11	2.218328	2.641435
12	1.865992	2.597315
13	1.725139	2.231473
14	2.020898	2.302906
15	1.630873	2.099237
16	2.129673	2.268741
17	1.834107	1.922384
18	2.570386	2.952974
19	2.006835	2.219573
20	1.733093	2.066331

Table 4. Performance of parameters estimation of simulated data.

$\hat{α}$	$\hat{θ}$	${\hat{β}}_{0}$	${\hat{β}}_{1}$
0.6003261	5.124903	1.003003	2.411909

Table 5. Results of small areas means of simulated data.

Area	Sample Mean	Estimate
1	2.6218905	2.342866
2	2.0078687	1.469475
3	0.9612582	1.162803
4	1.5513356	1.384326
5	2.2699821	1.809038
6	2.0796827	1.951780
7	3.0264584	2.486590
8	3.1521868	2.758094
9	1.2477416	1.554100
10	1.2796926	1.416766
11	2.5728659	2.642539
12	1.8507724	1.963746
13	2.8604333	2.081979
14	3.1778784	2.290092
15	1.7505182	1.817446
16	1.8190648	1.810704
17	1.0271540	1.127514
18	2.6706406	2.292215
19	2.5922461	2.019817
20	2.7200485	2.626496

Table 6. Performance of parameters estimation of real data.

$\hat{α}$	$\hat{θ}$	${\hat{β}}_{0}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{3}$
0.875775	3.025505	−0.007919	−0.000048	−0.000173	−0.000169

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, X.; Ke, Q.; Zhou, X.; Liu, Y. Small Area Estimation under Poisson–Dirichlet Process Mixture Models. Axioms 2024, 13, 432. https://doi.org/10.3390/axioms13070432

AMA Style

Qiu X, Ke Q, Zhou X, Liu Y. Small Area Estimation under Poisson–Dirichlet Process Mixture Models. Axioms. 2024; 13(7):432. https://doi.org/10.3390/axioms13070432

Chicago/Turabian Style

Qiu, Xiang, Qinchun Ke, Xueqin Zhou, and Yulu Liu. 2024. "Small Area Estimation under Poisson–Dirichlet Process Mixture Models" Axioms 13, no. 7: 432. https://doi.org/10.3390/axioms13070432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Small Area Estimation under Poisson–Dirichlet Process Mixture Models

Abstract

1. Introduction

2. Theoretical Background

2.1. Nested Error Regression Models

2.2. Poisson–Dirichlet Process

3. Small Area Model with PDP Random Effects

4. Estimation

4.1. Proposed Approach

4.1.1. Estimation of Regression Coefficients and Error Variance

4.1.2. Estimation of the Base Distribution and Two Parameters of the Poisson–Dirichlet Process

4.2. Algorithms

4.2.1. Selection of Initial Values

4.2.2. Full Conditional Distributions of the MCMC Algorithm

4.2.3. Sampling and Estimation

5. Simulation

5.1. Model Setup and Simulation Conditions

5.2. Simulation Results and Analysis

5.3. Simulated Normal Data

6. Application

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI