2.1. Eliciting a Beta Prior
Consider first the situation where and the prior on is beta . Suppose it is known with “virtual certainty” that where are known. This immediately implies that with virtual certainty. Here "virtual certainty" is interpreted to mean that the true value of is in the interval with high prior probability , say Thus, this restricts the prior to those values of satisfying Note that in general there may be several values of that satisfy this equality. For example, if and with then for all To completely determine another condition is added, namely, it is required that the mode of the prior be at the point as this allows the placement of the primary amount of the prior mass at an appropriate place within For example, a natural choice of the mode in this context is , namely, the midpoint of the interval. When the mode of the beta occurs at where There is thus a 1-1 correspondence between the values and given by Hereafter, we restrict to the case to avoid singularities on the boundary as these seem difficult to justify a priori. Therefore, after specifying the mode, only the scaling of the beta prior is required through the choice of Now if beta then and as which establishes that as Thus, the following result has been proven since .
Theorem 1. For and then the beta distribution has its mode at ξ and whenever there is a value such that there is exactly γ of the probability in .
While the theorem establishes the existence of a value satisfying the requisite equation, it does not establish that this value is unique. Although uniqueness is not necessary for the methodology, based on examples and intuition, it seems very likely that is a monotone increasing function of which would imply that the in Theorem 1 is in fact unique. In any case, can be computed by choosing , finding a value such that and then obtaining satisfying the equality via the bisection root finding algorithm. This procedure is guaranteed to converge by the intermediate value theorem.
Example 2. Determining a beta prior.
Suppose that and The solution obtained via the iterative algorithm is then where the iteration is stopped when This took seven iterations, the prior is given by and contains of the prior probability. If instead of the error tolerance for stopping was set equal to , then the solution and was obtained after 20 iterations with containing of the prior probability.
If then and virtual certainty for is obtained by
The concept of “virtual certainty” is interpreted as something being true “with high probability” and choosing
close to 1 reflects this. For example, in rolling an apparently symmetrical die the analyst may be quite certain that the probability
of observing
i pips is a least 1/8 and wants the prior to reflect this. In effect, the goal is to ensure that the prior concentrates its mass in the region satisfying these inequalities and choosing
large accomplishes this. Actually, it is not necessary that exact equality is obtained to ensure virtual certainty. As long as
is close to 1, then small changes in
will not lead to big changes in the prior as in Example 2 where it is seen that choosing
rather than
makes very little difference in the prior. Specifying probabilities beyond 2 or 3 decimal places seems impractical in most applications so taking
in the range
seems quite satisfactory for characterizing virtual certainty while allowing some flexibility for the analyst. Far more important than the choice of
is the selection of what it is felt is known, for example, the bounds
and
on the probabilities for the beta prior, as mistakes can be made. Protection against a misleading analysis caused by a poor choice of a prior is approached through checking for prior-data conflict and modifying the prior appropriately when this is the case, as discussed in
Section 3. It is also to be noted that the methodology does not require
to be large as the analyst may only be willing to say that the bounds on the probabilities for the die hold with prior probability
However, choosing the bounds so that these are fairly weak constraints on the probabilities, and so almost certainly hold as is reflected by choosing
close to 1, seems like an easy way to be weakly informative.
2.2. Eliciting a Dirichlet Prior
The approach to eliciting a beta prior allows for a great deal of flexibility as to where the prior allocates the bulk of its mass in The question, however, is how to generalize this to the Dirichlet prior. As will be seen, it is necessary to be careful about how is elicited. Again, we make the restriction that each to avoid singularities for the prior on the boundary.
It seems quite natural to think about putting probabilistic bounds on the
such as requiring
with high probability, for fixed constants
to reflect what is known with virtual certainty about
For example, it may be known that
is very small and so we put
, choose
small and require that
with prior probability at least
While placing bounds like this on the
seems reasonable, such an approach can result in a complicated shape for the region that is to contain the true value of
with virtual certainty. This complexity can make the computations associated with inference very difficult. In fact, it can be hard to determine exactly what the full region is. As such, it seems better to use an elicitation method that fits well with the geometry of the Dirichlet family. If it is felt that more is known a priori than a Dirichlet prior can express, then it is appropriate to contemplate using some other family of priors, see, for example, Elfadaly and Garthwaite [
4,
5]. Given the conjugacy property of Dirichlet priors and their common usage, the focus here is on devising elicitation algorithms that work well with this family. First, however, we consider elicitation approaches for this problem that have been presented in the literature.
Chaloner and Duncan [
6] discuss an iterative elicitation algorithm based on specifying characteristics of the prior predictive distribution of the data which is Dirichlet-multinomial. Regazzini and Sazonov [
7] discuss an elicitation algorithm which entails partitioning the simplex, prescribing prior probabilities for each element of the partition and then selecting a mixture of Dirichlet distributions such that this prior has Prohorov distance less than some
from the true prior associated with de Finetti’s representation theorem. Both of these approaches are complicated to implement. Closest to the method presented here is that discussed in [
8] where
is specified by choosing
stating two prior quantiles
where
for
and specifying prior quantile
for
for each
Thus, there are
k constraints that the Dirichlet
has to satisfy and an algorithm is provided for computing
Drawbacks include the fact that the
are not treated symmetrically as there is a need to place two constraints on one of the probabilities and
is treated quite differently than the other probabilities. In addition, precise quantiles need to be specified and values
can be obtained which induce singularities in the prior. Furthermore, it is not at all clear what these constraints say about the joint prior on
as this elicitation does not take into account the dependencies that occur necessarily among the
Zapata-Vázquez et al. [
9] develop an elicitation algorithm based on eliciting beta distributions for the individual probabilities and then constructing a Dirichlet prior that represents a compromise among these marginals. Elfadaly and Garthwaite [
4] determine a Dirichlet by eliciting the first quartile, median and third quartile for the conditional distribution of
and finding the beta distribution, rescaled by the factor
that best fits these quantiles. This requires the prescription of precise quantiles, an order in which to elicit the conditionals and an iterative approach to reconcile the elicited conditional quantiles when these quantiles are not consistent with a Dirichlet. A notable aspect of their approach is that it also works for the Connor-Mosimann distribution, a generalization of the Dirichlet, and in that case no reconciliation is required. Similarly, Elfadaly and Garthwaite [
5] base the elicitation on the three quartiles of the marginal beta distributions of the
which, while independent of order, still requires reconciliation to ensure that the elicited marginals correspond to a Dirichlet. In addition, the elicitation procedure based on the conditionals is extended to develop an elicitation procedure for a more flexible prior based on a Gaussian copula.
The approach in this paper is based on the idea of placing bounds on the probabilities that hold with virtual certainty and that are mutually consistent for any prior on
. The user need only check that the bounds stated satisfy the conditions stated in the theorems to ensure consistency and these can be very simple to check and modify appropriately. Rather than being required to state precise quantiles or moments for the prior, all that is required are weak bounds on the probabilities. For example, we might be willing to say that we are virtually certain that
is greater than a value
We consider
a weak bound because there may be some belief that the true value is much greater than
but being precise about how to express such beliefs is more difficult and requires more refined judgements. Certainly elicitation methodology that requires more assessment than what is being required here is even more open to concerns about robustness and other issues with the prior. As discussed in
Section 3,
Section 4 and
Section 5, such concerns are better addressed through considerations about prior-data conflict, bias and using inference methods that are as robust to the prior as possible.
There are several versions depending on whether lower or upper bounds are placed on the
We start with the situation where a lower bound is given for each
as this provides the basic idea for the others. Generally the elicitation process allows for a single lower or upper bound to be specified for each
These bounds specify a subsimplex of the simplex
with all edges of the same length. As will be seen, this implicitly takes into account the dependencies among the
With such a region determined, it is straightforward to find
such that the subsimplex contains
of the prior probability for
It is worth noting that the bounds determined in Theorems 2–4 can be applied to any family of priors on
and it is only in
Section 2.2.4 where specific reference is made to the Dirichlet.
Note that a -simplex can be specified by k distinct points in say and then taking all convex combinations of these points. This simplex will be denoted as with Thus, where is the i-th standard basis vector of and it is clear that whenever The centroid of is equal to
2.2.1. Lower Bounds on the Probabilities
For this we ask for a set of lower bounds such that for To make sense, there is only one additional constraint that the must satisfy, namely, If then it is immediate that otherwise Thus, the are completely determined when Attention is thus restricted to the case where The following result then holds.
Theorem 2. Specifying the lower bounds such that for andprescribes where andThe edges of each have length and Proof.
Note that (
1) implies that
and so stating the lower bounds implies a set of upper bounds, and also
Consider now the set
and note that
for
For
with
then
since, for example, the first coordinate satisfies
so
Therefore
If then where Now and so For we have This proves that and so we have
Finally note that and so has edges all of the same length. This completes the proof. ☐
It is relatively straightforward to ensure that the elicited bounds are consistent with a prior on
. For, if it is determined that
then it is simply a matter of lowering some of the bounds to ensure (
1) is satisfied. For example, multiplying all the bounds by a common factor can do this and lowered
means greater conservatism as it is a weaker bound. Furthermore, it is perfectly acceptable to set some
as this does not affect the result.
2.2.2. Upper Bounds on the Probabilities
Of course, it may be that prior beliefs are instead expressed via upper bounds on the probabilities or a mixture of upper and lower bounds. The case of all upper bounds is considered first. Our goal is to specify the upper bounds in such a way that these lead unambiguously to lower bounds
satisfying (
1) and so to the simplex
Suppose then that we have the upper bounds
such that
It is clear then that
must satisfy the system of linear equations given by (
2) as well as
for
and (
1). Thus, the
must satisfy
where
is the
k-dimensional vector of 1’s and
is the
identity. Noting that
it is immediate that
Note that this requires that
as is always the case.
Putting
then (
4) implies
and so
provided
satisfies
and, for this implies that iff
In addition, when (
5) is satisfied, then
for
This completes the proof of the following result.
Theorem 3. Specifying upper bounds such that for satisfying inequalities (5) and (7), determines the lower bounds given by (6), which determine the simplex defined in Theorem 2. For this elicitation to be consistent with a prior on
it is necessary to make sure that the upper bounds satisfy (
5) and (
7). If we take
then (
5) is satisfied and
implies that (
7) is satisfied as well. If
, then the
need to be increased which is conservative and note that
is true provided all the
which is always the case. If (
5) is satisfied but (
7) is not for some
i, then
must be increased, which is again conservative, and (
5) is still satisfied. Thus, again, making sure the elicited bounds are consistent is straight-forward. In addition, the bound
is an acceptable choice.
2.2.3. Upper and Lower Bounds on the Probabilities
Now, perhaps after relabelling the probabilities, suppose that lower bounds
for
as well as upper bounds
for
where
have been provided. Again, it is required that
and we search for conditions on the
that complete the prescription of a full set of lower bounds
so that Theorem 2 applies. Again the
and
vectors must satisfy (
3). Let
denote the subvector of
given by its consecutive
r-th through
s-th coordinates and
the sum of these coordinates provided
and be null otherwise. The following equations hold
Rearranging these equations so the knowns are on the left and the unknowns are on the right gives
It follows from (9) that
and substituting this into (
8) gives the solution for
as well.
Thus, it is only necessary to determine what additional conditions have to be imposed on the
so that Theorem 2 applies. Note that it follows from (
8) that
takes the correct form, as given by (
2), so it is really only necessary to check that
is appropriate.
First it is noted that it is necessary that The case only occurs when and then which is the required value for for Theorem 2 to apply. Thus, when , there is no choice but to put and choose a lower bound for which of course could be 0, which means that Theorem 2 applies. It is assumed hereafter that
Now
and the requirement
imposes the requirement
Using (
10) gives
and therefore
iff
It is seen that (
11) generalizes (
5) on taking
Now for
thus, for
this implies that
iff
Thus, (
13) generalizes (
5) on taking
In addition, if (
11) is satisfied, then
for
The above argument establishes the following result.
Theorem 4. For m satisfying specifying the bounds
- (i)
with for satisfying ; and
- (ii)
with for satisfying (11) and (13), determines the lower bounds given by (12), which, together with determine the simplex defined in Theorem 2.
Ensuring that the elicited bounds are consistent with a prior on
can proceed as follows. First ensuring
can be accomplished conservatively by lowering some of the
if necessary. In addition, the inequality
can be accomplished conservatively by raising some of the
if necessary. If
then some of the
need to be decreased or some of the
need to be increased or a combination of both. Indeed setting a
to be conservative, so (
13) is satisfied, may require lowering some of the lower bounds but again this is conservative. Note that, if we assign the
such that
, then (
13) reduces to
and the assignment
ensures consistency although an alternative assignment can be made such that
holds.
The purpose of Theorems 2–4 is to ensure that the bounds selected for the individual probabilities are consistent. It may be that an expert has a bound which they believe holds with virtual certainty but the consistency requirements are violated. The solution to this problem is to decrease a lower bound or increase an upper bound so that the requirements are satisfied. While this is not an entirely satisfactory solution to this problem, it does not violate the prescription that the bounds hold with virtual certainty. Furthermore, the lower bound of 0, or the upper bound of 1, is always available if a user feels they have absolutely no idea how to choose such a bound.
2.2.4. Determining the Elicited Dirichlet Prior
Theorems 2–4 state bounds that are consistent for a prior on Thus, now it is necessary to determine the Dirichlet prior, denoted such that Again we pick a point and place the mode at so for with For example, would often seem like a sensible choice and then only needs to be determined. There is a 1-1 correspondence between and given by
Again it makes sense to proceed via an iterative algorithm to determine . Provided set and find such that As before set and then the algorithm proceeds via bisection. Determining at each step becomes problematical even for . In the approach adopted here this probability content was estimated via a Monte Carlo sample from the relevant Dirichlet. This is seen to work quite well as, in the case of determining a prior, high accuracy for the computations is not required.
Consider an example.
Example 3. Determining a Dirichlet prior.
Suppose that and the lower bounds are placed on the probabilities. This results in the bounds and which are reasonably tight. The mode was placed at the centroid For an error tolerance of and a Monte Carlo sample of size of at each step, the values and were obtained after 13 iterations. The prior content of was estimated to be . If greater accuracy is required then N can be increased and/or ϵ decreased.
This choice of lower bounds results in a fairly concentrated prior as is reflected in the plots of the marginals in Figure 1. This concentration is not a defect of the elicitation as (2) indicates that it must occur when the sum of the bounds is close to 1. Thus, the concentration is forced by the dependencies among the probabilities. Consider now another example.
Example 4. Determining a Dirichlet prior.
Suppose that and the lower bounds , , are placed on the probabilities. This leads to the following bounds for the probabilities. The mode was placed at the centroid For an error tolerance of and a Monte Carlo sample of size of at each step, the values and were obtained after seven iterations. The prior content of was estimated to be Figure 2 is a plot of the nine marginal priors for the Again, the dependencies among the make the marginal priors quite concentrated. Example 1 (continued).
Choosing the prior.
Given that we wish to assess independence, it is necessary that any elicited prior include independence as a possibility so this is not ruled out a priori. A natural elicitation is to specify valid bounds (namely, bounds that satisfy our theorems) on the and the and then use these to obtain bounds on the which in turn leads to the prior. Thus, suppose valid bounds have been specified that lead to the lower bounds Then it is necessary that is the lower bound on Note that it is immediate that the satisfy the conditions of Theorem 2 and from (2), which is greater than since and As such the region for the contains elements of For this example, the lower bounds were chosen which leads to the lower boundson the Note that these are precisely the bounds used in Example 4 so the prior is as determined in that example where the indexing is row-wise.