Choosing and checking the ingredients logically comes before inference, but it is convenient to discuss these in reverse order.
  3.1. Relative Belief Inferences
Consider now a statistical problem with ingredients the data  a model  a prior  and interest is in making inference about  for  where no distinction is made between the function and its range to save notation. Initially, suppose that all the probability distributions are discrete. This is not really a restriction in the discussion, however, as if something works for inference in the discrete case but does not work in the continuous case, then it is our view that the concept is not being applied correctly or the mathematical context is just too general. For us, the continuous case is always to be thought of as an approximation to a fundamentally discrete context, as measurements are always made to finite accuracy, and the approximation arises via taking limits. Some additional comments on the continuous case are made subsequently.
As discussed in 
Section 2, the basic object of inference is the measure of evidence and what is wanted is a measure of the evidence that any particular value 
 is true. Based on the ingredients specified, there are two probabilities associated with this value, namely, the prior probability 
, as given here by the marginal prior density evaluated at 
 and the posterior probability 
, as given here by the marginal posterior density evaluated at 
 In certain treatments of inference, 
 is taken as a measure of the evidence that 
 is the true value, but, for a wide variety of reasons, this is not felt to be correct and Example 1 provides a specific case where this fails. In addition, this measure suffers from the same basic problem of 
p-values, namely, there is no obvious dividing line between evidence for and evidence against. Moreover, it is to be noted that probabilities measure belief and not evidence. If we start with a large prior belief in 
, then, unless there is a large amount of data, there will still be a large posterior belief even if it is false and, similarly, if we started with a small amount of belief. There is agreement, however, to use the 
principle of conditional probability to update beliefs after receiving information or data and this is to be regarded as the first principle of the theory of relative belief.
Thus, what is the evidence that 
 is the true value to be measured? Basic to this is the 
principle of evidence: if 
 there is evidence in favor, as belief has increased, if 
 there is evidence against as belief has decreased and if, 
 there is no evidence either way. This principle has a long history in the philosophical literature concerning evidence. This principle does not provide a specific measure of evidence but at least it indicates clearly when there is evidence for or against, independent of the size of initial beliefs, and it does suggest that any reasonable measure of the evidence depends on the difference, in some sense, between 
 and 
 namely, evidence is measured by 
change in belief rather than belief. A number of measures of this change have been proposed (see [
1] for a discussion), but, by far, the simplest and the one that has the nicest properties is the relative belief ratio
        
        Thus, if 
 there is evidence for 
 being the true value, if 
 there is evidence against 
 being the true value and no evidence either way if 
 The use of the relative belief ratio to measure the evidence is the third and final principle of the theory, which we call the 
principle of relative belief. The relative belief ratio can also be written as 
 where 
m is the prior predictive density of the data and 
 is the conditional prior predictive density of the data given 
Another natural candidate for a measure of evidence is the Bayes factor  as this satisfies the principle of evidence, namely,  when there is evidence for (against, neither)  being the true value. The Bayes factor can be defined in terms of the relative belief ratio as  but not conversely. Note that the relative belief ratio of a set  is  where  are the prior and posterior probability measures of  respectively. It might appear that  is a comparison between the evidence for  being true with the evidence for  being false, but it is provable that  implies  and conversely, so this is not the case. In addition, as subsequently discussed, in the continuous case, it is natural to take 
The principle of relative belief leads to an ordering of the possible values for  as  is preferred to  whenever  since there is more evidence for  than  When , this agrees with the likelihood ordering, but likelihood fails to provide such an ordering for general  It is common to use the profile likelihood ordering even though this is not a likelihood ordering and this does not agree with the relative belief ordering. It is noteworthy that the relative belief idea is consistent in the sense that inferences for all  are based on a measure of the change in prior to posterior probabilities.
The relative belief ordering leads immediately to a theory of estimation. Basing inferences on the evidence requires that the relative belief estimate be a value 
 that maximizes 
 and typically such a value is unique so 
. It is also necessary to say something about the accuracy of this estimate in an application. For this, a set of values containing 
 is quoted and the “size” of the set is taken as the measure of accuracy. Again, following the ordering based on the evidence, it is necessary that the set take the form 
 for some constant 
 since, if 
 then 
 must be included whenever 
 is. However, what 
c should be used? It is perhaps natural to chose 
c so that 
 contains some prescribed amount of posterior probability, so the set is a 
-credible region. However, there are several problems with this approach. For what 
 should be chosen? Even if one is content with some particular 
 say 
 there is the problem that the set may contain values 
 with 
 and such a value has been ruled out since there is evidence against such a 
 being true. It is argued in [
1] that the 
plausibility set  be used as 
 contains all the values for which there is evidence in favor of it being the true value. In general circumstances, it is provable that 
 so 
 There are several possible measures of size, and certainly the posterior content 
 is one as this measures the belief that the true value is in 
 but also some measure such as length or cardinality is relevant. If 
 is small and 
 large, then 
 can be judged to be an accurate estimate of 
It is immediate that 
 is the evidence concerning 
 The evidential ordering implies that the smaller 
 is than 1, the stronger is the evidence against 
 and the bigger it is than 1, the stronger is the evidence in favor 
, but how is one to measure this strength? In [
16], it is proposed to measure the 
strength of the evidence via
        
        which is the posterior probability that the true value of 
 has evidence no greater than that obtained for the hypothesized value 
 When 
 and (
3) is small, then there is strong evidence against 
 since there is a large posterior probability that the true value of 
 has a larger relative belief ratio. Similarly, if 
 and (
3) is large, then there is strong evidence that the true value of 
 is given by 
 since there is a large posterior probability that the true value is in 
 and 
 maximizes the evidence in this set. Additional results concerning 
 as a measure of evidence and (
3) can be found in [
1,
16].
For continuous parameters, it is natural to define 
 where 
 is a sequence of sets converging nicely to 
 as 
 When the densities are continuous at 
 then this limit equals (
2) so this is a measure of evidence in general circumstances. In addition, it is natural to define the Bayes factor by 
 and, when the densities are continuous at 
 then 
A variety of consistency results, as the amount of data increases, are proved in [
1] concerning the estimation and hypothesis assessment procedures. In particular, when 
 is false, then (
2) converges to 0 as does (
3). When 
 is true, then (
2) converges to its largest possible value (greater than 1 and often 
∞) and, in the discrete case (
3) converges to 1. In the continuous case, however, when 
 is true, then (
3) typically converges to a 
 random variable. This simply reflects the approximate nature of the inferences and is easily resolved by requiring that a deviation 
 be specified such that if dist
 where dist is some measure of distance determined by the application, then this difference is to be regarded as immaterial. This leads to redefining 
 as 
dist
 and typically a natural discretization of 
 exists with 
 as one of its elements. With this modification (
3) converges to 1 as the amount of data increases when 
 is true. Given that data is always measured to finite accuracy, the value of a typical continuous-valued parameter can only be known to a certain finite accuracy no matter how much data is collected. Thus, such a 
 always exists and it is part of an application to determine the relevant value (see Example 7 here, Al-Labadi, Baskurt and Evans [
7] and Evans, Guttman and Li [
17] for developments on determining 
). These results establish that, as the amount of data increases, relative belief inferences will inevitably produce the correct answers to estimation and hypothesis assessment problems.
It is immediate that relative belief inferences have some excellent properties. For example, any 1-1 increasing function of 
 such as 
 can be used to measure evidence as the inferences are invariant to this choice. In addition, 
 is invariant under smooth reparameterizations and so all relative belief inferences possess this invariance property. For example, MAP (maximum a posteriori) inferences are not invariant and this leads to considerable doubt about their validity (see also Example 1). In [
1], results from a number of papers are summarized establishing optimality results for relative belief inferences in the collection of all Bayesian inferences. For example, Al-Labadi and Evans [
18] establish that relative belief inferences for 
 have optimal robustness to the prior 
 properties. In addition, as discussed in 
Section 3.2, since the inferences are based on a measure of evidence a key criticism of Bayesian methodology can be addressed, namely, the extent to which the inferences are biased can be measured.
Relative belief prediction inferences for future data are naturally produced by using the ratio of the posterior to prior predictive densities for the quantity in question. The following example illustrates this and demonstrates significant advantages for relative belief.
Example 1. Prediction for Bernoulli sampling.
 Consider an example discussed in Chapter 6 of [
19] who further reference [
9]. A tack is flipped with 
 indicating the tack finishes point up and 
 otherwise, so 
 Bernoulli
 Suppose the prior is 
 and the goal is to predict 
f future observations 
 having observed 
n independent tosses 
. The posterior of 
 is beta
 the prior predictive density of 
 is 
 and the posterior predictive density for 
 is
        
        which is constant for all 
 with the same value of 
 Maximizing (
4) gives the MAP predictor of 
 If 
 then the maximum occurs at 
 with 
 namely, 
 If 
 then the maximum occurs at 
 with 
 namely, 
 If 
, then a maximum occurs at both 
 and 
. Thus, using MAP gives the absurd result that 
 is always predicted to be either all 0s or all 1s. Clearly, there is a problem here with using MAP.
Now suppose 
 so the prediction is all 0s and
        
For fixed 
 then 
 as 
 whenever 
 and converges to 1 when 
. Diaconis and Skyrms [
19] note, however, that when 
, then 
 as 
 and make the comment “If this is an unwelcome surprise, then perhaps the uniform prior is suspect.” They also refer to some attempts to modify the prior to avoid this phenomenon, which clearly violates an essential component of the Bayesian approach. In our view, there is nothing wrong with the uniform prior, rather the problem lies with using posterior probabilities implicitly as measures of evidence, both to determine the predictor and to assess its reliability.
The relative belief ratio for 
 is
        
        With 
 and 
 Figure 1 gives the plot of (
5) as a function of 
 The best relative belief predictor of 
 is any sample with 
 and 
 has posterior content 
 Thus, there is reasonable belief that the plausibility set contains the “true” future sample but certainly there are many such samples. By contrast with MAP, a sensible prediction is made using relative belief.
For the case when 
 and 
 then
        
        which is decreasing in 
 and so is maximized for the sample with 
 Similarly, when 
, the predictor is the sample with 
 Thus, at the extremes, the predictions based on MAP and relative belief are the same, but otherwise there is a sharp disagreement. In addition,
        
        always contains 
 and for any 
 such that 
        as 
 Therefore, for any 
 there is an 
 such that for all 
 then 
 contains no 
 having a proportion of 1s that is 
c or greater. Thus, 
 is shrinking as 
n increases in the sense that it contains only samples with smaller and smaller proportion of 1s as 
n increases.
The posterior content of the plausibility region equals
        
        which equals the sum over all the summands that are greater than 
 and
        
        When 
, the corresponding term converges to 
 Thus, for all 
n large enough, the sum (
6) contains the terms for 
 Therefore, for 
 and all 
n large enough, (
6) is greater than 
 and the posterior content of 
 converges to 1.
Thus, relative belief also behaves appropriately when 
 and 
 while MAP does not. The failure of MAP might be attributed to the requirement that the entire sample 
 be predicted. If instead it was required only to predict the value 
 then the prior predictive of this quantity is uniform on 
 the posterior of 
 equals 
 and the relative belief ratio for 
 equals 
 Thus, as is often the case when the quantity in question has a uniform prior, MAP and relative belief estimates are the same. However, even in this case, there is no natural cut-off for MAP inferences to say when there is evidence for or against a particular value. The fact that it is necessary to modify the problem in this way to get a reasonable inference is, in our view, a substantial failing of MAP. It seems reasonable to suggest that, when an inference approach is shown to perform poorly on such examples, that it not be generally recommended. Additional examples of poor performance of MAP are discussed in [
1].
It is notable that, while the relative belief approach to inference has been described here using statistical models and priors, in reality, everything can be cast in terms of a single probability model however such an object arises. Thus, if 
P is a probability measure on a sample space 
 and 
 is an event whose truth value is unknown but 
 is known to be true, then the evidence concerning the truth of 
A is given by 
 with this defined by the appropriate limit when either 
A or 
C is a null event. As discussed in [
1], the relative belief approach to inference can be seen as essentially probability theory together with the principles of evidence and relative belief.
  3.2. Choosing and Checking the Ingredients
The first choice that must be made is the model and there are a number of standard models used in practice. There is not a lot written about this step, however, and yet it is perhaps the most important step in solving a statistical problem. It is generally accepted that the correct way to choose a prior is through elicitation. This means that a methodology is prescribed that directs an expert in the application area on how to translate their knowledge into a prior. There are various default priors in use that avoid this elicitation step, but it is far better to recommend that sufficient time and energy be allocated for the elicitation of a proper prior. Staying within the context of probability suggests that a variety of paradoxes and illogicalities are avoided.
Given the ingredients, the relative belief inferences may be applied correctly, but it is still reasonable to ask if these ingredients are appropriate for the particular application. If not, then the inferences drawn cannot be considered valid. There are at least two questions about the ingredients that need to be answered, namely, is there bias inherent in the choice of ingredients and are the ingredients contradicted by the data?
The concern for bias is best understood in terms of assessing the hypothesis 
 Let 
 denote the prior predictive distribution of the data given that 
 Bias against 
 means that the ingredients are such that, with high probability, evidence will not be obtained in favor of 
 even when it is true. Bias against is thus measured by
        
        If (
7) is large, then obtaining evidence against 
 seems like a foregone conclusion. For bias in favor of 
, consider 
 where dist
 so 
 is a value that just differs from the hypothesized value by a meaningful amount. Bias in favor of 
 is then measured by
        
        If (
8) is large, then obtaining evidence in favor of 
 seems like a foregone conclusion. Typically, 
 increases as dist
 increases so (
8) is an appropriate measure of bias in favor of 
. The choice of the prior can be used somewhat to control bias but typically a prior that makes one bias lower just results in making the other bias higher. It is established in [
1] that, under quite general circumstances, both biases converge to 0 as the amount of data increases. Thus, bias can be controlled by design a priori.
The model needs to be checked against the data for, if the data d lies in the tails of every distribution in the model, then this suggests model failure. There are a wide variety of approaches to assessing this and these are not reviewed here. One relevant comment is that, at this time, there do not seem to exist general methodologies for modifying a model when model failure is encountered.
The prior can also be checked for conflict with the data. A conflict means that the observed data are in the tails of all those distributions in the model where the prior primarily places its mass. For a minimal sufficient statistic 
T for the model, Evans and Moshonov [
20] used the tail probability
        
        to assess prior-data conflict where (
9) small indicates prior-data conflict. In [
21], it is shown that, under general circumstances, (
9) converges to 
 as the amount of data increases. There are a variety of refinements of (
9) that allow for looking at particular components of a prior to isolate where a problem with the prior may be. In [
22], a method is developed for replacing a prior when a prior-data conflict has been detected. This does not mean simply replacing a prior by one that is more diffuse, however, as is demonstrated in 
Section 4.1.