4.1. Value of Information
4.1.1. Experiments
Making decisions when the only information-gathering tools available are noisy signal sets is an unavoidable aspect of life. Examples are not difficult to adduce. In a bank run situation, one may be inclined to withdraw one’s funds from a bank if there are rumours that the fundamentals of the bank are weakening, but how is one to ascertain whether the rumours are true? One could rely on several alternative information sources, such as official bank spokesperson statements, media reports, the length of queues at the bank, and additional rumours whispered by neighbours. Each source is ‘noisy’ in the sense of giving certain signals with positive probability no matter what is the true state of the bank, and it may be unclear which source is to be preferred.
Another example is given by laboratory tests for the diagnosis of an illness; such tests may not be entirely one hundred percent reliable if they indicate false positives or false negatives. Imagine a patient exhibiting symptoms that are consistent with three possible diseases, stumping a medical staff worthy of television’s Dr House. Two medical laboratory tests are available, which turn a chemical into one of two colours, depending on which disease is the true one. These are exhibited in
Figure 1; for example, in test (a), if the true disease is disease B then the chemical will turn red with probability
and green with probability
. Unfortunately, no matter what is the true state, and under both tests, no matter which colour appears there will still remain some uncertainty as to which of A, B, or C is the true case. Given the choice of conducting only one of these tests, which should the medical staff choose? Which one is more valuable from the perspective of the information it supplies?
To begin answering such questions rigorously, one needs to decide on a criterion for measuring the value of information provided by ‘noisy signals’. Suppose that is a (finite) set of possible states. An experiment (also called an information structure, or signal structure in the literature) over is a pair where S is a finite set of signals and is a stochastic matrix, with the rows representing the states and the columns the signals.
It is assumed that every signal
s has positive probability under at least one state. This is without loss of generality; if it does not hold a signal that has zero probability of being received can simply be removed from the set of signals. When the true state is
the decision maker receives signal
s with probability
. The two matrices depicted in
Figure 1 satisfy the definition, with the state space being
in both cases and pairs of colours serving as the signal spaces.
A decision maker with a prior distribution
over the the states can use an experiment
to arrive at a posterior distribution following receipt of a signal in
S. In detail, given prior
, the total probability of seeing
under
is
. Then the posterior probability distribution
conditional on signal
s is calculated by Bayes’ Rule such that for each state
:
Denote the collection of all experiments over by . An ordering of will be termed a value of information ordering. The intention is that if then has greater value as a source of information for discovering the true state than . Many orderings are possible; the challenge is to identify an ordering that captures our intuition regarding when one experiment truly provides more informational value than another.
4.1.2. The Blackwell Ordering of Experiments
David Blackwell ([
14,
15]), in seminal work that has had a significant impact on the literature, suggested a (partial) ordering of experiments using a decision theoretic criterion. Let
be a decision making problem over the state space
; that is, the utility function is given as
. We assume that all decision makers are utility maximisers. Hence if a decision maker knows that the state of the world is
the action chosen will be
.
If the decision maker cannot observe the state directly but has access to an experiment that supplies a signal s conditional on the state, the best that the decision maker can do is implement a policy, by which we mean a mixed strategy conditional on the signal, that is, . Denote the collection of all possible policies by .
The situation as described now is that there is a mapping and a mapping . Composition of these mappings yields , where denotes the probability of choosing an action conditional on the true state being .
A decision maker with a prior belief
faced with a decision making problem
can calculate the expected payoff of a policy
, contingent on information structure
, by way of
and then select an optimal policy
. The payoff of
for decision maker
with prior
using
is then given by
Suppose that we wish to remove the dependence on the prior in the above reasoning. For a pair
declare that
is
subjectively more informational for decision problem
than
if
for
all priors
.
Next, to get an interpersonally objective ordering, define to be more Blackwell informative than , denoted , if is subjectively more informational than for every possible decision problem . This value-of-information ordering is the Blackwell ordering.
The nice aspect of the Blackwell ordering is that it is ‘objective’ in the sense that if then every rational decision maker will agree that has greater informational value than no matter what decision making problem is at issue and no matter what prior distribution is assumed. The drawback to this very strong requirement of ‘objective unanimity’ amongst decision makers is that we pay the heavy price of working with a very partial ordering (in fact it is only a pre-order). In other words, not all experiments can be ordered by the Blackwell ordering; given two experiments and one may discover that they are incomparable under this ordering. In that case the Blackwell ordering is silent regarding the question which experiment provides greater value of information.
To see this, let
,
, and let
be the matrix of
Figure 1a, where
and
, while
is the matrix of
Figure 1b, where
and
. Alongside these, define action set
and payoff functions
and
such that
, with payoff zero for all other combinations of states and actions for both
and
. Let
and
be priors over
.
Then it is fairly straight-forward to see that, for sufficiently small , a decision maker facing the decision making problem characterised by payoff and prior will prefer to . This is because in this situation under both and , receiving signal instructs optimally choosing action and signal instructs optimally choosing . Taken together, the optimal expected payoff from is higher than that from . By similar reasoning, exactly the reverse holds for payoff and prior : here, the optimal expected payoff from is higher than that from . Hence and are incomparable in the Blackwell ordering.
4.1.3. Entropy and the Value of Information for Investors
The partiality of the Blackwell ordering has motivated several researchers to seek an extension of the Blackwell ordering to a total ordering of all experiments. One suggested completion, due to Cabrales, Gossner, and Serrano [
16], relates experiments to entropy, investments, and measures of risk.
Let
. Since
is a distribution, one can calculate its entropy
. Next, recall that by Equation (
2) each signal
s defines a posterior distribution
. Each such posterior distribution, in turn, has its own entropy
. One can then define from this the weighted mean posterior entropy
, where, as before,
is the total probability of signal
s, given the prior
.
Using these concepts, Reference [
16] defines what they term the
entropy informativeness of
(relative to prior
) to be the difference between the entropy of the prior and the weighted mean posterior entropy:
If we attach to entropy the interpretation that it measures uncertainty, a reduction in entropy corresponds to reduction in uncertainty. Hence
may be considered a measure of the increase in information supplied by the experiment
relative to the base-line information in the prior
, as given by the reduction in uncertainty quantified in the difference in entropy measurement expressed in Equation (
3).
It can readily be shown that
is always a positive real number. Since the real numbers have a natural ordering, and for fixed
we may regard
as mapping each
to a real number, it follows immediately that
defines a total ordering of experiments. To see further that the entropy informativeness ordering extends the Blackwell ordering, consider the following economic interpretation of
put forth in Reference [
16].
Let the set of states of the world
be identified with a set of integers
. From here on, fix a prior
. In the model, each decision maker is identified with a concave and twice continuously differentiable utility function
u for money. Every decision maker is also endowed with an initial wealth level
w. Each pairing of
u and
w determines an
Arrow-Pratt coefficient of relative risk aversion, defined as
It will further be assumed that all decision makers satisfy the property of having
increasing relative risk aversion (IRRA), which in this context can be taken to mean that
is non-decreasing as a function of
w. In Reference [
16] it is also assumed that all decision makers are
ruin averse, meaning that it always holds that
. The collection of utility functions satisfying all of these properties is denoted
.
An asset is defined to be an element , interpreted as meaning that at each realised state k the asset b pays . An asset is a no-arbitrage asset if . Denote the set of no-arbitrage assets by ; this is the set of investment opportunities presented to the decision makers.
When a decision maker with initial wealth w chooses investment and state k is realised, the updated wealth becomes . Since includes , the vector in consisting of zeros in every position, decision makers always have the option of inaction, guaranteeing that they can maintain wealth w. We do, however, impose the constraint that bankruptcy (the possibility of negative wealth) is not allowed.An investment b is termed feasible at wealth w when in every state k. Denote by the set of investment opportunities that are feasible at wealth w.
Just before deciding which investment opportunity to pick, each decision maker is offered the opportunity to purchase experiment
at price
. The question posed here is what price the decision maker would be willing to pay for the information supplied by
. Note first that the expected utility of
b for a decision maker with utility
u, initial wealth
w, and prior belief
(without the benefit of the information supplied by any experiment) is
Since we have supposed that decision makers are risk averse and that bankruptcy is forbidden, the optimal choice of Equation (
4), absent any additional information, is inaction. Hence,
If
is made available, then the expected payoff increases to
where
is the probability of seeing
s and
is the posterior probability distribution over the states upon receipt of
s; in other words,
is the weighted average of the expected utilities of the posterior distributions, with weighting given by the probabilities of seeing the signals. It follows that the gain from making use of
is
and that if
is offered to be purchased at price
the the decision maker will
accept the offer if
and
reject it otherwise.
Define investment dominates if for every wealth w and price , if is rejected by all decision makers with utility then is also rejected by all those same decision makers.
Theorem 8 ([
16]).
For each prior π, experiment investment dominates experiment if and only if . A sketch of the proof idea for Theorem 8 is as follows. As a first step, Reference [
16] shows that the utility function
for all wealth levels
w is representative for investment dominance in the sense that
investment dominates
if and only if for every
w and
a decision maker with the ln utility function who rejects
also rejects
. They then show that a decision maker with logarithmic utility will determine that
investment dominates
precisely when the entropy informativeness of
dominates that of
, as defined in Equation (
3).
It is a corollary of Theorem 8 that if is more Blackwell informative than then for all . This is because implies that no matter what the utility function, the prior, or the decision making problem, a decision maker using can obtain greater expected payoff than with ; hence certainly investment dominates .
Putting it all together, the Cabrales–Gossner–Serrano entropy ordering is a value of information ordering of experiments that extends the Blackwell informativeness ordering to a total ordering. There is, however, one drawback to the entropy ordering. By definition, measures the reduction in entropy from the entropy of the prior to the expected entropy of the generated posteriors. If a different prior is used, the entropy reduction measured by may be very different. Does this mean that the entropy ordering is unavoidably sensitive to the prior being used for this measurement?
Reference [
16] answers this question in the affirmative: there are examples in which the choice of a different prior leads to different orderings of
(although of course all of those different orderings extend the Blackwell ordering). Even worse, the paper proves that there exists no index that orders information structures that is both compatible with investment dominance and independent of the agent’s prior. (Cf. [
17]) In other words, almost every one of the infinite number of possible priors defines a different ordering.
4.2. Rational Inattention
In the most ideal presentation of the classical theory of economies composed of perfectly rational agents, each of whom is a homo economicus, all relevantly available economic information is immediately known and analysed by all agents, who instantaneously select optimal actions in response to attain economic equilibria. The implications of such a theory are many; amongst them are that business cycles are an impossibility, since prices and wages react instantaneously and optimally to changes in economic conditions and supply and demand. As a result, employment and output always remain at optimal levels with respect to objective economic fundamentals.
This view is challenged by Keynesian economic theories. One of the features of such theories is the assertion of ’price stickiness’, which can be presented as an assertion that prices and wages react with considerable delay to changes in underlying economic fundamentals. Since delays cause prices and wages to be suboptimal, economies are slow to attain market clearing equilibria and hence undergo periods of underperformance in contrast to what would be expected by a seamless model of continuously optimising agents.
An oft-repeated critique of the price stickiness assumption in Keynesian models is that it is inserted into formal economic models ad hoc, without micro-foundational justification. To contend with this, several explanations have been proffered in the literature. These include assumptions that agents may experience signal-extraction difficulties in distinguishing movements in aggregate prices and wage movements in the specific prices they encounter in transactions. Other theories seek explanations in models of behavioural deviations from full rationality and in bounded rationality models.
We focus here on the
rational inattention model, initiated in a seminal paper due to Christopher Sims ([
18]. (Nearly all of the material presented in this section is taken either from Reference [
18] or from Reference [
19].) This model begins with the observation that people react sporadically and imperfectly to the information they receive. Even individuals who actively read charts on financial data often fail to take actions based on all the information they have seen, and typically react only when data strike them as saliently unusual. In contrast, one would expect fully rational economic agents to be making fine adjustments to their spending schedules, investment portfolios, and other economic activities on a continual basis given any received information on changes in market parameters, no matter the amplitude of the changes.
The explanation given to this by the rational inattention model is that the benefits of continuous adjustment are slight relative to the extent of attention required to implement them; people have more important things to think about day to day and moment to moment than the tiny gains they might attain from continuous monitoring of economic variables. Rational inattention presumes that the ability of individuals to translate external data into actions is constrained by a finite Shannon channel capacity inherent to humans for information processing.
To begin modelling this mathematically, recall the definition of
, the mutual information(Not to be confused with the similar notation of Equation (
3), which refers to entropy informativeness in a different context.) between two probability distribution functions
X and
Y. This is the difference between the expected value of the log of the joint probability distribution of
X and
Y and the sum of the two expected values of the logs of the marginal probability distributions of
X and
Y. The mutual information measure is crucial for the development of the concepts related to Shannon channel capacity.
A channel, in Shannon’s theory, is a description of possible inputs (in terms of probability distributions) and of conditional distributions of inputs given outputs. The exact distribution of the inputs, however, may be chosen judiciously. More to the point, if one chooses the distribution of the inputs to maximise the mutual information between input distribution and output distribution, the channel is transmitting information at maximal capacity.
The coding theorem states that inputs can be coded in such a way that information can always be transmitted through the channel with arbitrarily low error rates and arbitrarily close the the channel capacity transmission rate. This does, however, come with a price: coding almost always introduces delay.
Suppose our objective is to minimise
, where
Y is the action to be chosen by the economic agent and
X is a random variable can only be observed through a finite-capacity channel, with capacity
. What is the optimal choice for the conditional distribution of
subject to the requirement that the information flow about
X required to generate
Y is finite? Formally, supposing that
is given as the probability density function of
X and that we seek the optimal
associated with
Y, the mathematical task is
Reference [
18] then shows that if
X is distributed
then the optimal form of
q is Gaussian and
X and
Y are jointly normal. In that case, this is equivalent to observing
X with an error.
To take one economic example to which this can relate, consider a permanent income calculation. The permanent income hypothesis, first proposed by Milton Friedman, postulates that individuals determine their consumption at each point in time t not only in relation to present wealth at time t but also taking into consideration expected income in future time periods. This has strong implications for savings and consumption rates. In particular, it predicts consumption smoothing, spreading out of spending over time.
In a simple formal model of permanent income calculation, postulate an infinitely-lived agent who maximises his or her life-time utility from the consumption of a stream of consumption (with consumption at time t denoted by ). The utility function of consumption is presumed to be .
At the start of each period t, the agent has available from which to consume , leaving , which then grows each period in accordance with a gross interest rate R. We may also suppose that the agent has labour or endowment income available at time t, which is denoted . The utility of consumption in future periods is discounted at a fixed rate per time period.
The objective of the agent is
So far this is a standard permanent income calculation. The text-book solution to the model as presented up to here is
Suppose now that distributions are Gaussian, , and i.i.d. for all t, where and stand for mean values. And to relate it all to rational inattention, suppose that denotes the information available at time t, with all variables known at time t measurable with respect to . Consistent with previous model assumptions, we suppose that updating from to occurs with upper bound channel capacity .
Sims [
18] shows that under these assumptions the agent will behave as if observing noisy state measurements; more precisely the consumption sequence
will satisfy
where
such that
is an i.i.d. normal noise factor.
Comparing Equations (
5) and (
6), the latter adds the noise term
that is not present in Equation (
5) and is not inherent to the original variables of the model. The extra noise can have significant cumulative effects on consumption and savings schedules.
Rational inattention models have been studied in much greater generality than those involving quadratic objective functions and Gaussian distributions (see References [
20,
21,
22,
23]). This literature is too broad to be fully surveyed here; we will provide only a very brief summary of one of its main implications.
The solutions to the models in the above papers often imply a discrete distribution for agent actions, even when the external uncertainty is continuously distributed. In the context of product prices, which is most interesting because of its implications for disputes regarding the ‘stickiness’ of prices, the results point to prices that jump amongst a finite set of values when they change, which is consistent with data observations but is not fully explained by models other than rational inattention models.
If price setters who are rationally inattentive adapt to available information only occasionally and select only from a finite set of possible prices, then price setters are far from the image of agents who continuously and optimally react to every revelation of economic information. In particular, their response to monetary policy changes will differ from that postulated in classical rational expectations models.