Next Article in Journal
Entropy-Complexity Characterization of Brain Development in Chickens
Previous Article in Journal
Entropy Production in Pipeline Flow of Dispersions of Water in Oil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Information-Theoretic Bounded Rationality and ε-Optimality

by
Daniel A. Braun
1,* and
Pedro A. Ortega
2,*
1
Max Planck Institute for Biological Cybernetics, Max Planck Institute for Intelligent Systems, Spemannstrasse 38, Tübingen 72076, Germany
2
GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania, Philadelphia, PA 19104, USA
*
Authors to whom correspondence should be addressed.
Entropy 2014, 16(8), 4662-4676; https://doi.org/10.3390/e16084662
Submission received: 19 July 2014 / Revised: 11 August 2014 / Accepted: 15 August 2014 / Published: 21 August 2014

Abstract

:
Bounded rationality concerns the study of decision makers with limited information processing resources. Previously, the free energy difference functional has been suggested to model bounded rational decision making, as it provides a natural trade-off between an energy or utility function that is to be optimized and information processing costs that are measured by entropic search costs. The main question of this article is how the information-theoretic free energy model relates to simple ε-optimality models of bounded rational decision making, where the decision maker is satisfied with any action in an ε-neighborhood of the optimal utility. We find that the stochastic policies that optimize the free energy trade-off comply with the notion of ε-optimality. Moreover, this optimality criterion even holds when the environment is adversarial. We conclude that the study of bounded rationality based on ε-optimality criteria that abstract away from the particulars of the information processing constraints is compatible with the information-theoretic free energy model of bounded rationality.

1. Introduction

Decision making under uncertainty is studied by means of optimal actor models in a broad spectrum of sciences with remarkably different historical roots, like economics, artificial intelligence research, biology, sociology, and even fields, like legal studies, ethics and philosophy [13]. Usually, when we talk about decision making, we imagine a human mind (for example, a chess player) that ponders a variety of possible options for action, deliberates about their potential outcomes and finally picks one of these actions for execution; namely, the one that is expected to have the most beneficial consequences. Recently, the same paradigm has also been extended to model sensorimotor integration and control [46], where consequences of actions can be anticipated by implicit learning processes. Crucially, however, in either case, classic decision-theoretic models [3,7] ignore the details of the underlying cognitive or implicit processes preceding a decision by simply assuming that these processes optimize a performance criterion. This ignorance is both boon and bane, as, on the one hand, it allows the statement of many general results that do not depend on the details of the decision making process, but on the other hand, the often unrealistic assumption of perfect optimization limits the applicability of classic decision theory.
Classic decision theory rests on two conceptual pillars: the notion of probability and the notion of utility. Their intertwined occurrence may be best understood on the basis of the concept of lotteries. A lottery is defined as a set of N different outcomes oj Entropy 16 04662f1 each of which can occur with a respective probability P(oj) where j = 1, …, N. We can imagine a lottery as a roulette wheel or a gamble where we obtain a prize oj with probability P(oj) that has a subjective utility U(oj) for the decision maker. The compound value of the lottery can then be determined by the expected utility E[U] = ∑j P(oj)U(oj), which is commonly used as the standard performance criterion in decision making. The concept of expected utility was first axiomatized by Neumann and Morgenstern [8]. In their axiomatic system, Neumann and Morgenstern [8] define a binary preference relation ≻ over the set of probability distributions ℘ defined over the set of outcomes Entropy 16 04662f1. If (and only if) this binary relation satisfies the axioms of completeness, transitivity, continuity and independence, then there exists a function U : Entropy 16 04662f1 ↦ ℝ, such that:
P P j P ( o j ) U ( o j ) > j P ( o j ) U ( o j ) ,
where P, P′ ∈ ℘. This utility function U is unique up to a positive affine transform.
When designing optimal actors, most designers use the Neumann and Morgenstern [8] conception of probability and utility; see for example Russell and Norvig [2]. Such optimal actors are typically equipped with a probabilistic model of the world P(oj|ai), where ai Entropy 16 04662f2 is an action that leads to consequence oj with probability P(oj|ai). The decision maker can assess the expected utility of each action as E[U|ai] = ∑j P(oj|ai)U(oj). Thus, the probabilistic model of the world defines a set of M different lotteries indexed by ai, where i = 1, …, M. The decision maker can compare the expected utilities of all the lotteries and choose the one with the highest expected utility, such that:
a m a x = arg max i E [ U a i ] .
However, there are at least two important assumptions. First, the decision maker requires an accurate probability model. Second, the decision maker requires enough computational resources to find the best lottery. What happens if one of the two assumptions is violated? This question has spurred research on bounded rationality where decision makers have limited knowledge and bounded computational resources.
The modern study of bounded rationality began with Herbert Simon [911] and has since been continued in economics [1214], game theory [1517], industrial organization [18] and political science [19], but also in psychology [20,21], cognitive science [2224], computer science and artificial intelligence research [2527]. One of the fundamental questions faced by bounded rationality models is whether they should attend to the actual physical or cognitive processes underlying decision making or whether it is also possible to gain a more general understanding of bounded rational decision making by abstracting away from the details of the actual decision making process. While the first approach is taken, for example, by the new field of neuroeconomics relating decision making processes to anatomical structures [28,29], one of the simplest approaches in the second tradition is the concept of ε-optimality [30], where the decision maker does not search for a single best action amax, but for any action from a set of permissible actions Entropy 16 04662f3 whose expected utility deviates at most by ε > 0 from the optimal expected utility of amax, such that:
A ɛ = { a i A : E [ U a i ] E [ U a m a x ] - ɛ } .
The main question of this article is how to relate this simple model of bounded rationality to the information-theoretic bounded rationality model discussed in Ortega and Braun [3134] that we recapitulate in the next section.

2. Methods

Most models of decision making ignore information processing costs and assume that the decision maker can simply handpick the action that yields the highest (expected) utility. Presupposing that there is a unique maximum, this would correspond to a deterministic strategy as in Equation (1). In general, however, a decision maker with limited information processing capabilities might be unable to handpick the best option with certainty. Such a bounded rational strategy must therefore be described by a probability distribution P(ai) reflecting this uncertainty. Information-theoretic models of bounded rational decision making quantify the cost of information-processing by entropic measures of information [1517,3135] and are closely related to softmax-choice rules that have been extensively studied in the psychological and econometric literature, but also in the literature on reinforcement learning and game theory [3642]. In [3134], Ortega and Braun discuss an information-theoretic model of bounded rational decision making where information processing costs are quantified by the relative entropy with the idea that information processing costs can then be measured with respect to changes in the choice strategy P(ai).
Let us assume that the initial strategy of the decision maker can be described by a probability distribution P0(ai). This could include the uniform distribution over ai as a special case, if the decision maker has no prior preferences between different actions. Next, this decision maker is exposed to a utility function V(ai), which includes the case of V(ai) = E[U|ai], implying that the decision maker does not have to compute the expectation values, but the expectation values are simply given. Ideally, the decision maker will arrive at the new distribution P(ai) = δai,amax. The underlying computation can be imagined as a search process that reduces the uncertainty over the action by DKL[P||P0] = ∑i P(ai) log [P(ai)/P0(ai)]. In general, such a search is costly, and the decision maker might not be able to afford such a stark reduction in uncertainty. Assuming a price 1/α for 1bit of information gain, we can then design a bounded optimal decision maker that trades off gains in utility resulting from changes in P(ai) against the search costs that these changes imply, such that, overall, the decision maker optimizes a free energy difference in utility gains and information costs:
Δ F [ P ˜ ] = { i P ˜ ( a i ) V ( a i ) - 1 α i P ˜ ( a i ) log  P ˜ ( a i ) P 0 ( a i ) } ,
where the maximizing distribution P = arg max ΔF[] is the equilibrium distribution:
P ( a i ) = 1 Z α P 0 ( a i ) e α V ( a i ) ,             where     Z α = i P 0 ( a i ) e α V ( a i ) ,
and represents the choice probabilities after deliberation. Note that the free energy difference ΔF[] can be expressed as ΔF[] = F1[] − F0, with the free energies:
F 1 [ P ˜ ] = i P ˜ ( a i ) Φ 1 ( a i ) - 1 α i P ˜ ( a i ) log  P ˜ ( a i ) F 0 = i P 0 ( a i ) Φ 0 ( a i ) - 1 α i P 0 ( a i ) log P 0 ( a i ) ,
where P0(ai) = exp (α0(ai) − F0)) and V(ai) = Φ1(ai) − Φ0(ai). Hence, the utility function V(ai) expresses changes in value Φ, that are gains or losses with respect to the status quo. In the case of inference, the utility function is given by a negative log-likelihood and measures informational surprise. The temperature parameter corresponds then to a precision parameter in exponential family distributions. Casting the problem of acting as an inference problem has been previously discussed in [4348]. The certainty-equivalent value VCE under strategy P can be determined from the same variational principle:
V C E = max P ˜ { i P ˜ ( a i ) V ( a i ) - 1 α i P ˜ ( a i ) log P ˜ ( a i ) P 0 ( a i ) } = 1 α log  ( i P 0 ( a i ) e α V ( a i ) ) = 1 α log  Z α .
For the two different limits of α, the value and the equilibrium distribution take the asymptotic forms:
α + 1 α log  Z α = max i V ( a i ) P ( a i ) = δ a i , a m a x ( perfectly rational ) α 0 1 α log  Z α = i P 0 ( a i ) V ( a i ) P ( a i ) = P 0 ( a i ) ( irrational )
It can be seen that a perfectly rational agent with α → ∞ is able to handpick the optimal action, which is a deterministic policy in the case of a unique optimum, whereas finitely rational agents have stochastic policies with a non-zero probability of picking a sub-optimal action.
In the case that V(ai) are not simply given, the decision maker has to compute the expectation values herself from the prior P0(oj|ai) and the utility U(oj), such that search costs have to be considered both for ai and oj. The variational problem can then be formulated as a nested expression [32,34,49]:
arg max P ˜ i P ˜ ( a i ) [ - 1 α log  P ˜ ( a i ) P 0 ( a i ) + j P ˜ ( o j a i ) [ U ( o j ) - 1 β log  P ˜ ( o j a j ) P 0 ( o j a i ) ] ] .
If we assume that the estimation of the expected utilities V(ai) is much cheaper than the calculation of the optimal action, then the price 1/β should be much lower than 1/α, such that αβ, implying that we can simply obtain samples from P0(oj|ai) for our computation of the expectation, but that it is much more difficult to compute ai, because we cannot simply rely on our prior P0(ai). The two-part solution to the nested variational problem is given by:
P ( o j a j ) = 1 Z β ( a i ) P 0 ( o j a i ) exp  ( β U ( o j ) )
with the normalization constant: Zβ(ai) = ∑j P0(oj|ai) exp (βU(oj)) and:
P ( a i ) = 1 Z α β P 0 ( a i ) exp  ( α β log  Z β ( a i ) )
with the normalization constant: Z α β = i P 0 ( a i ) exp  ( α β log  Z β ( a i ) ). The perfectly rational decision maker is obtained in the limit α → ∞ and β → 0, that is:
P ( o j a i ) = P 0 ( o j a i ) P ( a i ) = δ a i , a m a x .
The computational complexity of the information-theoretic model of bounded rational decision making can also be interpreted in terms of a sampling complexity [50,51]. In particular, Equation (4) can be interpreted under a rejection sampling scheme where we want to obtain samples from P(ai), but we are only able to sample from the distribution P0(ai). In this scheme, we generate a sample ai ~ P0(ai) and then accept the sample if:
u e α V ( a i ) e α T ,
where u is drawn from the uniform Entropy 16 04662f4[0; 1] and T is the acceptance target value with T ≥ maxi V(ai). Otherwise, the sample is rejected. The efficiency of the sampling process depends on how many samples we will need on average from P0 to obtain one sample from P. This average number of samples from P0 needed for one sample of P is given by the mean of a geometric distribution:
S a m p l e s ¯ = 1 i P 0 ( a i ) e α V ( a i ) e α T = e α T Z α .
It is important to note that the average number of samples increases exponentially with increasing the rationality parameter, such that:
e α T Z α α e α ( T - V ( a m a x ) ) P 0 ( a m a x ) ) ,
where amax = arg max V(x) and T > maxi U(ai).
This interpretation in terms of sampling complexity can also be extended to Equation (6), where the decision maker has to estimate the expected utilities from samples. In line with Equation (8), we should accept a sample ai ~ P0(ai) if it fulfils the criterion:
u e α 1 β log  Z β ( a i ) e α T = [ Z β ( a i ) e β T ] α β ,
where u ~ Entropy 16 04662f4[0; 1] and T 1 β log  Z β ( a i ) . From Equation (11), we know that the ratio Zβ(ai)/eβT can be interpreted as an acceptance probability; in this case, the acceptance probability of θ ~ P0(θ). Thus, in order to accept one sample from x, we need to accept α β consecutive samples of θ, with acceptance criterion:
u e β U ( x , θ ) e β T
with u ~ Entropy 16 04662f4[0; 1] and T as set above.

3. Results

Here, we investigate the question of how close a bounded rational decision maker gets to the optimal (expected) utility achieved by the perfectly rational decision maker. Since we assume that the strategy of a bounded rational decision maker is inherently stochastic and can be described by a probability distribution according to Equation (4), we can only compare some statistical measure of the performance of the bounded rational decision maker to the performance of the perfectly rational decision maker. In the following, we will consider the expected performance.

Theorem 1 (ε-Optimality).

Given a bounded rational decision maker with information cost 1/α that optimizes (3), one can bound the expected performance of this decision maker from below within an ε-neighborhood of the optimal performance Vmax = maxi E[U|ai] of the perfectly rational decision maker, such that:
i P ( a i ) V ( a i ) V m a x - ( - 1 α log  P 0 ( a m a x ) ) = : ɛ .

Proof

The certainty-equivalent value VCE under the bounded rational strategy P(ai) is given by:
V C E = 1 α log  i P 0 ( a i ) e α V ( a i ) = i P ( a i ) V ( a i ) - 1 α i P ( a i ) log  P ( a i ) P 0 ( a i ) 0 ,
where P ( a i ) = 1 Z P 0 ( a i ) e α V ( a i ). From the positiveness of the Kullback–Leibler divergence, it follows that:
i P ( a i ) V ( a i ) 1 α log  i P 0 ( a i ) e α V ( a i ) i P ( a i ) V ( a i ) 1 α log P 0 ( a m a x ) e α V m a x i P ( a i ) V ( a i ) V m a x + 1 α log  P 0 ( a m a x )
As a corollary, we can conclude for the special case of uniform prior P0(ai) = 1/M that the ε-bound is given by ε = 1 log M. Conversely, given an ε > 0, there exists an α ¯ = log  M ɛ, such that for αᾱ, any decision taken yields a utility within epsilon of the optimum.
In the case of (6), the bounded rational decision maker has to determine the expected utilities by sampling, and the above lower bound cannot be guaranteed anymore. Instead of the expected utilities V(ai) = E[U|ai], such a decision maker optimizes the “distorted” certainty-equivalent value:
V ˜ ( a i ) = 1 β log  Z β ( a i ) = 1 β log  j P 0 ( o j a i ) e β U ( o j ) ,
with Zβ(ai) from Equation (7). Only for β → 0, the expectation value (ai) → E[U|ai] is retained. Due to 1 β log  Z β ( a i ) E [ U a i ], such a decision maker with positive β will overestimate the certainty-equivalent value for sub-optimal actions ai. For small β ≪ 1, the certainty-equivalent value can be approximated by a Taylor expansion in β:
1 β log  j P 0 ( o j a j ) e β U ( o j ) E P 0 ( o j a i ) [ U ( o j ) ] + β 2 VAR P 0 ( o j a i ) [ U ( o j ) ] + O ( β 2 ) ,
where O(β2) are higher-order cumulants that can be neglected. Due to Theorem 1, we have:
i P ( a i ) [ 1 β log  j P 0 ( o j a i ) e β U ( o j ) ] V m a x + 1 α log  P 0 ( a m a x ) ,
from which we can conclude for the limit β ≪ 1 and αβ that:
i P ( a i ) V ( a i ) V m a x - ( - 1 α log  P 0 ( a m a x ) + β 2 E P ( a i ) [ VAR P 0 ( o j a i ) [ U ( o j ) ] + O ( β 2 ) ] ) = : ɛ .
For such a bounded rational decision maker, the error bound is increased by higher order cumulants.
If all of the (expected) utilities V(ai) are very similar in magnitude, it requires a high rationality parameter α to differentiate between them. A tighter ε-bound in α can be given, if we assume that there is an interval V(ai) ∈ [Vmin; Vmax] and that all the utilities are discriminable by at least one “utile”, such that for any choice ai and ak, we have |V(ai)−V(ak)| ≥ 1, which is the case, for example, when utilities reflect rank.

Theorem 2 (ε-Optimality for rank utilities).

Given a bounded rational decision maker with information cost 1/α that optimizes Equation (3) and assuming a uniform prior P0(ai) = 1/M, bounded (expected) utilities V(ai) ∈ [Vmin; Vmax] for all i and |V(ai) − V(ak)| ≥ 1 for every pair (i, k), one can bound the expected performance of this decision maker from below within an ε-neighborhood of the optimal performance Vmax = maxi E[U|ai] of the perfectly rational decision maker, such that:
i P ( a i ) V ( a i ) V m a x - ( e - α ( V m a x - V m i n ) ) = : ɛ .

Proof

We express the choice probability P(ai) derived from Equation (4) under uniform prior P0(ai) = 1/M as:
P ( a i ) = e α V ( a i ) k e α V ( a k ) = ( 1 δ ) V ( a i ) k ( 1 δ ) V ( a k ) ,
where we have introduced the variable δ = exp(−α). We can then express the expected performance as:
i P ( a i ) V ( a i ) = 1 k ( 1 δ ) V ( a k ) i ( 1 δ ) V ( a i ) V ( a i ) ( ( 1 δ ) V m a x k ( 1 δ ) V ( a k ) ) V m a x + ( 1 - ( 1 δ ) V m a x k ( 1 δ ) V ( a k ) ) V m i n V m a x - ( 1 - ( 1 δ ) V m a x k ( 1 δ ) V ( a k ) ) ( V m a x - V m i n ) ,
where the inequality is obtained by taking out the largest summand and then finding a lower bound for the remaining terms. The second summand in the last equality can be further delimited as:
1 - ( 1 δ ) V m a x k ( 1 δ ) V ( a k ) = 1 - 1 k δ V m a x - V ( a k ) δ ,
since we can limit k δ V m a x - V ( a k ) k δ k 1 1 - δ from |V(ai) − V(ak)| ≥ 1 ∀i, k and the limit properties of the geometric series. Therefore, we have:
i P ( a i ) V ( a i ) V m a x - δ ( V m a x - V m i n ) .
As a corollary, we can conclude in the case of minimal interval size [Vmin; Vmax] = [Vmin; Vmin +M] that the performance bound is given by ∑i P(ai)V(ai) ≥ Vmaxeα M. Conversely, given an ε > 0, there exists an α ¯ = log V m a x - V m i n ɛ, such that for αᾱ, any decision made yields a utility within epsilon of the optimum.

4. Adversarial Environments

So far, we have considered stochasticity in action selection to arise due to limited computational power, even in the absence of any uncertainty in the environment. Naturally, in this setting, stochastic choice yields less (expected) utility than deterministic choice of the best option, but the performance decrement can be bounded by ε. If, however, the environment is potentially adversarial, stochastic action selection can also be superior in terms of utility alone, since it does not allow the opponent to perfectly predict and thwart any deterministic action plan that the decision maker might have. In the following, we will discuss two different scenarios for decision making in adversarial environments, where the decision maker chooses between different actions ai Entropy 16 04662f2 with (expected) utility V(ai) = E[U|ai].

4.1. Unknown Action Set

In the first scenario, we assume that the decision maker starts by choosing a probability distribution P(ai) over actions ai Entropy 16 04662f2, and then, the environment chooses a subset Entropy 16 04662f5 ∈ ℘( Entropy 16 04662f2)\{} of permissible actions, where ℘( Entropy 16 04662f2) denotes the powerset. All actions that are not part of the subset are eliminated. Finally, the action ai is randomly determined from the set of permissible actions with their renormalized probabilities. The problem is to find the betting probability P(ai) such that we maximize our expected return; however, the expectation has to be taken over the unknown subset Entropy 16 04662f5 capriciously chosen by the opponent. This models a decision maker, who has to choose a generic hedging strategy by allocating resources to different alternatives, but where the rules of the game are only fully revealed after the choice is made. Formally, we want to choose the probability P(ai), such that the conditional expectation E[V(ai)| Entropy 16 04662f5] is as large as possible. Unsurprisingly, we cannot provide a deterministic optimal solution P(ai) = δ(aia*), since the environment could always eliminate a*. However, if we allow ourselves an arbitrarily small, non-zero performance loss ε > 0, then there is a way to assign probabilities P(ai), such that the conditional expectation is almost equal to the optimum, i.e., to the highest utility in the subset chosen by the opponent. This is precisely the result of the following theorem.

Theorem 3 (ε-Optimality in adversarial environments).

The expected utility achieved by a bounded rational decision maker that optimizes (3) lies within an ε-neighborhood of the optimal utility V m a x S = m a x a i S V ( a i ) in Entropy 16 04662f5 for any subset Entropy 16 04662f5 of possible actions selected by nature, such that:
1 a k S P ( a k ) a i S P ( a i ) V ( a i ) V m a x S - ( - 1 α log  P 0 ( a m a x S ) ) = : V m a x S - ɛ .

Proof

1 a k S P ( a k ) a i S P ( a i ) V ( a i ) = a i S P 0 ( a i ) e α V ( a i ) a k S P 0 ( a k ) e α V ( a k ) V ( a i ) = a i S P 0 ( a i ) a l S P 0 ( a l ) e α V ( a i ) a k S P 0 ( a k ) a l S P 0 ( a l ) e α V ( a k ) V ( a )
where P ( a i ) = 1 Z P 0 ( a i ) e α V ( a i ). We can then apply Theorem 1 to the expression in the last equality to find that:
1 a k S P ( a k ) a i S P ( a i ) V ( a i ) V m a x S + 1 α log P 0 ( a m a x S ) a k S P ( a k ) V m a x S + 1 α log  P 0 ( a m a x S )
where a m a x S = arg  max a i V S ( a i ).
As a corollary, we obtain in the case P 0 ( a i ) = 1 M
an ε-bound of ɛ = 1 α log  M.
Similarly, Theorem 2 holds for any chosen subset Entropy 16 04662f5, such that:
1 a k S P ( a k ) a i S P ( a i ) V ( a i ) V m a x S - ( e - α ( V m a x - V m i n ) ) = : ɛ .

4.2. Unknown Utility

In the second scenario of an adversarial environment, the agent chooses a distribution P0(ai) and the environment subsequently chooses V(ai) in an arbitrary fashion, such that, in general, the choice of V(ai) may depend on P0(ai). Once the V(ai) are revealed, the decision maker updates the choice strategy according to Equation (4). Importantly, the new distribution P(ai) is not used as a choice strategy to choose between the different V(ai) as in the previous theorems, but is only used in a later choice with new, yet unknown utilities. If we denote the trial number or time step by t and assume a trial-by-trial update:
P t + 1 ( a i ) = 1 Z t P t ( a i ) exp ( α V t ( a i ) ) ,
where the utilities Vt(ai) are bounded in each time step to lie within the unit interval, that is Vt(ai) ∈ [0; 1], then the expected performance of the decision maker can be bounded from below by:
t i P t ( a i ) V t ( a i ) log ( 1 + ɛ ) ɛ V m a x T - log  M ɛ ,
where ε = exp(α) − 1. This performance bound can be derived from a hedging analysis originally proposed by Freund and Shapire in a full information game where the decision maker learns about all possible utilities Vt(ai) in each time step [52,53]. In this case, the decision maker chooses between i different options with probability pi(t) = wi(t)/∑j wj(t), where the weights wi(t) are updated according to:
w i ( t + 1 ) = w i ( t ) ( 1 + ɛ ) V i ( t )
and where Vi(t) is the utility of option i at time t. It is straightforward to see that a bounded rational decision maker following Equation (4) is hedging, when acting according to Pt(ai) before receiving feedback Vi(t); that is, the bounded rational decision maker has a delay of one time step, as it is the distribution Pt+1(ai) that is bounded optimal for the utility Vi(t) under the prior Pt(ai).

5. Discussion and Conclusion

Information-theoretic bounded rationality can be viewed as a prescriptive model of optimal decision making when the decision maker can only afford a certain amount of information processing. Information processing is formalized as a change in probability distribution from a prior distribution representing an a priori choice strategy to a posterior distribution over actions after information processing has taken place. Such changes in distributions can be measured by the relative entropy between prior and posterior distribution and be related to actual physical state changes in thermodynamic systems [34], where the concept of energy is analogous to the concept of utility and computational costs are analogous to entropic costs that reduce the system’s capability to do work. This interpretation builds on previous work that has related computational and physical processes; see for example [54] for an overview. As discussed in the Methods, the cost of changing distributions can also be expressed in terms of complexity of sampling processes [50,51].
In this paper, we show that we can abstract away even further both from physical and computational processes when modeling bounded rational decision making with entropic information processing constraints. We show that the performance of information-theoretic bounded rational decision makers can be ε-bounded compared to the perfectly rational decision maker and that, therefore, information-theoretic bounded rationality naturally implies ε-optimality. In this sense, bounded rational decision making is strictly inferior to perfect rationality, which selects deterministically the best action. This, however, changes in adversarial environments. We discuss two scenarios. In the first scenario, the opponent can eliminate any non-empty subset of actions from the choice set after the decision maker has specified her strategy. Here, bounded rationality allows defining an ε-optimal performance criterion under any subset. In the second scenario, the opponent can arbitrarily select utilities for each action, and the agent responds with the bounded rational strategy with respect to the previous utilities. This scenario is equivalent to hedging and also comes with performance bounds, but in contrast to the previous setting, these bounds do not correspond to ε-optimality, since the difference between optimal and actual utility also depends on a multiplicative factor.
The concept of ε-optimality has been previously discussed in the economic literature, in particular within the context of game theory and the solution concept of ε-equilibria [55,56]. In particular, Fudenberg and Levine [57] have investigated the concept of ε-universal consistency in games where players learn a smooth best response to another player from observations. They could show that learning with a softmax-decision rule performs within an ε-bound of the best response with known frequencies of the opponent’s play. Importantly, the concept of ε-optimality extends the usual black box approach taken in perfect rationality models of economic decision making where the details of the reasoning process are ignored [30]. In ε-optimality models, the decision maker is assumed to make decisions that are (approximately) optimal; how these decisions are arrived at is largely ignored. The choice of the ε in such models is typically arbitrary. Here, we link the parameter ε quantitatively to the temperature parameter of information-theoretic bounded rationality, that is a Lagrange multiplier indicating the shadow price of changing the distribution representing the choice strategy.
Economic models of decision making are usually considered to be as if models. The fact that behavior is consistent with an optimality criterion does not imply that an actual optimization process causes this behavior. Similarly, we could consider the information-theoretic bounded rationality model as an as if model, where the decision maker behaves as if optimizing a trade-off between utility and information cost or as if optimizing utility under information processing constraints. In contrast, when engineering an optimal decision maker (for example, a planning algorithm in a robot), typically the utility function is provided by the engineer, and the action is selected by the system after an optimization process. Here, we can consider the information-theoretic bounded rationality model as an anytime search for the optimum that stops when resources run out. Most importantly, however, independent of whether one regards utility functions as causal for behavior or not, bounded rational decision making does not necessarily imply optimizing a constrained optimization problem that is more difficult to solve than the original unconstrained problem, but the decision maker can be regarded as optimizing utility until running out of resources, thereby implicitly optimizing the constrained problem.

Acknowledgments

This study was supported by the Deutsche Forschungsgemeinschaft, Emmy Noether Grant BR4164/1-1.

Author Contributions

Daniel A. Braun and Pedro A. Ortega conceived of and wrote the paper. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gintis, H. A Framework for the Unification of the Behavioral Sciences. Behav. Brain Sci 2006, 30, 1–61. [Google Scholar]
  2. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 1st ed.; Prentice-Hall: Englewood Cliffs, NJ, USA, 1995. [Google Scholar]
  3. Kreps, D.M. Notes on the Theory of Choice; Westview Press: Boulder, CO, USA, 1988. [Google Scholar]
  4. Trommershauser, J.; Maloney, L.T.; Landy, M.S. Decision making, movement planning and statistical decision theory. Trends Cogn. Sci 2008, 12, 291–297. [Google Scholar]
  5. Braun, D.A.; Nagengast, A.J.; Wolpert, D. Risk-sensitivity in sensorimotor control. Front. Hum. Neurosci 2011, 5. [Google Scholar] [CrossRef]
  6. Wolpert, D.M.; Landy, M.S. Motor control is decision-making. Curr. Opin. Neurobiol 2012, 22, 996–1003. [Google Scholar]
  7. Fishburn, P. The Foundations of Expected Utility; D. Reidel Publishing: Dordrecht, The Netherlands, 1982. [Google Scholar]
  8. Neumann, J.V.; Morgenstern, O. Theory of Games and Economic Behavior; Princeton University Press: Princeton, NJ, USA, 1944. [Google Scholar]
  9. Simon, H.A. Rational choice and the structure of the environment. Psychol. Rev 1956, 63, 129–138. [Google Scholar]
  10. Simon, H. Theories of Bounded Rationality. In Decision and Organization; McGuire, C.B., Radner, R., Eds.; North Holland Pub. Co.: Amsterdam, The Netherlands, 1972; pp. 161–176. [Google Scholar]
  11. Simon, H. Models of Bounded Rationality; MIT Press: Cambridge, MA, USA, 1984. [Google Scholar]
  12. Aumann, R.J. Rationality and Bounded Rationality. Games Econ. Behav 1997, 21, 2–14. [Google Scholar]
  13. Rubinstein, A. Modeling bounded rationality; MIT Press: Cambridge, MA, USA,, 1998. [Google Scholar]
  14. Kahneman, D. Maps of Bounded Rationality: Psychology for Behavioral Economics. Am. Econ. Rev 2003, 93, 1449–1475. [Google Scholar]
  15. McKelvey, R.D.; Palfrey, T.R. Quantal Response Equilibria for Normal Form Games. Games Econ. Behav 1995, 10, 6–38. [Google Scholar]
  16. Mckelvey, R.; Palfrey, T.R. Quantal Response Equilibria for Extensive Form Games. Exp. Econ 1998, 1, 9–41. [Google Scholar]
  17. Wolpert, D.H. Information Theory—The Bridge Connecting Bounded Rational Game Theory and Statistical Physics. In Complex Engineered Systems; Braha, D., Minai, A.A., Bar-Yam, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 262–290. [Google Scholar]
  18. Spiegler, R. Bounded Rationality and Industrial Organization; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
  19. Jones, B.D. Bounded Rationality Political Science: Lessons from Public Administration and Public Policy. J. Public Adm. Res. Theory 2003, 13, 395–412. [Google Scholar]
  20. Gigerenzer, G.; Selten, R. Bounded rationality: The adaptive toolbox; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
  21. Camerer, C. Behavioral Game Theory: Experiments in Strategic Interaction; Princeton University Press: Princeton, NJ, USA, 2003. [Google Scholar]
  22. Howes, A.; Lewis, R.; Vera, A. Rational adaptation under task and processing constraints: implications for testing theories of cognition and action. Psychol. Rev 2009, 116, 717–751. [Google Scholar]
  23. Janssen, C.P.; Brumby, D.P.; Dowell, J.; Chater, N.; Howes, A. Identifying Optimum Performance Trade-Offs Using a Cognitively Bounded Rational Analysis Model of Discretionary Task Interleaving. Top. Cogn. Sci 2011, 3, 123–139. [Google Scholar]
  24. Lewis, R.; Howes, A.; Singh, S. Computational rationality: Linking mechanism and behavior through bounded utility maximization. Top. Cogn. Sci 2014, in press. [Google Scholar]
  25. Lipman, B. Information Processing and Bounded Rationality: A Survey. Can. J. Econ 1995, 28, 42–67. [Google Scholar]
  26. Russell, S. Rationality and Intelligence. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Canada, 20–25 August 1995; Morgan Kaufmann: San Francisco, CA, USA, 1995; pp. 950–957. [Google Scholar]
  27. Russell, S.; Subramanian, D. Provably bounded-optimal agents. J. Artif. Intell. Res 1995, 3, 575–609. [Google Scholar]
  28. Glimcher, P.; Fehr, E.; Camerer, C.; Poldrack, R. Neuroeconomics: Decision Making and the Brain; Elsevier Science: Amsterdam, The Netherlands, 2008. [Google Scholar]
  29. Friston, K.; Schwartenbeck, P.; Fitzgerald, T.; Moutoussis, M.; Behrens, T.; Dolan, R.J. The anatomy of choice: Active inference and agency. Front. Hum. Neurosci 2013, 7. [Google Scholar] [CrossRef]
  30. Dixon, H. Some thoughts on economic theory and artificial intelligence. In Artificial Intelligence and Economic Analysis: Prospects and Problems; Moss, S., Rae, J., Eds.; Edward Elgar Publishing: Cheltenham, UK, 1992; pp. 131–154. [Google Scholar]
  31. Ortega, P.; Braun, D. A conversion between utility and information. Proceedings of the Third Conference on Artificial General Intelligence, Lugano, Switzerland, 5–8 March 2010; Atlantis Press: Paris, France, 2010; pp. 115–120. [Google Scholar]
  32. Ortega, P.A.; Braun, D.A. Information, utility and bounded rationality. In Artificial General Intelligence; Proceedings of the 4th International Conference on Artificial General Intelligence (AGI 2011), Mountain View, CA, USA, 3–6 August 2011, Schmidhuber, J., Thórisson, K.R., Looks, M., Eds.; Lecture Notes on Artificial Intelligence, Volume 6830; Springer: Berlin/Heidelberg, Germany, 2011; pp. 269–274. [Google Scholar]
  33. Braun, D.A.; Ortega, P.A.; Theodorou, E.; Schaal, S. Path integral control and bounded rationality. Proceedings of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France, 11–15 April 2011; pp. 202–209.
  34. Ortega, P.A.; Braun, D.A. Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A 2013, 469. [Google Scholar] [CrossRef]
  35. Wolpert, D.; Harre, M.; Bertschinger, N.; Olbrich, E.; Jost, J. Hysteresis effects of changing parameters of noncooperative games. Phys. Rev. E 2012, 85, 036102. [Google Scholar]
  36. Luce, R. Individual choice behavior; Wiley: Oxford, UK, 1959. [Google Scholar]
  37. McFadden, D. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics; Zarembka, P., Ed.; Academic Press: New York, NY, USA, 1974; pp. 105–142. [Google Scholar]
  38. Meginnis, J. A new class of symmetric utility rules for gambles, subjective marginal probability functions, and a generalized Bayesian rule. In 1976 Proceedings of the American Statistical Association, Business and Economic Statistics Section; American Statistical Association: Washington, DC, USA, 1976; pp. 471–476. [Google Scholar]
  39. Fudenberg, D.; Kreps, D. Learning mixed equilibria. Games Econ. Behav 1993, 5, 320–367. [Google Scholar]
  40. Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  41. Luce, R. Utility of gains and losses: Measurement-theoretical and experimental approaches; Erlbaum: Mahwah, NJ, USA, 2000. [Google Scholar]
  42. Train, K. Discrete Choice Methods with Simulation, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  43. Toussaint, M.; Harmeling, S.; Storkey, A. Probabilistic inference for solving (PO)MDPs; Technical Report; University of Edinburgh: Edinburgh, UK, 2006. [Google Scholar]
  44. Ortega, P.A.; Braun, D.A. A minimum relative entropy principle for learning and acting. J. Artif. Intell. Res 2010, 38, 475–511. [Google Scholar]
  45. Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci 2010, 11, 127–138. [Google Scholar]
  46. Tishby, N.; Polani, D. Information Theory of Decisions and Actions. In Perception-reason-action cycle: Models, algorithms and systems; Vassilis, H.T., Ed.; Springer: Berlin, Germany, 2011. [Google Scholar]
  47. Kappen, H.; Gómez, V.; Opper, M. Optimal control as a graphical model inference problem. Mach. Learn 2012, 1, 1–11. [Google Scholar]
  48. Vijayakumar, S.; Rawlik, K.; Toussaint, M. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference. Proceedings of Robotics: Science and Systems, Sydney, Australia, 9–13 July 2012; MIT Press: Cambridge, MA, USA, 2013. [Google Scholar]
  49. Ortega, P.A.; Braun, D.A. Free Energy and the Generalized Optimality Equations for Sequential Decision Making. Proceedings of the Tenth European Workshop on Reinforcement Learning, Edinburgh, Scotland, 30 June–1 July 2012.
  50. Ortega, P.A.; Braun, D.A. Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adap. Syst. Model 2014, 5, 269–274. [Google Scholar]
  51. Ortega, P.A.; Braun, D.A.; Tishby, N. Monte Carlo Methods for Exact & Efficient Solution of the Generalized Optimality Equations. Proceedings of IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–5 June 2014.
  52. Auer, P.; Cesa-Bianchi, N.; Freund, Y.; Schapire, R.E. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proceedings of IEEE 36th Annual Symposium on Foundations of Computer Science, Milwaukee, WI, USA, 23–25 October 1995; pp. 322–331.
  53. Freund, Y.; Schapire, R.E. A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. J. Comput. Syst. Sci 1997, 55, 119–139. [Google Scholar]
  54. Feynman, R.P. The Feynman Lectures on Computation; Addison-Wesley: Boston, MA, USA, 1996. [Google Scholar]
  55. Fudenberg, D.; Levine, D. The Theory of Learning in Games; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  56. Noam, N.; Roughgarden, T.; Éva, T.; Vazirani, V. Algorithmic Game Theory; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  57. Fudenberg, D.; Levine, D.K. Consistency and cautious fictitious play. J. Econ. Dyn. Control 1995, 19, 1065–1089. [Google Scholar]

Share and Cite

MDPI and ACS Style

Braun, D.A.; Ortega, P.A. Information-Theoretic Bounded Rationality and ε-Optimality. Entropy 2014, 16, 4662-4676. https://doi.org/10.3390/e16084662

AMA Style

Braun DA, Ortega PA. Information-Theoretic Bounded Rationality and ε-Optimality. Entropy. 2014; 16(8):4662-4676. https://doi.org/10.3390/e16084662

Chicago/Turabian Style

Braun, Daniel A., and Pedro A. Ortega. 2014. "Information-Theoretic Bounded Rationality and ε-Optimality" Entropy 16, no. 8: 4662-4676. https://doi.org/10.3390/e16084662

Article Metrics

Back to TopTop