1. Introduction
Since Samuelson’s [
1] discounted utility model, Economics has modelled impatience in decision-making by assuming that agents discount future streams of utility exponentially over time. Exponential discounting means that people have the same preferences now over future behaviour as they will have when the future arrives. For example, if today we prefer to start spending less and saving more tomorrow, tomorrow we will want to start spending less and saving more immediately; if today we prefer to quit smoking tomorrow, then tomorrow we will want to quit smoking immediately; if today we prefer to go on a diet tomorrow, tomorrow we will want to go on a diet immediately; and if today we prefer to start exercising tomorrow, tomorrow we will want to start exercising immediately.
Behavioural Economics has built on the work of [
2] to explore the consequences of relaxing the standard assumption of exponential discounting. Both [
3] and [
4] suggest that some basic features of inter-temporal decision-making (namely that many individuals value, e.g., consumption in the present more than any delayed consumption) may be explained by a particular type of time inconsistency, a quasi-hyperbolic discounting which formalizes the idea that today’s “self” is impatient while tomorrow’s “self” is much more patient. This shape of time discounting was originally proposed by Phelps and Pollak in 1968 [
5], later employed by David Laibson [
6,
7], and further developed in the work of O’Donoghue and Rabin [
8,
9,
10]. In the formulation of quasi-hyperbolic discounting adopted by [
7], the taste for immediate gratification or present bias is captured by an extra discount parameter
. Accordingly, the consumption path planned at each time period for future periods may never be realized, due to inter-temporal trade-off changes over time. If people have a bias towards immediate gratification, when thinking about two generic future dates they care roughly equally about well-being on those two dates, but when the first of those two future dates becomes “today”, people care more about well-being on the first date than about well-being the second date. So, they will save less, smoke more, eat more, and exercise less tomorrow than they plan today they will do tomorrow. In other words, there is a conflict between what the individuals would like for themselves today and what they would like for themselves tomorrow. The implications of such self-control problems depend on the agents’ awareness of their future preferences [
8,
11]. Extreme assumptions about such awareness, e.g., full awareness and full unawareness, identify two types of individuals usually considered in the Behavioural Economics literature [
2,
8,
12]: naïve and sophisticated. Sophisticated persons are fully aware of what their future selves’ preferences will be. Naïve persons are instead fully unaware of what their future preferences will be. They believe their future self’s preferences will be identical to their current selves, not realizing that their tastes will have changed as they become closer to implementing their decisions.
To analyse the equilibrium behaviour of individuals with different time preferences, for example, equilibrium consumption behaviour, researchers in Economics [
8,
9,
11] have formally modelled a consumer as a sequence of temporal selves making choices in a dynamic game. Hence, a
T-period consumption problem translates into a
T-period game, with
T players or selves indexed by their respective periods of consumption decision. Individual behaviour is described by
perception-perfect strategies, i.e., solution concepts describing the individual’s optimal action in all periods given her current preferences and her perception of future behaviour. The naïve have present-biased preferences but believe that they are time consistent (TC). Therefore, the decision process of the naïve is identical to that of TCs, and amounts to choosing an optimal future consumption path. Thus, both naïve and TC equilibrium behaviour can be solved as an optimization problem [
9]. At all times they maximize their expected utility given their current information
where
is the long-run discount factor,
is the discount rate, and the extra discount parameter
is intended to capture the essence of quasi-hyperbolic discounting; namely, that the discount factor between consecutive future periods,
, is larger than the discount factor between the current period and the next one,
. If
preferences in Equation (
1) are dynamically inconsistent, that is to say preferences at date
t are inconsistent with preferences at date
. If, instead,
the discount factor between consecutive future periods,
, is the same as the discount factor between the current period and the next one,
, and individuals are time-consistent.
For sophisticates, perception-perfect strategies imply that they are playing a game against their future selves. Their behaviour partly reflects “strategic” reactions to predicted behaviour by future selves that they cannot directly control, and partly reflects attempts to induce better behaviour from their future selves.
In this strand of literature, the decision process of naïve and TC individuals amounts to just choosing an optimal future consumption path, but the equilibrium behaviour of sophisticated individuals is the solution to a dynamic game in which the decision maker is modelled as a sequence of temporal selves making choices.
The Economics literature summarized here assumes that present-biased individuals are endowed with a given, exogenous, degree of present bias and with an exogenous level of awareness about their self-control problems. References [
10,
13] make assumptions on individuals’ awareness of their future self-control problems. People could be fully aware of their future self-control problems and correctly predict how they will behave in the future; by indicating the perceived level of self-control with
, full sophisticates will have
. People could be fully unaware of their future self-control problems and wrongly predict how they will behave in the future. Fully naïve people will have
. Further, people could be partially naïve, aware that they will have self-control problems, but underestimate their magnitude:
. Finally, people with standard time-consistent preferences will have
.
In this strand of literature, the degree of individual present bias is exogenous and not affected by one’s level of awareness. However, it is not unreasonable to assume that agents have the power to affect their discount factor through various means, such as investing in education [
14], and/or to affect their present bias through self-awareness. In addition, this literature defines self-awareness as the exogenously given difference between perceived and actual present bias, leaving no scope for learning.
A recent paper in Psychological research proposed a Markov Decision Process Model of Decision-Making which embeds awareness as a dynamic process, allowing the decision maker (DM) to switch from habitual to optimal behaviour [
15]. In this model, the level of awareness impacts individual well-being at each time period, and agents discount future streams of well-being exponentially over time.
The purpose of this paper is to offer a generalization of that model and allow for a taste of immediate gratification, i.e., present bias, in inter-temporal decision-making. The evolution of well-being obtained by solving the model using numerical simulations shows that, for a given magnitude of the present-bias parameter,
, a higher level of individual awareness leads to higher well-being. Consider, as an example, present-biased DMs whose retirement savings are too low. By increasing their awareness level, they might decide to commit to a savings contract forcing them to save a given amount per month. Under the setup of [
15], the decision to subscribe a savings contract—a commitment device driven by higher awareness—improves decisions and well-being. We then make
endogenous by letting the present-bias parameter vary with the awareness level and show that even individuals suffering from severe present bias may learn over time to partly overcome their self-control problems.
This paper does not have the presumption nor the ambition to contribute to the Economics literature on present bias but aims to highlight the possible relationship between self-awareness and present bias using, for this purpose, a simple mathematical framework proposed in Psychological research. We offer the following contributions to the literature.
First, we use a mathematical framework—a Markov Decision Process (MDP)—for modelling inter-temporal decision-making in situations where outcomes are partly random and partly under the control of the decision maker, and where there is scope for learning, as the probability that the decision process moves into its new state is influenced by the previously chosen actions. Therefore, self-awareness is naturally embedded in this setup. The essence of a MDP is that a decision maker’s action generates a response impacting the individual state, and consequently affects the immediate reward obtained by the agent, as well as the probabilities of future state transitions. The agent’s objective is to select actions to maximize a long-term measure of total reward. We add individual traits such as self-awareness and present bias to this mathematical tool. Second, we embed awareness and present bias in a dual-process model of decision-making, relying on the intuition that decision-making is not made by a unique coherent entity, but by individuals who may operate in a “cold” or “hot” mode (Dual process theories of decision-making rely on the concept of multiple interacting brain systems in neuroscience, i.e., on the existence of several brain systems interacting with each other to make a variety of choices. The idea that choices can be explained by decision-making not made by a unique coherent entity has been developed both in Psychology [
16,
17] and in Behavioural Economics [
18,
19]; see [
20] for a survey of dual process theories of decision-making in Economics). Our model can thus predict heterogeneity in awareness and present bias across individuals and within individuals over their life cycle. Third, we explore how individuals can affect their present bias through self-awareness in a general setting. Specifically, we explore how a psychological factor—awareness—can impact self-control problems implied by present bias, an additional discounting parameter. Finally, our setup is general, not centred on economic choices. The main implication is that we do not include the usual elements of choice in economic models, e.g., preferences, prices, and a budget constraint that enter the dynamic programming algorithm.
The model can find application in a variety of fields, such as strategic managerial decisions [
21,
22], social psychology [
23] and health research [
24], where the combination of intuitive–analytic reasoning propensities and present bias in inter-temporal decision-making may contribute to understanding and promoting new views of yet unexplained processes.
The paper proceeds as follows: the generalized Sequential Decision Process is outlined in
Section 2, we present numerical simulations of the model in
Section 3, behavioural patterns emerging from the simulations are discussed in
Section 4, and
Section 5 concludes.
2. The Model
In this Section, we develop a generalization of [
15]’s model. The details and the background of the model are reported in the companion paper. Here we introduce the basic formalism necessary to incorporate the idea of present bias introduced in the previous Section.
Using MDPs to model the mechanisms and the dynamics of human decision-making seems natural for many reasons. The first reason is that MDPs describe processes changing over time, an essential element of inter-temporal choice. Secondly, the stochastic terms in the model allow us to embed internal and external sources of uncertainty in human decision processes. Finally, the actions obtained by maximizing the reward function correspond to the solution of an optimal control problem, so the model can mine information on the main factors to address when dealing with present bias. Moreover, the optimal policies embed the objective, expressed in terms of the expected reward at the final time, to be achieved by the DM.
Definition 1. An MDP is a tuple where S is a finite set of states, A is a finite set of actions, is a transition probability function, and is a reward function.
Each individual makes a decision in the present by taking into account the outcomes realized over a time horizon of length T.
Definition 2. The time horizon is the discrete and finite set .
At each time period, the individual experiences a particular state, , which belongs to the state space, S, representing the finite set of available states.
Definition 3. The state of the individual is defined as her level of awareness at time t.
The state of the individual provides information on the level of awareness they adopt when making decisions at each time period, and belongs to the state space S defined in the closed interval [0,1] so that the higher the value of s at time t, the higher the level of awareness. The state and, particularly, its evolution over time, is the main aspect we focus on in this work. The DM’s level of awareness has an impact on their overall well-being as well as on their choices. By considering a Markov Decision Process, the current state incorporates the entire history of the DM so that awareness is a state embodying, to some extent, all aspects affecting individual decision-making: from personality to values and beliefs developed over a lifetime, to education and past experiences.
At each time period, the individual makes a choice , which belongs to the action space, A, denoting the set of available actions.
Definition 4. The action chosen by the individual at time t is defined as their decision, and embodies the level of analyticity adopted in solving a particular decision-making problem.
In our framework, each choice is the solution to a decision problem. The individual faces a generic decision problem over a set of possible actions, representing different trade-offs between an intuitive () and an analytical approach (), which are the two extremes of a continuum of values. In this way, the focus is maintained on the process underlying the decisions without focusing on the specific decision in itself.
Psychological research suggests that decision-making is characterized by a dual-process mechanism [
16,
17,
25,
26] where the individual reasoning propensity can be hot or cold. The authors of [
27] provide empirical evidence that the two systems are, in fact, complementary and necessary for optimal decision-making. We introduce a dual-process mechanism by defining the reasoning propensity in the following way.
Definition 5. The reasoning propensity is an individual trait defining the DM’s attitude in processing the information used in the decision problem embedding the trade-off between two reasoning modalities: intuitive and analytical.
The reasoning propensity takes values in a continuum between the two extreme attitudes [
28] called intuitive (
) and analytical (
), assuming that both are always involved, to different degrees, in any decision. The fact that the interval is open, i.e., the extremes are not in the domain of
, remarks the idea that both mechanisms are always involved with a different trade-off, and one does not completely exclude the other.
The agent does not have perfect knowledge of the outcomes of the decision due to the interaction with the external environment which embeds uncontrollable factors. This uncertainty affects the evolution of the state over time and the reward resulting from the decision
. Regarding the first aspect, the DM’s state,
, evolves according to the following non-deterministic dynamic rule:
The future level of awareness of the individual depends on the current one, , and on the choices they made, , but it is also subject to some uncertainty represented by a stochastic variable, , related to the state transition. We assume, for simplicity, that the state can remain the same, increase or decrease by only one step at each time period, thus preventing “jumps” in its dynamic. This reflects the consideration that awareness-raising is a continuous process over time. By convention, considering the current state , we write to indicate that the state increases by one step z, and to indicate that the state decreases by one step. With these assumptions, the stochastic variable belongs to the set , indicating the possibility for the state to increase, remain constant, or decrease, respectively. The probability with which takes on each of these values depends on the decision (it does not have, in this case, any explicit dependence on the state ).
The system dynamics can equivalently be written in terms of probabilities. We assume that for each state
, only three possibilities exist:
,
, and
, and we define a function P with three components specifying the probability of each of the three cases depending on the decision
. P is represented by a matrix of dimensions
, in which the three columns embed the different probabilities:
specifying the probability for the state to increase (
Forward probability), remain constant (
Stationary probability), or decrease (
Backward probability), respectively, explicitly depending only on the action
made at time t. Alternatively we can say that
assumes a value equal to
z with probability
, 0 with probability
, and
with probability
.
The last element defining an MDP is a reward function.
Definition 6. The reward function is a stochastic function where gives the reward the agent obtains by performing the action a from state s and transitioning to state .
The appropriate formulation of the reward function,
, depends on its relationship with awareness. The authors of [
29] recently proposed a theory of self-connection and showed how self-connection is an important potential contributor to a person’s well-being. Accordingly, it is reasonable to assume that awareness has a positive influence on all aspects of an individual’s life so that living with a higher level of awareness can improve their general well-being (physical, psychological, emotional, as well as economic). Therefore, we assume the reward function to be positively affected by the current level of awareness, such that the higher the level of awareness of an individual,
, the higher their general well-being. The reward function also depends on the reasoning propensity. Rational/analytical reasoning is resource-consuming because it requires the acquisition of information about the problem and the possible solutions, and it is time-consuming because time is needed to sort, analyse, and elaborate the collected data. Therefore, the more analytical a decision is, the more resources are needed in terms of time, personal energy, and monetary resources. This translates into a negative dependence of the reward function on choice,
: the higher
, the more analytical the DM’s reasoning and the more their resources are consumed. Our instantaneous reward is, therefore, a stochastic function involving both the current state and the next state the DM may reach, with a given probability, by choosing an action. Specifically, in the formulation
we make explicit the dependence of the reward function on the state
and the action
, while it also involves some stochastic component.
The long-run discount factor is the current value of an expected reward, i.e., it is the weight the individual assigns to a future reward when thy have a non-null probability to transition to state from state s as a consequence of choice a. We assume that the discount factor is . When the future is not considered. Otherwise, the higher the value of , the higher the weight to future rewards given, until the case , in which there is no distinction between present and future values.
2.1. Exogenous Present Bias
The main purpose of this paper is to generalize the above setup by introducing present bias, signalling the presence of self-control problems, and analysing the impact of both an exogenous and a state-dependent present bias on the evolution of the state. Most experimental research aimed at quantifying present bias has focused on a single specific reward type, namely money. However, estimates of
in other domains, such as real effort [
30], and healthy and unhealthy foods [
31] also exist. Experimental estimates of the present-bias parameter range from 0.69 for healthy food and 0.71 for unhealthy food [
31], to 0.65 for money, to 0.88 for real effort [
30]. A recent meta-analysis of empirical estimates of present bias measured with the experimental method called the Convex Time Budget, [
32] reports an average present bias between 0.95 and 0.97, but finds that estimates differ depending on the type of reward. Specifically, while monetary reward estimates are close to one, studies with non-monetary rewards produce an average present bias of 0.88.
Definition 7. The factor embeds a taste for immediate gratification, i.e., a present bias affecting the DM’s evaluation of the future stream of rewards.
The objective function the DM wants to maximize is given by the sequence of expected rewards incurred so that at each time instant
t the DM’s maximization problem is
considering the state dynamics defined in (
2). The second part of the equation exploits the expected value of the future rewards.
First, the presence of an expected value is due to the presence of a source of external uncertainty, making the evolution of the state stochastic. Second, , the weight given to future rewards, is subject to an exponential decrease (because it is smaller than 1), whereas is constant for all future time periods but is not considered in the current time period t in which the decision is made.
The maximization problem is solved with a Dynamic Programming algorithm [
33] exploiting a mechanism of backward induction in which, starting from a fixed terminal value,
, the sequence of all optimal decisions is reconstructed step by step for the entire time horizon.
2.2. Endogenous Present Bias
With exogenous present bias, the lower the value of
, the stronger the bias, the worse the state. However, it is reasonable to think of personal awareness as a mechanism to contrast and mitigate individual present bias. A higher level of awareness could weaken self-control problems, leading to a higher
. In other words, the higher the DM’s level of self-awareness, the lower the negative impact of present bias on the system’s dynamics. We assume
to be a function of the individual’s current level of self-awareness, which acts as a deterrent of self-control problems. This assumption translates into a state-dependent
such that
is now a linear combination of
and its complement to one,
, weighted by the two constants
>
. They are both in the range
and correspond to weak and severe present bias, respectively, (for example, in the simulations in
Section 3 they are set to
). The final value of
is not constant over time, but it is allowed to evolve according to the state level of the individual. From (
5), we see that at
the value of
coincides with
, corresponding to a maximum level of bias, whereas at
, the value of
coincides with
, a minimum level of bias. In the range between these two values, the higher the state, the higher the resulting value of
(the weaker the degree of present bias). By modifying their evaluation of the present/future trade-off in terms of well-being, the agent affects the entire representation of their future rewards, which includes both a present bias parameter and a long-run discount parameter.
The idea of an endogenous present bias is not new. In Economics, [
14] argued that economic agents have the power to affect their discount factor,
, through various means, such as investing in education. More recently, [
34,
35] allow the agent to exert some control over the magnitude of their present bias by investing time, effort, and resources in human capital which allows them to affect the future well-being implied by current actions. In our setup, we allow the present bias parameter to depend on self-awareness so that the agent has some control over its magnitude.
3. Simulations
In this Section we show, by means of numerical simulations (parameter setting is detailed in
Appendix A) of the model, the impact on the state dynamics of both an exogenous and a state-dependent present-bias parameter,
. As mentioned before, the state variable is a representation of the level of awareness of the individual, and, according to [
15], it displays monotonic increasing dynamics. The baseline is the simulation of the original model with no present bias, which is equivalent to fixing
in Equation (
4).
We then introduce the present-bias parameter , considering an intuitive () and an analytical () individual, respectively.
Figure 1 shows how the introduction of exogenous present bias has a slowdown effect on the state dynamics for both intuitive and analytical individuals. The effect is stronger for analytical individuals for whom even the increasing monotonic characteristic of the state is lost. When
is set below a threshold, the state eventually falls to zero. The state dynamics of the intuitive individual also display a slowdown effect, though a less prominent one which only smooths the growth speed of the state.
With an endogenous present bias, there is, in the long run, an improvement in all cases (
Figure 2A,B, where the red lines are always above the blue ones) in terms of growth speed or state values. The state dynamics when
is exogenous and extremely low (
Figure 2A, blue line,
), is now overcome. The initial decreasing state dynamics in
Figure 2A (red line), can be interpreted as a learning period during which the individual has not yet found ways of addressing their self-control problems.
Panels C and D in
Figure 2 show the endogenous present bias,
, dynamics over time compared to a constant
at low (Panel C) and high (Panel D) initial levels of the parameters. In Panel C, after an initial decreasing phase,
is monotonously driven towards higher values corresponding to weaker present bias.
Figure 3 shows the dynamics of an intuitive individual. The introduction of an endogenous present bias accelerates the state dynamics, especially when
is extremely low (0.2). As before, Panels C and D refer to the evolution of
over time compared to a constant
.