Information Thermodynamics: From Physics to Neuroscience

Karbowski, Jan

doi:10.3390/e26090779

Open AccessPerspective

Information Thermodynamics: From Physics to Neuroscience

by

Jan Karbowski

Institute of Applied Mathematics and Mechanics, Department of Mathematics, Informatics and Mechanics, University of Warsaw, 02-097 Warsaw, Poland

Entropy 2024, 26(9), 779; https://doi.org/10.3390/e26090779

Submission received: 6 July 2024 / Revised: 5 September 2024 / Accepted: 9 September 2024 / Published: 11 September 2024

(This article belongs to the Special Issue Entropy and Information in Biological Systems)

Download

Browse Figures

Versions Notes

Abstract

This paper provides a perspective on applying the concepts of information thermodynamics, developed recently in non-equilibrium statistical physics, to problems in theoretical neuroscience. Historically, information and energy in neuroscience have been treated separately, in contrast to physics approaches, where the relationship of entropy production with heat is a central idea. It is argued here that also in neural systems, information and energy can be considered within the same theoretical framework. Starting from basic ideas of thermodynamics and information theory on a classic Brownian particle, it is shown how noisy neural networks can infer its probabilistic motion. The decoding of the particle motion by neurons is performed with some accuracy, and it has some energy cost, and both can be determined using information thermodynamics. In a similar fashion, we also discuss how neural networks in the brain can learn the particle velocity and maintain that information in the weights of plastic synapses from a physical point of view. Generally, it is shown how the framework of stochastic and information thermodynamics can be used practically to study neural inference, learning, and information storing.

Keywords:

information; non-equilibrium stochastic thermodynamics; computational neuroscience; learning; inference; neurons and synapses; plasticity

“Earth, air, fire, and water in the end are all made of energy, but the different forms they take are determined by information. To do anything requires energy. To specify what is done requires information.”— Seth Lloyd (2006) [1]

1. Introduction: Information Is Physical, So Is the Brain

Brain computations require a certain amount of energy [2,3,4,5,6,7], and the brain is one of the most metabolically expensive organs in the body [8]. Moreover, the brain energy cost (oxygen and glucose metabolic rates) scales linearly with the number of neurons [9] and sub-linearly with brain size [10]. Every transition in neural circuits, either on a microscopic or macroscopic scale, is associated with some energy dissipation [11,12,13,14,15,16,17,18]. Despite all this, a huge majority of neuronal models used in computational (or theoretical) neuroscience neglect completely the energetic aspect of brain functioning, as if neural information processing were for free and performed in some abstract “mathematical” hyperspace (e.g., [19,20,21,22,23]). One can argue that brain information processing is relatively cheap (only about 10–20 Watts for human brain [6,8]) in comparison to computations executed by artificial neural networks on semiconductor hardware (the supercomputer involved in the Blue Brain Project uses about

4 \cdot 10^{5}

Watts for a “realistic” simulation [24,25]). However, this relative brain energetic efficiency cannot be a justification for dismissing the metabolic constraints. In fact, handling information in real neural circuits is energetically demanding, as transmitting 1 bit of information through a chemical synapse requires about ∼

10^{5}

k_{B} T

of energy [4], and acquiring 1 bit by a synapse during synaptic learning needs a similar amount of ∼

5 \cdot 10^{6}

k_{B} T

[18], where

k_{B}

is the Boltzmann constant, and T is the brain temperature. Importantly, both these energy figures are much larger than the minimum set by the Landauer limit (

k_{B} T ln 2

; [26]). Most of the energy consumption in the mammalian brain goes for fast electric signaling, i.e., the generation of action potentials (neural activation) and synaptic transmission (each of them roughly

2 \cdot 10^{8}

k_{B} T

/min, for neuronal firing rates ∼5 Hz) [5,6,7], and for fast communication (spatial traveling of action potentials along axons) [27]. In contrast, slow chemical signaling associated with synaptic plasticity (related to learning and memory) requires much less energy, about 4–11% of the energy cost expanded on the synaptic transmission for low firing rates [13]. These substantial costs are likely the reason for observing sparse coding in brain networks, where only a small fraction of neurons and synapses are active at any instant of time [28,29]. All this suggests that energy is a strong constraint on neural information processing and storing, and consequently, not all sorts of computations, even theoretically possible, can be implemented by neural networks in the brain.

The first meaningful connection between physics and neuroscience was made a long time ago, in 1871, by James Maxwell in his book about heat [30]. In that book, he considered an “intelligent being” or “demon” that supposedly breaks the second law of thermodynamics by decreasing the entropy of the physical system. This thought experiment was a paradox that triggered a confusion regarding fundamental issues of thermodynamics and led to a huge amount of literature on this subject (for reviews, see [31,32]). The resolution of this paradox came with the realization that the concept of information also has to be included in the thermodynamic considerations, i.e., information has to be treated on equal footing with physical entropy and work [32].

This realization followed from a seminal observation made by Rolf Landauer that erasing information always leads to heat dissipation (erasure of 1 bit causes at least

k_{B} T ln 2

of energy released into the environment [26]). In other words, information is physical, since its storing and processing requires physical hardware, and it has to comply with the laws of physics [33,34,35,36].

It seems that one of the main goals of neural networks of any brain is to accurately estimate the outside signals [21,37,38,39,40], which are relevant for the brain, using as little energy as possible [41,42,43]. Based on these estimates, the brain tries to predict the future dynamics of these signals and to plan action. The outside signals, or inputs coming to brain circuits, are mostly of a stochastic nature, and therefore, their estimation and prediction is additionally complicated and demanding. Given this, it is perhaps not surprising that the brain has to possess some internal, stable, representation of the outside world, which can be modified by learning. It is fair to say that despite many conceptual developments, we have only rudimentary knowledge (or feeling) of how this representation is created and works.

We can quantify the degree of correlation between outside dynamics and internal brain dynamics by mutual information, which is known from the Claude Shannon mathematical theory of communication [44]. This concept was brought to neuroscience by Horace Barlow [45] in the late 1950s. Much later, it was used by many neuroscientists, starting from Laughlin [46], Atick [37], and most notably by Bialek and colleagues [21,47,48]. These approaches aimed at the maximization of mutual information, initially ignoring energetic aspects. Levy and Baxter were likely the first to consider energetics of information encoding in neural networks [2,3]. However, even in these attempts, information and energy were treated as separate concepts that were not directly related to one another.

In contrast, stochastic thermodynamics provides a framework where information and energy are mutually related and can be considered and computed within a single formalism [36,49,50]. This is because on a micro-level, which includes molecular fluctuations, all relevant degrees of freedom have to be considered simultaneously. This work provides a perspective on a mutual connection between stochastic and information thermodynamics considered in physics and neural systems, which are intrinsically stochastic due to their small sizes and strong interactions with a fluctuating environment. This intrinsic stochasticity is a key ingredient of neurons and synapses that causes energy dissipation and influences information processing.

The paper is organized as follows. We start, in Section 2, with reviewing the fundamentals of stochastic dynamics and their relation to stochastic thermodynamics, with a simple pedagogic example of a Brownian particle moving in a gravitational field. This example is a basis and leitmotif for the next considerations, which link this stochastic mechanical system with the thermodynamics of information processing in neural networks. In Section 3, we introduce the relationship between entropy, information, and energy in general and in particular for the Brownian particle from Section 2. Next, in Section 4, we discuss information flow between two coupled subsystems, as a clear example where entropy production is directly related to information flow, and its relevance to the Maxwell demon. A neural network inferring the velocity of a Brownian particle (or a more general stochastic particle) is presented in Section 5 together with an associated energy cost. Synaptic plasticity and learning are discussed in Section 6 in the context of information gain and loss, using a stochastic version of the BCM model [51] together with its energy cost. It is shown here how the information loss after learning is related to the entropy production rate in synapses. Most of the calculations in Section 5 and Section 6 are novel; i.e., standard neural and synaptic models are analyzed in a new light. Finally, in Section 7, we briefly discuss a more general large-scale model of interacting plastic synapses during learning, using Glauber dynamics [52], in terms of information processing. We conclude with some general remarks about the relevance of information thermodynamics to neuroscience.

2. Stochastic Dynamics and Thermodynamics

2.1. Stochastic Dynamics

Small physical systems have internal degrees of freedom that are subject to fluctuations due to thermal noise (i.e., interactions with the environment or “heat bath”). These internal degrees of freedom can be described either by discrete or continuous time-dependent variables, such as position, velocity, activity, composition, etc. Let the index z denote an internal variable (or all relevant internal variables), describing the state of the system, and let

p (z, t)

denote the probability that the system is in this particular state at time t. Assuming that z follows a Markov process, one can describe the dynamics of the probability

p (z, t)

by a master equation [53,54]:

\begin{matrix} \dot{p} (z) = \sum_{z^{'}} (w_{z z^{'}} p (z^{'}) - w_{z^{'} z} p (z)), \end{matrix}

(1)

where

\dot{p} (z)

denotes the temporal derivative of

p (z)

, and

w_{z z^{'}}

is the transition rate for jump from state

z^{'}

to state z. Here, the variable z can be either discrete or continuous. In the latter case, one can expand Equation (1) to obtain the so-called Fokker–Planck equation (see below).

In the case with a single continuous internal variable

z (t)

, we can write its stochastic dynamics as the so-called Langevin equation [53,54]:

\begin{matrix} \frac{1}{μ} \dot{z} = F (z, t) + σ η (t), \end{matrix}

(2)

where

F (z, t)

is the deterministic generalized force acting on the system, which can depend on z and on time t, and

μ

is some parameter which is inversely proportional to the time scale of the dynamics. The parameter

η (t)

is the thermal noise acting on the variable z and thus can be described by a delta-correlated Gaussian random variable, such that

〈 η (t) 〉 = 0

, and

〈 η (t) η (t^{'}) 〉 = δ (t - t^{'})

. The parameter

σ

characterizes the magnitude of the thermal noise. If z is velocity, then the two parameters,

μ

and

σ

, are not independent. In fact, they are mutually coupled by the temperature of the system T through the relation:

σ^{2} = 2 k_{B} T / μ

[53,54]. This relation is known as a fluctuation–dissipation theorem, which essentially means that in the presence of a heat bath (characterized by the temperature T), there is some balance between the level of fluctuations in the system (∼

σ

) and the time for which that system approaches equilibrium (∼

μ^{- 1}

). It should be noted that for neural systems, the thermal noise is not the most important source of noise, at least on the level of the whole neuron, and thus the temperature does not play a major part in the considerations of neural activation (see also below).

The dynamics of variable z can be described equivalently by the dynamics of probability density of the state variable z in terms of the Fokker–Planck equation as [53,54].

\begin{matrix} \frac{\partial P (z, t)}{\partial t} = - \frac{\partial J (z, t)}{\partial z}, \end{matrix}

(3)

with the probability current (or flux)

J (z, t)

given by

\begin{matrix} J (z, t) = μ F (z, t) P (z, t) - \frac{1}{2} {(μ σ)}^{2} \frac{\partial P (z, t)}{\partial z}, \end{matrix}

(4)

where

P (z, t)

is the probability density for the variable z.

In many circumstances, in physical systems, one thinks about z as a generalized position or velocity. In biological systems, z can be either some structural variable, concentration of some ions or molecules, or system activity. These are the most common “state variables”, although it should be noted that there are no restrictions about what physical observable a Langevin equation may or may not describe.

For concreteness, we take a specific example of Equation (2): a small particle of mass m moving in a gravitational field with some modulating time-dependent force

F_{0} (t)

in the fluctuating environment with

z (t)

being the particle velocity

v (t)

. This example will be our leitmotif for most of this paper, which is devoted to neural information processing and thermodynamics (Section 5 and Section 6). The Langevin equation of motion takes a familiar form:

\begin{matrix} m \dot{v} = - k v + F (t) + \sqrt{2 m k} σ_{v} η \end{matrix}

(5)

where

k v

is the deterministic part of the resistance force of the environment with k being the parameter corresponding to the strength of the resistance and proportional to the size of the particle. The force

F (t)

is

F (t) = m g + F_{0} (t)

, with g being the gravitational acceleration, and

σ_{v}

being the standard deviation (its steady-state value) of the particle velocity due to the thermal noise

η

acting on it (random hitting of air particles). When

F_{0} (t) = 0

, the particle is falling freely with velocity-dependent friction and stochastic environmental fluctuations. In this case, at the steady state (

t \mapsto \infty

), we obtain the fluctuation–dissipation relation for our moving particle in the form

σ_{v}^{2} = k_{B} T / m

. This relation indicates that the fluctuations in the kinetic energy of the particle correspond to one degree of freedom associated with

k_{B} T / 2

(in 1D). It is instructive to have a sense of the magnitude of these fluctuations for real particles. For a particle with the size 0.1 mm and the mass of 1 μg (assuming the density 1 g/cm³), we obtain

σ_{v} = 2

μm/s, which is small and cannot be detected by a naked eye, but it can be observed with a microscope. For a comparison, for a hundred times greater particle with the size 1 cm and mass 1 g, we obtain

σ_{v} = 0.002

μm/s, which is extremely small.

We can write the Fokker–Planck equation for Equation (5), and easily solve it, yielding a Gaussian distribution

P_{v}

for particle velocity [53]

\begin{matrix} P_{v} (v, t) = \frac{exp (- {[v - 〈 v (t) 〉]}^{2} / 2 σ_{v}^{2} (t))}{\sqrt{2 π σ_{v}^{2} (t)}}, \end{matrix}

(6)

where

〈 v (t) 〉

is the average velocity,

〈 v (t) 〉 = [v (0) + \int_{0}^{t} d t^{'} e^{γ t^{'}} F (t^{'}) / m] e^{- γ t}

, with

γ = k / m

, and the time-dependent variance of velocity is

σ_{v}^{2} (t) = σ_{v}^{2} (1 - e^{- 2 γ t}) = 〈 v {(t)}^{2} 〉 - {〈 v (t) 〉}^{2}

.

In the limit when the particle mass is very small,

m \mapsto 0

, we can neglect the term on the left in Equation (5) and use the fact that

v = - d x / d t

, with x being the height of the particle (velocity increases as height decreases). This corresponds to a standard overdamped approximation [55], and then Equation (5) transforms to

\begin{matrix} \dot{x} = - \frac{F (t)}{k} - \sqrt{2 γ} σ_{x} η . \end{matrix}

(7)

This approximation is equivalent to saying that the particle velocity is in a quasi-stationary state, since its dynamic is governed by a fast time constant

\sim m

. In Equation (7), we used the rescaling

σ_{x} = σ_{v} / γ

, where

σ_{x}

refers to the standard deviation of particle position x. We can also write the Fokker–Planck equation for the temporal evolution of the distribution of particle position

P_{x} (x, t)

and easily solve it, obtaining

\begin{matrix} P_{x} (x, t) = \frac{exp (- \frac{{[x - 〈 x (t) 〉]}^{2}}{4 σ_{x}^{2} γ t})}{\sqrt{4 π σ_{x}^{2} γ t}}, \end{matrix}

(8)

where the average position

〈 x (t) 〉 = x (0) - \frac{1}{k} \int_{0}^{t} d t^{'} F (t^{'})

. Additionally, the variance of particle position is

〈 x {(t)}^{2} 〉 - {〈 x (t) 〉}^{2} = 2 σ_{x}^{2} γ t

, which indicates that it is growing proportionally with time, which is a characteristic of unrestricted Brownian motion. Also, in this limit, equivalent to the case

γ ≫ 1

, we have a simple expression for the mean of particle velocity (as can be easily seen from Equation (7)),

〈 v 〉 \approx F (t) / k

. Note that in contrast to the distribution for particle velocity (Equation (6)), which has a stationary solution, the distribution for particle position (Equation (8)) never assumes a stationary form.

2.2. Stochastic Thermodynamics

The first law of thermodynamics is essentially the rule for energy conservation. It turns out that Equation (2) can be used to derive the first law, as was realized by Sekimoto [56,57]. The idea is to treat the state variable in Equation (2) as generalized velocity and introduce an additional state variable u representing the generalized position, on which the generalized force also depends, i.e.,

F (z, u, t)

, with

z = d u / d t

. Next, we decompose the force

F (z, u, t)

as

F (z, u, t) = - \partial V (u, t) / \partial u + f_{n c} (z)

, where

V (u, t)

is the generalized potential (dependent on u and t), and

f_{n c} (z)

is the generalized nonconservative force (dependent on velocity z). After the multiplication of both sides of Equation (2) by z and rearrangement, we obtain the conservation of generalized “mechanical energy” in the following form:

\begin{matrix} \frac{d}{d t} (\frac{1}{2} μ^{- 1} z^{2} + V (u, t)) = \frac{\partial V (u, t)}{\partial t} + z f_{n c} (z) + σ z η (t), \end{matrix}

(9)

where we used the differentiation rule

d V (u, t) / d t = \frac{\partial V (u, t)}{\partial t} + \frac{\partial V (u, t)}{\partial u} \dot{u}

. Note that the left-hand side of Equation (9) is the temporal rate of mechanical energy, represented by

\frac{1}{2} μ^{- 1} z^{2} + V (u, t)

, which is the sum of “kinetic energy” (with

μ^{- 1}

representing the generalized mass) and generalized potential

V (u, t)

. Equation (9) implies that mechanical energy is lost (or gained) in three different ways: by temporal changes in the external potential V, by the action of nonconservative force

f_{n c}

, and by the noise (

\sim η

). The last two factors constitute the heat dissipated to the environment.

In the case of our Brownian particle in the gravitational field, we find the law of energy conservation as

\begin{matrix} \frac{d E_{m e c h}}{d t} = - k v^{2} + v F_{0} (t) + \sqrt{2 k m} σ_{v} v η, \end{matrix}

(10)

where

E_{m e c h}

is the mechanical energy of the particle,

E_{m e c h} = \frac{1}{2} m v^{2} + m g x

. Averaging this equation over the distribution of velocities, Equation (6), yields the mean balance of energy loss and gain:

\begin{matrix} \frac{d 〈 E_{m e c h} 〉}{d t} = - k 〈 v^{2} 〉 + 〈 v 〉 F_{0} (t) + k σ_{v}^{2}, \end{matrix}

(11)

where we used the Novikov theorem [58] for determining the average

〈 v η 〉 = σ_{v} \sqrt{k / (2 m)}

. According to our expectations, the mean mechanical energy is lost due to friction (the term −

k 〈 v^{2} 〉

), and

〈 E_{m e c h} 〉

can be either decreased or increased by the driving force depending on its sign. But interestingly,

〈 E_{m e c h} 〉

is always increased by the presence of thermal fluctuations (the term

k σ_{v}^{2}

).

Equation (11) in the limit

m \mapsto 0

, equivalent to

γ ≫ 1

, and corresponding to the unrestricted Brownian motion [Equations (7) and (8)], takes a simple form

\begin{matrix} \frac{d 〈 E_{m e c h} 〉}{d t} \approx - k {〈 v 〉}^{2} + F_{0} (t) 〈 v 〉 \\ \approx - \frac{m g}{k} [m g + F_{0} (t)] . \end{matrix}

(12)

Thus, the rate of mean mechanical energy is negative unless the driving force is negative (breaking from outside) and sufficiently strong. This means that opposing the gravitational force can save the mean mechanical energy or even increase it. We will come back also to the mechanical energy later in the context of entropy production and flux.

3. Entropy, Information, and the Second Law of Thermodynamics

3.1. Entropy, Kullback–Leibler Divergence, and Information

For the system with probability

p (z, t)

described by Equation (1) one can define Shannon entropy

S_{z} (t)

as [44,59]

\begin{matrix} S_{z} (t) = - \sum_{z} p (z, t) ln p (z, t), \end{matrix}

(13)

which is the measure of an average uncertainty about the state of the system or the value of the stochastic variable z. The larger the entropy, the less is known about the actual state of the system. The concept of entropy is central in thermodynamics [31,32,49], in information theory [59], and in the science of complexity [60].

It is worth noting that Shannon entropy is not the only way to define entropy. There are other definitions of entropy, such as Renyi entropy [61,62] and Tsallis entropy [63], which are also used in statistical physics and information theory [64,65,66]. Shannon entropy in Equation (13) is a special case of these more general entropies.

For two different probability distributions describing the same physical system, i.e.,

p (z)

and

q (z)

, one can define a statistical distance between them (in fact, it is a pseudo-distance in probability space) called Kullback–Leibler (KL) divergence [59,67]

\begin{matrix} D_{K L} (p | | q) = \sum_{z} p (z) ln \frac{p (z)}{q (z)} . \end{matrix}

(14)

KL divergence is also called the relative entropy, and it is always non-negative and quantifies the difference between the distributions

p (z)

and

q (z)

. Therefore,

D_{K L} (p | | q)

can be also thought as an information gain by observing the

p (z)

distribution in relation to the baseline distribution

q (z)

. The larger the KL divergence, the more distinct the two distributions are.

D_{K L}

has many applications in statistical physics and information theory [59,68]. We will use it in the following sections for synaptic information gain and loss.

As for the entropy, one can define also other statistical divergences, such as Renyi and Tsallis divergences [61,63].

D_{K L}

is a special case of these more general divergences. There exist numerous inequalities relating various types of statistical divergences [62,69] and inequalities relating the rates of these divergences to stochastic thermodynamics [70].

In the case of two coupled systems described by variables x and y, one can write

z = (x, y)

and define the joint probability

p_{x y}

as well as marginal probability distributions

p_{x}

and

p_{y}

for each subsystem separately. This allows us to introduce the measure of mutual dependency between the two subsystems,

ln \frac{p_{x y}}{p_{x} p_{y}}

, which is zero if x and y are independent and nonzero otherwise. The average of this quantity over all realizations of

x, y

is called the mutual information

I_{x y}

between x and y [59]

\begin{matrix} I_{x y} = \sum_{x, y} p_{x y} ln \frac{p_{x y}}{p_{x} p_{y}} \\ \equiv D_{K L} (p_{x y} | | p_{x} p_{y}) . \end{matrix}

(15)

Thus, the mutual information is the KL divergence between the joint probability

p_{x y}

and the product of marginal probabilities

p_{x}, p_{y}

. The definition in Equation (15) ensures that mutual information is always non-negative, and the stronger the dependence between x and y, the larger

I_{x y}

. This is in contrast to entropy, which can be negative for continuous probability distributions (when summation is replaced by integration).

From Equation (15), it follows that the mutual information can be also represented in terms of entropies [59]:

\begin{matrix} I_{x y} = S_{x} - S_{x | y} = S_{y} - S_{y | x}, \end{matrix}

(16)

where

S_{x | y}

is the conditional entropy defined as

S_{x | y} = - \sum_{x, y} p_{x y} ln p (x | y)

, with

p (x | y)

denoting the conditional probability,

p (x | y) = p_{x y} / p_{y}

, and similarly for for reverse conditional entropy

S_{y | x}

and conditional probability

p (y | x)

.

In recent years, information theory in general, and mutual information in particular, were applied to stochastic processes in different settings. For example, information theory was used to derive thermodynamic uncertainty relations [71]. Mutual information can be helpful in mapping the input trajectory to the output trajectory, which is relevant for biochemical networks [72]. Additionally, mutual information can be used to discriminate between internal information in the system and the information coming from external sources [73], which may have some relevance in neuroscience. In the latter context, mutual information was shown to be maximized for critical brain states with power law distributions of neural activity [74,75]. In a broader biological context, it has been argued that evolution acts to optimize the gathering and representation of information across many spatial scales [48].

3.2. Entropy Production and Flow, and the Second Law

The temporal derivative of the entropy from Equation (13) can be decomposed into two contributions [76,77,78]:

\begin{matrix} \frac{d S}{d t} = {\dot{S}}_{p r} - {\dot{S}}_{f l}, \end{matrix}

(17)

where

{\dot{S}}_{p r}

is the entropy production rate given by

\begin{matrix} {\dot{S}}_{p r} = \frac{1}{2} \sum_{z, z^{'}} (w_{z z^{'}} p_{z^{'}} - w_{z^{'} z} p_{z}) ln \frac{w_{z z^{'}} p_{z^{'}}}{w_{z^{'} z} p_{z}}, \end{matrix}

(18)

and

{\dot{S}}_{f l}

is the entropy flow rate given by

\begin{matrix} {\dot{S}}_{f l} = \frac{1}{2} \sum_{z, z^{'}} (w_{z z^{'}} p_{z^{'}} - w_{z^{'} z} p_{z}) ln \frac{w_{z z^{'}}}{w_{z^{'} z}} . \end{matrix}

(19)

The thermodynamic interpretation of

{\dot{S}}_{f l}

is that it is proportional to the heat

Δ Q

exchanged with the surrounding medium, i.e.,

Δ Q = k_{B} T {\dot{S}}_{f l} Δ t

, in the short time interval

Δ t

. Moreover, the entropy flow can be of either sign, which reflects the fact that the system can either gain energy from the environment (

{\dot{S}}_{f l} < 0

) or dissipate energy into the environment (

{\dot{S}}_{f l} > 0

).

The entropy production rate, on the other hand, is always non-negative, which follows from the fact that the two factors on the right in Equation (18) have the same signs, which are either both positive or negative. Alternatively, the non-negativity of

{\dot{S}}_{p r}

and its lower bound can be determined from a well-known inequality,

ln (1 + x) \geq \frac{x}{1 + x}

, which is valid for all

x > - 1

. Applying this to Equation (18) leads to

\begin{matrix} {\dot{S}}_{p r} \geq \frac{1}{2} \sum_{z, z^{'}} \frac{{(w_{z z^{'}} p_{z^{'}} - w_{z^{'} z} p_{z})}^{2}}{w_{z z^{'}} p_{z^{'}}} \geq 0 . \end{matrix}

(20)

The fact that

{\dot{S}}_{p r} \geq 0

has a tremendous consequence on the behavior of stochastic objects in the form of the second law of thermodynamics. In a nutshell, the second law says that the entropy of the isolated physical system (for which

{\dot{S}}_{f l} = 0

) never decreases, i.e.,

d S / d t = {\dot{S}}_{p r} \geq 0

, which means that disorder of the isolated system tends to increase over time.

Equations (18) and (19) apply to the general case described by the master Equation (1); however, it is also possible to define

{\dot{S}}_{p r}

and

{\dot{S}}_{f l}

for continuous stochastic variables described by the Fokker–Planck Equations (3) and (4). In the latter case, we have [79]

\begin{matrix} {\dot{S}}_{p r} = \frac{2}{{(μ σ)}^{2}} \int d z \frac{J {(z, t)}^{2}}{P (z, t)} \geq 0, \end{matrix}

(21)

and

\begin{matrix} {\dot{S}}_{f l} = \frac{2}{μ σ^{2}} \int d z J (z, t) F (z, t) . \end{matrix}

(22)

For the system at steady state, i.e., for

\dot{p} (z, t) = 0

, its entropy is constant with

d S / d t = 0

, which implies

{\dot{S}}_{p r} = {\dot{S}}_{f l}

. This equality can happen in two cases. In the first, the probability flux

J (z, t) = 0

for continuous variables, and

w_{z z^{'}} p_{z^{'}} - w_{z^{'} z} p_{z} = 0

for discrete variables. This situation describes the so-called detailed balance (where all local probability fluxes balance each other), which corresponds to the thermodynamic equilibrium with the environment. In the second case, one can have nonzero probability flux,

J (z, t) \neq 0

, and broken detailed balance

w_{z z^{'}} p_{z^{'}} - w_{z^{'} z} p_{z} \neq 0

. This situation takes place in the so-called driven systems by outside factors that provide energy and materials for maintaining the steady state out of equilibrium with the environment. Such a steady state is called a non-equilibrium steady state (NESS) [49,50]. All biological systems are out of equilibrium [11,33,80], and many biological processes operate in a non-equilibrium steady state [39,49], including neural systems [13,14].

Since at steady state

{\dot{S}}_{p r} = {\dot{S}}_{f l}

, one can say roughly that for any conditions, the entropy production rate is proportional to the amount of dissipated energy to the environment. Thus, it is useful to think about

{\dot{S}}_{p r}

as a measure of the energy cost of performing a non-trivial function that requires non-equilibrium conditions.

3.3. Entropy Production and Flow for the Brownian Particle

Our Brownian particle falling in the gravitational field represented by Equations (5)–(8) has entropy (Equation (13)) corresponding to its position distribution

P_{x} (x, t)

given by [59]

\begin{matrix} S_{x} (t) = \frac{1}{2} ln (4 π e σ_{x}^{2} γ t), \end{matrix}

(23)

which grows logarithmically with time. This means that the uncertainty about the particle position increases weakly with time. However, the entropy rate,

d S_{x} / d t

, decreases with time as

\begin{matrix} \frac{d S_{x}}{d t} = \frac{1}{2 t} . \end{matrix}

(24)

The entropy production rate for particle position (the main “state variable”) can be found from Equation (21). For this, we need the probability current

J (x, t)

(Equation (4)) for our particle position, which is

\begin{matrix} J (x, t) = (- \frac{F (t)}{k} + \frac{[x - 〈 x 〉]}{2 t}) P_{x} (x, t) . \end{matrix}

(25)

This allows us to find the entropy production rate

{\dot{S}}_{p r, x}

in the form

\begin{matrix} {\dot{S}}_{p r, x} = \frac{1}{2 t} + \frac{{[m g + F_{0} (t)]}^{2}}{γ k^{2} σ_{x}^{2}} . \end{matrix}

(26)

Note that when there is no driving force (

F_{0} = 0

), the entropy production rate decreases all the time to its asymptotic value

{(m g)}^{2} / (γ k^{2} σ_{x}^{2}) = m^{3} g^{2} / (k^{3} σ_{x}^{2})

.

The entropy flux rate can be quickly found from Equations (24) and (26), using the definition (17). The result is

\begin{matrix} {\dot{S}}_{f l, x} = \frac{{[m g + F_{0} (t)]}^{2}}{γ k^{2} σ_{x}^{2}} \\ \approx \frac{k {〈 v 〉}^{2}}{k_{B} T}, \end{matrix}

(27)

where the second approximate equality comes from using the fluctuation–dissipation theorem and the approximate equality for the average particle velocity

〈 v 〉 \approx [m g + F_{0} (t)] / k

(see Equation (7)). Thus, in this case, the entropy flux is always positive, suggesting energy dissipation to the environment.

The relationship between the mechanical energy loss and the entropy flux is (from Equations (12) and (27))

\begin{matrix} \frac{d 〈 E_{m e c h} 〉}{d t} \approx - k_{B} T {\dot{S}}_{f l, x} + F_{0} (t) 〈 v 〉 . \end{matrix}

(28)

This equation is the manifestation of the first law of thermodynamics or equivalently the law of energy conservation. It implies that our (mechanical) system changes its energy

E_{m e c h}

by dissipating heat to the environment (

k_{B} T {\dot{S}}_{f l, x}

) and by mechanical work performed on the particle by the external force

F_{0}

. Equation (28) also suggests that the rate of the mean mechanical energy of the Brownian particle is related to the entropy flux rate for its position, but they are not the same. The energy lost

d 〈 E_{m e c h} 〉 / d t

and

{\dot{S}}_{f l, x}

are directly proportional only if

F_{0} = 0

. To conclude, the entropy flux rate is a measure of dissipated energy (heat), but it does not account for all the lost or gained energy of the system.

4. Information Flow between Two Subsystems and the Maxwell Demon

In this section, we follow closely the main ideas presented in Ref. [81]. Consider two coupled subsystems X and Y with dynamics of the joint probability

p_{x y}

described by the following master equation

\begin{matrix} {\dot{p}}_{x y} = \sum_{x^{'}} (w_{x x^{'}}^{y} p_{x^{'} y} - w_{x^{'} x}^{y} p_{x y}) \\ + \sum_{y^{'}} (w_{y y^{'}}^{x} p_{x y^{'}} - w_{y^{'} y}^{x} p_{x y}), \end{matrix}

(29)

where

w_{x x^{'}}^{y}

is the transition rate in the subsystem X from state

x^{'}

to state x, which depends on the actual state y of the second subsystem Y (and similarly for

w_{y y^{'}}^{x}

). The form of the master equation in Equation (29) has a bipartite structure, in which simultaneous jumps in the two subsystems are neglected as much less likely than single jumps.

For this system, we can define the rate of mutual information

d I_{x y} / d t

as [81,82]

\begin{matrix} \frac{d I_{x y}}{d t} = {\dot{I}}_{x} + {\dot{I}}_{y}, \end{matrix}

(30)

where

{\dot{I}}_{x} = [I_{x_{t + d t}, y_{t}} - I_{x_{t}, y_{t}}] / d t

, and

{\dot{I}}_{y} = [I_{x_{t}, y_{t + d t}} - I_{x_{t}, y_{t}}] / d t

, with

d t \mapsto 0

. The explicit expressions for

{\dot{I}}_{x}

and

{\dot{I}}_{y}

are given by [81]

\begin{matrix} {\dot{I}}_{x} = \sum_{x > x^{'}, y} (w_{x x^{'}}^{y} p_{x^{'} y} - w_{x^{'} x}^{y} p_{x y}) ln \frac{p (y | x)}{p (y | x^{'})}, \end{matrix}

(31)

and

\begin{matrix} {\dot{I}}_{y} = \sum_{y > y^{'}, x} (w_{y y^{'}}^{x} p_{x y^{'}} - w_{y^{'} y}^{x} p_{x y}) ln \frac{p (x | y)}{p (x | y^{'})} . \end{matrix}

(32)

The essence of the decomposition in Equation (30) is that it splits the total rate of mutual information into two flows of information. The first flow,

{\dot{I}}_{x}

, relates to the change in mutual information between the two subsystems that is only due to the dynamics of X. The second flow,

{\dot{I}}_{y}

, is analogous and relates to Y. When

{\dot{I}}_{x} > 0

, then information is created in the subsystem X as it monitors the Y subsystem.

In the same manner, we can split the rate of entropy of the joint system (X,Y), i.e.,

d S_{x y} / d t

, as well as the joint entropy production rate

{\dot{S}}_{p r, x y}

and the joint entropy flux rate

{\dot{S}}_{f l, x y}

. We have

\begin{matrix} \frac{d S_{x y}}{d t} = {\dot{S}}_{x} + {\dot{S}}_{y}, \end{matrix}

(33)

where

S_{x y} = - \sum_{x, y} p_{x y} ln p_{x y}

, and the rates of entropy in each subsystem

{\dot{S}}_{x}

and

{\dot{S}}_{y}

are given by

\begin{matrix} {\dot{S}}_{x} = - \sum_{y} \sum_{x > x^{'}} (w_{x x^{'}}^{y} p_{x^{'} y} - w_{x^{'} x}^{y} p_{x y}) ln p_{x y}, \end{matrix}

(34)

and

\begin{matrix} {\dot{S}}_{y} = - \sum_{x} \sum_{y > y^{'}} (w_{y y^{'}}^{x} p_{x y^{'}} - w_{y^{'} y}^{x} p_{x y}) ln p_{x y} . \end{matrix}

(35)

Note that in the particular case of two independent subsystems, we have

w_{x x^{'}}^{y} \mapsto w_{x x^{'}}

(

w_{y y^{'}}^{x} \mapsto w_{y y^{'}}

), and the subsystems entropy rates

{\dot{S}}_{x}

and

{\dot{S}}_{y}

reduce to

{\dot{S}}_{x} = - \sum_{x} {\dot{p}}_{x} ln p_{x}

and

{\dot{S}}_{y} = - \sum_{y} {\dot{p}}_{y} ln p_{y}

, i.e., in agreement with the expectations.

Similarly, the joint entropy production

{\dot{S}}_{p r, x y}

and entropy flux

{\dot{S}}_{f l, x y}

can be decomposed as

\begin{matrix} {\dot{S}}_{p r, x y} = {\dot{S}}_{p r, x} + {\dot{S}}_{p r, y}, \end{matrix}

(36)

where

\begin{matrix} {\dot{S}}_{p r, x} = \sum_{x > x^{'}, y} (w_{x x^{'}}^{y} p_{x^{'} y} - w_{x^{'} x}^{y} p_{x y}) ln \frac{w_{x x^{'}}^{y} p_{x^{'} y}}{w_{x^{'} x}^{y} p_{x y}}, \\ {\dot{S}}_{p r, y} = \sum_{y > y^{'}, x} (w_{y y^{'}}^{x} p_{x y^{'}} - w_{y^{'} y}^{x} p_{x y}) ln \frac{w_{y y^{'}}^{x} p_{x y^{'}}}{w_{y^{'} y}^{x} p_{x y}}, \end{matrix}

(37)

and for the entropy flux

\begin{matrix} {\dot{S}}_{f l, x y} = {\dot{S}}_{f l, x} + {\dot{S}}_{f l, y}, \end{matrix}

(38)

where

\begin{matrix} {\dot{S}}_{f l, x} = \sum_{x > x^{'}, y} (w_{x x^{'}}^{y} p_{x^{'} y} - w_{x^{'} x}^{y} p_{x y}) ln \frac{w_{x x^{'}}^{y}}{w_{x^{'} x}^{y}}, \\ {\dot{S}}_{f l, y} = \sum_{y > y^{'}, x} (w_{y y^{'}}^{x} p_{x y^{'}} - w_{y^{'} y}^{x} p_{x y}) ln \frac{w_{y y^{'}}^{x}}{w_{y^{'} y}^{x}} . \end{matrix}

(39)

The terms

{\dot{S}}_{p r, x}, {\dot{S}}_{p r, y}

can be interpreted as local entropy production rates, while

{\dot{S}}_{f l, x}, {\dot{S}}_{f l, y}

are local entropy fluxes. As before,

{\dot{S}}_{p r, x}

and

{\dot{S}}_{p r, y}

are both non-negative, which means that the second law is valid also in each of the subsystems.

The interesting thing coming from all these equations is that local entropy productions

{\dot{S}}_{p r, x}

,

{\dot{S}}_{p r, y}

are related to information flows

{\dot{I}}_{x}

and

{\dot{I}}_{y}

as [81]

\begin{matrix} {\dot{S}}_{p r, x} = {\dot{S}}_{x} + {\dot{S}}_{f l, x} - {\dot{I}}_{x}, \\ {\dot{S}}_{p r, y} = {\dot{S}}_{y} + {\dot{S}}_{f l, y} - {\dot{I}}_{y} . \end{matrix}

(40)

These equations imply that local entropy balance involves both energy dissipation (

{\dot{S}}_{f l, x}, {\dot{S}}_{f l, y}

) and the flow of information (

{\dot{I}}_{x}, {\dot{I}}_{y}

). Consequently, energy and information are mutually coupled, and one influences the other. Equation (40) provides an important link between information processing and its energy cost.

How do the results represented by Equation (40) relate to the Maxwell demon? Although the quantity

{\dot{S}}_{p r, x}

always satisfies

{\dot{S}}_{p r, x} \geq 0

, the sum

{\dot{S}}_{x} + {\dot{S}}_{f l, x}

can be negative if the information flow

{\dot{I}}_{x} < 0

. Thus, from a local point of view of the subsystem X, its visible “entropy production” (i.e.,

{\dot{S}}_{x} + {\dot{S}}_{f l, x}

) can be negative if the presence of the Y subsystem is neglected. This seems like a violation of the second law (requiring positive entropy production rate), and it is closely related to the Maxwell demon thought experiment. Obviously, the inclusion of the information flow term

{\dot{I}}_{x}

in the local entropy production solves the paradox.

5. Neural Inference

In this section, we consider a simple model of how neurons estimate an external signal. We will discuss this model in terms of information processing as well as thermodynamics.

Neurons in the visual cortex selectively respond to different velocities of a moving stimulus [21,83]. Generally, each neuron has a preferred velocity to which it responds in the form of an elevated firing rate (it is called a tuning curve, see, e.g., [19,21]). Thus, a single neuron is unable to estimate (decode) the velocity of the moving stimulus, because it reacts only to a very small range of velocities. However, a large population of neurons can do it, although with only some accuracy. Below, we consider how such a decoding can take place. In the example below, which is mostly a “thought experiment”, the moving stimulus should be a particle with a substantial size and velocity to be detectable by visual neurons. Typical Brownian particles are too small and too slow to be directly observable by the mammalian visual system. To make them observable, a magnifying instrument such as a microscope is needed. Thus, one can think about the moving stimulus below as a magnified Brownian particle from Section 2, or alternatively, as a macroscopic object moving stochastically, e.g., due to strong stochastic force

F_{0}

not related to thermal fluctuations of the environment. The analysis below is independent of either choice.

The model we use is a stochastic version of the deterministic model called a linear recurrent network for interacting neurons (see Equation (7.17) in [19]). In this model, the activity or firing rate

r_{i}

(number of action potentials or spikes per time unit) of a single neuron labeled as i in the visual cortex can be represented as

\begin{matrix} {\dot{r}}_{i} = - \frac{(r_{i} - c_{i} (v))}{τ_{n 0}} + \frac{1}{N} \sum_{j} w_{i j} r_{j} + \sqrt{\frac{2 σ_{r 0}^{2}}{τ_{n 0}}} η_{i} (t), \end{matrix}

(41)

where

w_{i j}

is the synaptic weight (or strength) characterizing the magnitude of synaptic transmission coming from neuron j, and

i = 1, 2, \dots, N

, with N number of neurons in the network (here,

w_{i j}

are in units of inverse of time). Since the majority of synapses in the cortex of mammals is excitatory (about 80–90%; [84,85]), the weights

w_{i j}

are assumed to be positive, which implies that the steady-state average values of

r_{i}

are all positive. The parameter

τ_{n 0}

is the time constant of the single neuron dynamics related to changes in its firing rate, and

σ_{r 0}

represents the standard deviations of the Gaussian noise

η_{i}

related to firing rate fluctuations. The function

c_{i} (v)

is the sensory input coming to neuron i, which is discussed below. The activity of the neuron i is a compromise between this sensory input and the synaptic contributions coming from other neurons in the network. It should be also clearly stated that the noise term in Equation (41) is not of thermal origin. Microscopic thermal fluctuations present in synapses and different ion channels have only a marginal influence on neural activity, since their numbers for a typical cortical neuron are very large (small variance), although there are some exceptions (see [86]). More important are the fluctuations caused by an unreliable sensory signal and unpredictable synaptic transmission (probabilistic neurotransmitter release), the latter being caused by the low numbers of signaling molecules involved [87].

Before we go further, let us talk about the range of validity of Equation (41). First, both the linear term associated with synaptic interactions and the additive noise can occasionally make the firing rate

r_{i}

negative, which is obviously wrong (even if all synaptic weights are positive). However, this can happen only transiently, especially in the limit of weak noise. Moreover, the steady-state average values of firing rates are always positive, since on average, the term

\sum_{j} w_{i j} r_{j}

is positive. This means that the linear approximation is a “reasonable” approximation, and we use it primarily because such a linear model can be analytically analyzed, revealing some generic features. Second, the time constant

τ_{n 0}

in Equation (41) cannot be too small. It must be significantly larger than a time constant related to synaptic transmission (5 ms and 120 ms, related to AMPA and NMDA synaptic receptors), such that synaptic currents assume quasi-stationary values [19]. In what follows, i.e., the analysis of the dynamics and information aspects of this model, is a novel calculation.

The sensory input

c_{i} (v)

received by neuron i is in these particular settings also called the tuning curve for neuron i. It can be approximated by a Gaussian as (see Equation (3.28) in [19])

\begin{matrix} c_{i} (v) = r_{m} exp [- \frac{{(v - u_{i})}^{2}}{2 ϵ^{2}}], \end{matrix}

(42)

where

r_{m}

is the maximal firing rate in response to the visual stimulus (the same for all neurons in the network),

u_{i}

is the preferred velocity for the neuron i, and

ϵ

characterizes the maximal deviation from the preferred velocity for which neurons are still (weakly) activated. We take

ϵ

to be small, i.e., typically

ϵ / u_{i} ≪ 1

. Note that for

v = u_{i}

, we have

c_{i} (v) = r_{m}

, while for

v = u_{i} \pm 2 ϵ

, we have

c_{i} (v) = 0.14 r_{m}

. Equation (41) indicates that the neuron adjusts dynamically to the changes in the stimulus (in its sensitivity range represented by

c_{i} (v)

) and in the synaptic input coming from other neurons.

Since the decoding of stimulus velocity is a collective process, we define a population average of all neural activities, denoted as

\bar{r}

, and defined as

\bar{r} = (1 / N) \sum_{i = 1}^{N} r_{i}

. Consequently, the dynamic of the population average neural activity

\bar{r}

can be represented as

\begin{matrix} \dot{\bar{r}} = - \frac{[\bar{r} - κ (\bar{w}) \bar{c} (v)]}{τ_{n}} + \sqrt{\frac{2 σ_{r}^{2}}{N τ_{n}}} \bar{η} (t), \end{matrix}

(43)

where we made a mean-field type approximation

(1 / N^{2}) \sum_{i, j} w_{i j} r_{j} \approx \bar{w} \bar{r}

, where

\bar{w}

is the population average synaptic weight, i.e.,

\bar{w} = (1 / N^{2}) \sum_{i} \sum_{j} w_{i j}

. We assume that

\bar{w} > 0

, which follows from the fact that the majority of synapses are excitatory [84,85]. The term

\bar{c} (v)

is the population average tuning curve,

\bar{c} (v) = (1 / N) \sum_{i = 1}^{N} c_{i} (v)

, which is given by (see Appendix A)

\begin{matrix} \bar{c} (v) \approx \frac{r_{m} ϵ}{α} exp (- \frac{v^{2}}{2 α^{2}}) \\ \approx \frac{r_{m} ϵ}{α} [1 - \frac{v^{2}}{2 α^{2}} + O (α^{- 4})], \end{matrix}

(44)

where

α

is the velocity range to which neurons respond. The approximate equality in Equation (44) follows from the fact that

α

is generally large, i.e.,

α ≫ 1

, and we will use that approximation in the calculations below. The parameter

κ (\bar{w})

is the network enhancement factor given by

\begin{matrix} κ (\bar{w}) = \frac{1}{(1 - \bar{w} τ_{n 0})}, \end{matrix}

(45)

since for

\bar{w} \mapsto τ_{n 0}^{- 1}

, the parameter

κ (\bar{w}) \mapsto \infty

(obviously, we must assume that

\bar{w} < τ_{n 0}^{- 1}

). The parameter

τ_{n}

is the effective time constant of the neural population dynamics

τ_{n} = κ (\bar{w}) τ_{n 0}

, and

\bar{η}

is the population-averaged noise, i.e.,

\bar{η} = (1 / \sqrt{N}) \sum_{i = 1}^{N} η_{i}

with zero mean and unit variance, with

σ_{r}

being the effective standard deviation of the noise in the network, i.e.,

σ_{r} = \sqrt{κ (\bar{w})} σ_{r 0}

. Note that the main effect of the network interactions on population dynamics, as compared to the single neuron dynamics, is to significantly enhance the tuning curve, the time constant, and the standard deviation.

Equation (43) corresponds to the time-dependent distribution of mean neural activity conditioned on stimulus velocity v,

ρ (\bar{r} | v, t)

, which is in the form [53]

\begin{matrix} ρ (\bar{r} | v, t) = \frac{exp (- N {[\bar{r} - {〈 \bar{r} (v, t) 〉}_{ρ (\bar{r} | v)}]}^{2} / 2 σ_{r}^{2} (t))}{\sqrt{2 π σ_{r}^{2} (t) / N}}, \end{matrix}

(46)

where

σ_{r}^{2} (t) = σ_{r}^{2} (1 - e^{- 2 t / τ_{n}})

, and

{〈 \bar{r} (v, t) 〉}_{ρ (\bar{r} | v)}

is the stochastic average of the population mean of neural activity over the conditional distribution

ρ (\bar{r} | v, t)

, i.e.,

{〈 \bar{r} (v, t) 〉}_{ρ (\bar{r} | v)} = \int d \bar{r} ρ (\bar{r} | v, t) \bar{r}

. The latter can be found quickly by averaging Equation (43) over noise and then by finding its time-dependent solution. The result is

\begin{matrix} {〈 \bar{r} (v, t) 〉}_{ρ (\bar{r} | v)} = \bar{r} (0) e^{- t / τ_{n}} + \frac{ϵ κ r_{m}}{α} (1 - e^{- t / τ_{n}}) \\ - \frac{ϵ κ r_{m}}{2 τ_{n} α^{3}} e^{- t / τ_{n}} \int_{0}^{t} d t^{'} e^{t^{'} / τ_{n}} v^{2} (t^{'}) + O (α^{- 5}), \end{matrix}

(47)

where

\bar{r} (0)

is the initial mean neural activity. This equation indicates that the outside stimulus modulates the collective neural activity only weakly (in the order of

\sim α^{- 3}

). In addition, neurons respond to the stimulus with some delay governed by the effective time constant

τ_{n}

.

5.1. Mutual Information between Neural Activities and the Stimulus

The degree of correlations between the neural collective activity and the stimulus velocity is quantified by mutual information

I (\bar{r}, v)

as

\begin{matrix} I (\bar{r}, v) = {〈 ln ρ (\bar{r} | v) 〉}_{P (\bar{r}, v)} - {〈 ln ρ (\bar{r}) 〉}_{ρ (\bar{r})}, \end{matrix}

(48)

where

ρ (\bar{r})

is the distribution of neural activities, and averaging in the first term is performed over the joint probability density

P (\bar{r}, v)

of collective neural activity

\bar{r}

and stimulus velocity v, with

P (\bar{r}, v) = ρ (\bar{r} | v) P (v)

. The distribution

ρ (\bar{r})

is found by marginalizing the joint distribution

P (\bar{r}, v)

over velocities v. The result of this procedure is (up to order

α^{- 3}

)

\begin{matrix} ρ (\bar{r}, t) \approx \frac{exp (- \frac{N}{2 σ_{r}^{2} (t)} {[\bar{r} - r_{0} + \frac{r_{1}}{α^{3}} \int_{0}^{t} d t^{'} e^{t^{'} / τ_{n}} {〈 v^{2} (t^{'}) 〉}_{P (v)}]}^{2})}{\sqrt{2 π σ_{r}^{2} (t) / N}}, \end{matrix}

(49)

where

r_{0}

and

r_{1}

are the stimulus-independent and the stimulus-dependent collective neural activities

\begin{matrix} r_{0} = \bar{r} (0) e^{- t / τ_{n}} + \frac{ϵ κ r_{m}}{α} (1 - e^{- t / τ_{n}}), \\ r_{1} = \frac{ϵ κ r_{m}}{2 τ_{n}} e^{- t / τ_{n}} . \end{matrix}

(50)

Since both distributions

ρ (\bar{r})

and

ρ (\bar{r} | v)

are Gaussian, the mutual information

I (\bar{r}, v)

can be calculated easily as

\begin{matrix} I (\bar{r}, v) \approx \frac{N {(ϵ κ r_{m})}^{2}}{8 α^{6} σ_{r}^{2} (t)} e^{- 2 t / τ_{n}} \int_{0}^{t} d t_{1} \int_{0}^{t} d t_{2} e^{(t_{1} + t_{2}) / τ_{n}} \\ \times [{〈 v^{2} (t_{1}) v^{2} (t_{2}) 〉}_{P (v)} - {〈 v^{2} (t_{1}) 〉}_{P (v)} {〈 v^{2} (t_{2}) 〉}_{P (v)}] + O (α^{- 8}) . \end{matrix}

(51)

This equation shows that the mutual information between neural collective activity and the stimulus velocity is proportional to the averaged temporal auto-correlations of the velocity square. Moreover, the larger the number of neurons N decoding the stimulus and the larger the network enhancement factor

κ

, the higher the mutual information. This clearly indicates that the effect of the network is a key ingredient for the accurate decoding of information from the outside world.

It is also interesting to see the effect of the time scale associated with variability in the stimulus velocity on the mutual information

I (\bar{r}, v)

in Equation (51). Assuming that the temporal auto-correlations of the stimulus velocity square are characterized by time constant

τ_{c}

, i.e., that they decay exponentially as

{〈 v^{2} (t_{1}) v^{2} (t_{2}) 〉}_{P (v)} - {〈 v^{2} (t_{1}) 〉}_{P (v)} {〈 v^{2} (t_{2}) 〉}_{P (v)} = C_{0} e^{- | t_{1} - t_{2} | / τ_{c}}

, where

C_{0}

is some constant, we find that (see Appendix B)

\begin{matrix} I {(\bar{r}, v)}_{t \mapsto \infty} \approx \frac{N {(ϵ κ r_{m})}^{2}}{8 α^{6} σ_{r}^{2}} \frac{C_{0} τ_{n}^{2} τ_{c}}{(τ_{n} + τ_{c})} + O (α^{- 8}) . \end{matrix}

(52)

This implies that for very fast variability in the stimulus velocity (

τ_{c} \mapsto 0

), the mutual information between the stimulus and the neural activity is close to 0. Consequently, neurons in this limit cannot track the particle velocity at all. However, as the stimulus variability slows down (

τ_{c}

grows), the mutual information increases and saturates at

τ_{c} ≫ τ_{n}

. This means that in this limit, neurons can decode the stimulus optimally. In general, this result shows that time scale separation is important for the quality of neural inference, with a preference for slower stimuli, which agrees with the general results obtained in [88].

5.2. Energy Cost of Decoding the Stimulus

Guessing the actual value of the stimulus by neural network is not free of cost. In fact, it requires some amount of energy that neurons have to use to perform that function well. The energy used by neurons can be estimated by calculating the entropy production rate, with the help of Equation (21), with the probability density represented by a conditional distribution for collective neural activity given by Equation (46). We find the conditional entropy production rate

{\dot{S}}_{ρ (\bar{r} | v)}

(conditioned on the stimulus velocity v) of neural activity as

\begin{matrix} {\dot{S}}_{ρ (\bar{r} | v)} = \frac{1}{τ_{n}} (\frac{e^{- 2 t / τ_{n}}}{(e^{2 t / τ_{n}} - 1)} + \frac{N {[{〈 \bar{r} (t) 〉}_{ρ (\bar{r} | v)} - κ \bar{c} (v)]}^{2}}{σ_{r}^{2}}) . \end{matrix}

(53)

This formula indicates that the higher the discrepancy between the population-averaged tuning curve and the averaged neural activity, the larger the entropy production rate of neurons. In other words, neurons make an energetic effort to keep track of the actual particle velocity.

More explicit formula for the entropy production for longer times (

t ≫ t_{n}

), after transients are gone, is

\begin{matrix} {\dot{S}}_{ρ (\bar{r} | v)} \approx \frac{N {(ϵ κ r_{m})}^{2}}{4 α^{6} σ_{r}^{2} τ_{n}} {(v^{2} (t) - \frac{e^{- t / τ_{n}}}{τ_{n}} \int_{0}^{t} d t^{'} e^{t^{'} / τ_{n}} {〈 v^{2} (t^{'}) 〉}_{P (v)})}^{2}, \end{matrix}

(54)

which implies that

{\dot{S}}_{ρ (\bar{r} | v)}

is proportional to fluctuations of the square of velocity around its delayed average. Thus, for stationary stimulus velocity, its tracking by neurons is essentially energetically costless (neurons, however, use energy for other biophysical processes [5,6,7,13]). Note also that the prefactor in Equation (54) is the same as that in Equation (51) for the mutual information between neural activities and the stimulus velocity. This means that gaining information about the outside signals requires a proportionally large supply of energy; i.e., better prediction needs proportionally more energy.

6. Stochastic Dynamics of Synaptic Plasticity: Learning and Memory Storage

Synaptic weights are not fixed but change in the neural network although much slower than neural electric activities. Synaptic plasticity is the mechanism with which synaptic weights change, and it is responsible for learning and memory formation in neural systems [19,89,90,91,92]. The model analyzed in this section is a novel extension and modification of the model analyzed in [14].

6.1. Dynamics of Synaptic Weights

One of the most influential and important models of synaptic plasticity is the so-called BCM model [51], which was used for understanding the development of the mammalian visual cortex. It is an extension of the Hebb idea that connections between simultaneously activated presynaptic and postsynaptic neurons become stronger, but the model is constructed in such a way that the synaptic weights stabilize at some level without catastrophic run-away as it takes place for a classic Hebb’s rule [19]. The BCM plasticity rule, which was originally a deterministic rule, was extended to a stochastic rule by the author in [14], because synaptic plasticity is stochastic in nature [93,94]. In the case of a given postsynaptic neuron with activity r, which receives

N_{s}

synaptic inputs from neurons with activities

f_{i}

(

i = 1, \dots, N_{s}

), the stochastic BCM rule takes the following form [14]

\begin{matrix} \frac{d w_{i}}{d t} = λ f_{i} r (r - θ) - \frac{w_{i}}{τ_{w}} + \frac{\sqrt{2} σ_{w}}{\sqrt{τ_{w}}} ξ_{i} \end{matrix}

(55)

\begin{matrix} τ_{θ} \frac{d θ}{d t} = - θ + β r^{2}, \end{matrix}

(56)

where

w_{i}

is the synaptic weight (proportional to the number of receptors on a synaptic membrane) related to the electric conductance of signals coming from presynaptic neuron i,

λ

is the amplitude of synaptic plasticity controlling the rate of change of synaptic weight,

τ_{w}

is the synaptic time constant controlling the weight decay duration,

θ

is the homeostatic variable, the so-called sliding threshold (adaptation for plasticity) related to an interplay of LTP and LTD (respectively, long-term potentiation and long-term depression [19]) with the time constant

τ_{θ}

, and

β

is the coupling intensity of

θ

to the postsynaptic firing rate r. The parameter

σ_{w}

is the standard deviation of weights due to stochastic intrinsic fluctuations in synapses, which are represented as Gaussian white noise

ξ_{i}

with zero mean and Delta function correlations, i.e.,

{〈 ξ_{i} (t) 〉}_{η} = 0

and

{〈 ξ_{i} (t) ξ_{j} (t^{'}) 〉}_{η} = δ_{i j} δ (t - t^{'})

[53]. Equations (55) and (56) correspond to plastic synapses located on a single neuron. We consider this example, because it is easier to analyze than the whole network of neurons.

It is often assumed that

τ_{θ} / τ_{w} ≪ 1

, and then the homeostatic variable achieves a steady state on the time scale for changes in synaptic weights, i.e.,

d θ / d t \approx 0

. This means that for long times, we have approximately

θ \approx β r^{2}

, and consequently, the BCM rule takes a simple (one equation) form

\begin{matrix} \frac{d w_{i}}{d t} = λ f_{i} r^{2} (1 - β r) - \frac{w_{i}}{τ_{w}} + \frac{\sqrt{2} σ_{w}}{\sqrt{τ_{w}}} ξ_{i} . \end{matrix}

(57)

As we saw in the previous section, the neural network function is determined primarily by the collective dynamics of neurons and synapses. For that reason, it makes sense to consider also the dynamics of the population-averaged synaptic weight. In this case, it is not the population average of all synapses in the network but rather the population average of synapses on a single neuron, i.e.,

\bar{w} = (1 / N_{s}) \sum_{i} w_{i}

. Summing both sides of Equation (57) with the rescaling factor

N_{s}

, we obtain the population averaged dynamics of

\bar{w}

\begin{matrix} \frac{d \bar{w}}{d t} = λ \bar{f} r^{2} (1 - β r) - \frac{\bar{w}}{τ_{w}} + \frac{\sqrt{2} σ_{w}}{\sqrt{N_{s} τ_{w}}} \bar{ξ}, \end{matrix}

(58)

where

\bar{f} = (1 / N_{s}) \sum_{i} f_{i}

, and

\bar{ξ} = (1 / \sqrt{N_{s}}) \sum_{i} ξ_{i}

. Moreover, the neural activity is much faster than the synaptic dynamics (seconds vs. minutes), i.e.,

τ_{n 0} / τ_{w} ≪ 1

. Hence, the neural dynamics also reach a quasi-stationary state on the time scales

\sim τ_{w}

, and it can be approximated by (from Equation (41))

\begin{matrix} r \approx c (v) + τ_{n 0} \bar{f} \bar{w}, \end{matrix}

(59)

where we used a mean-field expression

(1 / N_{s}) \sum_{i} w_{i} f_{i} \approx \bar{w} \bar{f}

, and

c (v)

is given by Equation (42). In the following, we treat

\bar{f}

as the time-independent fixed parameter characterizing the level of activity in the local network.

Inserting Equation (59) into Equation (58), we obtain an effective equation for the dynamics of population mean synaptic weight

\bar{w}

\begin{matrix} \frac{d \bar{w}}{d t} = λ \bar{f} {[c (v) + τ_{n 0} \bar{f} \bar{w}]}^{2} (1 - β [c (v) + τ_{n 0} \bar{f} \bar{w}]) - \frac{\bar{w}}{τ_{w}} + \frac{\sqrt{2} σ_{w}}{\sqrt{N_{s} τ_{w}}} \bar{ξ}, \end{matrix}

(60)

which has a general form of the Langevin equation as in Equation (2) with the generalized force acting on synapses

\begin{matrix} F_{w} (\bar{w}) = λ \bar{f} {[c (v) + τ_{n 0} \bar{f} \bar{w}]}^{2} (1 - β [c (v) + τ_{n 0} \bar{f} \bar{w}]) - \frac{\bar{w}}{τ_{w}} . \end{matrix}

(61)

That force depends nonlinearly on

\bar{w}

, which is one of the reasons for complex dynamics of synaptic plasticity, which are additionally influenced by synaptic noise (∼

σ_{w}

). Note, however, that the noise for the mean synaptic weight is much weaker than the noise in individual synapses due to the rescaling factor

1 / \sqrt{N_{s}}

.

How do synaptic weights react to the sensory input represented by the tuning curve

c (v)

(see Equation (42))? Since we consider here a single postsynaptic neuron, and it has a preferred velocity of the stimulus that is mostly different than the actual velocity of the stimulus, the value of

c (v)

is most of the time close to zero. The stimulus

c (v)

jumps between 0 and its maximal value

r_{m}

only transiently at precisely those times when the velocity of the outside particle matches the preferred velocity of the neuron. This is the basic setup we consider here: input coming to synapses is transient, which however can be enough to increase significantly their mean population weight

\bar{w}

in some circumstances. This process of changing

\bar{w}

is essentially the “learning” information about the particle velocity, which can be stored in the mean weight

\bar{w}

for some time (“memory”). Below, we describe in more detail how these two processes, learning and memory, take place within this model.

Given the transient nature of

c (v)

, we consider it as a perturbation to the collective synaptic dynamics in Equation (60). The deterministic version of Equation (60), i.e., with

σ_{w} = 0

, for

c (v) = 0

, can have either one fixed point at

\bar{w} = 0

, or three fixed points, of which two are stable, corresponding to bistability (the fixed points are the solutions of the equation

F_{w} = 0

). The change from monostability to bistability in the dynamics takes place if the following condition is satisfied:

\begin{matrix} λ τ_{w} τ_{n o} {\bar{f}}^{2} > 4 β, \end{matrix}

(62)

which happens for sufficiently large plasticity amplitude

λ

and/or presynaptic firing rate

\bar{f}

. In the bistable regime, the two stable fixed points are denoted as

{\bar{w}}_{d}

(“down” state) and

{\bar{w}}_{u}

(“up” state), and they have the following values:

\begin{matrix} {\bar{w}}_{d} = 0, \\ {\bar{w}}_{u} = \frac{1 + \sqrt{1 - β_{f}}}{2 β τ_{n 0} \bar{f}}, \end{matrix}

(63)

where

β_{f} = 4 β / (λ τ_{w} τ_{n 0} {\bar{f}}^{2})

. The unstable fixed point denoted as

{\bar{w}}_{m}

(middle state) is

\begin{matrix} {\bar{w}}_{m} = \frac{1 - \sqrt{1 - β_{f}}}{2 β τ_{n 0} \bar{f}} . \end{matrix}

(64)

Note that

{\bar{w}}_{u}

and

{\bar{w}}_{m}

are pushed toward zero for very large presynaptic firing rates

\bar{f}

, which suggests that bistability is lost for very large presynaptic firing

\bar{f}

.

Now, consider the stochastic version of Equation (60), i.e., with inclusion of the noise (

σ_{w} \neq 0

). In this case, the brief input

c (v)

can cause a dynamic transition from the down state (

{\bar{w}}_{d}

) to the up state (

{\bar{w}}_{u}

) in the collective behavior of synapses but only if two conditions are met (Figure 1). The first is the bistability condition represented by Equation (62). The second condition is such that the input

c (v)

cannot be too brief, which translates to the requirement that the stimulus velocity cannot change too quickly. The latter simply means that slow synapses are unable to react to too-fast inputs (Figure 1), which is a similar situation to the case of poor neural inference of too-fast stimuli (see, the previous Section). The successful transition to the up state

{\bar{w}}_{u}

is a form of brief learning, and maintaining the acquired information about the stimulus c for a long time represents the memory trace. Keeping the information in the synaptic weights for a prolonged time is possible even for very strong intrinsic noise (

σ_{w} \sim {\bar{w}}_{u}

), because collective synaptic noise is suppressed by the number of synapses

N_{s}

(compare Equations (57) and (60)). Ultimately, the memory will be lost, i.e.,

\bar{w}

will decay from

{\bar{w}}_{u}

to

{\bar{w}}_{d}

, and this can happen in several ways. The most likely are a very strong downward noise fluctuation or a significant drop in the presynaptic activity

\bar{f}

below some level.

Instead of speaking about forces acting on synapses, we can alternatively say that the population mean of synaptic weight moves in an effective potential

V (\bar{w}, c)

, given by

V (\bar{w}, c) = - \int_{0}^{\bar{w}} d x F_{w} (x, c)

. It can be determined explicitly, and it is composed of two contributions

\begin{matrix} V (\bar{w}, c) = V_{0} (\bar{w}) + Δ V (\bar{w}, c), \end{matrix}

(65)

where

V_{0} (\bar{w})

is the “core” potential

\begin{matrix} V_{0} (\bar{w}) = \frac{{\bar{w}}^{2}}{2 τ_{w}} - \frac{1}{3} λ {\bar{f}}^{3} τ_{n 0}^{2} {\bar{w}}^{3} + \frac{1}{4} λ {\bar{f}}^{4} τ_{n 0}^{3} β {\bar{w}}^{4}, \end{matrix}

(66)

and

Δ V (\bar{w}, c)

is the perturbation to the core potential due to the transient stimulus

\begin{matrix} Δ V (\bar{w}, c) = - λ \bar{f} c^{2} (1 - β c) \bar{w} - \frac{1}{2} λ {\bar{f}}^{2} τ_{n 0} c (2 - 3 β c) {\bar{w}}^{2} + λ {\bar{f}}^{3} τ_{n 0}^{2} β c {\bar{w}}^{3} . \end{matrix}

(67)

The core potential

V_{0} (\bar{w})

can have either one minimum (monostability) or two minima (bistability) depending on the strength of synaptic plasticity

λ

and/or the level of presynaptic neural activity

\bar{f}

(Figure 2A). The “phase transition” from monostability to bistability occurs if the condition in Equation (62) is satisfied, which is the same as the condition for the appearance of the three fixed points. Thus, one can think about the plasticity amplitude

λ

or the presynaptic firing rate

\bar{f}

as tuning parameters for the phase transition in this model. More interesting for information storing is the bistable regime with two minima, as is the case with storing information in electronic hardware [26,33], and we focus on this case below. The minima of

V_{0}

are situated exactly at the two stable fixed points

{\bar{w}}_{d}

and

{\bar{w}}_{u}

determined before (Equation (63)). The maximum of

V_{0}

appears at the middle (unstable) fixed point

{\bar{w}}_{m}

. However, note that for the realistic synaptic and neural parameters, the minimum at

{\bar{w}}_{d}

is very shallow (Figure 2A), and this is due to the large synaptic time constant

τ_{w}

.

In the potential-like picture, the effective mean synaptic weight wanders around the two minima of the potential

V_{0} (\bar{w})

with occasional large jumps over the potential barrier (i.e., the maximum) triggered either by turning on the input

c (v)

, or by noise, or both (Figure 2B). However, the transitions from

{\bar{w}}_{d}

to

{\bar{w}}_{u}

are more easier and frequent than the reverse transitions due to the shallowness of the potential

V_{0}

at

{\bar{w}}_{d}

. This means that not only sensory input can trigger the learning and subsequent “memory” of that input but also the noise can induce sporadically “learning and memory”. The latter can be thought as false memories, which are also present in real brains.

The dwelling times of the collective weight

\bar{w}

close to the minima at

{\bar{w}}_{d}

and

{\bar{w}}_{u}

can be found from the well-known Kramers’ formula [53]. In our case, they are given by

\begin{matrix} T_{d} = \frac{2 π}{\sqrt{V_{d}^{(2)} | V_{m}^{(2)} |}} exp (\frac{N_{s} τ_{w}}{σ_{w}^{2}} [(V_{0, m} - V_{0, d}) + (Δ V_{m} - Δ V_{d})]), \\ T_{u} = \frac{2 π}{\sqrt{V_{u}^{(2)} | V_{m}^{(2)} |}} exp (\frac{N_{s} τ_{w}}{σ_{w}^{2}} [(V_{0, m} - V_{0, u}) + (Δ V_{m} - Δ V_{u})]), \end{matrix}

(68)

where

V_{0, m} = V_{0} ({\bar{w}}_{m})

,

V_{0, d} = V_{0} ({\bar{w}}_{d})

, and

V_{0, u} = V_{0} ({\bar{w}}_{u})

, and analogically for

Δ V

. (Note that

V_{0, d} = Δ V_{d} = 0

.) The quantity in the exponent of

T_{d}

(

T_{u}

) is proportional to the potential barrier between the minimum at

{\bar{w}}_{d}

(

{\bar{w}}_{u}

) and the maximum at

{\bar{w}}_{m}

. The symbols

V_{d}^{(2)}, V_{u}^{(2)}, V_{m}^{(2)}

denote the second derivatives of

V (\bar{w}, c)

with respect to

\bar{w}

at points

{\bar{w}}_{d}

,

{\bar{w}}_{u}

, and

{\bar{w}}_{m}

, respectively. The formulas in Equation (68) indicate that switching on the input c causes the deformation of the potential barrier (Figure 2B). In particular, in our case

Δ V_{m} - Δ V_{d} < 0

for

c > 0

, meaning that the barrier from the down to up state decreases, which can facilitate the transition to the up state if synapses were initially in the lower state. Moreover, while the fluctuations around the minima are much slower than neural activity (

τ_{n 0}

), they are more frequent (

\sim τ_{w}

) than the jumps over the potential barrier, which happen rarely (

\sim T_{d}, T_{u} ≫ τ_{w}

).

The existence of bistability in the collective behavior of synapses implies that we can effectively represent the continuous stochastic dynamics of synaptic weights as the jumping dynamics of a two-state system. In this discrete effective system, we can define probability

p_{d}

that the collective state of all

N_{s}

synapses has the weight

{\bar{w}}_{d}

and another probability

p_{u}

corresponding to the higher population weight

{\bar{w}}_{u}

. The transition rates between the down and up states can be determined from the dwelling times as their inverses. In particular, the transition rate

ω_{u d}

from the down to up state is

ω_{u d} = 1 / T_{d}

, and the opposite transition the from up to down state is

ω_{d u} = 1 / T_{u}

. In our case, because of the asymmetric potential, we have that

ω_{d u} ≪ ω_{u d}

, i.e., the transitions to the up state are more frequent than in the opposite direction. The master equation associated with this dynamic is

\begin{matrix} {\dot{p}}_{u} = ω_{u d} (1 - p_{u}) - ω_{d u} p_{u}, \end{matrix}

(69)

and

p_{d} = 1 - p_{u}

. From the above, it is clear that the transition rates

ω_{u d}, ω_{d u}

are approximately the products of two terms:

ω_{u d} = ω_{u d, 0} Γ_{u d} (c)

and

ω_{d u} = ω_{d u, 0} Γ_{d u} (c)

, one of which is independent of the input c (

ω_{u d, 0}

and

ω_{d u, 0}

) and the second is dependent on it via

Δ V

(the terms

Γ_{u d} (c)

and

Γ_{d u} (c)

). Thus, turning on the input can modify the distribution of the probabilities

p_{d}, p_{u}

, and it can also induce transitions. The existence of bistability for the population of synapses can be also useful in terms of information storing, which we address next.

6.2. Information Gain and Maintenance, and Associated Energy Cost

Learning in our synaptic system can be thought as gaining information about the stimulus c due to its brief switching on and off. Such a transient change causes changes in

ω_{d u}, ω_{u d}

, which modifies the probabilities

p_{d}, p_{u}

. The information gain can be quantified by calculating the KL divergence between an initial distribution of probabilities after the brief learning and the final steady-state distribution. Memory in this system can be thought as maintaining that information for a prolonged time after the stimulus c was brought to 0.

Below, we consider in detail the maintenance of the information, and its associated energy cost, and this is a novel analysis. Let us assume that at time

t = 0

, the collective synaptic system has a probability

p_{u} (0)

larger than its steady-state value (before learning)

p_{u, \infty} = ω_{u d, 0} / ω_{0}

, where

ω_{0} = ω_{d u, 0} + ω_{u d, 0}

. At

t = 0

the stimulus is switched off and the transition rates suddenly jump to their steady state values (

ω_{u d} \mapsto ω_{u d, 0}

, and

ω_{d u} \mapsto ω_{d u, 0}

). Consequently, the probability

p_{u} (t)

relaxes to its steady-state value

p_{u, \infty}

according to

p_{u} (t) = [p_{u} (0) - p_{u, \infty}] e^{- ω_{0} t} + p_{u, \infty}

. This relaxation is related to losing the acquired information during the learning phase and has a characteristic time scale, which in this case can be called the memory lifetime

T_{m} = 1 / ω_{0}

. Thus, the memory lifetime is equivalent to a temporal retaining of information about the stimulus in the population of synaptic weights.

The loss of information about the stimulus can be also quantified by the KL divergence

D_{K L} (\vec{p} (t) | | {\vec{p}}_{\infty})

between the actual probability distribution

\vec{p} (t) = (p_{d} (t), p_{u} (t))

and the steady-state distribution

{\vec{p}}_{\infty} = (p_{d, \infty}, p_{u, \infty})

. We find

\begin{matrix} D_{K L} (\vec{p} (t) | | {\vec{p}}_{\infty}) = [p_{d, \infty} - Δ e^{- ω_{0} t}] ln (1 - (Δ / p_{d, \infty}) e^{- ω_{0} t}) \\ + [p_{u, \infty} + Δ e^{- ω_{0} t}] ln (1 + (Δ / p_{u, \infty}) e^{- ω_{0} t}), \end{matrix}

(70)

where

Δ

characterizes the magnitude of an initial perturbation from the steady state caused by the transient stimulus, i.e.,

Δ = p_{u} (0) - p_{u, \infty}

, and

Δ > 0

.

The rate of Kullback–Leibler divergence, denoted as

{\dot{D}}_{K L}

, takes the form

\begin{matrix} {\dot{D}}_{K L} (\vec{p} (t) | | {\vec{p}}_{\infty}) = - ω_{0} Δ e^{- ω_{0} t} ln (\frac{1 + (Δ / p_{u, \infty}) e^{- ω_{0} t}}{1 - (Δ / p_{d, \infty}) e^{- ω_{0} t}}), \end{matrix}

(71)

from which it is clear that information is lost exponentially with the rate proportional to the inverse of memory lifetime

ω_{0}

.

The energy loss during the relaxation to the steady state is proportional to the entropy production rate

{\dot{S}}_{w}

in the synaptic weights. The latter is found from Equation (18) and yields

\begin{matrix} {\dot{S}}_{w} = - {\dot{D}}_{K L} (\vec{p} (t) | | {\vec{p}}_{\infty}), \end{matrix}

(72)

which means that the entropy production rate increases precisely in such a way as to balance the decreasing rate of acquired information, i.e.,

{\dot{D}}_{K L}

. The inverse relationship between

{\dot{S}}_{w}

and memory lifetime (

\dot{S} \sim ω_{0}

) implies that the longer the information is retained, the smaller the rate of dissipated energy. This, in turn, suggests that the total entropy produced during the weights relaxation process, i.e.,

S_{w, t o t} = \int_{0}^{\infty} d t {\dot{S}}_{w}

, should be independent of memory lifetime. Indeed, we find

\begin{matrix} S_{w, t o t} = p_{d} (0) ln (\frac{p_{d} (0)}{p_{d, \infty}}) + p_{u} (0) ln (\frac{p_{u} (0)}{p_{u, \infty}}), \end{matrix}

(73)

which means the total entropy produced is related in a simple way to the KL divergence between

\vec{p} (0)

and

{\vec{p}}_{\infty}

, namely

\begin{matrix} S_{w, t o t} = D_{K L} (\vec{p} (0) | | {\vec{p}}_{\infty}) . \end{matrix}

(74)

This equation can be interpreted in the following way: the energy cost associated with storing information in synapses is proportional to the discrepancy between the distribution of initially perturbed synaptic weights and their steady-state distribution. In general, Equations (72) and (74) indicate that the information-like quantity, which is

D_{K L}

, is closely related to the energy-like quantity

{\dot{S}}_{w}

. This is in line with the considerations in the previous sections about stochastic thermodynamics.

7. More General Framework for Synaptic Learning and Memory

The above approach for synaptic plasticity and learning may seem too simplistic. After all, representing different patterns of synaptic weights by a single collective variable

\bar{w}

is probably too drastic, since by doing that, we throw out a lot of information about different synaptic states. An alternative approach is possible, and it is briefly described below. The details can be found in [18].

Here, we consider

N_{s}

mutually coupled excitatory synapses on a single neuron (we assume that the neuron has a single dendrite along which synapses are linearly located). Each synapse can be in K discrete states

s_{i} = 1, \dots, K

, where i denotes the synapse number. These states correspond to the different shapes and sizes of the postsynaptic part of a synapse called the dendritic spine, which can be regarded as mesoscopic well-defined morphological synaptic states, where microscopic (molecular) details are neglected [95,96]. It is hypothesized in the neuroscience community that these morphological states have functional roles, e.g., large synapses (spines) are slow and involved in storing long-term information (memory), while smaller synapses are fast and take part in acquiring information (learning) [90,97]. Moreover, the states with small values of

s_{i}

correspond to weaker synaptic weights (a smaller number of molecular receptors on the synapse membrane), and larger values of

s_{i}

correspond to stronger synaptic weights.

Let

P (\vec{s})

be the probability that these synapses are in the global state described by the vector

\vec{s} = (s_{1}, s_{2}, \dots, s_{N})

. The most general form of the master equation for the stochastic dynamics of

P (\vec{s})

is

\begin{matrix} \frac{d P (\vec{s})}{d t} = \sum_{i = 1}^{N_{s}} \sum_{s_{i}^{'}} [w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1}) P ({\vec{s}}_{i}^{'}) - w_{s_{i}^{'}, s_{i}} (s_{i - 1}, s_{i + 1}) P (\vec{s})], \end{matrix}

(75)

where

{\vec{s}}_{i}^{'} = (s_{1}, \dots, s_{i - 1}, s_{i}^{'}, s_{i + 1}, \dots, s_{N})

, and

w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1})

is the transition rate for the jumps inside synapse i from state

s_{i}^{'}

to state

s_{i}

. In agreement with the experimental data, these jumps also depend on the states of neighboring synapses

s_{i - 1}

and

s_{i + 1}

[97], and such synaptic cooperativity can be also useful for long-term memory stability [98,99,100]. The transition rates

w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1})

can be composed of several different terms, each representing a different type of synaptic plasticity (e.g., hebbian, homeostatic) [101]. Additionally, each term can depend in a complicated manner on presynpatic and postsynaptic neural activities. It is also useful to note that Equation (75) is structurally similar to the Glauber dynamics for a time-dependent Ising model, which is known from statistical physics [52].

Unfortunately, Equation (75) is practically unsolvable for a large number of synapses

N_{s}

, because we have

K^{N_{s}}

coupled differential equations to solve. For example, for

K = 2

and

N_{s} = 1000

, we have

10^{100}

equations, which is impossible to handle on any existing computer (more equations than the number of protons in the visible universe!). An useful approximation to these types of problems is provided by the so-called “pair approximation” [18]. The essence of this method lies in reducing the effective dimensionality of the synaptic system by considering only dynamics of single-synapse probabilities

P (s_{i})

and double-synapse probabilities

P (s_{i}, s_{i + 1})

. This means that three-synapse correlations as well as higher-order correlations are neglected, which is in agreement with an intuition, since the coupling between synapses takes place between the nearest neighbors. In the pair approximation, the joint probability

P (\vec{s})

is approximated as [18]

\begin{matrix} P (\vec{s}) \approx \frac{P (s_{1}, s_{2}) \dots P (s_{i - 1}, s_{i}) \dots P (s_{N - 1}, s_{N})}{P (s_{2}) \dots P (s_{i}) \dots P (s_{N - 1})} . \end{matrix}

(76)

This allows us to write the dynamics of probabilities

P (s_{i})

and

P (s_{i}, s_{i + 1})

, which we obtain by marginalization of the joint probability

P (\vec{s})

, in the form

\begin{matrix} \frac{d P (s_{i})}{d t} \approx \sum_{s_{i - 1}} \sum_{s_{i + 1}} \sum_{s_{i}^{'}} [w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}^{'}) P (s_{i}^{'}, s_{i + 1})}{P (s_{i}^{'})} \\ - w_{s_{i}^{'}, s_{i}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}) P (s_{i}, s_{i + 1})}{P (s_{i})}] \end{matrix}

(77)

for

i = 2, \dots, N_{s} - 1

, and

\begin{matrix} \frac{d P (s_{i}, s_{i + 1})}{d t} \approx \sum_{s_{i - 1}} \sum_{s_{i}^{'}} [w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}^{'}) P (s_{i}^{'}, s_{i + 1})}{P (s_{i}^{'})} \\ - w_{s_{i}^{'}, s_{i}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}) P (s_{i}, s_{i + 1})}{P (s_{i})}] \\ + \sum_{s_{i + 2}} \sum_{s_{i + 1}^{'}} [w_{s_{i + 1}, s_{i + 1}^{'}} (s_{i}, s_{i + 2}) \frac{P (s_{i}, s_{i + 1}^{'}) P (s_{i + 1}^{'}, s_{i + 2})}{P (s_{i + 1}^{'})} \\ - w_{s_{i + 1}^{'}, s_{i + 1}} (s_{i}, s_{i + 2}) \frac{P (s_{i}, s_{i + 1}) P (s_{i + 1}, s_{i + 2})}{P (s_{i + 1})}], \end{matrix}

(78)

for

i = 2, \dots, N_{s} - 2

. Similar expressions can be written for the boundary probabilities with

i = 1

and

i = N_{s}

.

Equations (77) and (78) form a closed system of differential equations. Most importantly, we have now only a

K (K + 1) N_{s} / 2

equation to solve instead of

K^{N_{s}}

. This means that after applying the pair approximation, the computational complexity of the problem grows only linearly with the number of synapses

N_{s}

, not exponentially. The solution of the system given by Equations (77) and (78) allows us to determine information gain and its energy cost during synaptic learning (during the LTP phase).

Information Gain and Loss, and Associated Energy Cost

Let us assume that before learning, synapses have a steady-state distribution

P_{s s} (\vec{s})

. Learning causes modifications in synaptic structures, which are associated with modified transition rates and non-equilibrium jumps between different states. As before, KL divergence can be used to quantify information gain during the learning phase (Equation (14)), which in our case takes the form

\begin{matrix} D_{K L} (P (\vec{s}) | | P_{s s} (\vec{s})) = \sum_{\vec{s}} P (\vec{s}) ln \frac{P (\vec{s})}{P_{s s} (\vec{s})} . \end{matrix}

(79)

The temporal rate of gaining information during LTP can be found with the help of the above pair approximation as

\begin{matrix} {\dot{D}}_{K L} (P (\vec{s}) | | P_{s s} (\vec{s})) \approx \sum_{s_{1}, s_{1}^{'}} \sum_{s_{2}} [w_{s_{1}, s_{1}^{'}} (s_{2}) P (s_{1}^{'}, s_{2}) - w_{s_{1}^{'}, s_{1}} (s_{2}) P (s_{1}, s_{2})] ln \frac{P (s_{1}, s_{2})}{P_{s s} (s_{1}, s_{2})} \\ + \sum_{s_{N_{s}}, s_{N_{s}}^{'}} \sum_{s_{N_{s} - 1}} [w_{s_{N_{s}}, s_{N_{s}}^{'}} (s_{N_{s} - 1}) P (s_{N_{s} - 1}, s_{N_{s}}^{'}) - w_{s_{N_{s}}^{'}, s_{N_{s}}} (s_{N_{s} - 1}) P (s_{N_{s} - 1}, s_{N_{s}})] ln \frac{P (s_{N_{s} - 1}, s_{N_{s}})}{P_{s s} (s_{N_{s} - 1}, s_{N_{s}})} \\ + \sum_{i = 2}^{N_{s} - 1} \sum_{s_{i}, s_{i}^{'}} \sum_{s_{i - 1}, s_{i + 1}} [w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}^{'}) P (s_{i}^{'}, s_{i + 1})}{P (s_{i}^{'})} \\ - w_{s_{i}^{'}, s_{i}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}) P (s_{i}, s_{i + 1})}{P (s_{i})}] ln \frac{P (s_{i - 1}, s_{i}) P (s_{i}, s_{i + 1}) P_{s s} (s_{i})}{P_{s s} (s_{i - 1}, s_{i}) P_{s s} (s_{i}, s_{i + 1}) P (s_{i})} \end{matrix}

(80)

Thus,

{\dot{D}}_{K L}

depends on the transition rates between synaptic states, which is similar to the entropy production rate related to the energy cost of synaptic plasticity.

The entropy production rate of synaptic transitions in this approximation is

\begin{matrix} {\dot{S}}_{p r} (\vec{s}) = \sum_{i = 1}^{N_{s}} {\dot{S}}_{p r, i} \end{matrix}

(81)

where

{\dot{S}}_{p r, i}

is the individual entropy production in synapse i, which is

\begin{matrix} {\dot{S}}_{p r, i} \approx \frac{1}{2} \sum_{s_{i - 1}, s_{i + 1}} \sum_{s_{i}, s_{i}^{'}} [w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}^{'}) P (s_{i}^{'}, s_{i + 1})}{P (s_{i}^{'})} \\ - w_{s_{i}^{'}, s_{i}} (s_{i - 1}, s_{i + 1}) \frac{P (s_{i - 1}, s_{i}) P (s_{i}, s_{i + 1})}{P (s_{i})}] \\ \times ln \frac{w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1}) P (s_{i - 1}, s_{i}^{'}) P (s_{i}^{'}, s_{i + 1}) P (s_{i})}{w_{s_{i}^{'}, s_{i}} (s_{i - 1}, s_{i + 1}) P (s_{i - 1}, s_{i}) P (s_{i}, s_{i + 1}) P (s_{i}^{'})} . \end{matrix}

(82)

The physical energy cost of synaptic plasticity is

\sim E_{0} {\dot{S}}_{p r} (\vec{s})

, where

E_{0}

is the energy scale associated with plasticity processes in a single synapse (for details, see [14,18]). In general,

E_{0} \sim 10^{5} k_{B} T

, since a synapse, although small, is a composite object consisting of many different molecular degrees of freedom [14].

As can be seen, both Equations (80) and (82) have a similar structure, suggesting that the information gain rate and its energy requirement depend similarly on time, and they are generally proportional to one another. This means acquiring larger information during learning incurs higher energy costs, which is mainly because of the prefactor

E_{0}

. Again, information is physical and costly. Moreover, the cooperativity between neighboring synapses (reflected in the transition rates

w_{s_{i}, s_{i}^{'}} (s_{i - 1}, s_{i + 1})

) can have a positive effect on energy efficiency of information gain if synapses are positively correlated [18].

8. Concluding Remarks

Basic components of the brain, i.e., neurons and synapses, exhibit probabilistic behavior because they are affected by noisy internal and external signals [87]. In this paper, the goal was to show that the concepts of information thermodynamics can be useful in neuroscience problems, in which there is inherent stochasticity. Such problems involve neural inference as well as synaptic learning and memory. In all these neurobiological examples, neurons and synapses handle information, and since information is physical, the brain has to use some amount of energy while executing its computations [4,5,13,14,18]. If we assume that the brain uses information economically (e.g., [29]), then not all of these computations are equally likely. Consequently, knowing the probability of a given neural or synaptic activity (for a given task) should be a crucial element in deciphering the rules governing brain computations. Thus, taking the economical point of view for cerebral information processing might inspire theorists in efforts to construct more thermodynamically realistic models of neural and synaptic computations. These models would embrace relevant physics, rather than ignoring it, as advocated by William Bialek in a more general context of “biological physics” [102]. One such proposition, of a broad nature and generality, could be the principle of entropy maximization, which can be used to explain many types of data not only in neural systems but also in molecular biology [43,48,103]. However, its weakness is that it is based on equilibrium statistical mechanics, where time does not explicitly appear. Therefore, it is difficult to imagine (at least for the author) how this principle could be conceptually justified when applied to driven systems with stochastic dynamics, such as neurons and synapses in the non-stationary regime.

The examples described here were relatively simple, and they neglected some detailed features of real neurons and synapses. They were chosen because they can be treated analytically in a pedagogic way with explicit relationships between different quantities. Even for more complex models of neurons and synapses, the basic relationships between information and energy still hold, as described above; however, to reveal them requires heavy numerical calculations.

In the examples related to neural inference and synaptic plasticity, we used the idea of time-scale separation to derive analytical formulas. The dynamics of neurons and synapses can be quite complicated even for the relatively simple models we used because of the several time scales involved: from the neural firing rates time constants

τ_{n 0}, τ_{n}

of the order of 1 s [19] to the synaptic plasticity time constants

τ_{θ}

(∼10–20 s) and

τ_{w}

(∼100–600 s) [93,104]. Real brains have obviously much more intrinsic time scales, from milliseconds for some molecular processes (channels and receptors) [19], to seconds for neuromodulators, to hours or days for homeostatic processes [101], to months or years for developmental processes. This diversity of time scales is one of the main reasons for brain complexity, as many processes overlap and interact with one another [105,106,107]. In both of our examples, we observed that the stimulus variability, i.e., the external time scale, should be sufficiently slow to have any noticeable influence on neural and synaptic dynamics and on their information processing capability. Indeed, it seems that the slowness of the external stimulus can be a very important requirement for efficient computation not only in neural systems but generally in all biological systems with many interacting layers [88]. This is also the case for the efficiency of information propagation in the so-called critical regime of brain dynamics [108,109]. In this context, brain dynamics can be close to the critical point with long neural avalanches exhibiting power laws but only if the stimulus variability is slower than the duration of an avalanche [110].

In this perspective, the focus was on activities and information processing in individual neurons and synapses in small networks. Such an approach is similar in spirit to the physical approaches employed by others [12,17], where the authors analyzed energy constraints on the amount of learnt information. In these cases, the concepts of information and entropy production have a clear physical interpretation. However, in recent years, there are also other more global approaches, where the whole brain dynamics are analyzed from a thermodynamic point of view [15,111]. In such attempts, it is often difficult to interpret entropic quantities in terms of physical observables, because so many degrees of freedom, of different natures, are involved. In these global approaches, the goal seems to be different from the “physicality” of neurons and synapses. The authors rather focus on quantifying the irreversibility of global brain dynamics as described by the extent of a broken detailed balance on a level of whole macroscopic brain networks [15,16].

Despite many successes of computational and theoretical neuroscience (partly and briefly described in [112]), many traditional neurobiologists still neither understand it nor appreciate it. Even theoretical neuroscientists use models that often are not well grounded in neuronal reality, neglecting many physical aspects, e.g., energy, as irrelevant [19,20,23]. Theoretical neuroscience still needs a consistent and general theory to put diverse models and different theoretical pieces together in a unified way. I do hope that information thermodynamics, as developed in recent years by physicists, is a step in this ambitious direction. In this respect, the most promising approaches, in my opinion, would be the ones explicitly exploring simultaneously information and energy within stochastic thermodynamics by identifying the most important mechanisms on the micro- and mesoscopic levels, mainly in synapses, as they are important for learning and memory storing. Such approaches were initiated in [13,14,18]. However, to construct a general and powerful theory capable of making quantitative predictions requires much more, and it is not easy. The good starting point is the idea that the presence of nonpredictive information leads to energetic inefficiency [42]. Only retaining predicting (relevant) information in the memory makes sense from a thermodynamic point of view [113]. Making these ideas more concrete for “realistic” synapses could enhance our mechanistic understanding of synaptic plasticity in the context of acquiring and storing information.

Funding

The work was supported by the Polish National Science Centre (NCN) grant number 2021/41/B/ST3/04300.

Acknowledgments

The author thanks the reviewers for their useful comments on the manuscript.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

In this Appendix, we derive the population-averaged tuning curve in Equation (44). The summation in the expression

\bar{c} (v) = (1 / N) \sum_{i = 1}^{N} c_{i} (v)

can be substituted by integration, i.e.,

\begin{matrix} \bar{c} (v) \approx \int_{- \infty}^{\infty} ρ (u) c (v, u), \end{matrix}

(A1)

where

c (v, u)

is the tuning curve for a given neuron as in Equation (42), i.e.,

c (v, u) = r_{m} exp [- {(v - u)}^{2} / (2 ϵ^{2})]

, and

ρ (u)

is the distribution of neuronal preferences. We take

ρ (u)

in the form of the Gaussian such that

\begin{matrix} ρ (u) = \frac{exp (- \frac{u^{2}}{2 α^{2}})}{\sqrt{2 π α^{2}}}, \end{matrix}

(A2)

where

α

corresponds to the most likely range of stimulus velocities to which neurons respond. A straightforward calculation yields

\begin{matrix} \bar{c} (v) = \frac{ϵ r_{m}}{\sqrt{α^{2} + ϵ^{2}}} exp [- \frac{v^{2}}{2 (α^{2} + ϵ^{2})}] . \end{matrix}

(A3)

This equation can be further approximated by noting that the most likely range of velocities

α

is much greater than the “window of selectivity” of a typical neuron

ϵ

, i.e.,

α / ϵ ≫ 1

. After this, we find Equation (44) in the main text.

Appendix B

In this Appendix, we briefly show how to derive the steady-state mutual information

I {(\bar{r}, v)}_{t \mapsto \infty}

in Equation (52).

The mutual information in Equation (51), with exponentially decaying correlation for velocity of the stimulus, is proportional to the following integral I:

\begin{matrix} I = \int_{0}^{t} d t_{1} \int_{0}^{t} d t_{2} e^{(t_{1} + t_{2}) / τ_{n}} e^{- | t_{1} - t_{2} | / τ_{c}} . \end{matrix}

(A4)

We decompose the integral I into two integrals

I_{1}

and

I_{2}

, such that

I = I_{1} + I_{2}

, where

\begin{matrix} I_{1} = \int_{0}^{t} d t_{1} \int_{0}^{t_{1}} d t_{2} e^{(t_{1} + t_{2}) / τ_{n}} e^{- (t_{1} - t_{2}) / τ_{c}}, \end{matrix}

(A5)

and

\begin{matrix} I_{2} = \int_{0}^{t} d t_{1} \int_{t_{1}}^{t} d t_{2} e^{(t_{1} + t_{2}) / τ_{n}} e^{- (t_{2} - t_{1}) / τ_{c}} . \end{matrix}

(A6)

A straightforward integration of

I_{1}

and

I_{2}

yields

\begin{matrix} I = \frac{τ_{n}^{2} τ_{c}}{(τ_{n} + τ_{c})} e^{2 t / τ_{n}} + \frac{2 τ_{n}^{2} τ_{c}^{2}}{(τ_{n}^{2} - τ_{c}^{2})} e^{t (\frac{1}{τ_{n}} - \frac{1}{τ_{c}})} + \frac{τ_{n}^{2} τ_{c}}{(τ_{c} - τ_{n})}, \end{matrix}

(A7)

after which we obtain Equation (52) in the main text.

References

Lloyd, S. Programming the Universe; Knopf: New York, NY, USA, 2006. [Google Scholar]
Levy, W.B.; Baxter, R.A. Energy efficient neural codes. Neural Comput. 1996, 8, 531–543. [Google Scholar] [CrossRef] [PubMed]
Levy, W.B.; Baxter, R.A. Energy-efficient neuronal computation via quantal synaptic failures. J. Neurosci. 2002, 22, 4746–4755. [Google Scholar] [CrossRef] [PubMed]
Laughlin, S.B.; de Ruyter van Steveninck, R.R.; Anderson, J.C. The metabolic cost of neural information. Nat. Neurosci. 1998, 1, 36–40. [Google Scholar] [CrossRef] [PubMed]
Attwell, D.; Laughlin, S.B. An energy budget for signaling in the gray matter of the brain. J. Cereb. Blood Flow Metabol. 2001, 21, 1133–1145. [Google Scholar] [CrossRef]
Karbowski, J. Thermodynamic constraints on neural dimensions, firing rates, brain temperature and size. J. Comput. Neurosci. 2009, 27, 415–436. [Google Scholar] [CrossRef]
Karbowski, J. Approximate invariance of metabolic energy per synapse during development in mammalian brains. PLoS ONE 2012, 7, e33425. [Google Scholar] [CrossRef]
Aiello, L.C.; Wheeler, P. The expensive-tissue hypothesis: The brain and the digestive-system in human and primate evolution. Curr. Anthropol. 1995, 36, 199–221. [Google Scholar] [CrossRef]
Herculano-Houzel, S. Scaling of brain metabolism with a fixed energy budget per neuron: Implications for neuronal activity, plasticity, and evolution. PLoS ONE 2011, 6, e17514. [Google Scholar] [CrossRef]
Karbowski, J. Global and regional brain metabolic scaling and its functional consequences. BMC Biol. 2007, 5, 18. [Google Scholar] [CrossRef]
Nicolis, G.; Prigogine, I. Self-Organization in Nonequilibrium Systems; Wiley: New York, NY, USA, 1977. [Google Scholar]
Goldt, S.; Seifert, U. Stochastic thermodynamics of learning. Phys. Rev. Lett. 2017, 11, 11601. [Google Scholar] [CrossRef]
Karbowski, J. Metabolic constraints on synaptic learning and memory. J. Neurophysiol. 2019, 122, 1473–1490. [Google Scholar] [CrossRef] [PubMed]
Karbowski, J. Energetics of stochastic BCM type synaptic plasticity and storing of accurate information. J. Comput. Neurosci. 2021, 49, 71–106. [Google Scholar] [CrossRef] [PubMed]
Lynn, C.W.; Cornblath, E.J.; Papadopoulos, L.; Bertolero, M.A.; Bassett, D.S. Broken detailed balance and entropy production in the human brain. Proc. Natl. Acad. Sci. USA 2021, 118, e2109889118. [Google Scholar] [CrossRef]
Deco, G.; Lynn, C.W.; Sanz Perl, Y.; Kringelbach, M.L. Violations of the fluctuation-dissipation theorem reveal distinct non-equilibrium dynamics of brain states. Phys. Rev E 2023, 108, 064410. [Google Scholar] [CrossRef]
Lefebvre, B.; Maes, C. Frenetic steering in a nonequilibrium graph. J. Stat. Phys. 2023, 190, 90. [Google Scholar] [CrossRef]
Karbowski, J.; Urban, P. Cooperativity, information gain, and energy cost during early LTP in dendritic spines. Neural Comput. 2024, 36, 271–311. [Google Scholar] [CrossRef] [PubMed]
Dayan, P.; Abbott, L.F. Theoretical Neuroscience; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
Ermentrout, G.B.; Terman, D.H. Mathematical Foundations of Neuroscience; Springer: New York, NY, USA, 2010. [Google Scholar]
Rieke, F.; Warl, D.; de Ruyter, R.; Bialek, W. Spikes: Exploring the Neural Code; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
Chaudhuri, R.; Fiete, I. Computational principles of memory. Nat. Neurosci. 2016, 19, 394–403. [Google Scholar] [CrossRef] [PubMed]
Marblestone, A.H.; Wayne, G.; Kording, K.P. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. 2016, 10, 94. [Google Scholar] [CrossRef]
Markram, H.; Muller, E.; Ramaswamy, S.; Reimann, M.W.; Abdellah, M.; Sanchez, C.A.; Ailamaki, A.; Alonso-Nanclares, L.; Antille, N.; Arsever, S.; et al. Reconstruction and simulation of neocortical microcircuitry. Cell 2015, 163, 456–492. [Google Scholar] [CrossRef]
Stiefel, K.M.; Coggan, J.S. A hard energy use limit on artificial superintelligence. TechRxiv 2023. [Google Scholar] [CrossRef]
Landauer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 1961, 5, 183–191. [Google Scholar] [CrossRef]
Levy, W.B.; Calvert, V.G. Communication consumes 35 times more energy than computation in the human cortex, but both costs are needed to predict synapse number. Proc. Natl. Acad. Sci. USA 2021, 118, e2008173118. [Google Scholar] [CrossRef] [PubMed]
Balasubramanian, V.; Kimber, D.; Berry, M.J. Metabolically efficient information processing. Neural. Comput. 2001, 13, 799–815. [Google Scholar] [CrossRef]
Niven, B.; Laughlin, S.B. Energy limitation as a selective pressure on the evolution of sensory systems. J. Exp. Biol. 2008, 211, 1792–1804. [Google Scholar] [CrossRef] [PubMed]
Maxwell, J.C. Theory of Heat; Appleton: London, UK, 1871. [Google Scholar]
Leff, H.S.; Rex, A.F. Maxwell’s Demon: Entropy, Information, Computing; Princeton University Press: Princeton, NJ, USA, 1990. [Google Scholar]
Maruyama, K.; Nori, F.; Vedral, V. Colloquium: The physics of Maxwell’s demon and information. Rev. Mod. Phys. 2009, 81, 1–23. [Google Scholar] [CrossRef]
Bennett, C.H. The thermodynamics of computation—A review. Int. J. Theor. Phys. 1982, 21, 905–940. [Google Scholar] [CrossRef]
Berut, A.; Arakelyan, A.; Petrosyan, A.; Ciliberto, S.; Dillenschneider, R.; Lutz, E. Experimental verification of Landauer’s principle linking information and thermodynamics. Nature 2012, 483, 187–190. [Google Scholar] [CrossRef] [PubMed]
Landauer, R. Information is physical. Phys. Today 1991, 44, 23–29. [Google Scholar] [CrossRef]
Parrondo, J.M.; Horowitz, J.M.; Sagawa, T. Thermodynamics of information. Nat. Phys. 2015, 11, 131. [Google Scholar] [CrossRef]
Atick, J.J.; Redlich, A.N. Toward a theory of early visual processing. Neural Comput. 1990, 2, 308. [Google Scholar] [CrossRef]
Bialek, W.; Nemenman, I.; Tishby, N. Predictability, complexity, and learning. Neural Comput. 2001, 13, 2409–2463. [Google Scholar] [CrossRef] [PubMed]
Lang, A.H.; Fisher, C.K.; Mora, T.; Mehta, P. Thermodynamics of statistical inference by cells. Phys. Rev. Lett. 2014, 113, 148103. [Google Scholar] [CrossRef] [PubMed]
Palmer, S.E.; Marre, O.; Berry, M.J.; Bialek, W. Predictive information in a sensory population. Proc. Natl. Acad. Sci. USA 2015, 112, 6908–6913. [Google Scholar] [CrossRef]
Sterling, P.; Laughlin, S. Principles of Neural Design; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
Still, S.; Sivak, D.A.; Bell, A.J.; Crooks, G.E. Thermodynamics of prediction. Phys. Rev. Lett. 2012, 109, 120604. [Google Scholar] [CrossRef]
Karbowski, J.; Urban, P. Information encoded in volumes and areas of dendritic spines is nearly maximal across mammalian brains. Sci. Rep. 2023, 13, 22207. [Google Scholar] [CrossRef] [PubMed]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Barlow, H.B. Sensory mechanisms, the reduction of redundancy, and intelligence. In Symposium on the Mechanization of Thought Processes, Volume II; Blake, D.V., Uttley, A.M., Eds.; HM Stationery Office: London, UK, 1959; pp. 537–574. [Google Scholar]
Laughlin, S.B. A simple coding procedure enhances a neuron’s information capacity. Z. Naturforsch. C 1981, 36C, 910–912. [Google Scholar] [CrossRef]
Bialek, W.; Rieke, F.; van Steveninck, R.; Warland, D. Reading a neural code. Science 1991, 252, 1854. [Google Scholar] [CrossRef]
Tkacik, G.; Bialek, W. Information processing in living systems. Annu. Rev. Condens. Matter Phys. 2016, 7, 12.1–12.29. [Google Scholar] [CrossRef]
Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 2012, 75, 126001. [Google Scholar] [CrossRef]
Peliti, L.; Pigolotti, S. Stochastic Thermodynamics: An Introduction; Princeton University Press: Princeton, NJ, USA, 2021. [Google Scholar]
Bienenstock, E.L.; Cooper, L.N.; Munro, P.W. Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. J. Neurosci. 1982, 2, 32–48. [Google Scholar] [CrossRef] [PubMed]
Glauber, R.J. Time-dependent statistics of the Ising model. J. Math. Phys. 1963, 4, 294–307. [Google Scholar] [CrossRef]
Van Kampen, N.G. Stochastic Processes in Physics and Chemistry; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
Gardiner, C.W. Handbook of Stochastic Methods; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Majumdar, S.N.; Orl, H. Effective Langevin equations for constrained stochastic processes. J. Stat. Mech. 2015, 2015, P06039. [Google Scholar] [CrossRef]
Sekimoto, K. Langevin equation and thermodynamics. Prog. Theor. Phys. Suppl. 1998, 130, 17–27. [Google Scholar] [CrossRef]
Sekimoto, K. Stochastic Energetics; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Novikov, E.A. Functionals and the random-force method in turbulence theory. Sov. Phys. JETP 1965, 20, 1290–1294. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley and Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Gell-Mann, M.; Lloyd, S. Information measures, effective complexity, and total information. Complexity 1996, 2, 44–52. [Google Scholar] [CrossRef]
Renyi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, Berkeley, CA, USA, 20–30 June 1961; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
Csiszar, I. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
Tsallis, C. Generalized entropy-based criterion for consistent testing. Phys. Rev. E 1998, 58, 1442–1445. [Google Scholar] [CrossRef]
Amari, S.-I.; Nagaoka, H. Methods of Information Geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inform. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
Gorban, A.N. General H-theorem and entropies that violate the second law. Entropy 2014, 16, 2408–2432. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Kawai, R.; Parrondo, J.M.R.; Van den Broeck, C. Dissipation: The phase-space perspective. Phys. Rev. Lett. 2007, 98, 080602. [Google Scholar] [CrossRef] [PubMed]
Sason, I.; Verdu, S. f-Divergence inequalities. IEEE Trans. Inf. Theory 2016, 62, 5973–6006. [Google Scholar] [CrossRef]
Karbowski, J. Bounds on the rates of statistical divergences and mutual information via stochastic thermodynamics. Phys. Rev. E 2024, 109, 054126. [Google Scholar] [CrossRef]
Hasegawa, Y.; Vu, T.V. Uncertainty relations in stochastic processes: An information equality approach. Phys. Rev. E 2019, 99, 062126. [Google Scholar] [CrossRef]
Tostevin, F.; Ten Wolde, P.R. Mutual information between input and output trajectories of biochemical networks. Phys. Rev. Lett. 2009, 102, 218101. [Google Scholar] [CrossRef]
Nicoletti, G.; Busiello, D.M. Mutual information disentangles interactions from changing environments. Phys. Rev. Lett. 2021, 127, 228301. [Google Scholar] [CrossRef]
Fagerholm, E.D.; Scott, G.; Shew, W.L.; Song, C.; Leech, R.; Knöpfel, T.; Sharp, D.J. Cortical entropy, mutual information and scale-free dynamics in waking mice. Cereb. Cortex 2016, 26, 3945–3952. [Google Scholar] [CrossRef]
Shriki, O.; Yellin, D. Optimal information representation and criticality in an adaptive sensory recurrent neuronal network. PLoS Comput. Biol. 2016, 12, e1004698. [Google Scholar] [CrossRef]
Schnakenberg, J. Network theory of microscopic and macroscopic behavior of master equation systems. Rev. Mod. Phys. 1976, 48, 571–585. [Google Scholar] [CrossRef]
Maes, C.; Netocny, K. Time-reversal and entropy. J. Stat. Phys. 2003, 110, 269–310. [Google Scholar] [CrossRef]
Esposito, M.; Van den Broeck, C. Three faces of the second law. I. Master equation formulation. Phys. Rev. E 2010, 82, 011143. [Google Scholar] [CrossRef]
Tome, T. Entropy production in nonequilibrium systems described by a Fokker-Planck equation. Braz. J. Phys. 2006, 36, 1285–1289. [Google Scholar] [CrossRef]
Mehta, P.; Schwab, D.J. Energetic cost of cellular computation. Proc. Natl. Acad. Sci. USA 2012, 109, 17978–17982. [Google Scholar] [CrossRef] [PubMed]
Horowitz, J.M.; Esposito, M. Thermodynamics with continuous information flow. Phys. Rev. X 2014, 4, 031015. [Google Scholar] [CrossRef]
Allahverdyan, A.E.; Janzing, D.; Mahler, G. Thermodynamic efficiency of information and heat flow. J. Stat. Mech. 2009, 2009, P09011. [Google Scholar] [CrossRef]
Rodman, H.R.; Albright, T.D. Coding of visual stimulus velocity in area MT of the macaque. Vis. Res. 1987, 27, 2035–2048. [Google Scholar] [CrossRef]
Braitenberg, V.; Schuz, A. Cortex: Statistics and Geometry of Neuronal Connectivity; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Karbowski, J. Constancy and trade-offs in the neuroanatomical and metabolic design of the cerebral cortex. Front. Neural Circuits 2017, 8, 9. [Google Scholar] [CrossRef]
Faisal, A.A.; White, J.A.; Laughlin, S.B. Ion-channel noise places limits on the miniaturization of the brain’s wiring. Curr. Biol. 2005, 15, 1143–1149. [Google Scholar]
Renart, A.; Machens, C.K. Variability in neural activity and behavior. Curr. Opin. Neurobiol. 2014, 25, 211–220. [Google Scholar]
Nicoletti, G.; Busiello, D.M. Information propagation in multilayer systems with higher-order interactions across timescales. Phys. Rev. X 2024, 14, 021007. [Google Scholar] [CrossRef]
Kandel, E.R.; Dudai, Y.; Mayford, M.R. The molecular and systems biology of memory. Cell 2014, 157, 163–186. [Google Scholar] [CrossRef] [PubMed]
Bourne, J.N.; Harris, K.M. Balancing structure and function at hippocampal dendritic spines. Annu. Rev. Neurosci. 2008, 31, 47–67. [Google Scholar] [CrossRef] [PubMed]
Takeuchi, T.; Duszkiewicz, A.J.; Morris, R.G.M. The synaptic plasticity and memory hypothesis: Encoding, storage and persistence. Phil. Trans. R. Soc. B 2014, 369, 20130288. [Google Scholar] [CrossRef]
Poo, M.M.; Pignatelli, M.; Ryan, T.J.; Tonegawa, S.; Bonhoeffer, T.; Martin, K.C.; Rudenko, A.; Tsai, L.H.; Tsien, R.W.; Fishell, G.; et al. What is memory? The present state of the engram. BMC Biol. 2016, 14, 40. [Google Scholar] [CrossRef]
Meyer, D.; Bonhoeffer, T.; Scheuss, V. Balance and stability of synaptic structures during synaptic plasticity. Neuron 2014, 2014 82, 430–443. [Google Scholar] [CrossRef]
Statman, A.; Kaufman, M.; Minerbi, A.; Ziv, N.E.; Brenner, N. Synaptic size dynamics as an effective stochastic process. PLoS Comput. Biol. 2014, 10, e1003846. [Google Scholar] [CrossRef]
Petersen, C.C.; Malenka, R.C.; Nicoll, R.A.; Hopfield, J.J. All-or-none potentiation at CA3-CA1 synapses. Proc. Natl. Acad. Sci. USA 1998, 95, 4732–4737. [Google Scholar] [CrossRef]
Montgomery, J.M.; Madison, D.V. Discrete synaptic states define a major mechanism of synaptic plasticity. Trends Neurosci. 2004, 27, 744–750. [Google Scholar] [CrossRef]
Kasai, H.; Matsuzaki, M.; Noguchi, J.; Yasumatsu, N.; Nakahara, H. Structure-stability-function relationships of dendritic spines. Trends Neurosci. 2003, 26, 360–368. [Google Scholar] [CrossRef]
Govindarajan, A.; Kelleher, R.J.; Tonegawa, S. A clustered plasticity model of long-term memory engrams. Nat. Rev. Neurosci. 2006, 7, 575–583. [Google Scholar] [CrossRef] [PubMed]
Winnubst, J.; Lohmann, C.; Jontes, J.; Wang, H.; Niell, C. Synaptic clustering during development and learning: The why, when, and how. Front. Mol. Neurosci. 2012, 5, 70. [Google Scholar] [CrossRef] [PubMed]
Yadav, A.; Gao, Y.Z.; Rodriguez, A.; Dickstein, D.L.; Wearne, S.L.; Luebke, J.I.; Hof, P.R.; Weaver, C.M. Morphologic evidence for spatially clustered spines in apical dendrites of monkey neocortical pyramidal cells. J. Comp. Neurol. 2012, 520, 2888–2902. [Google Scholar] [CrossRef] [PubMed]
Turrigiano, G.G.; Nelson, S.B. Homeostatic plasticity in the developing nervous system. Nat. Rev. Neurosci. 2004, 5, 97–107. [Google Scholar] [CrossRef] [PubMed]
Bialek, W. Ambitions for theory in the physics of life. SciPost Phys. Lect. Notes 2024, 84, 1–79. [Google Scholar] [CrossRef]
Tkacik, G.; Mora, T.; Marre, O.; Amodei, D.; Palmer, S.E.; Berry, M.J.; Bialek, W. Thermodynamics and signatures of criticality in a network of neurons. Proc. Natl. Acad. Sci. USA 2015, 112, 11508–11513. [Google Scholar] [CrossRef]
Holtmaat, A.J.; Trachtenberg, J.T.; Wilbrecht, L.; Shepherd, G.M.; Zhang, X.; Knott, G.W.; Svoboda, K. Transient and persistent dendritic spines in the neocortex in vivo. Neuron 2005, 45, 279–291. [Google Scholar] [CrossRef]
Golesorkhi, M.; Gomez-Pilar, J.; Tumati, S.; Fraser, M.; Northoff, G. Temporal hierarchy of intrinsic neural timescales converges with spatial core-periphery organization. Commun. Biol. 2021, 4, 277. [Google Scholar] [CrossRef]
Zeraati, R.; Shi, Y.L.; Steinmetz, N.A.; Gieselmann, M.A.; Thiele, A.; Moore, T.; Levina, A.; Engel, T.A. Intrinsic timescales in the visual cortex change with selective attention and reflect spatial connectivity. Nat. Commun. 2023, 14, 1858. [Google Scholar] [CrossRef]
Honey, C.J.; Newman, E.L.; Schapiro, A.C. Switching between internal and external modes: A multiscale learning principle. Netw. Neurosci. 2017, 1, 339–356. [Google Scholar] [CrossRef]
Beggs, J.M.; Plenz, D. Neuronal avalanches in neocortical circuits. J. Neurosci. 2003, 23, 11167–11177. [Google Scholar] [CrossRef] [PubMed]
Chialvo, D.R. Emergent complex neural dynamics. Nat. Phys. 2010, 6, 744–750. [Google Scholar] [CrossRef]
Das, A.; Levina, A. Critical neuronal models with relaxed timescale separation. Phys. Rev. X 2019, 9, 021062. [Google Scholar] [CrossRef]
Kringelbach, M.L.; Perl, Y.S.; Deco, G. The thermodynamics of mind. Trends Cogn. Sci. 2024, 28, 568–581. [Google Scholar] [CrossRef] [PubMed]
Abbott, L.F. Theoretical neuroscience rising. Neuron 2008, 60, 489–495. [Google Scholar] [CrossRef]
Still, S. Thermodynamic cost and benefit of memory. Phys. Rev. Lett. 2020, 124, 050601. [Google Scholar] [CrossRef]

Figure 1. Stimulus-induced transition from weak to strong synapses. Transient input

c (v)

to the neuron can induce a transition in the collective weight of synapses

\bar{w}

(upper panel). Transitions from weak (

\bar{w} \approx {\bar{w}}_{d}

) to strong (

\bar{w} \approx {\bar{w}}_{u}

) synapses take place only when the amplitude of synaptic plasticity

λ

or firing rate of presynaptic neurons

\bar{f}

are sufficiently large (middle and lower panels). Note that

\bar{w}

can maintain the value

{\bar{w}}_{u}

for a very long time, much larger than the synaptic time constant

τ_{w} = 200

s (synaptic memory trace about c), because collective stochastic fluctuations are rescaled by the number of synapses

1 / \sqrt{N_{s}}

. The middle and lower panels look almost identical despite different parameters, because the noise term in Equation (60) dominates for most of the time in this regime. The nominal parameters used are

λ = 1.3

,

β = 1.2

,

\bar{f} = 0.9

Hz,

τ_{n} = 0.3

s,

τ_{w} = 200

s,

σ_{w} = 5.0

,

N_{s} = 1000

,

r_{m} = 10

Hz,

u = 10

mm/s,

ϵ = 0.1

mm/s. In this example, the stimulus moves with the linearly increasing velocity

v = 0.02 t + 7

(mm/s) with a small accelaration of

0.02

mm/s². Too large accelaration prohibits the synaptic transition to the state with

{\bar{w}}_{u}

.

Figure 1. Stimulus-induced transition from weak to strong synapses. Transient input

c (v)

to the neuron can induce a transition in the collective weight of synapses

\bar{w}

(upper panel). Transitions from weak (

\bar{w} \approx {\bar{w}}_{d}

) to strong (

\bar{w} \approx {\bar{w}}_{u}

) synapses take place only when the amplitude of synaptic plasticity

λ

or firing rate of presynaptic neurons

\bar{f}

are sufficiently large (middle and lower panels). Note that

\bar{w}

can maintain the value

{\bar{w}}_{u}

for a very long time, much larger than the synaptic time constant

τ_{w} = 200

s (synaptic memory trace about c), because collective stochastic fluctuations are rescaled by the number of synapses

1 / \sqrt{N_{s}}

. The middle and lower panels look almost identical despite different parameters, because the noise term in Equation (60) dominates for most of the time in this regime. The nominal parameters used are

λ = 1.3

,

β = 1.2

,

\bar{f} = 0.9

Hz,

τ_{n} = 0.3

s,

τ_{w} = 200

s,

σ_{w} = 5.0

,

N_{s} = 1000

,

r_{m} = 10

Hz,

u = 10

mm/s,

ϵ = 0.1

mm/s. In this example, the stimulus moves with the linearly increasing velocity

v = 0.02 t + 7

(mm/s) with a small accelaration of

0.02

mm/s². Too large accelaration prohibits the synaptic transition to the state with

{\bar{w}}_{u}

.

Figure 2. Effective potential $V (\bar{w}, c)$ for the collective synaptic weights and bistability. (A) The core potential

V_{0} (\bar{w})

has either one minimum, for sufficiently weak plasticity amplitude

λ

, or two minima for stronger

λ

. The latter corresponds to bistability in the collective behavior of synapses. Note that the miniumum at

\bar{w} = 0

is very shallow (inset). (B) The bistability regime. The presence of even a weak stimulus

c (v)

lowers the potential barrier in

V (\bar{w}, c)

between the shallow and the deep minima, which can facilitate a transition from weak to strong synapses (

\bar{w}

can change from

{\bar{w}}_{d}

to

{\bar{w}}_{u}

). The parameters used are the same as in Figure 1.

Figure 2. Effective potential $V (\bar{w}, c)$ for the collective synaptic weights and bistability. (A) The core potential

V_{0} (\bar{w})

has either one minimum, for sufficiently weak plasticity amplitude

λ

, or two minima for stronger

λ

. The latter corresponds to bistability in the collective behavior of synapses. Note that the miniumum at

\bar{w} = 0

is very shallow (inset). (B) The bistability regime. The presence of even a weak stimulus

c (v)

lowers the potential barrier in

V (\bar{w}, c)

between the shallow and the deep minima, which can facilitate a transition from weak to strong synapses (

\bar{w}

can change from

{\bar{w}}_{d}

to

{\bar{w}}_{u}

). The parameters used are the same as in Figure 1.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karbowski, J. Information Thermodynamics: From Physics to Neuroscience. Entropy 2024, 26, 779. https://doi.org/10.3390/e26090779

AMA Style

Karbowski J. Information Thermodynamics: From Physics to Neuroscience. Entropy. 2024; 26(9):779. https://doi.org/10.3390/e26090779

Chicago/Turabian Style

Karbowski, Jan. 2024. "Information Thermodynamics: From Physics to Neuroscience" Entropy 26, no. 9: 779. https://doi.org/10.3390/e26090779

APA Style

Karbowski, J. (2024). Information Thermodynamics: From Physics to Neuroscience. Entropy, 26(9), 779. https://doi.org/10.3390/e26090779

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Thermodynamics: From Physics to Neuroscience

Abstract

1. Introduction: Information Is Physical, So Is the Brain

2. Stochastic Dynamics and Thermodynamics

2.1. Stochastic Dynamics

2.2. Stochastic Thermodynamics

3. Entropy, Information, and the Second Law of Thermodynamics

3.1. Entropy, Kullback–Leibler Divergence, and Information

3.2. Entropy Production and Flow, and the Second Law

3.3. Entropy Production and Flow for the Brownian Particle

4. Information Flow between Two Subsystems and the Maxwell Demon

5. Neural Inference

5.1. Mutual Information between Neural Activities and the Stimulus

5.2. Energy Cost of Decoding the Stimulus

6. Stochastic Dynamics of Synaptic Plasticity: Learning and Memory Storage

6.1. Dynamics of Synaptic Weights

6.2. Information Gain and Maintenance, and Associated Energy Cost

7. More General Framework for Synaptic Learning and Memory

Information Gain and Loss, and Associated Energy Cost

8. Concluding Remarks

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI