1. Introduction
Almost all processes—highly correlated, weakly correlated, or correlated not at all—exhibit statistical fluctuations. Often physical laws, such as the Second Law of Thermodynamics, address only typical realizations—those identified by Shannon’s asymptotic equipartition property [
1] and that emerge in the thermodynamic limit of an infinite number of degrees of freedom and infinite time [
2]. Indeed, our interpretations of the functioning of macroscopic thermodynamic cycles are so focused. What happens, though, during atypical behaviors, during fluctuations?
The limitation to typical behaviors is particularly a concern when it comes to information processing in thermodynamic systems or in biological processes, since fluctuations translate into errors in performing designed computing tasks or in completing the operations required for maintenance and survival, respectively. As a consequence, one realizes that the information processing second law (IPSL) only identifies thermodynamic functioning supported by a system’s typical realizations [
3]. Now, since observing typical realizations is highly probable over long periods and goes to probability one in the thermodynamic limit, a definition of system functionality based on typicality is quite useful. However, this renders the IPSL substantially incomplete and practically inapplicable—ignoring fluctuations over finite periods and in microscopic systems. This is unfortunate. For example, while a system’s typical realizations may operate as an engine—converting thermal fluctuations to useful work—even “nearby” fluctuations (atypical, but probable realizations) behave differently, as Landauer erasers—converting the available stored energy to dissipate stored information. How do we account for functioning during fluctuations? And, over long periods, how, in fact, does a fluctuating system operate at all?
The following answers these questions by introducing constructive methods that identify thermodynamic functioning during any system fluctuation. It shows how to use the IPSL to determine functionality for atypical realizations and how to calculate the probability of distinct modalities occurring via the large-deviation rate function. The lesson is that, falling short of the thermodynamic limit, one cannot attribute a unique functional modality to a thermodynamic system.
To begin, the next section motivates our approach, reviewing its historical background and basic set-up. The development then reviews thermodynamic functioning in information engines and fluctuation theory proper, before bringing the two threads together to analyze functional fluctuations in a prototype information engine.
2. From Szilard to Functional Information Engines
Arguably, Szilard’s Engine [
4] is the simplest thermodynamic device—a controller leverages knowledge of a single molecule’s position to extract work from a single thermal reservoir. As one of the few Maxwellian Demons [
5] that can be completely analyzed [
6], it exposes the balance between entropic costs dictated by the second law and thermodynamic functionality during the operation of an information-gathering physical system. The net work extracted exactly balances the entropic cost. As Szilard emphasized: while his single-molecule engine was not very functional, it was wholly consistent with the second law, only episodically extracting useful work from a thermal reservoir.
Presaging Shannon’s communication theory [
7] by two decades, Szilard’s major contribution was to recognize the importance of the Demon’s information acquisition and storage in resolving Maxwell’s paradox [
5]. The Demon’s informational manipulations had an irreducible entropic cost that balanced any gain in work. The role of information in physics [
8] has been actively debated ever since, culminating in a recent spate of experimental tests of the physical limits of information processing [
9,
10,
11,
12,
13,
14,
15] and the realization that the degree of the control system’s dynamical instability determines the rate of converting thermal energy to work [
6].
Though many years ago, Maxwell [
5] and then Szilard [
4] were among the first to draw out the consequences of an “intelligent being” taking advantage of thermal fluctuations [
16]. Szilard’s Engine, however, and ultimately Maxwell’s Demon are not very functional: Proper energy and entropy book-keeping during their operation shows their net operation is consistent with the second law. As much energy is dissipated by the Demon as it extracts from the heat bath [
4]. There is no net thermodynamic benefit. Are there Demons that are functional?
Only rather recently was an exactly solvable Maxwellian engine proposed that exhibited functionality, extracting net work each cycle by decreasing physical entropy at the expense of positive change in a reservoir’s Shannon information [
17]. There, the Demon generated directed rotation leveraging the statistical bias in a memoryless information reservoir to compensate for the transfer of high-entropy energy in a thermal reservoir to low-entropy energy that performed the rotational work. Since then, an extensive suite of studies analyzed more complex
information engines [
3,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Here, and in contrast with several of these studies, we emphasize engines that leverage information reservoirs with large, unrestricted memories while interacting with complex, correlated environments.
Figure 1 illustrates the general design for an information engine. The Demon, now denoted “State Machine”, is in contact with three reservoirs: thermal, work, and information. Each reservoir provides a distinct thermodynamic resource which the engine transforms. The thermal reservoir stores high-entropy energy; the work reservoir, low-entropy energy; and the information reservoir zero-energy Shannon information. The information reservoir consists of input and output tapes with cells storing discrete symbols.
The State Machine functions step by step. To process information on the tapes, it reads a symbol from an input cell and writes a symbol to an output tape cell and changes its internal state. The tapes then shift one cell presenting new input and output cells to the State Machine. In terms of the energetics, in the first step, a controller couples the symbol read from the input tape cell to the Machine. The controller may need positive or negative work from the work reservoir. The heat transfer is zero since, for our purposes here, we assume the process is relatively fast. In the second step, the state of the coupled cell–system transitions as a result of being in contact with the thermal reservoir. Then, the thermal reservoir induces a Markovian dynamics over the coupled cell–system joint states. This step is completely performed by the thermal reservoir and as a result there is heat transfer between the machine and thermal reservoir. The controller is absent and so the work carried out in this step is zero. In the third step, the controller decouples the output state from the machine state. Again, the work here can be nonzero, but the heat flow is zero.
There are three types of functioning. In the first, the state machine extracts heat from the thermal reservoir and performs work on the work reservoir by producing output symbol sequences with higher entropy than the input sequences. In this case, we say the machine functions as an engine. In the second, the machine decreases the output sequence entropy below that of the input by extracting work from the work reservoir and dumping that energy to the thermal reservoir. In this way, the machine acts as an information eraser. Finally, the third (non)functionality occurs when the machine uses (wastes) work energy to randomize output. Since the randomization of the input can happen spontaneously without wasting work—similar to the engine mode—we say the machine functions as a dud; it is a wasteful randomizer.
3. Environment and Engine Representations
There are two technical points that need to be called out here. First, we imagine the engine interacts with a complex environment. This means that we allow the input sequence to be highly correlated with a very long memory. Formally, the input sequence considered as a stochastic process is not necessarily Markovian. Denote the probability distribution over the input’s bi-infinite random variable chain by
, where
is the random variable at time
t. Then, the input sequence’s
Markov order R is as follows:
And so, by complex environment we mean that input sequences to the machine have large
R—the environment remembers long histories. Second, even though the machine has a finite number of states, we allow it to also have a long memory. This simply means that, via its states, the machine can remember the last, perhaps large, number of inputs.
One concludes from the first point about complex environments that Markov chains are not powerful enough to represent correlated inputs, especially for the general case we analyze. We need a less restrictive representation and so use hidden Markov models (HMMs), which are known to be more powerful in the sense that, using only a finite number of internal states, they can represent infinite Markov-order processes. We use HMMs to represent the mechanisms generating both input sequences and output sequences.
A process ’s HMM is given as a pair . is HMM’s hidden states. for any particular x is a substochastic matrix or state-to-state transition matrix for transitions that generate symbol x. is the alphabet of generated symbols.
Similarly, we conclude from the second point that more powerful machinery is needed to handle general stochastic mappings with a long memory. We use stochastic finite-state
transducers [
29] as they are powerful enough to represent the mappings we use in the following. (Several of the technical contributions stem directly from showing how to work directly with these powerful representations.)
A transducer representation is a pair . is the transducer’s states. for any particular x and y is a substochastic matrix or state-to-state transition matrix for transitions that for input x generate symbol y. and are the alphabet for input and output symbols.
The following will demonstrate how these choices of representation greatly facilitate analyzing the dynamics and thermodynamics of information engines.
4. Thermodynamic Functioning: When Is an Engine a Refrigerator?
Thermodynamic functionality is defined in terms of the recently introduced
information processing second law (IPSL) [
3] which bounds the thermodynamic resources required, such as work, to perform a certain amount of information processing:
where
is Boltzmann’s constant and
T is the environment’s temperature. The IPSL relates three macroscopic system measures: the input’s Shannon entropy rate
, the output’s entropy rate
, and the the average work
done on the work reservoir per engine cycle:
Here,
is the Shannon entropy of the specified random variables.
is defined as follows. Since the machine stochastically maps inputs to outputs, a given input sequence
w typically maps to many distinct output sequences. Then,
denotes the average work carried out by feeding word
w to the machine, averaging over all the possible mappings from
w; see
Figure 2.
That is, thermodynamic functioning is determined by the signs of
and
. Since there are two possible signs for each, there are four distinct cases. However, the IPSL forbids the cases
and
. And so, there are three thermodynamically functional modes:
engine,
eraser, and
ineffective randomizer; see
Table 1 [
3]. When operating as an engine, the machine absorbs heat from the thermal reservoir and converts it to work by mapping the input sequence to a higher entropy-rate output sequence. Thus, the net effect is to randomize the input. When operating as an eraser, the machine reduces the input entropy by consuming work from the work reservoir and dumping it as high-entropy energy to the heat reservoir. In the third case, the machine does not function usefully at all. It is an ineffective randomizer, consuming work to randomize the input string. It wastes work, low-entropy energy.
5. A Functional Information Engine
To ground these ideas, consider a prototype information engine—the
information ratchet introduced in Ref. [
3]. The engine,
Figure 3, specifies the distribution of inputs and the states and transition structure of the engine’s state machine. The inputs come from flipping a coin with bias
b for heads (“0”). That is, the input is a memoryless, independent, and identically distributed (IID) stochastic process. Its generating mechanism is depicted as the hidden Markov model in
Figure 3a with two states,
A and
B. Together, the current state and transition taken determine the statistics of the emitted symbol. Similarly, the engine’s mechanism is represented by the finite-state transducer in
Figure 3b. Transducer transitions are labeled. For example, if the machine is in state
B and the input is 0, then with probability
p the output emitted is 1 and the machine state changes to
A. This is shown by an edge labeled by
1 going from state
A to
B.
At this point, only the engine’s information processing has been specified. To design a physical system that implements the transducer, we first define the energetics for inputs and for machine states and transitions:
where
is a parameter. Second, we define the energetics for joint symbol-states:
The energies
are further constrained:
Third, we specify Markovian detailed-balanced dynamics over the coupled system (input + state machine) that is induced by the thermal reservoir; see
Figure 4. To guarantee that this dynamic generates the same stochastic mapping as the transducer in
Figure 3b, we must relate the energetics to stochastic-transition parameters
p and
q:
The average work carried out on the work reservoir is then as follows:
See Ref. [
3] for calculation details.
The Shannon entropy rates of input and output sequences can also be calculated directly:
Thus, the energies
and control
b are the only free parameters. They control the engine’s behavior and, through the IPSL modalities in
Table 1, its functionality. Reference [
3] gives a complete analysis of this information engine’s thermodynamic functioning.
Summarizing for general information engines, one specifies the following:
This prepares us to analyze fluctuations in an information engine interacting with the complex environment specified by the input process.
6. Engines in Fluctuating Environments: The Strategy
Hidden in this and often unstated, but obvious once realized, Maxwellian Demons cannot operate unless there are statistical fluctuations. Szilard’s Engine cleverly uses and skirts this issue since it contains only a single molecule whose behaviors, by definition, are nothing but fluctuations—single realizations. There is no large ensemble over which to average. The information gleaned by the engine’s control system (Demon/Machine) is all about the “fluctuation” in the molecule’s position. And, that information allows the engine to temporarily extract energy from a heat reservoir. In short, fluctuations are deeply implicated in the functioning of thermodynamic systems. The following isolates the underlying statistical mechanisms.
The distinct types of thermodynamic functioning—engine, eraser, or dud—are based on three average quantities: average work produced
, the input sequences’ Shannon entropy rate
, and the output sequences’ Shannon entropy rate
[
3,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. As a result, their definitions concern the thermodynamic limit of infinitely long sequences being fed into the machine. Of course, the situation is practically quite different: the engine works with and operates due to finite-length sequences.
To overcome this—and so to develop a theory of functional fluctuations—the following is burdened with precisely delineating the limitations inherent in the infinite-length definitions above. It shows that, for any finite length, the functionality definitions are limited to describing properties of only a unique subset of events—the so-called typical set of realizations as identified by the asymptotic equipartition property of information theory [
1]. To do this, first we redefine the three quantities—work and entropy rates—as averages over all the possible input sequences of a given length. Second, we define three new unweighted-average quantities, but this time they are explicitly limited to typical realizations. Third, we demonstrate that the differences between the first three averages and the second three can be made arbitrarily small. Since the second kind of averages are unweighted, the closeness result tells us that the average quantities are features of the typical set and not of any other subset of the input sequences. In point of fact, they do not describe atypical behaviors (statistical fluctuations) and so cannot be used to define thermodynamic functions arising from fluctuations.
One technical reason behind this result is that, for the three averages, the functions being averaged are linearly bounded from above by the input-sequence length. The conclusion is that the original quantities can give information only about system functionality for the specific subset of typical realizations. Of course, since observing realizations in this subset is highly probable for long sequences and has probability one in the thermodynamic limit of infinite length, the original functionality definition is quite useful. Our goal, though, is to show just how incomplete it is and in important ways that must be overcome to analyze fluctuations in functioning.
In short, the following consistently extends the original definitions to other realization subsets—the fluctuations or atypical sets. The net result is that the theory covers the set of any realization for any finite length. Given that, we introduce a method to calculate the new functionality for these different fluctuation subsets. This completes the picture of functional fluctuations for finite, but long, lengths. We go on to find the large deviation rate for the new definition of functionality. An important contribution in this is that all of the results also apply to input sequences and machines with long memories, given that the latter are stochastic finite-state machines. This should be contrasted with developments, cited above, that assume memoryless or order-1 Markov systems. We return to discuss related work at the end, once the results are presented.
7. Functioning Supported by Typical Realizations
A picture of a system’s behavioral fluctuations can be developed in terms of (and deviations from) asymptotic equipartition. Let us review. Consider a given process
and let
denote the set of its possible length-
ℓ realizations. Then, for an arbitrary
, the process’
typical set is as follows:
This set consists of realizations whose probability scales with the process’ entropy rate [
1,
30,
31]. Moreover, the
Shannon–McMillan–Breiman theorem [
7,
32,
33] gives the probability of observing one of these realizations. That is, for a given
, sufficiently large
, and
,
for all
. There are three lessons:
As a result, sequences generated by a stationary ergodic process fall into one of three partitions, as depicted in
Figure 5. The first contains sequences that are never generated; they fall in the
forbidden set. The second is the typical set. And, the last contains sequences in a family of
atypical sets—realizations that are rare to different degrees.
Appendix A illustrates these for a Biased Coin Process.
What does this partitioning say about fluctuations in thermodynamic functioning? Recall the functionings identified by the IPSL, as laid out in
Table 1. That is, for a given input process, transducer, and temperature, thermodynamic functionality is controlled by three quantities: the average work
generated by the transducer when it operates on the input process, the Shannon entropy
of the input process, and the Shannon entropy
of output process.
Appendix B proves that the difference between average work
over all sequences and that
defined for typical set is small for sufficiently large
ℓ. For all practical purposes, they are equal. This, together with recalling that
is an unweighted average of works
for
, provides an operational interpretation of works used in typical-set-defined functionality.
Similarly,
Appendix C proves that the average generated information, when the transducer is fed the whole set, is essentially equal to the average information generated when the transducer is fed the typical set without probability weights.
From Equation (
5), it is also clear that the Shannon entropy rate of the input process is also a function of the typical set. This demonstrates that all three quantities—
,
, and
—effectively measure properties of the typical set and not of other (atypical) partitions. Recalling that these three quantities also determine the thermodynamics via the IPSL functionality highlights that the previously defined functionality is limited. Next, we remove this limitation, extending the thermodynamic functionality to the whole set of partitions.
8. Functioning Outside Typical Realizations
The last section established that the average work and input and output entropy rates can be used, for , to identify the system functionality for typical realizations. At last, “typical” has a precise operational meaning. Moreover, as , the fraction of information available about the functionality of realizations outside the typical set vanishes. Since the probability of observing realizations in the typical set at large ℓ approaches one, the definition of functionality based on and the entropies is very useful.
However, one should not forget that this definition is limited, applying only to one particular subset of realizations. As a result, the associated definition of functionality gives an incomplete picture. How incomplete? Note that the size of the typical set grows like
and the size of the whole set, excluding forbidden realizations, grows as
, where
h is the input process’
topological entropy [
34]. Generally,
(except for the special class of maximum-entropy processes, which we do not consider directly). And so, the relative size of the typical set shrinks exponentially with
ℓ as
, even though the probability of observing typical realizations converges to one. The lesson is that, at finite
ℓ, only considering the typical set misses exponentially many—
—possibly functional, observable realizations. With this as motivation, we are ready to define functionality for all realizations—typical and atypical—allowing one to describe “nearby” functionalities that arise during fluctuations out of the typical set. The goal is a complete picture of functional fluctuations for finite, but long, realizations.
What engine functionalities do atypical realizations support? The very first step is to partition the set of all possible realizations into the subsets of interest. How? We must find a suitable, physically relevant parametrization of realization subsets. We call the collections a process’ atypical sets, using degrees of typicality as a parameter.
A key step in the last section was to realize that functionality is defined for unweighted sets of realizations. Recalling Equation (
5)’s definition of typical set, the normalized minus logarithm of probabilities—effectively a decay rate—of all the words in the typical set is sandwiched by small deviations (
) from the Shannon entropy rate:
This is the main reason why
is approximately the unweighted average work and, consequently, why functionality is operationally defined for an unweighted set—the typical set. This provides an essential clue as to how to partition the set
of all possible realizations, at fixed length
ℓ.
We collect all the realizations with the same probability in the same subset, labeling it with a decay rate denoted
u:
Defining
, it is easy to show that
are disjoint and partition
.
Technically, this definition for the (parametrized) subsets of interest is necessary to guarantee consistency with the previously defined typical-set notion of functionality.
The parameter
u, considered as a random variable, is sometimes called a
self process [
35].
Figure 6 depicts these subsets as “bubbles” of equal decay rate. Equation (
5) says the typical set is that bubble with a decay rate equal to the process’ Shannon entropy rate:
. All the other bubbles contain rare events, some rarer than others, in the sense that they exhibit faster or slower probability decay rates.
The previous section shows that for the averaging operator yields a statistic essentially about the typical set. Now, consider the situation in which we are interested in the functionality of another subset with decay rate . How can we use the same operator to find the functionality arising from this subset?
If someone presents us with another process whose typical set is and we feed this new process into the system, instead of the original input process, then the operator can be used to identify the functionality of realizations in . Now, the question comes up as to whether this process exists at all and, if so, can we find it?
The answer to the first question is positive, since we made certain to define the atypical subsets in a way consistent with the definition of the typical set. And, by definition, all the sequences in the subset have the same decay rate.
The answer to the second question is also positive. As argued earlier, we use hidden Markov models (HMMs) as our choice of process representation. Denote process ’s HMM by . The question is now framed, What is ?
To answer, define a new process
with HMM
. Notice both
and
have the same states
and same alphabet
. The substochastic matrices of
are related to the substochastic matrices of
via the following construction [
36,
37]:
Pick a .
For each , construct a new matrix for which .
Form the matrix .
Calculate ’s maximum eigenvalue and corresponding right eigenvector .
For each
, construct new matrices
for which
We defined the new process by constructing its HMM. We now use the latter to produce an atypical set of interest, say, .
Theorem 1. Within the new process , in the limit , the probability of generating realizations from the set converges to one:where the energy density is as follows:Additionally, in the same limit, the process assigns equal energy densities (probability decay rates) to all . In this way, for large
ℓ the process
typically generates realizations in the set
and with the specified energy
u. The process
is variously called the
auxiliary,
driven, or
effective process [
39,
40,
41].
Using Equation (
8), one can show that for any
u there exists a unique and distinct
and, moreover, that
u is a decreasing function of
. And so, we can equivalently denote the process
by
. More formally, every word in
with probability measure one is in the typical set of process
. Thus, sweeping
controls which subsets (atypical sets) outside the typical set we focus on. And, applying the operator
determines the engine functionality for realizations in that subset, as we now show.
9. Functional Fluctuations
Let us draw out the consequences and applications of this theory of functional fluctuations. First, we ground the results by identifying the range of functionality that arises as an information ratchet (introduced earlier) operates. Then, we turn to showing how to calculate the probability of its fluctuating functionalities.
9.1. An Information Ratchet Fluctuates
Recall the information ratchet introduced in
Section 4, but now set its Markov dynamic parameters
and
and put it in contact with an information reservoir that generates IID symbol sequences with bias
. Operating the input reservoir for a sufficiently long period, with high probability, we observe a sequence that has nearly
0 s in it. Using Equations (
3) and (
4), we see positive work
and positive entropy production
. Then, according to the IPSL functionalities in
Table 1, the ratchet typically operates as an engine.
What thermodynamic functionalities occur when the input fluctuates outside the typical set? Sweeping
controls which subsets outside the typical set are expressed and, consequently, which fluctuation subsets are accessible. Recall that the input process is specified by the unifilar HMM in
Figure 3a. For this input, as a result of the ratchet design,
is the same as
, except that
b is shifted to
. Different sequence–probability decay rates
u are calculated from Equation (
8). Then, feeding the new process to the ratchet,
is calculated from Equation (
3), again by changing
b to
. Denote this work quantity
.
Figure 7 shows the dissipated work
(u) and the difference between the output and input Shannon entropy rate versus the fluctuating decay rate
u. There are several observations to make, before associating the thermodynamic function.
First, let us locate the input typical set. This occurs at a u such that . The figure identifies it with a vertical line, so labeled.
Second, the input process’ ground states occur as
since
u is a decreasing function of
. As a consequence of Equation (
7), this subset corresponds to the sequence with the highest probability. In this case, this is the all-0 s sequence with
. The other extreme is at
, corresponding to the lowest-probability, allowed sequence. This is the all-1 s sequence with
. Note that there is only a single sequence associated with
and only one with
.
Third, to complete the task of identifying function, we must determine the average work
as a function of energy
u. From the figure, we see that the dissipated work
is linear in the decay rate
u.
Appendix D derives this and also shows that the maximum work over all subsets—all
or all allowed decay rates
u—is independent of the input process bias. This is perhaps puzzling as bias clearly controls the ratchet’s thermodynamic behavior. Thus, assuming an IID input, the maximum work is a property of the ratchet itself and not the input—the maximum work playing a role rather analogous to how Shannon’s channel capacity is a channel property.
To better understand how the ratchet operates thermodynamically, consider the ground state of the input process, which as just noted has only a single member, the all-0 sequence with zero entropy rate . If we feed this sequence into the ratchet, the ratchet adds stochasticity which appears in the output sequence. The first 0 fed to the ratchet leads to a 0 on the output. For the next 0 fed in, with probability p the ratchet outputs 1 and with probability it outputs 0. The entropy rate of the output sequence then is .
To generate this sequence, we simply use the
-machine in
Figure 3 with
. With this biased process as input, using Equation (
3), we find
.
Table 1 then tells us that if we feed the ground state of the input process to the ratchet, it functions as an engine. At the other extreme,
, the only fluctuation subset member is the all-1 s sequence with
. Again, the ratchet adds stochasticity and the output has
. To generate this input sequence, we simply use the
-machine in
Figure 3 with
. With this process as an input, we use Equation (
3) again and find negative work
.
Table 1 now tells us that feeding in this extreme sequence (input fluctuation) the ratchet functions as a dud.
Overall,
Table 1 allows one to identify the regimes of
u associated with distinct thermodynamic functionality. These are indicated in
Figure 7 with the green region corresponding to engine functioning, red to eraser functioning, and yellow to dud. We conclude that the ratchet’s thermodynamic functioning depends substantially on fluctuations and so will itself fluctuate over time. In particular, engine functionality occurs only at relatively low input fluctuation energies, seen on
Figure 7’s left side, and encompasses the typical set, as a consequence of our design. Rather nearby the engine regime, though, is a narrow one of no functioning at all—a dud. In fact, though the ratchet was designed as an engine, we see that, over most of the range of fluctuations, with the given parameter setting, the ratchet operates as an eraser.
9.2. Probable Functional Fluctuations
In this way, we see that typical-set functionality can be extended to all input realizations—that is, to all fluctuation subsets. The results give insight into the variability in thermodynamic function and a direct sense of its robustness or lack thereof. Now, we answer two questions that are particularly pertinent in the present setting of events (sequences) whose probabilities decay exponentially fast and so may be practically never observed. How probable are fluctuations in thermodynamic functioning? And, the related question, how probable are each of the fluctuation subsets? Exploring one example, we will show that the functional fluctuations are, in fact, quite observable not only with short sequences, perhaps expectedly, but also over relatively long sequences, such as .
The second question calls for determining
. However, in the large-
ℓ limit, this quantity vanishes. So, it is rather more natural to ask
how it converges to zero. Since we are considering ergodic stationary processes, we can apply the large deviation principle: the probability of every subset
vanishes exponentially with
ℓ. However, each subset
has a different exponent which is the subset’s
large deviation rate [
35]:
Since all these
w have the same probability decay rate
u,
decomposes to two components. The first gives the number
of sequences in the subset and the second the probability
of individual sequences. That is,
The size of the subsets also grows exponentially with
ℓ, each subset with a different exponent. To monitor this, we define a new function:
Previously, we showed that
, where
is
Shannon entropy and
from Equation (
8) [
38]. These results allow one to calculate
for any subset using the following expressions:
Figure 8 plots
for our example information ratchets. As with the previous figure, when realizations from the typical set are fed in, the transducer functions as an engine. We now see that the typical set has a zero large deviation rate. That is, in the limit of infinite length, the probability of observing realizations in the typical set goes to one. In terms of thermodynamic functioning, the transducer operates as an engine over long periods with probability one. Complementarily, in the infinite length limit, the probability of the other “fluctuation” subsets vanishes.
In reality, though, one only observes finite-length sequences. And so, the operant question here is, are functional fluctuations observable at finite lengths? As we alluded to earlier, the expectation is that short sequences should enhance their observation.
Consider the input process in
Figure 3a and assume the input’s realization length is
. We have
distinct input sequences that are partitioned into 101 fluctuation subsets with different energy densities—subsets of sequences with
ℓ 0 s and
1 s for
. Let us calculate the probability of each of these fluctuation subsets occurring analytically. The probability of each versus its energy is shown in
Figure 8 as the blue dotted line. To distinguish it from the energy density of fluctuation subsets at infinite length we label the energy density of each of these sets with
; the index 100 reminds us that we are examining input sequences of length
. There are 101 blue points on the figure, each representing one of the fluctuation subsets. (Most are obscured by other tokens, though.) If we feed the first 13 of the 101 fluctuation subsets (the first 13 blue points on the left of the figure) to the transducer, it functions as an engine. Summing the probabilities of these engine subsets, we see that the transducer functions as an engine
of the time, which is quite probable, even though it operates on sequences of length 100 that are individually highly improbable.
To verify the analytical results, we also performed extensive numerical simulations that drove the ratchet with a sequence of length
. We divided the input sequence into time intervals of length 100 and estimated the generated work and other observables, such as energy, during each interval. The star tokens in
Figure 7 show the estimated average work in each interval with a decay rate
u versus the decay rate itself. The numerical estimates agree closely with the analytical result.
Figure 8 also shows the probabilities of each of these atypical subsets estimated from the simulations, which also validates the analytical results.
Let us return to the remaining question: how probable are fluctuations in thermodynamic functioning? The answer is given by the large deviation rate for
. Since
is a function of
u, one can use the contraction principle [
35] and relate the large deviation rate of
in terms of a large deviation rate of
u via the following:
Since
is a one-to-one function, the minimization above may be removed.
10. Discussion
10.1. Related Work
The new results here on memoryful information engines are also complementary to previous studies of fluctuations in the efficiency of a nanoscale heat engine [
42,
43,
44], a particular form of information engine.
10.2. Relation to Fluctuation Theorems
To head off confusion, and anticipate a key theme, note that the “statistical fluctuation” above differs importantly from the sense used to describe variations in mesoscopic quantities when controlling small-scale thermodynamic systems. This latter sense is found in the recently famous fluctuation theorem for the probability of positive and negative entropy production
during macroscopic thermodynamic manipulations [
45,
46,
47,
48,
49,
50,
51]:
Both kinds of fluctuation are ubiquitous, often dominating equilibrium finite-size systems and finite and infinite nonequilibrium steady-state systems. Differences acknowledged, there are important connections between statistical fluctuations in microstates observed in steady state and fluctuations in thermodynamic variables encountered during general control: for one, they are deeply implicated in expressed thermodynamic function. Is a system operating as an engine—converting thermal fluctuations to useful work—or as an eraser—depleting energy reservoirs to reduce entropy—or not functioning at all?
11. Conclusions
We synthesized statistical fluctuations—as entailed in Shannon’s Asymptotic Equipartition Property [
1] and large deviation theory [
35,
52,
53]—and functional thermodynamics—as determined using the new informational second law [
3]—to predict spontaneous variations in thermodynamic functioning. In short, there is simultaneous, inherently parallel, thermodynamic processing that is functionally distinct and possibly in competition. This strongly suggests that, even when in a nonequilibrium steady state, a single nanoscale device or biomolecule can be both an engine and an eraser. And, we showed that these functional fluctuations need not be rare. This complements similar previous results on fluctuations in small-scale engine efficiency [
42,
43,
54]. The conclusion is that functional fluctuations should be readily observable and the prediction experimentally testable.
A main point motivating this effort was to call into question the widespread habit of ascribing a single functionality to a given system and, once that veil has lifted, to appreciate the broad consequences. To drive them home, since biomolecular systems are rather like the information ratchet here, they should exhibit measurably different thermodynamic functions as they behave. If this prediction holds, then the biological world is vastly richer than we thought and it will demand of us a greatly refined vocabulary and greatly improved theoretical and experimental tools to adequately probe and analyze this new modality of parallel functioning.
That said, thoroughness forces us to return to our earlier caveat (
Section 9) concerning not conflating various “temperatures”. If we give the input information reservoir and the output information reservoir physical implementations, then the fluctuation indices
and
take on thermal physical meaning and so can be related to the ratchet’s thermodynamic temperature
T. Doing so, however, would take us too far afield here, but it will be necessary for a complete understanding.
Looking forward, there are many challenges. First, note that technically speaking we introduced a fluctuation theory for memoryful stochastic transducers, but by way of the example of Ref. [
3]’s information ratchet. A thoroughgoing development must be carried out in much more generality using the tools of Refs. [
29,
38], if we are to fully understand the functionality of thermodynamic processes that transform inputs to outputs, environmental stimulus to environmental action.
Second, the role of the Jarzynski–Crooks theory for fluctuations in thermodynamic observables needs to be made explicit and directly related to statistical fluctuations, in the sense emphasized here. One reason is that their theory bears directly on controlling thermodynamic systems and the resulting macroscopic fluctuations. To draw the parallel more closely, following the fluctuation theory for transitions between nonequilibrium steady states [
55], we could drive the ratchet parameters
p and
q and input bias
b between different functional regimes and monitor the entropy production fluctuations to test how the theory fares for memoryful processes. In any case, efficacy in control will also be modulated by statistical fluctuations.
Not surprisingly, there is much to do. Let us turn to a larger motivation and perhaps larger consequences to motivate future efforts.
As just noted, fluctuations are key to nanoscale physics and molecular biology. We showed that fluctuations are deeply implicated both in identifying thermodynamic function and in the very operation of small-scale systems. In fact, fluctuations are critical to life—its proper and robust functioning. The perspective arising from parallel thermodynamic function is that, rather than fluctuations standing in contradiction to life processes, potentially corrupting them, there may be a positive role for fluctuations and parallel thermodynamic functioning. Once that is acknowledged, it is a short step to realize that biological evolution may have already harnessed them to good thermodynamic effect. Manifestations are clearly worth looking for.
It now seems highly likely that fluctuations engender more than mere health and homeostasis. It is a commonplace that biological evolution is nothing, if not opportunistic. If so, then it would evolve cellular biological thermodynamic processes that actively leverage fluctuations. Mirroring Maxwell’s Demon’s need for fluctuations to operate, biological evolution itself advances only when there are fluctuations. For example, biomolecular mutation processes engender a distribution of phenotypes and fitnesses; fodder for driving selection and so evolutionary innovation. This, then, is Darwin’s Demon—a mechanism that ratchets in favorable fluctuations for a positive thermodynamic and then positive survival benefit. The generality of results and methods here give new insight into thermodynamic functioning in the presence of fluctuations that should apply at many different scales of life, including its emergence and evolution.