1. Introduction
The reconstruction of stochastic evolution equations from time-series data in terms of the Langevin equation and the corresponding Fokker–Planck equation is often challenged by the inevitably finite temporal sampling of time-series data. Moreover, the Fokker–Planck equation is restricted to continuous stochastic processes, i.e., diffusion, and thus cannot adequately describe discontinuous transitions in time-series data. A more general description of continuous and discontinuous stochastic processes can be constructed using the Kramers–Moyal equation [
1,
2] given by
of which the Fokker–Planck equation is a particular case (
for
). The Kramers–Moyal equation serves as a stepping stone to adequately describe time-series data with both diffusive and discontinuous characteristics, but it is nevertheless challenged by finite-time sampling in real-world data. Recent applications of the Kramers–Moyal equation include brain [
3,
4] and heart dynamics [
5], stochastic harmonic oscillators [
6], renewable-energy generation [
7], solar irradiance [
8], turbulence [
9], nano-scale friction [
10], and X-ray imaging [
11,
12].
Previous work has demonstrated that a finite sampling interval
not only influences the first- and second-order Kramers–Moyal (KM) coefficients [
13] but also causes non-vanishing, higher-order (>2) coefficients [
4,
14,
15,
16,
17,
18,
19,
20]. A more recent example is the jump-diffusion process discussed in reference [
3]
where
is a drift function,
is the diffusion associated with an uncorrelated Brownian motion or Wiener process
,
is a Poisson process with jump rate
, independent of
, and
is Gaussian-distributed
with zero mean and variance
s. For such jump-diffusion processes, additional influences of the finite temporal sampling need to be taken into account. As shown in reference [
8], jump events produce terms of order
in the KM coefficients of even orders and the jump rate and amplitude induce terms of order
in all coefficients. These influences are heightened for bivariate jump-diffusion processes [
21], since terms of order
impact higher-order (≥4) coefficients [
22].
Although most of the aforementioned studies reported on finite-time corrections for KM coefficients and/or conditional moments of various orders, we still lack an explicit arbitrary-order correction or a closed-form solution in which the conditional moments are represented as functions of the KM coefficients and vice versa. In this article, we derive a full expansion of the generator of the Kramers–Moyal operator in exponential form for one-dimensional Markovian processes. This is equivalent to van Kampen’s system-size expansion, which is taken over a finite time interval
[
23,
24]. The derivations presented henceforth are generally applicable to Markovian diffusion as well as jump-diffusion processes.
On a more general level, our solution is an explicit approximate solution of the Kramers–Moyal equation [
1,
2], which generalises the Fokker–Planck equation for discontinuous processes [
13,
23,
25]. Our approximation of the Kramers–Moyal operator can be taken as an arbitrary order. In particular, we focus on the solution of this partial differential equation by representing the Kramers–Moyal operator in an exponential form and equating the conditional moments with the KM coefficients after representing the exponential operator as a power series. This representation of the exponential operator can similarly be used in other problems with an equivalent formulation [
26,
27,
28] or similar discontinuous stochastic processes with different jump distributions, e.g., the Gamma distribution [
29,
30].
2. Mathematical Background
The Fokker–Planck(–Kolmogorov) equation (Kolmogorov forward equation or Smoluchowski equation) for the conditional probability density
, that is well-known within the fields of physics and mathematics, yields the propagation in time and space of any diffusion (thus continuous) process, is given by [
31]
We restrict our investigation to stationary processes, hence
. Equation (
2) describes the evolution of, for instance, a Brownian particle (for the case
), which results in the known heat equation, or more complicated Markovian motions with drift. Here, one recognises the function
, the first KM coefficient, commonly denoted as drift, and the function
, the second KM coefficient, commonly denoted as diffusion or volatility. The Fokker–Planck equation is, nevertheless, only valid for continuous motions and thus cannot describe jump-diffusion processes as in the case in Equation (
1) or other stochastic motions with discontinuous paths.
A more general equation—the so-called Kramers–Moyal equation—takes higher-order KM coefficients
into account
where
denotes the Kramers–Moyal operator defined as the power series [
1,
2]
which we will subsequently solve for
and an appropriate starting condition by exponentiating the Kramers–Moyal operator
.
When examining a stochastic process in terms of time-series data, there is no direct access to the KM coefficients
but rather to the conditional moments of the data. The
th-order conditional moment
is given by
The KM coefficients
can be retrieved from the conditional moments
via
When dealing with real-world data, we do not have access to infinite temporal resolution, meaning that the above limit
is not possible. A best-case scenario is to analyse the smallest possible temporal differences. If the data are sampled at
time steps, take
In order to non-parametrically retrieve the conditional moments
from data, a set of histogram or Nadaraya–Watson estimators can be utilised (see Refs. [
29,
32] for details). Here, we will focus not on how to estimate the conditional moments but rather on how to derive a set of finite-time corrections to estimate the KM coefficients from conditional moments. These can be retrieved from data with software packages like
kramersmoyal [
33] or
JumpDiff [
34] in
Python or
Langevin [
35] in
R.
3. The Formal Solution of the Kramers–Moyal Equation and Its Approximations
First, we explicitly derive the corrective terms and subsequently link these to the results in reference [
36], connecting them to the relation between statistical cumulants and moments [
13].
Let us assume a well-defined initial state of the Kramers–Moyal equation be given by
. The formal solution of the time-dependent Kramers–Moyal equation (
3) is given by
where
is a normalisable function, such that
. We will now proceed to show the first-, second-, third-order, and arbitrary-order approximation to the solution of this partial differential equation with this particular initial condition.
3.1. The First- and Second-Order Approximations
The first-order approximation of the formal solution of Equation (
5) is given by
yielding for the conditional moments
in Equation (
4)
where the large square brackets indicate that the derivation operation is limited to the terms within the brackets. The superscript
indicates the order of approximation.
The second-order approximation is obtained in a similar fashion, now including the quadratic term from the exponential representation Equation (
5), i.e.,
To alleviate the notation, we refer to the KM coefficient without explicit state dependencies, i.e.,
. The second-order approximation
of the n-th conditional moment in Equation (
4) reads
The first integral is only non-vanishing if
and the second integral is only non-vanishing if
, with
. Hence,
Separating the terms between those with explicit derivatives of the KM coefficients and those without, it is immediately clear that the second-order approximation follows a structure given by the partial ordinary Bell polynomials
[
37]
where the summation is taken over
such that
The first- and second-order approximations can be written with the help of the partial ordinary Bell polynomials with
and
, respectively,
where
incorporates all derivatives of the KM coefficients from the 2nd-order corrections, and is given by
To simplify the description, we introduce a short-hand notation and take the superscript
in the KM coefficients:
.
These results are in line with those reported for diffusion-type processes [
16,
17,
18,
19,
38], where the Kramers–Moyal operator
reduces to the Fokker–Planck operator and we are solely left with the first two KM coefficients, as in Equation (
2). In particular, applying the second-order approximation in Equation (
8) to the two first KM coefficients results in
and truncating the sums at second order yields the expressions in reference [
16].
3.2. The Third-Order Approximation
Before we introduce the general formalism for the arbitrary-order approximation, we explicitly derive the third-order approximation
which leads to
Notice that the first integral is only non-vanishing for the combination
, which can again be expressed via the partial ordinary Bell polynomial
, where
, for the third-order approximation. The second expression requires
as well as
. Separating these again into two expressions, one with and another without derivatives, we can express the third-order approximation as
where
incorporates all derivatives of the KM coefficients from the third-order corrections
Here, we compare our derivation to the derivation of third-order approximation in Gottschall and Peinke [
16]. We note that our derivation takes the general form of the Kramers–Moyal operator, to which the Fokker–Planck operator is circumscribed. From Equation (
9), we derive an identical expression for the Fokker–Planck operator reported in reference [
16]. Since the Fokker–Planck operator is limited to second-order terms, i.e.,
for
, the sum in Equation (
10) can be express in full. For the first conditional moment
, we obtain the corrective terms
given by
which is identical to Equation (A1) in the Appendix of reference [
16]. Similarly, for the second conditional moment
, we obtain the corrective terms
which is in agreement with Equation (A2) in the Appendix of reference [
16]. A similar derivation can be found in Appendix B of reference [
8], which also yields congruent findings for the first two conditional moments of jump-diffusion processes. However, no explicit expression for all terms is given in either publication.
As a simple rule of thumb, one can confer if the result is correct, as follows: the sum of the order of the KM coefficients subtracted by the derivation operation must equal n, the order of the conditional moment being calculated. In the notation used in this work, the sum of subscripts minus the sum of superscripts must equal the order n of the coefficient under investigation.
3.3. Arbitrary-Order Approximation
We now derive the arbitrary-order corrections of the Kramers–Moyal operator. This is done by induction from the previous derivations, whilst disregarding any emerging terms with derivatives of the KM coefficients
with
a partition of a set of
obeying Equation (
7). This, in turn, is the same as a collection of partial Bell polynomials, namely
where we combine terms with derivatives in
. If we disregard the derivative terms, the summation has an upper bound, namely
. This is directly seen as the Bell polynomials are similarly bounded, and thus we arrive at
neglecting the derivative terms
.
From the perspective of estimation, the aim is to determine the KM coefficients , however what we have expressed here is the relation of the conditional moments . As we now have an explicit relation in terms of partial Bell polynomials, we will invert the relation and express the KM coefficients as functions of the conditional moments .
Note that the first conditional moment
is solely a function of the first KM coefficient
. The second conditional moment
is a function of the second KM coefficient
, and by substitution, a function of the first conditional moment
, given by Equation (
11). Subsequently
is a function of
,
, and
. Thus, by recursively substituting the
KM coefficients by their expressions via the conditional moments, we obtain a relation of
as a function of the
conditional moments.
To this end, we rewrite Equation (
11) in terms of the partial exponential Bell polynomials
where the summation terms obey the constraints of the Bell polynomials given in Equation (
7). This can be expressed through the partial ordinary Bell polynomials in Equation (
6) as
Thus, Equation (
11) reads
We can then utilise the reciprocal relations of the partial exponential Bell polynomials: for a set of variables
, defined as functions of
n other variables
given by
the inverse relation holds
With this, we can finally express any KM coefficients
from the nth-order power series expansion, neglecting the derivative terms
, as
We note here that these relations are equivalent to the relation between cumulants and (non-central) moments of a probability distribution [
13,
36]. Let
be the moment-generating function, such that
with
the (non-central) moments and
the cumulant-generating function. For
, the cumulants
and the central moments are the same (e.g., the mean and variance). This is not the same for higher cumulants and moments. The relation between the cumulants
and the (non-central) moments
is given by the reciprocal relation of the Bell polynomials, as in Equations (
12) and (
13). This is in line with our exponential representation of the Kramers–Moyal operator. Here, the KM coefficients are the cumulants (with the exception of the
term).
4. Exemplary Cases with Constant Diffusion and Constant Jumps
Here, we present two illustrative examples: first, a constant diffusion process, the Ornstein–Uhlenbeck process; secondly, we augment this process with jumps to obtain a jump-diffusion process. We implement the corrective terms derived thus far to show the impact of the finite-time corrections. This choice of parameters, i.e., constant diffusion and constant jumps, considerably simplifies Equation (
1) to
where
is the state-dependent linear drift function, with
, also denoted mean-reverting strength,
a constant diffusion,
a Brownian motion or Wiener process,
a state-independent and normally distributed jump amplitude with zero mean and variance
s, and
a Poisson process with jump rate
. Note that the conventional Ornstein–Uhlenbeck process is recovered if we omit the jump process.
We have derived an expression for the conditional moments
as a function of the KM coefficients
, given by Equation (
11), which is valid for any Markovian diffusion or jump-diffision process. For our particular application to the Poissonian jump-diffusion process in Equation (
1) we require at least the first six KM coefficients/first six moments. These are given by
We invert this expression explicitly using Equation (
14) and report on the KM coefficients as functions of the conditional moments, which are given by
We again note that these expressions are valid for any case of diffusion and jump-diffusion processes. In the first case, where there are no jump terms in Equation (
15), i.e., the Ornstein–Uhlenbeck process, we know that all KM coefficients
with
are zero. However, this is not the case when estimating the coefficients from time-series data, i.e., from one realisation of the stochastic process sampled at finite resolution. It is common to find that these terms do not vanish due to finite-time effects. In our second case with a jump-diffusion process, the KM coefficients
with
can be related directly to the jump parameters. These relations were derived in reference [
3], and are given by
where
, for Gaussian distributions with zero mean and variance
s.
We will now compare the derived theoretical corrections to KM coefficients estimated from numerically generated time-series data. In
Figure 1 and
Figure 2, we display the second-, fourth-, and sixth-order KM coefficients
,
, and
estimated with the first-order, second-order, and full-order approximations given by Equation (
16) (or in general Equation (
14)). The full-order approximations have the same order as the KM coefficients, i.e, second-, fourth-, and sixth-order approximation for
,
, and
, respectively. For the data shown in
Figure 1, we use a Euler–Maruyama scheme to numerically integrate an Ornstein–Uhlenbeck process Equation (
15) (without the jump terms) with parameters: drift
and diffusion
(
). We numerically integrate this process with a coarse time-step
to deliberately emphasise the finite-time effects on the aforementioned KM coefficients. For example, the second-order KM coefficients
takes a quadratic form, despite the fact that the diffusion term is constant. The KM coefficients
and
are not truly zero, as would be expected for purely diffusive processes [
39,
40], due to the finite-time effects, but the full-order finite-time correction approximates the theoretical values with far greater detail.
For the data shown in
Figure 2 we follow a similar approach, now augmenting the Ornstein–Uhlenbeck process with Poissonian jumps, i.e., as given in Equation (
15). The parameters are as follows: drift
, diffusion
, jump amplitude with a Gaussian distribution with variance
and zero mean, a Poissonian jump rate
, and a time step
. For this process, we know the higher-order KM coefficients
and
reflect the presence of discontinuous paths, which, for our particular case of the Poissonian jump-Ornstein–Uhlenbeck process, we know the explicit inversion in Equation (
17) (cf. reference [
3]). For the chosen coarse time step, we notice that the estimations do not correspond exactly with the theoretical values, regardless of the order of finite-time correction chosen. This can likely be traced back to the limitations of the Kramers–Moyal equation to fully capture discontinuous stochastic processes (cf. reference [
41]). Nevertheless, the higher-order finite-time corrections approximate the theoretical values with greater accuracy.
We note here that the parameter estimation from data heavily depends on the number of data points and the sampling rate of numerically simulated or real-world time-series data. Real-world time-series data can often be sampled at higher sampling rates, but not always in such a large number of datapoints. A closer inspection of the limitations of both the sampling rate and the number of data points in parameter estimation is necessary, but falls outside the scope of this publication. Moreover, it should be emphasised that, prior to any examination of time-series data within the purview of either the Fokker–Planck or the Kramers–Moyal equation, the Markov property of the data must be account for, i.e., a vanishing memory of the increments of the data. This can be examined, for example, via the Chapman–Kolmogorov equality [
13].
Summarising our findings, we conclude that our proposed arbitrary-order finite-time corrections considerably help in differentiating one-dimensional purely diffusive processes and jump-diffusion processes, as these accurately show that higher-order KM coefficients
,
vanish for purely diffusive processes. These arbitrary-order finite-time corrections should now also be considered for
N-dimensional stochastic processes. A first examination of the second-order finite-time corrections for two-dimensional processes was recently addressed in reference [
22]. Note that the one-dimensional second-order finite-time correction for these KM coefficients was recently addressed in another publication [
34]. Here, it is extended to arbitrary order.