LC *<sup>a</sup>* <sup>D</sup>*<sup>ν</sup> <sup>t</sup> X*p*t*q " *σ X*p*t*q, *ν* P p0, 1q. *<sup>X</sup>*p0q " *<sup>X</sup>*0. (19)

Without loss of generality, let us assume that *σ* ă 0. The numerical scheme of (19) is to find <sup>Z</sup>0, <sup>Z</sup><sup>1</sup> <sup>P</sup> <sup>V</sup>p*r*<sup>q</sup> such that

$$\begin{cases} \mathcal{Z}\_0(t\_l^-) \, w\_0(t\_l^-) - \mathcal{Z}\_0(t\_{l-1}^-) \, w\_0(t\_{l-1}^+) - \left(\mathcal{Z}\_1(t), \, w\_0(t)\right)\_l - \left(\mathcal{Z}\_0(t), \, w\_0'(t)\right)\_l = 0, \\\left(\mathcal{Q}\_t^{(1-\nu)} \, \mathcal{Z}\_1(t), \, w\_1(t)\right)\_l = \sigma\left(\mathcal{Z}\_0(t), \, w\_1(t)\right)\_l, \\\left(\mathcal{Z}\_0(t\_0^-) - \mathcal{X}\_0 = 0, \end{cases} \tag{20}$$

for all *<sup>w</sup>*0, *<sup>w</sup>*<sup>1</sup> <sup>P</sup> <sup>V</sup>p*r*q, and *<sup>l</sup>* " 1, 2, ... , *<sup>J</sup>*. Let us state the next lemma, which based on the semigroup properties of fractional integral operators and will be used below, a proof of which can be found in Reference [38].

**Lemma 1.** *Suppose that ν* P p0, 1q*, then we have*

$$\left\langle \left\langle \, \mathcal{J}\_t^{1-\nu} u, u \right\rangle\_l \right\rangle\_l = \left\langle \, \left\langle \, \mathcal{J}\_t^{\frac{1-\nu}{2}} u, \, \, \_t \mathcal{Z}\_{t\_l}^{\frac{1-\nu}{2}} u \right\rangle\_l = \cos \left( \frac{(1-\nu)\pi}{2} \right) \left\| u \right\|\_{H^{\frac{1-\nu}{2}}}^2 \left( [0, t\_l] \right).$$

Let us assume that <sup>Z</sup>r0, <sup>Z</sup>r<sup>1</sup> <sup>P</sup> <sup>V</sup>p*r*<sup>q</sup> be the approximate solutions of <sup>Z</sup>0, <sup>Z</sup><sup>1</sup> respectively. Now, the numerical errors are defined as *EXi* :" Zr*<sup>i</sup>* ´ Z*<sup>i</sup>* for *<sup>i</sup>* " 0, 1. It can be seen that Zr<sup>0</sup> and Zr<sup>1</sup> both satisfy (20). If we subtract Equation (20) from the same equations with Zr<sup>0</sup> and Zr1, the following error equations will be obtained

$$\begin{cases} \begin{aligned} \left(E\_{X\_0}(t\_I^-) \, w\_0(t\_I^-) - E\_{X\_0}(t\_{I-1}^-) \, w\_0(t\_{I-1}^+) - \left(E\_{X\_1}(t), \, w\_0(t)\right)\_I - \left(E\_{X\_0}(t), \, w\_0'(t)\right)\_I \right)\_I &= 0, \\ -\frac{1}{\sigma} \left(\,\_0\mathcal{Z}\_t^{(1-\nu)} E\_{X\_1}(t), \, w\_1(t)\right)\_I &= -\left(E\_{X\_0}(t), \, w\_1(t)\right)\_I \end{aligned} \tag{21}$$

which holds for all *<sup>w</sup>*0, *<sup>w</sup>*<sup>1</sup> <sup>P</sup> <sup>V</sup>p*r*q. Taking *<sup>w</sup>*<sup>0</sup> " *EX*<sup>0</sup> and *<sup>w</sup>*<sup>1</sup> " *EX*<sup>1</sup> in (21) followed by collecting these two equations, we conclude that

$$\left(E\_{X\_0}^2(t\_I^-) - E\_{X\_0}(t\_{I-1}^-), E\_{X\_0}(t\_{I-1}^+) - \left(E\_{X\_0}(t), E\_{X\_0}^{\prime}(t)\right)\_I - \frac{1}{\sigma} \left(\rho\_0^{\prime (1-\nu)} E\_{X\_1}(t), E\_{X\_1}(t)\right)\_I\right) = 0.$$

To deal with the third term, we utilize the identity ´ *u*, *du dt* ¯ *l* " p*u*2p*t* ´ *<sup>l</sup>* q ´ *<sup>u</sup>*2p*<sup>t</sup>* ` *<sup>l</sup>*´1qq{2 with *<sup>u</sup>* " *EX*<sup>0</sup> . Hence, we multiply the preceding equation by two. Adding and subtracting *E*<sup>2</sup> *X*0 p*t* ´ *<sup>l</sup>*´1<sup>q</sup> to the modified equation and rearranging the terms to obtain

$$\left(E\_{X\_0}(t\_{I-1}^+) - E\_{X\_0}(t\_{I-1}^-)\right)^2 + \left(E\_{X\_0}^2(t\_I^-) - E\_{X\_0}^2(t\_{I-1}^-)\right) - \frac{2}{\sigma} \left(\rho \mathcal{Z}\_t^{(1-\nu)} E\_{X\_1}(t), E\_{X\_1}(t)\right)\_l = 0.5$$

By summing over elements for *l* " 1, . . . , *J*, we get

$$E\_{X\_0}^2(t\_I^-) - E\_{X\_0}^2(t\_0^-) + \sum\_{l=1}^{J} \left( E\_{X\_0}(t\_{l-1}^+) - E\_{X\_0}(t\_{l-1}^-) \right)^2 - \frac{2}{\sigma} \left\langle \,\_0 \mathcal{Z}\_t^{(1-\nu)} E\_{X\_1}(t), \, E\_{X\_1}(t) \right\rangle\_{\mathbb{J}} = 0.$$

By using Lemma 1, we have established the following stability of the LDG in the *L*<sup>8</sup> norm for (20) (see also References [38,40]:

**Lemma 2.** *We have the following L*<sup>8</sup> *stability of the LDG scheme* (20) *and for the numerical errors hold*

$$E\_{X\_0}^2(t\_I^-) = E\_{X\_0}^2(t\_0^-) - \sum\_{l=1}^{l} \left( E\_{X\_0}(t\_{l-1}^+) - E\_{X\_0}(t\_{l-1}^-) \right)^2 + \frac{2}{\sigma} \cos \left( \frac{(1-\nu)\pi}{2} \right) \left\| E\_{X\_1} \right\|\_{H^{\frac{1-\nu}{2}}([0,t\_f])}^2 \tag{22}$$

We close this section by pointing out some facts about the order of convergence of the proposed LDG scheme. In Reference [38] it is shown that the solution can be calculated with optimal order of convergence p*r* ` 1q in the *L*<sup>2</sup> norm. In this work the mechanism of superconvergence is also discussed. The authors observed the superconvergence of order p*r* ` 1q ` mint*r*, *ν*u at downwind point of each element.

#### **5. Numerical Results and Discussions**

In this section, we present some results of computations using the proposed LDG scheme described in the preceding sections to test their accuracy and efficiency when applied to the logistic equation. To assess the accuracy of the present numerical algorithms, we calculate the difference between the true exact and numerical solutions whenever the exact solution is available. For this purpose, we also consider a linear fractional population model and then we solve the fractional logistic equation numerically.

In order to asses the numerical scheme more qualitatively, by EOC we denote the estimated order of convergence calculated through defining

$$EOC := \log\_2\left(\frac{E\_a(h)}{E\_a(h/2)}\right).$$

where *Ea*p*h*q is the absolute error corresponding to the step-size *h*. Moreover, to test the validity and accuracy of proposed LDG method and to make a comparison between our numerical model results with the results of other existing methods, we employ the predictor-corrector PECE method of Adams-Bashforth-Moulton type considered in Reference [43] as well as the implicit product integration of trapezoidal type described in Reference [24]. All experimental computations have been done by using MATLAB R2017a.

#### *5.1. Linear Model*

In this section, we consider a linear test problem to show the effectiveness of the proposed LDG approach. For this purpose, we consider the fractional population growth

$$\begin{cases} \ \ \_a^{\text{LC}} \mathcal{D}\_t^\nu X(t) = \sigma^\nu X(t), \quad t > 0, \\\ X(0) = X\_{0\prime} \end{cases} \tag{23}$$

where 0 ă *ν* ď 1 and *σ* ą 0. This model problem is previously studied in Reference [22] and can be considered as a generalization of the Malthusian model (1) to the fractional-order derivative. By the aid of the Laplace transform, the exact analytical solution of the initial-value problem can be obtained in terms of well-known Mittag-Leffler function [10]

$$X(t) = X\_0 \, E\_{\boldsymbol{\nu}}(\boldsymbol{\sigma}^{\boldsymbol{\nu}} \, ^t \boldsymbol{t}^{\boldsymbol{\nu}}), \quad E\_{\boldsymbol{\nu}}(z) = \sum\_{k=0}^{\infty} \frac{z^k}{\Gamma(k\, \boldsymbol{\nu} + 1)}.$$

Note that by taking *<sup>ν</sup>* " 1 the exact solution becomes *<sup>X</sup>*p*t*q " *<sup>X</sup>*<sup>0</sup> *<sup>e</sup><sup>σ</sup> <sup>t</sup>* .

To start computation, we take *σ* " 1 for simplicity and set *X*<sup>0</sup> " 3{4. By considering *ν* " 1 and *J* " 1, the approximate solutions for *r* " 3, 6, and *r* " 9 on the interval 0 ď *t* ď 2 are obtained as follows


*Entropy* **2020**, *22*, 1328

These approximations together with the corresponding absolute errors are depicted in Figure 1. Clearly, as *r* increased, more accurate results will be obtained. Note, in all cases, the step size is taken as *h* " 2. Moreover, we emphasize that numerical solutions for this model problem based on the fractional spline collocation scheme have been proposed in Reference [22] with achieved absolute errors larger than 1 ˆ 10´4, see Figure 2 in this paper. The parameters used in this approach related to *ν* " 1 were *M*<sup>1</sup> " 26, 27, 28, *N*<sup>1</sup> " 37, 69, 133, which obviously are much more greater than our used parameters.

**Figure 1.** The approximated LDG with exact solutions (**left**) and the corresponding absolute errors (**right**) for *J* " 1, *ν* " 1, *σ* " 1, *X*<sup>0</sup> " 0.75, and different *r* " 3, 6, 9.

Additionally, to justify our numerical model results, a comparison in Table 1 has been performed between the previous work on PECE [15,43] in terms of the number of (sub)intervals *J* is used in the computation. In this comparison, we compute the numerical solutions corresponding to *X*p2q as well as absolute errors <sup>|</sup>*X*p2q ´ <sup>Z</sup>0p2q| in these methods via different values of *<sup>J</sup>* " <sup>2</sup>*<sup>i</sup>* for *<sup>i</sup>* " 0, 1, ... 7. For our LDG method we take *r* " 2 and *ν* " 1. The last column in each method reports the corresponding EOC. The exact value of *X*p2q up to 30 digits is

$$X(2) = 5.54179207419798736111715697916... $$


**Table 1.** Comparison of absolute errors in LDG with *r* " 2 and PECE for different number of interval *J* and *ν* " 1. Numbers in bold show that the correct digits are obtained by the LDG.

The observed EOC seen for PECE in Table 1 is approximately 2 as was proved in Reference [43]. However, the superconvergence EOC about 5 («2*r* + 1) is clearly achieved for our results. This comparison indicates the thoroughness of the proposed method.

The numerical solutions for various values of *ν* " 0.65, 0.75, 0.85, 0.95 using *r* " 5 and *J* " 1 are depicted in Figure 2, left plot. In all plots, the exact solutions are indicated by a solid line while the numerical counterpart are visualized by (coloured) dotted, dashed, and dash-dotted curves. Note that the computational domain is r0, 1s, which implies that the time step is *h* " 1. It can be seen from Figure 2 that the numerical solution obtained by the present LDG scheme has a good accuracy even using a relatively large time step and a low degree of the approximating polynomials. Furthermore, an appropriate choice of these computational parameters can improve the approximation accuracy.

**Figure 2.** The approximated LDG with exact solutions (**left**) and the corresponding absolute errors (**right**) for *J* " 1, *r* " 5, *σ* " 1, *X*<sup>0</sup> " 0.75, and various values of *ν* " 0.65, 0.75, 0.85, 0.95.

Finally, for the linear model problem (23), we investigate the standard L1 approximation method [44] and its variant known as the fast L1 method [45]. To implement these approaches, we use a uniform mesh with the step size *h* " 1{1000 on the interval r0, 1s. In the LDG scheme, we utilize *J* " 1 or *h* " 1 and *r* " 5 as the results shown in Figure 2. The numerical model results are presented in Table 2 for *ν* " 0.75 and *ν* " 0.5. For each *ν*, the corresponding exact solutions are also reported in the last column.

**Table 2.** Comparison of numerical solutions in LDG with *r* " 5, *h* " 1 and L1/fast L1 schemes with *<sup>h</sup>* " <sup>10</sup>´<sup>3</sup> for some *<sup>t</sup>* P r0, 1<sup>s</sup> and *<sup>ν</sup>* " 0.75, 0.5.


#### *5.2. Nonlinear Model*

We now consider the FLE (3) on r0, 1s with the initial condition given by *X*<sup>0</sup> " 1{2 and the parameter *σ* " 1{2. Using *ν* " 1, the analytical exact solution of the logistic equation is given by

$$X(t) = \frac{1}{1 + e^{-t/2}}.$$

The simulation results for this example can be found in Figures 3 and 4 for the number of elements equals to *J* " 5 and the polynomial degree *r* " 2. In Figure 3, we take *ν* " 1 to compare the numerical results to the exact solution. Furthermore, we also use different approaches to treat the nonlinear term in the weak formulation, that is, the D.C. and P.A., which are utilized to compute *nDC <sup>j</sup>* and *<sup>n</sup>PA <sup>i</sup>*,*<sup>j</sup>* in (15). As one can see that from Figure 3 that a slightly more accurate result is obtained by means of direct computation rather than product approximation, however, as mentioned it is more time-consuming. In order to observe the behaviour of numerical solutions more closely, a magnification of these solutions at *t* " 0.4 is done in Figure 3. The exact solution is depicted by a solid line.

**Figure 3.** Numerical solutions of LDG scheme using P.A. and D.C. approaches with *h* " 0.2, *σ* " 0.5, *X*<sup>0</sup> " 0.5, and *ν* " 1.0. The magnification of solutions at time *t* " 0.4 is plotted in the box. The exact solution is displayed by a solid line.

In the next experiment, we plot the absolute errors when utilizing two approaches D.C. and P.A., as one observes in Figure 4. The computational parameters are the same as those applied for Figure 3. In Figure 4, the left plot corresponds to the D.C. and the right plot is when we use P.A. technique. Note that in all plots we have divided further each interval *Ll* into ten subinterval uniformly to see the behaviour of the corresponding curves more precisely.

**Figure 4.** Absolute errors of LDG versus time using D.C. (**left**) and P.A. (**right**) approaches with *h* " 0.2, *σ* " 0.5, *X*<sup>0</sup> " 0.5, *ν* " 1.0, and *r* " 2. In the left and right plots, the upwind and downwind points are highlighted by black pentagon.

Let us interpret the numerical errors depicted in Figure 4. On the left picture in which the P.A. technique is used, the smallest errors are obtained at upwind points. Almost the same magnitude of errors is achieved at downwind points. On the contrary, on the right picture without using the P.A. this process is reversed. This implies that the minimum values of absolute errors are achieved at downwind points and there exist considerable difference between them and the errors obtained at upwind points in each *Ll*. In the next experiments, we compare the numerical errors achieved at the final point *T* " 1.0, which is clearly a downwind point in the first approach.

In Tables 3 and 4, we summarize the numerical results related to *X*p1q and its numerical approximation Z0p1q are obtained by the LDG procedure (9). Here, we use *r* " 1, 2 and a different choice of the number of grid points *J* " 1, 2, 4, 8 and 16 are utilized. In these tables, we further compare the performance of two different D.C. and P.A. approaches. All calculations are shown with 10 decimal places of accuracy. In the last column of each table, the estimated order of convergence (EOC) is given. The exact value is *X*p1q " 0.622459331201855.

**Table 3.** Comparison of absolute errors in LDG with *r* " 1 using P.A. and D.C. for different number of interval *J* and *ν* " 1. Numbers in bold show that the correct digits are obtained by the LDG.


**Table 4.** Comparison of absolute errors in LDG with *r* " 2 using P.A. and D.C. for different number of interval *J* and *ν* " 1. Numbers in bold show that the correct digits are obtained by the LDG.


It can be seen from Tables 3 and 4 that using *r* " 1 and *r* " 2 in the D.C. approach, the results are accurate respectively to 6 and 10 decimal places for only *J* " 4 intervals. In other words, achieving an order of accuracy equal to 3 and 5 is possible if one uses the LDG scheme with *r* " 1, 2 degree of polynomials and for a small number of elements. These EOC are also confirmed the superconvergence order at downwind points previously reported in Reference [38]. Note that by utilizing the P.A. technique, the obtained EOC is equal to 2. We emphasize also that using the scheme PECE for the nonlinear logistic equation the EOC at most 2 will be achieved and of course a larger number of intervals *J* is required. In the next plot, we examine the behaviour of the absolute errors in the log scale for various polynomial degrees as well as with respect to the number of elements *J*, see Figure 5.

**Figure 5.** Absolute-errors versus polynomial degrees *r* for *J* " 1, 2, 4 (**left**) and against the number of elements *J* for *r* " 0, 1, 2, 3 (**right**) evaluated at *T* " 1.0 and for *ν* " 1.

In the next experiment we show the impact of the fractional derivative on the approximated obtained solutions. In Figure 6 we present the approximated solutions at *J* " 4, *r* " 3 with different values of the fractional derivatives *ν* " 0.65, 0.75, 0.85, 0.95 as well as *ν* " 1.0. In these plots, we also compare the performance of two P.A. and D.C. approaches for these values of *ν*. In each case, for *ν* " 1.0 the exact solution is also shown by a solid line. To justify our computed results, the implicit product-integration of trapezoidal (IPIT) rule with the step size *h* " 1{256 is used [24].

From both depictions in Figure 6, one can observe that the numerical solutions for *ν* P p0, 1q are approaching to the solutions correspond to *ν* " 1 for which the exact solution is known. Of course, more reliable results is obtained through the D.C. as previously tested for *ν* " 1 in Tables 3 and 4.

**Figure 6.** The approximated LDG solutions versus time using P.A. (**left**) and D.C. (**right**) approaches with *J* " 4, *r* " 3, *σ*, *X*<sup>0</sup> " 0.5, and various values of *ν* " 0.65, 0.75, 0.85, 0.95, 1.0.

#### **6. Conclusions**

In this work, an approximation algorithm based on the LDG scheme is developed for the fractional-order logistic equation occurring in many biological and chemical phenomena. To be more precise, our numerical scheme based on discontinuous Galerkin finite element concept with Legendre basis functions yields to a set of nonlinear equations to be solved in each subinterval. The numerical stability in the linear case is proved and the order of convergence is also discussed. Beside the direct computation of the nonlinear term, the technique of product approximation is also utilized and then their performance are compared for various *J*,*r* and *ν*. We have tested the performance of the LDG scheme on the linear as well as nonlinear growth and logistic differential equations of fractional order. Comparing our numerical results with the PECE indicates that the present approaches produce an accurate approximation for the underlying model problems.

**Author Contributions:** Conceptualization, M.I. and H.M.S.; Methodology, H.M.S.; Software, M.I.; Validation, H.M.S.; Writing—original draft, M.I.; Writing—review & editing, M.I. and H.M.S. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Telegraphic Transport Processes and Their Fractional Generalization: A Review and Some Extensions**

**Jaume Masoliver**

Department of Condensed Matter Physics and Complex Systems Institute (UBICS), University of Barcelona, 08007 Barcelona, Catalonia, Spain; jaume.masoliver@ub.edu

**Abstract:** We address the problem of telegraphic transport in several dimensions. We review the derivation of two and three dimensional telegrapher's equations—as well as their fractional generalizations from microscopic random walk models for transport (normal and anomalous). We also present new results on solutions of the higher dimensional fractional equations.

**Keywords:** telegrapher's equations; fractional telegrapher's equation; continuous time random walk; transport problems

**PACS:** 02.50.Ey; 05.40.Fb; 05.40.Jc; 05.60.Cd

#### **1. Introduction**

In many physical situations particle transport through continuous media is described by transport equations which are typically derived from general physical principles as, for instance, the conservation of energy and momentum [1]. Classical cases are provided by the transport of neutron in a reactor or the photon transport in a highly scattering medium [2]. In their most general form transport equations (one of the first and most paradigmatic example is the Boltzmann equation) are nonlinear integro-differential equations often with an incompletely known scattering kernel [1,2]. It is therefore very difficult, not to say impossible, to attain exact analytical solutions of the problem and even obtaining numerical solutions is not an easy task. Moreover, numerical solutions might not reproduce, or even detect, important qualitative characteristics of the transport process [2].

These difficulties have traditionally lead to the search of simpler and easier way to handle approximations. One of the most universal approximation is modeling the transport process by diffusion processes. Such approximation greatly simplifies the description of the transport process because in the absence of any field driving the particle and the usually complicated transport equation is reduced to the much simpler diffusion equation:

$$\frac{\partial p}{\partial t} = D \nabla^2 p\_\prime \tag{1}$$

here *p*(**r**, *t*) is the probability density function (PDF) of the diffusing particle to be at **r** at time *t* and *D* is the diffusion coefficient.

Diffusion processes have two major characteristics: (i) the mean square deviation grows linearly with time,

$$
\langle |\Delta \mathbf{r}(t)|^2 \rangle = Dt,\tag{2}
$$

where Δ**r**(*t*) = **r**(*t*) − **r**(*t*); (ii) the PDF is Gaussian. Indeed, the solution to Equation (1) assuming the particle is initially at the origin, *p*(**r**, 0) = *δ*(**r**), is

$$p(\mathbf{r},t) = \frac{1}{(4\pi Dt)^{3/2}} e^{-r^2/4Dt}.\tag{3}$$

**Citation:** Masoliver, J. Telegraphic Transport Processes and Their Fractional Generalization: A Review and Some Extensions. *Entropy* **2021**, *23*, 364. https://doi.org/10.3390/ e23030364

Academic Editor: Bruce J. West

Received: 22 January 2021 Accepted: 12 March 2021 Published: 18 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Despite its simplicity and the wide range of applications in countless areas of physical sciences, the diffusion approximation has, however, several limitations. We will here point out two of them. First, diffusion processes present an infinite velocity of propagation. This can be easily seen from Equation (3) where it is shown that the solution *p*(**r**, *t*) > 0 never vanishes for any finite time and distances *r* = |**r**|. There is, nonetheless, a nonzero probability (albeit small) of finding the diffusive particle, at any instant of time, arbitrarily far away from the initial position. In consequence diffusion models allow for arbitrary velocities, even larger than the speed of light in vacuum. This is contrary to the principles of relativity and certainly unsatisfactory from a conceptual point of view [3].

On the other hand, diffusion processes are also unable to account for ballistic motion and are rather useless in describing of early-time effects when ballistic motion may be important as well as near interfaces and in thin samples. This is certainly the case when modeling transport phenomena for which thermalization due to random collisions takes a finite time and the flux of ballistic particles might not be negligible, all of it resulting in anisotropic scattering along the forward direction. A particular but significant case is that of the photon migration through turbid media in which diffusion models are unable to account for ballistic photons and are inaccurate near boundaries [2,4–6]. A similar situation may arise in transport across membranes [7].

Telegraphic processes are a generalized form of diffusion processes in these two aspects. Thus (i) they allow for a finite velocity of propagation and (ii) they are nearly deterministic (i.e., ballistic) at short times while they are diffusive at long times when random collisions have been able to thermalize the motion. As a first approximation, the transport equation for telegraphic processes is the telegrapher's equation (TE):

$$\frac{\partial^2 p}{\partial t^2} + \frac{1}{\tau} \frac{\partial p}{\partial t} = v^2 \nabla^2 p\_\prime \tag{4}$$

where *τ* > 0 is a characteristic time, and *v* > 0 is a characteristic speed. From a mathematical point of view, this is a hyperbolic equation which, as *τ* → ∞ with *v* fixed, becomes the wave equation,

$$\frac{\partial^2 p}{\partial t^2} = v^2 \nabla^2 p\_\prime \tag{5}$$

while as *<sup>τ</sup>* <sup>→</sup> 0 and *<sup>v</sup>* <sup>→</sup> <sup>∞</sup> such that *<sup>v</sup>*2*<sup>τ</sup>* <sup>→</sup> *<sup>D</sup>* is finite it reduces to the diffusion Equation (1). The telegrapher's equation thus possesses wave and diffusion features describing "diffusion with finite propagation velocity" but also "wave motion with damping" [8]. Moreover, the limits to diffusion and wave equations are also achieved as time progresses. We can thus easily see by scaling time with *τ* that initially as *t* → 0 (i.e., *t τ*), TE approaches to the wave equation while asymptotically as *t* → ∞ (*t τ*) moves toward the diffusion equation. As a consequence [2,8]

$$
\langle |\Delta \mathbf{r}(t)|^2 \rangle \sim t^2, \ (t \to 0) \qquad \text{and} \qquad \langle |\Delta \mathbf{r}(t)|^2 \rangle \sim t, \ (t \to \infty),
$$

which heightens the duality of the TE and shows the transition from ballistic motion to diffusive motion as time progresses.

The TE appeared in the nineteenth century in the works of Kelvin and Heaviside related to the analysis of transmission of electromagnetic waves in telegraphic wires. In this context, the three dimensional telegrapher's equation can be derived by combining Maxwell's equations for homogeneous media [2,8]. TE can also be phenomenologically derived from thermodynamics by using Cattaneo's equation, a nonlocal generalization of Fick's law accounting for non instantaneous diffusions [9–11], and also from random walk theory where the one-dimensional TE is the master equation of the persistent random walk [12–15].

From a mesoscopic point of view (somewhere between the microscopic view of random walk models and the macroscopic approach of thermodynamics) telegraphic processes are closely related to Brownian motion. As was studied some years ago in Ref. [16], the telegrapher's equation, like the diffusion equation, can also be derived from the Chapman-

Kolmogorov equation, which is the master equation for Markovian processes [17]. It is worth noticing that such a derivation is obtained by retaining quadratic terms in the time expansion of the Chapman-Kolmogorov equation which sets a characteristic time scale and a characteristic velocity. The Markovian character of the process is assured for times greater than the characteristic time while a possible non-Markovian character for smaller times is still an unsettled question [16].

In the context of transport theory, the three-dimensional TE is the so-called P1 approximation to the full transport equation for which the basic assumption is that the change in the direction of motion due to a single scattering event is small [1,2,18,19]. In a more recent approach [20] a three-dimensional TE model is obtained by a modification of the continuity equation for the probability current. The model is, however, limited to a discrete number of transport directions, which restricts possible applications. Other approaches suppose phenomenological generalizations, where a three dimensional TE is postulated for uniform isotropic media by assuming the same form as the one-dimensional TE, but with numerical corrections in the coefficients which guarantee correct ballistic (*t* → 0) and diffusive (*t* → ∞) behaviors in three dimensions [4–6]. The more fundamental and less phenomenological way of describing telegraphic processes is, however, based on random walk models since they try to reproduce the microscopic mechanism of transport.

Random walk models for describing telegraphic processes are modifications of the ordinary random walk because the latter, for long times and large distances (i.e., the so-called "fluid limit" [21]) leads to the diffusion equation but not to the telegrapher's equation [2,8,22]. However, and contrary to one dimension where the TE is readily obtained from the persistent random walk on the line [2,12,14], in higher dimensions obtaining the TE from microscopic models encounters serious difficulties. The main reason lies in the difficulty of generalizing persistence in dimensions greater than one [23–29].

We have recently solved this problem by obtaining the three-dimensional TE [30] and the two-dimensional TE [31] from random walk models (as we had done previously for the one dimensional case [32]). These models consist of a continuous version of two and three dimensional random walks with a continuum of states [33]. I will here review and enlarge these works.

For more than two decades, the so-called "anomalous transport" and "anomalous diffusion" have been the object of intense research with countless applications in many areas of physics, chemistry and natural and socio-economic sciences. There is an immense literature on the subject with many complete reports. As a necessarily short sample we may cite from early reviews in [34–39] to more recent reports [40–42] among many others. It is also worthwhile mentioning a less technical but excellent introduction in [43]. The concept first appeared from the theory of random processes, specifically within continuous time random walks, a powerful technique developed by Montroll and Weiss more than 50 years ago [22,44,45] (see a recent and updated review in Ref. [46]) and it was first applied to diffusion of charge carriers in organic semiconductors by Scher and Montroll in the 1970's [47,48].

Anomalous transport arises in motion through extremely disordered systems such as random media and fractal structures [49] and its most distinctive characteristic is that the mean square deviation follows the asymptotic law [35,36,50]

$$
\langle |\Delta \mathbf{r}(t)|^2 \rangle \sim t^{\alpha},\tag{6}
$$

(*t* → ∞), where *α* > 0 is any positive real number. When 0 < *α* < 1 the transport regime is subdiffusive, *α* = 1 corresponds to diffusive transport while *α* > 1 describes superdiffusion. Within the diffusive approximation and in the force-free case, the anomalous transport process is described by a fractional diffusion equation,

$$\frac{\partial^a p}{\partial t^a} = D \nabla^{2\gamma} p \tag{7}$$

(0 <sup>&</sup>lt; *<sup>α</sup>* <sup>≤</sup> 1, 0 <sup>&</sup>lt; *<sup>γ</sup>* <sup>≤</sup> 1), *<sup>∂</sup>α*/*∂<sup>t</sup> <sup>α</sup>* is the fractional Caputo derivative and *<sup>∇</sup>*2*<sup>γ</sup>* is the Riesz– Feller fractional Laplacian (see Section 5.2 for a definition of these operators). In the case of particles diffusing under the influence of an external field of force, Equation (7) is replaced by a fractional Fokker–Plank equation [36,38].

The mathematical properties of the solutions to the fractional diffusion Equation (7) have been thoroughly studied and very clearly exposed by Mainardi, Gorenflo and collaborators [51–53]. One of these properties is the scaling relation [21,35,53]

$$p(\mathbf{r}, t) = t^{-a/2\gamma} f\left(\frac{\mathbf{r}}{t^{a/2\gamma}}\right),\tag{8}$$

resulting in the mean square displacement [21]:

$$
\langle r^2(t) \rangle = M t^{a/\gamma}.\tag{9}
$$

When *γ* = 1 but *α* is not an integer we have the 'time-fractional diffusion"; the case 0 < *α* < 1 corresponding to subdiffusion while *α* > 1 to superdiffusion. When *α* = 1 but *γ* is not integer, the fractional diffusion Equation (7) describes a Levy process, this case is always associated with superdiffusion and it is termed "space-fractional diffusion" [21,38].

As mentioned above, the original motivation for the fractional transport was devised from the continuous time random walk formalism [47,48]. As a result, the derivations of the fractional diffusion equation are mostly based on this formalism, although alternative approaches exist based on master equations or (fractional) Chapman–Kolmogorov expansions [36].

The fractional Equation (7) ignores changes in the dynamics of the diffusing particle as time increases. These changes account for ballistic motion and anisotropic scattering (among others) that are relevant in a number of experimental settings [54]. The TE explains some of these characteristics of transport which imply the transition form ballistic to diffusive motion asymptotically in time.

In a recent work [32] we have presented a derivation of the fractional telegrapher's equation (FTE) in one dimension based on a fractional generalization of the persistent random walk on the line. The continuous multistate model mentioned above allows for a fractional treatment which finally leads to fractional TEs in higher dimensions [30,31].

In this paper we review all these questions and present some new results. The paper is organized as follows. In Section 2 we present the continuous multistate random walk in three dimensions, which in homogeneous and isotropic cases, allows us to derive the three-dimensional telegrapher's equation (Section 3). In Section 4 we adapt the model to two dimensions and derive the corresponding telegrapher's equation. The rest of the paper is devoted to the fractional generalization of these matters. In Section 5 we set the general model for fractional telegraphic transport and obtain the space-time fractional telegrapher's equation in two and three dimensions along with the exact expression for the characteristic function. In Section 6 we study in detail the time-fractional telegrapher's equation, analyze its solution for any dimensionality, and obtain asymptotic results for the probability distribution and the moments of the distance travelled. Concluding remarks are presented in Section 7.

#### **2. Continuous Multistate Random Walk in Three Dimensions**

We review the microscopic model introduced in Ref. [30] for the transport of particles in continuous media. The model is based on a generalization of multistate random walks and assumes a continuum in the number of states [33]. In the traditional formulation of multistate random walks (see [15] for a recent review on multistate walks on the line) the walker can be in a discrete (but not necessarily finite) number of internal states. The transition between states is determined by a transition matrix with random Markovian elements. In order to model particle transport we will generalize the multistate random walk in two key features: (i) we assume that the walker (i.e., the particle) moves in three dimensions, and (ii) the model has internal states defined on a continuous set of values.

#### *2.1. General Setting*

Suppose a particle moving in the three dimensional space along a straight line determined by the bidimensional quantity **Ω** = (*θ*, *ϕ*), where *θ* is the polar angle and *ϕ* is the azimuthal angle. The particular direction along which the particle is moving constitutes the "internal state" and since all possible direction form a continuous and denumerable set, the motion of the particle is thus described by a *continuous multistate random walk*.

At random instants of time the particle shifts direction and, hence, the duration of the motion along a given direction **Ω** (which is called a *sojourn*) is a random variable determined by a PDF denoted by *ψ*(*t*|**Ω**). The cumulative distribution

$$\Psi(t|\Omega) = \int\_{t}^{\infty} \psi(t'|\Omega)dt',\tag{10}$$

gives the probability that the duration of a given sojourn is greater than *t*.

Let us denote by *h*(**r**, *t*|**Ω**) the joint PDF for the displacement in a single sojourn along direction **Ω** to be equal to **r** and the sojourn duration to equal *t*. Let us also define *H*(**r**, *t*|**Ω**) as the probability density for the displacement to be **r** when the duration is greater than *t*. Note that the duration PDF *ψ*(*t*|**Ω**) is the time marginal density of *h*(**r**, *t*|**Ω**),

$$\int\_{\mathbb{R}^3} h(\mathbf{r}, t | \Omega) d^3 \mathbf{r} = \psi(t | \Omega),$$

while Ψ(*t*|**Ω**) is the marginal probability arising from *H*(**r**, *t*|**Ω**),

$$\int\_{\mathbb{R}^3} H(\mathbf{r}, t | \Omega) d^3 \mathbf{r} = \Psi(t | \Omega). \tag{12}$$

At the end of a given sojourn, the particle moving along direction **Ω** switches to direction **Ω**. We denote by *β*(**Ω**|**Ω** ) the PDF of this transition **Ω** → **Ω** (note that *β*(**Ω**|**Ω** ) is the "scattering kernel" of the transport problem). In other words, the probability that a single scattering changes the direction of the particle from **Ω** to a direction falling somewhere inside the angular region (**Ω**, **Ω** + *d***Ω**) is given by

$$\text{Prob}\{\boldsymbol{\Omega}^{\prime} \to (\boldsymbol{\Omega}, \boldsymbol{\Omega} + d\boldsymbol{\Omega})\} = \beta(\boldsymbol{\Omega}|\boldsymbol{\Omega}^{\prime})d^{2}\boldsymbol{\Omega},\tag{13}$$

where *d***Ω** = (*dθ*, *dϕ*) and

$$d^2\Omega = \sin\theta d\theta d\varphi \tag{14}$$

is the surface element on the sphere of unit radius.

Note that in this model there is a nonvanishing probability of traveling along the same direction and in those cases where this probability is greater than 1/2 the particle tends to persist in moving along the same direction. In this way the model can be seen as a higher dimensional generalization of the persistent random walk on the line [22].

Let us denote by *p*(**r**, **Ω**, *t*) the joint PDF for the walker to be at **r** at time *t* while moving in direction **Ω**. Our final objective is, however, to know the density *p*(**r**, *t*) for the random walker to be at **r** at time *t* regardless the direction. The latter is the marginal density of the former,

$$p(\mathbf{r},t) = \int p(\mathbf{r},\Omega,t)d^2\Omega. \tag{15}$$

In order to evaluate *p*(**r**, **Ω**, *t*) we define the auxiliary density *ρ*(**r**, **Ω**, *t*) as

*<sup>ρ</sup>*(**r**, **<sup>Ω</sup>**, *<sup>t</sup>*)*d*3**r***dt* <sup>=</sup> Prob{a sojourn in direction **<sup>Ω</sup>** ends in the region (**r**,**r** + *d***r**) at (*t*, *t* + *dt*)}.

This joint density describes the state of the process at the scattering points where the direction of the particle changes. Thus, if a scattering event happens at time t, it must either be the first one (assuming the initial one occurred at *t* = 0) or else an earlier change of direction **Ω** → **Ω** [governed by *β*(**Ω**|**Ω** )] happened at any earlier time *t* < *t* with the random walker at some position **r** . It is not difficult to convince oneself that this renewal argument leads to the following integral equation for the auxiliary density:

$$\begin{split} \rho(\mathbf{r}, \mathbf{\Omega}, t) &= \, \_0\beta(\mathbf{\Omega}) h(\mathbf{r}, t | \mathbf{\Omega}) \\ &\quad + \int \beta(\mathbf{\Omega} | \mathbf{\Omega}') d^2 \mathbf{\Omega}' \int\_0^t dt' \int\_{\mathbb{R}^3} h(\mathbf{r} - \mathbf{r}', t - t' | \mathbf{\Omega}) \rho(\mathbf{r}', \mathbf{\Omega}', t') d^3 \mathbf{r}', \end{split} \tag{16}$$

where *β*(**Ω**) is the probability that the process starts moving in direction **Ω**.

In terms of the auxiliary density *ρ*(**r**, **Ω**, *t*), the PDF *p*(**r**, **Ω**, *t*) for the walker to be at **r** at time *t* while moving in direction **Ω** is

$$\begin{split} p(\mathbf{r},\boldsymbol{\Omega},t) &= \; \beta(\boldsymbol{\Omega})H(\mathbf{r},\boldsymbol{\Omega},t) \\ &\quad + \int \beta(\boldsymbol{\Omega}|\boldsymbol{\Omega}')d^2\boldsymbol{\Omega}' \int\_0^t dt' \int\_{\mathbb{R}^3} H(\mathbf{r}-\mathbf{r}',t-t'|\boldsymbol{\Omega})\rho(\mathbf{r}',\boldsymbol{\Omega}',t')d^3\mathbf{r}'. \end{split} \tag{17}$$

The reasoning behind this equation is similar to the one given for obtaining Equation (16). Indeed, the displacement of the walker is either within the first sojourn, this given by *βH*, or else an earlier change of direction occurred at time *t* < *t* while the walker was at position **r** and the time interval to the next scattering exceeded *t* − *t* .

We thus see that in the most general case the solution to the problem, that is to say, obtaining the PDF *p*(**r**, *t*) (cf. Equation (15)) is given by first solving the integral Equation (16) for the auxiliary function *ρ* and afterwards substituting this solution into Equation (17) and the result into Equation (15). In the most general case, for arbitrary forms of *β*(**Ω**|**Ω** ), *h*(**r**, *t*, |**Ω**) and *H*(**r**, *t*|**Ω**), obtaining analytical expressions is out of reach, and one has to resort to numerical work.

#### *2.2. Independent Scattering*

In order to proceed further we assume that after each scattering the direction is randomized independently of the previous direction of the particle leading to the scattering kernel:

$$
\beta(\Omega|\Omega') = \beta(\Omega). \tag{18}
$$

The scattering process is thus an independent random process in the change of direction. In the context of fluctuations in laser fields this model corresponds to the so-called Burshtein model [55,56].

When the scattering kernel has the form given by Equation (18), Equations (16) and (17) reduce to

$$\rho(\mathbf{r}, \Omega, t) = \beta(\Omega) \left[ h(\mathbf{r}, t | \Omega) + \int\_0^t dt' \int\_{\mathbb{R}^3} h(\mathbf{r} - \mathbf{r}', t - t' | \Omega) d^3 \mathbf{r}' \int \rho(\mathbf{r}', t' | \Omega') d^2 \Omega' \right], \tag{19}$$

and

$$p(\mathbf{r}, \mathbf{\Omega}, t) = \beta(\mathbf{\Omega}) \left[ H(\mathbf{r}, t | \mathbf{\Omega}) + \int\_0^t dt' \int\_{\mathbb{R}^3} H(\mathbf{r} - \mathbf{r}', t - t' | \mathbf{\Omega}) d^3 \mathbf{r}' \int \rho(\mathbf{r}', t' | \mathbf{\Omega}') d^2 \mathbf{\Omega}' \right]. \tag{20}$$

Integrating Equations (19) and (20) with respect to all possible directions **Ω**, defining the direction-free densities (cf. Equation (15))

$$p(\mathbf{r},t) = \int p(\mathbf{r},\Omega,t)d^2\Omega,\qquad \rho(\mathbf{r},t) = \int \rho(\mathbf{r},\Omega,t)d^2\Omega,\tag{21}$$

and the averages

$$h(\mathbf{r},t) = \int \beta(\mathbf{\Omega})h(\mathbf{r},\mathbf{\Omega},t)d^2\mathbf{\Omega}, \qquad H(\mathbf{r},t) = \int \beta(\mathbf{\Omega})H(\mathbf{r},\mathbf{\Omega},t)d^2\mathbf{\Omega}, \tag{22}$$

we get a simpler integral equation for *ρ*(**r**, *t*):

$$
\rho(\mathbf{r},t) = h(\mathbf{r},t) + \int\_0^t dt' \int\_{\mathbb{R}^3} h(\mathbf{r}-\mathbf{r}',t-t')\rho(\mathbf{r}',t')d^2\mathbf{r}',\tag{23}
$$

and the PDF *p*(**r**, *t*) will be given by

$$p(\mathbf{r},t) = H(\mathbf{r},t) + \int\_0^t dt' \int\_{\mathbb{R}^3} H(\mathbf{r}-\mathbf{r}',t-t')\rho(\mathbf{r}',t')d^2\mathbf{r}'.\tag{24}$$

The problem can now be solved in Fourier–Laplace space. Thus, defining the joint Fourier and Laplace transform,

$$\hat{\rho}(\omega, s) = \int\_0^\infty e^{-st} dt \int\_{\mathbb{R}^3} e^{i\omega \cdot \mathbf{r}} \rho(\mathbf{r}, t) d^3 \mathbf{r} \,\omega$$

the integral Equation (23) turns into a simple algebraic equation for ˆ *ρ*˜ whose solution can be readily obtained and reads

$$\hat{\phi}(\omega, s) = \frac{\hat{\bar{h}}(\omega, s)}{1 - \hat{\bar{h}}(\omega, s)}. \tag{25}$$

On the other hand, by transforming Equation (24) we get

$$
\hat{p}(\omega, s) = \hat{H}(\omega, s) \left[ 1 + \hat{p}(\omega, s) \right],
$$

which after substituting for (25) yields

$$
\hat{p}(\omega, s) = \frac{\hat{H}(\omega, s)}{1 - \hat{h}(\omega, s)},
\tag{26}
$$

The form of Equation (26) can be considered a generalization of the Montroll–Weiss equation [44,45] for higher dimensional continuous time random walks with independent directions.

#### *2.3. The Isotropic and Uniform Random Walk*

Equation (26) furnishes the formal solution to the transport problem for independent scattering in Fourier–Laplace space and it is valid for any form of the conditional densities *h*(**r**, *t*|**Ω**) and *H*(**r**, *t*|**Ω**) which describe the displacement inside a given sojourn in direction **Ω**. In other words, Equation (26) applies to any kind of motion inside a given sojourn and to any distribution of sojourn times. In order to proceed further and solve the problem in a specific way by obtaining the explicit expression for *p*(**r**, *t*) in real time and space, we first assume that the particle moves in an isotropic medium so that the pausing time density and its cumulative probability are independent of the direction,

$$
\psi(t|\Omega) = \psi(t), \qquad \Psi(t|\Omega) = \Psi(t).
$$

We next assume that inside any sojourn the motion is uniform with a constant speed *c* so that after each sojourn the velocity of the particle takes a different direction but with the same modulus and, hence, the kinetic energy is conserved. Despite its simplicity, the model describes the motion of non-interacting particles—such as, for instance, photons undergoing elastic dispersion with fixed centers randomly distributed. The assumption of uniform motion leads to conclude that the conditional densities for the displacement inside a given sojourn have the form

$$h(\mathbf{r}, t | \Omega) = \delta(\mathbf{r} - ct\mathbf{u})\psi(t), \qquad H(\mathbf{r}, t | \Omega) = \delta(\mathbf{r} - ct\mathbf{u})\Psi(t), \tag{27}$$

where **u** is the unit vector pointing in direction **Ω** = (*θ*, *ϕ*), that is

$$\mathbf{u} = (\sin \theta \cos \varphi, \sin \theta \sin \varphi, \cos \theta). \tag{28}$$

The Fourier transforms of these densities read

$$
\tilde{h}(\omega, t | \Omega) = \psi(t) e^{i(\omega \cdot \mathbf{u})ct}, \qquad \tilde{H}(\omega, t | \Omega) = \Psi(t) e^{i(\omega \cdot \mathbf{u})ct}. \tag{29}
$$

In addition to the assumption that after each collision the new direction of the particle is randomized independently of the previous direction (cf. Equation (18)), we also suppose *complete isotropy* in the sense that all outgoing directions are equally likely. For the three dimensional motion this implies

$$
\beta(\Omega|\Omega') = \beta(\Omega) = \frac{1}{4\pi}.\tag{30}
$$

The characteristic function of the displacement inside any sojourn independent of the direction is given by the average

$$
\tilde{h}(\omega, t) = \int \tilde{h}(\omega, t|\Omega) \beta(\Omega) d^2 \Omega.
$$

In the isotropic case and for uniform motion (cf. Equations (14), (29) and (30)) we have

$$
\tilde{h}(\omega, t) = \frac{1}{4\pi} \psi(t) \int e^{i(\omega \cdot \mathbf{u})ct} d^2 \Omega.
$$

That is,

$$h(\omega, t) = \frac{1}{2} \psi(t) \int\_0^\pi e^{i|\omega|ct\cos\theta} \sin\theta d\theta\_\prime$$

which after integrating yields

$$
\tilde{h}(\omega, t) = \psi(t) \frac{\sin|\omega|ct}{|\omega|ct}. \tag{31}
$$

Analogously

$$
\tilde{H}(\omega, t) = \Psi(t) \frac{\sin|\omega|ct}{|\omega|ct}. \tag{32}
$$

In order to obtain the Fourier–Laplace transform of the PDF of the particle to be at position **r** at time *t* by means of Equation (26), we have to specify the form of the pausing time density *ψ*(*t*). One of the most natural and universal assumptions consists in taking the random instants of time at which the scattering process occurs to be a Poissonian set of events which implies that time intervals inside any sojourn are exponentially distributed [57]. Thus

$$
\psi(t) = \lambda \varepsilon^{-\lambda t} \quad \Rightarrow \quad \Psi(t) = \varepsilon^{-\lambda t},
$$

where *λ*−<sup>1</sup> is the average time interval between two consecutive scattering events (i.e., the mean sojourn duration). We have

$$
\tilde{h}(\omega, t) = \lambda e^{-\lambda t} \frac{\sin|\omega|ct}{|\omega|ct}, \qquad \tilde{H}(\omega, t) = \frac{1}{\lambda} \tilde{h}(\omega, t).
$$

We next take the Laplace transform of these expressions. Recalling that [58]

$$
\mathcal{L}\left\{\frac{\sin|\omega|ct}{t}\right\} = \arctan\left(\frac{|\omega|c}{s}\right).
$$

and the property L{*e*−*λ<sup>t</sup> <sup>f</sup>*(*t*)} <sup>=</sup> <sup>ˆ</sup> *f*(*λ* + *s*), we get

$$\hat{\vec{h}}(\omega, s) = \frac{\lambda}{|\omega|c} \arctan\left(\frac{|\omega|c}{\lambda + s}\right) \tag{33}$$

and

$$\hat{H}(\omega, s) = \frac{1}{|\omega|c} \arctan\left(\frac{|\omega|c}{\lambda + s}\right). \tag{34}$$

Substituting Equations (33) and (34) into Equation (26) we finally get

$$\hat{p}(\omega, s) = \frac{\arctan[|\omega|c/(\lambda + s)]}{|\omega|c - \lambda \arctan[|\omega|c/(\lambda + s)]},\tag{35}$$

which constitutes the exact solution of the homogeneous and isotropic model and the starting point for deriving the three dimensional telegrapher's equation as we will see next. It is worth mentioning that a similar expression was obtained some years ago by Claes and Van den Broeck [59] in the context of modeling the end-to-end distance of polymer chains, although they used a different approach.

#### **3. Telegrapher's Equation**

The homogeneous and isotropic random walk reviewed is a microscopic model of particle transport. We can construct the TE from this model by coarse graining the dynamics to the fluid limit approximation.

#### *3.1. Fluid Limit Approximation*

The fluid limit approximation consists in rewriting the model for large times and distances [21,51]. Because of Tauberian theorems [60,61], large times and distances, *t* → ∞ and |**r**| → ∞, correspond to small Laplace and Fourier variables, *s* → 0 and |*ω*| → 0. Note that to achieve such a limit, i.e., to get an approximate expression for the transformed PDF ˆ *p*˜(*ω*,*s*) for small values of *s* and |*ω*|, we have two different and equivalent ways of proceeding. We can thus proceed either through the direct expansion of ˆ *p*˜ given by Equation (35) or else through the expansions of ˆ˜ *h* and ˆ *<sup>H</sup>*˜ (cf. Equations (33) and (34)) as *<sup>s</sup>* <sup>→</sup> 0 and <sup>|</sup>*ω*| → 0 and their subsequent substitution in Equation (26). Obviously both procedures yield the same result but, albeit longer, we follow the second approach since it turns to be instrumental for the fractional generalization of the random walk.

We thus start off with Equation (33) and first perform the long-distance limit (|*ω*| → 0) and postpone for a moment the long-time limit (*s* → 0). As |*ω*| → 0 we have the following expansion

$$\begin{split} \arctan\left(\frac{|\omega|c}{\lambda+s}\right) &= \frac{|\omega|c}{\lambda+s} - \frac{1}{3} \left(\frac{|\omega|c}{\lambda+s}\right)^3 + O(|\omega|^5) \\ &= \frac{|\omega|c}{(\lambda+s)^3} \left[ (\lambda+s)^2 - \frac{1}{3} (|\omega|c)^2 + O(|\omega|^4) \right]. \end{split} \tag{36}$$

From Equations (33), (34) and (36) we write

$$\hat{h}(\omega, s) = \frac{\lambda}{(\lambda + s)^3} \left[ (\lambda + s)^2 - \frac{1}{3} |\omega|^2 c^2 + O(|\omega|^4) \right],\tag{37}$$

and

$$\hat{H}(\omega, s) = \frac{1}{(\lambda + s)^3} \left[ (\lambda + s)^2 - \frac{1}{3} |\omega|^2 c^2 + O(|\omega|^4) \right]. \tag{38}$$

Hence

$$\begin{aligned} 1 - \hat{h}(\omega, s) &= 1 - \frac{\lambda}{(\lambda + s)^3} \left[ (\lambda + s)^2 - \frac{1}{3} |\omega|^2 c^2 + O(|\omega|^4) \right] \\ &= \frac{1}{(\lambda + s)^3} \left[ (\lambda + s)^3 - \lambda (\lambda + s)^2 + \frac{\lambda}{3} |\omega|^2 c^2 + O(|\omega|^4) \right] \\ &= \frac{1}{(\lambda + s)^3} \left[ s(\lambda + s)^2 + \frac{\lambda}{3} |\omega|^2 c^2 + O(|\omega|^4) \right], \end{aligned}$$

and as *s* → 0, we may write

$$1 - \hat{h}(\omega, s) = \frac{1}{\left(\lambda + s\right)^{3}} \left[ s(\lambda^{2} + 2\lambda s) + \frac{\lambda}{3} |\omega|^{2} \mathbf{c}^{2} + O(s^{3}, |\omega|^{4}) \right]. \tag{39}$$

Substituting Equations (38) and (39) into Equation (26) yields

$$\hat{p}(\omega, \mathbf{s}) = \frac{(\lambda + \mathbf{s})^2 - (\mathbf{c}|\omega|)^2/3 + O(|\omega|^4)}{\mathbf{s}(\lambda^2 + 2\lambda\mathbf{s}) + \lambda(\mathbf{c}|\omega|)^2/3 + O(\mathbf{s}^3, |\omega|^4)}. \tag{40}$$

In order to ensure the stability of Equation (40) under Fourier-Laplace inversion [and, hence, for the existence of a valid approximation for *p*(**r**, *t*)], it is necessary that the powers of *s* and |*ω*| which appear in the numerator of Equation (40) be less than the corresponding powers of the denominator [60]. We, therefore, write

$$
\hat{\psi}(\omega, s) = \frac{\lambda^2 + 2\lambda s + O(s^2, |\omega|^2)}{s(\lambda^2 + 2\lambda s) + \lambda (c|\omega|)^2/3 + O(s^3, |\omega|^4)}.
$$

that is,

$$\hat{p}(\omega, s) = \frac{\lambda/2 + s + O(s^2, |\omega|^2)}{s(\lambda/2 + s) + c^2 |\omega|^2/6 + O(s^3, |\omega|^4)}.$$

and take as a fluid limit approximation of the PDF the expression

$$\hat{p}(\omega, s) = \frac{s + \lambda/2}{s(s + \lambda/2) + c^2 |\omega|^2 / 6}. \tag{41}$$

#### *3.2. The Three-Dimensional Telegrapher's Equation*

Equation (41) is the starting point for deriving the two-dimensional TE. We next obtain the associated partial differential equation for *p*(**r**, *t*) whose solution, in Fourier–Laplace space and with appropriate initial conditions, is precisely given by Equation (41). To this end we multiply both sides of Equation (41) by the denominator and rewrite the result as

$$s^2 \hat{p}(\omega, \mathbf{s}) - \mathbf{s} + \frac{\lambda}{2} \left[ s \hat{p}(\omega, \mathbf{s}) - 1 \right] = -\frac{c^2}{6} |\omega|^2 \hat{p}(\omega, \mathbf{s}).$$

We now proceed to Fourier inversion. Taking into account

$$\mathcal{F}^{-1}\{ |\omega|^2 \hat{p}(\omega, s) \} = -\nabla^2 \hat{p}(\mathbf{r}, s), \qquad \mathcal{F}^{-1}\{ 1 \} = \delta(\mathbf{r}),$$

the Fourier inversion yields

$$
\delta s^2 \mathfrak{p}(\mathbf{r}, s) - s\delta(\mathbf{r}) + \frac{\lambda}{2} \left[ s\mathfrak{p}(\mathbf{r}, s) - \delta(\mathbf{r}) \right] = \frac{c^2}{6} \nabla^2 \mathfrak{p}(\mathbf{r}, s).
$$

Let us next address Laplace inversion. With the standard initial conditions [62]

$$p(\mathbf{r},0) = \delta(\mathbf{r}), \qquad \frac{\partial p(\mathbf{r},t)}{\partial t}\bigg|\_{t=0} = 0,\tag{42}$$

and the Laplace inversion formulas [58]

$$
\mathcal{L}^{-1}\left\{\mathbf{s}^2 \not{p}(\mathbf{r}, \mathbf{s}) - \mathbf{s} \delta(\mathbf{r})\right\} = \frac{\partial^2 p(\mathbf{r}, t)}{\partial t^2},
$$

$$
\mathcal{L}^{-1}\left\{\mathbf{s} \not{p}(\mathbf{r}, \mathbf{s}) - \delta(\mathbf{r})\right\} = \frac{\partial p(\mathbf{r}, t)}{\partial t},
$$

we find that *p*(**r**, *t*) satisfies the three-dimensional TE

$$\frac{\partial^2 p}{\partial t^2} + \frac{1}{\tau} \frac{\partial p}{\partial t} = v^2 \nabla^2 p\_\prime \tag{43}$$

with

$$
\pi = 1/(2\lambda) \quad \text{and} \quad v = c/\sqrt{6}, \tag{44}
$$

as characteristic time and velocity respectively.

TE (43) enjoys both wave and diffusion characteristics. This duality becomes even more apparent as time progresses. Thus, as *t* → 0 Equation (43) reduces to the wave equation while as *t* → ∞ it goes to the diffusion equation. Indeed, scaling time with *τ* one can easily see that [15,30]

$$\frac{\partial^2 p}{\partial t^2} \simeq v^2 \nabla^2 p \quad (t \to 0), \qquad \qquad \frac{\partial p}{\partial t} \simeq D \nabla^2 p \quad (t \to \infty).$$

(*D* = *v*2*τ*) which leads to

$$
\langle |\mathbf{r}(t)|^2 \rangle \sim t^2 \quad (t \to 0), \qquad \langle |\mathbf{r}(t)|^2 \rangle \sim t \quad (t \to \infty),
$$

showing the transition from ballistic motion to diffusive motion as time increases.

#### **4. The Two Dimensional Case**

Up to this point we have developed the telegraphic approximation to transport in three dimensions. We will now briefly report on how to treat the problem in lower dimensions.

In one dimension the standard derivation of the TE is based on the persistent random walk on the line [14]. In this model there is only one possible direction and the walker has two possible states since it can move either to the left or to the right with equal probability which is the isotropic case for the one dimensional motion. We do not present the details for the one-dimensional case here, but instead refer the interested reader to Ref. [15] for a recent and rather complete report. Note that the TE obtained is

$$\frac{\partial^2 p}{\partial t^2} + \frac{1}{\pi} \frac{\partial p}{\partial t} = v^2 \frac{\partial^2 p}{\partial x^2},\tag{45}$$

where in this case *v* = *c* coincides with the velocity of the moving particle and, as in three dimensions, *τ* = (2*λ*)−<sup>1</sup> (recall that *λ*−<sup>1</sup> is the mean sojourn time when switching times are Poissonian).

#### *4.1. General Model*

The two-dimensional case has been recently developed in Ref. [31]. This microscopic model for transport in planar media has many similarities (but some particular differences) with the three dimensional model presented above. Let us note that now the direction of the particle is not given by the solid angle **Ω** = (*θ*, *ϕ*) but by the planar angle *ϕ*. Therefore, the equations for the continuous multistate random walk in two dimensions will be the same as those in three dimensions (cf. Section 2) with the replacements

$$\Omega \longrightarrow \varphi, \qquad \int d^2 \mathbf{1} \Omega \longrightarrow \int\_0^{2\pi} d\varphi, \qquad \int\_{\mathbb{R}^3} d^3 \mathbf{r} \longrightarrow \int\_{\mathbb{R}^2} d^2 \mathbf{r}. \tag{46}$$

Thus, in the most general case the bidimensional model will be described by Equations (16) and (17) with these replacements in which the change of direction is governed by the transition density

*β*(*ϕ*|*ϕ* )*dϕ* = Prob) *ϕ* → *ϕ* + *dϕ* \* ,

with similar definitions as those of the three dimensional walk for the densities *ψ*(*t*|*ϕ*), Ψ(*t*|*ϕ*), *h*(**r**, *t*|*ϕ*) and *H*(**r**, *t*|*ϕ*) (cf. Section 2.1).

For the case of independent scattering (Section 2.2)

$$
\beta(\varrho|\varrho') = \beta(\varrho).
$$

As in three dimensions, we can now also define direction-free densities by means of Equations (21) and (22) through replacements (46). Finally, the Fourier–Laplace transform of the PDF,

$$\hat{p}(\omega, s) = \int\_0^\infty e^{-st} dt \int\_{\mathbb{R}^2} e^{i\omega \cdot \mathbf{r}} p(\mathbf{r}, t) d^2 \mathbf{r}\_{\omega}$$

is explicitly given by the generalization of Montroll–Weiss Equation (26),

$$
\hat{\psi}(\omega, s) = \frac{\hat{H}(\omega, s)}{1 - \hat{\bar{h}}(\omega, s)},
\tag{47}
$$

where ˆ˜ *h*(*ω*,*s*) is given by the average over all possible directions *ϕ* (cf. Equation (22))

$$
\hat{h}(\omega, s) = \int\_0^{2\pi} \beta(\varphi) \hat{h}(\omega, s | \varphi) d\varphi,
$$

and a similar expression for ˆ *H*˜ (*ω*,*s*).

#### *4.2. The Isotropic and Uniform Case*

In an isotropic medium (cf. Section 2.3) the pausing time densities are independent of the direction taken by the particle, *ψ*(*t*|*ϕ*) = *ψ*(*t*) and Ψ(*t*|*ϕ*) = Ψ(*t*), and for uniform motion we have [cf. Equations (27)–(29)]

$$h(\mathbf{r}, t | \boldsymbol{\varrho}) = \delta(\mathbf{r} - c t \mathbf{u}) \boldsymbol{\psi}(t), \qquad \qquad H(\mathbf{r}, t | \boldsymbol{\varrho}) = \delta(\mathbf{r} - c t \mathbf{u}) \boldsymbol{\Psi}(t), \tag{48}$$

and the Fourier transforms are

$$
\tilde{h}(\omega, t | \boldsymbol{\varphi}) = \psi(t) e^{i(\omega \cdot \mathbf{u}) \cdot \mathbf{t}}, \qquad \boldsymbol{\varPi}(\omega, t | \boldsymbol{\varphi}) = \boldsymbol{\varPsi}(t) e^{i(\omega \cdot \mathbf{u}) \cdot \mathbf{t}}, \tag{49}
$$

where **u** is the unit vector pointing in direction *ϕ*,

$$\mathfrak{u} = (\cos \varphi, \sin \varphi).$$

Assuming that all directions are equally likely (i.e., complete isotropy), we have

$$\beta(\varphi) = \frac{1}{2\pi}$$

and

$$\begin{aligned} \tilde{h}(\omega, t) &= \int\_0^{2\pi} b(\varrho) \tilde{h}(\omega, t | \varrho) d\varrho = \frac{\psi(t)}{2\pi} \int\_0^{2\pi} e^{ict|\omega| \cdot \mathbf{u}} d\varrho \\ &= \frac{\psi(t)}{2\pi} \int\_0^{2\pi} e^{ict|\omega| \cos \varrho} d\varrho = \frac{\psi(t)}{\pi} \int\_0^{\pi} \cos(ct | \omega| \cos \varrho) d\varrho. \end{aligned}$$

From the integral representation of the Bessel function *J*0(*z*) [63],

$$J\_0(ct|\omega|) = \frac{1}{\pi} \int\_0^\pi \cos(ct|\omega|\cos\varphi)d\varphi,\tag{50}$$

we get

$$
\tilde{h}(\omega, t) = \psi(t) l\_0(ct|\omega|),
\tag{51}
$$

and analogously

$$
\tilde{h}(\omega, t) = \Psi(t) l\_0(\varepsilon t | \omega|). \tag{52}
$$

For exponentially distributed sojourn intervals *ψ*(*t*) = *λe*−*λ<sup>t</sup>* and Ψ(*t*) = *e*−*λ<sup>t</sup>* , we write

$$
\tilde{h}(\omega, t) = \lambda \varepsilon^{-\lambda t} J\_0(ct|\omega|), \qquad \tilde{H}(\omega, t) = \frac{1}{\lambda} \tilde{h}(\omega, t).
$$

Using the Laplace transformation formula [58]

$$\mathcal{L}\left\{l\_0(ct|\omega|)\right\} = \frac{1}{\sqrt{s^2 + c^2 t^2 |\omega|^2}'}$$

and the standard property

$$
\mathcal{L}\{e^{-\lambda t}f(t)\} = \hat{f}(\lambda + s)\_{\lambda}
$$

we get

$$
\hat{h}(\omega, s) = \frac{\lambda}{\sqrt{(\lambda + s)^2 + c^2 |\omega|^2}}
$$

and ˆ˜ *<sup>h</sup>*(*ω*,*s*) = ˆ˜ *h*(*ω*,*s*)/*λ*. Finally, from the Montroll–Weiss Equation (47) we obtain the exact solution to the homogeneous and isotropic random walk on the plane,

$$\hat{\vec{p}}(\omega, s) = \frac{1}{\sqrt{(\lambda + s)^2 + c^2 |\omega|^2 - \lambda}}. \tag{53}$$

Notice the completely different form for the exact PDF of the planar model compared to that of the three dimensional case given by Equation (35).

#### *4.3. Fluid Limit Approximation and Telegrapher's Equation*

As we have done in the three dimensional transport, in order to get the two-dimensional equation we first make the fluid limit approximation of the planar model, that is, the longdistance and long-time limits of the exact PDF (53). By mimicking the steps done in Section 3.1 to obtain the fluid limit approximation in the three dimensional case, we can easily see that in two dimensions we obtain the same result but with the replacement

$$
\mathfrak{c}^2/6 \longrightarrow \mathfrak{c}^2/4.\tag{54}
$$

Thus, the approximation for the PDF reads (cf. Equation (41))

$$\hat{p}(\omega, s) = \frac{s + \lambda/2}{s(s + \lambda/2) + c^2 |\omega|^2 / 4} \tag{55}$$

and similar expressions for the quantities ˆ˜ *h* and ˆ *H*˜ . Assuming the initial conditions given in Equation (42), inverting Equation (55) and following the same procedure as in the three dimensional case we finally obtain the two-dimensional TE,

$$\frac{\partial^2 p}{\partial t^2} + \frac{1}{\tau} \frac{\partial p}{\partial t} = v^2 \nabla^2 p\_\prime \tag{56}$$

where as in three dimensions *τ* = (2*λ*)−<sup>1</sup> but now

$$v = c/2.$$

Before proceeding further and explaining the fractional generalizations of telegraphic transport, let us point the significant issue of boundary conditions which are instrumental in first-passage, escape and survival problems. The question is far from being simple specially for telegraphic processes and in one and higher dimensions it has, to my knowledge, not being settled yet. In the transport of particles the problem of survival is closely related to the question of when the particle is absorbed (and, hence, disappears) if it reaches a certain critical region of boundary *Sc*. For diffusion processes, absorption at *Sc* corresponds to *p*(**r**, *t*|**r**0) = 0 when **r** ∈ *Sc* (or *p*(**r**, *t*|**r**0) = 0 when **r**<sup>0</sup> ∈ *Sc*). That is, if the particle reaches *Sc* (or starts at *Sc*) disappears. For telegraphic processes (and in the context of particle transport, at least for one-dimensional processes) the situation is more complex because of the property of persistence inherent in the telegrapher's equation [14]. In this context persistence, which is analogous to the physical property of momentum, makes necessary, in deriving boundary conditions for absorption, to take into account the direction in which the particle is traveling. For if the particle starts at *Sc*, or at time *t* reaches *Sc*, will disappear (that is, it will be absorbed) only if the direction of the velocity is the appropriate one, otherwise the particle will escape.

For one dimensional processes we studied this situation some years ago [64,65] and refer the interested reader to these works for more information. In higher dimensions the situation may be even more involved. There are, however, problems which are not related to the escape out of some region (which implies absorption at the boundary of the region) but only on the first arrival to some region *Sc*. It can be shown that in these cases the boundary condition is *p*(**r**, *t*|**r**0) = 0 (if **r** ∈ *Sc* or **r**<sup>0</sup> ∈ *Sc*), regardless the direction of the velocity at this particular location (see [66] for a problem of this sort in one dimension).

#### **5. Fractional Transport**

Likewise the one dimensional case, the two and three dimensional telegraphic transport processes described above are ordinary (i.e., non-fractional) processes in the sense that for small time (*t τ*) they behave like an ordinary wave front while for long times (*t τ*) they act like an ordinary diffusion processes,

$$
\langle |\mathbf{r}(t)|^2 \rangle \sim t^2 \qquad (t \to 0), \qquad \qquad \langle |\mathbf{r}(t)|^2 \rangle \sim t \qquad (t \to \infty).
$$

However, in transport through highly disordered systems as, for instance, random media or fractal structures, ordinary diffusion becomes anomalous, that is

$$
\langle |\mathbf{r}(t)|^2 \rangle \sim t^a,
$$

where *α* is any positive real number. Two questions arise: (i) How does this circumstance affect telegraphic transport? and (ii) what is the fractional TE ruling such kind of processes? In one dimension we addressed these questions by setting a fractional version of the persistent random walk on the line [32]. In higher dimensions we followed this path and generalized the continuous and isotropic walks described in previous sections [30,31]. Let us review these findings.

#### *5.1. The Fractional Isotropic Walk*

We will first work on the three dimensional case and obtain a fractional version of the homogenous and isotropic random walk in the fluid limit approximation. To this end we generalize the expressions of ˆ˜ *h*(*ω*,*s*) and ˆ *H*˜ (*ω*,*s*)—given in Section 3.1 in the fluid limit approximation—to include fractional behavior.

Let us start with Equation (39) which when *s* → 0 yields

$$1 - \mathring{\tilde{h}}(\omega, s) = \frac{1}{(\lambda + s)^3} \left[ \lambda^2 s + 2\lambda s + \frac{\lambda}{3} |\omega|^2 c^2 + O(s^3, |\omega|^4) \right]\_{\lambda}$$

we further approximate the denominator by (*λ* + *s*)<sup>3</sup> = *λ*<sup>3</sup> + *O*(*s*), so that

$$1 - \mathring{\hat{h}}(\omega, s) \simeq \frac{s}{\lambda} + 2\left(\frac{\lambda}{2}\right)^2 + \frac{1}{3\lambda^2}|\omega|^2 c^2 \cdot \dotsb \cdot s$$

We thus take as a fluid limit approximation for the sojourn density ˆ˜ *h* the expression

$$\hat{h}(\omega, s) \simeq 1 - \frac{s}{\lambda} - 2\left(\frac{s}{\lambda}\right)^2 - \frac{1}{3\lambda^2} |\omega|^2 c^2 \cdot \cdots \tag{57}$$

We next obtain an appropriate fluid limit approximation for the sojourn probability ˆ *H*˜ . Thus starting from Equation (34) and following the same approximation scheme we get

$$\begin{aligned} \hat{H}(\omega,s) &= \frac{1}{(\lambda+s)^3} \Big[ (\lambda+s)^2 - \frac{1}{3} |\omega|^2 c^2 + O(|\omega|^4) \Big] \\ &= \frac{1}{(\lambda+s)^3} \Big[ \lambda^2 + 2\lambda s - \frac{1}{3} |\omega|^2 c^2 + O(s^2, |\omega|^4) \Big] \\ &\simeq \quad \frac{1}{\lambda^3} \Big( \lambda^2 + 2\lambda s \Big) \cdots \ . \end{aligned}$$

That is,

$$
\hat{H}(\omega, s) = \frac{1}{\lambda} \left( 1 + \frac{2s}{\lambda} \right) \cdots \tag{58}
$$

Let us incidentally note that substituting Equations (57) and (58) into Montroll–Weiss Equation (26) yields the fluid limit solution (41) for the PDF ˆ *p*˜(*ω*,*s*) which has been the starting point of the derivation of the TE.

We are now ready to construct a fractional generalization of the three-dimensional isotropic random walk. Thus, and looking at Equation (57), we propose the following expansion for the sojourn density in the fluid limit:

$$\hat{h}(\omega, s) = 1 - (Ts)^a - 2(Ts)^{2a} - \frac{1}{3}(L|\omega|)^{2\gamma} \cdot \cdots \tag{59}$$

(*s*, |*ω*| → 0), where 0 < *α* ≤ 1, 0 < *γ* ≤ 1 and *T* > 0 and *L* > 0 are arbitrary parameters, *T* defines a characteristic time and *L* a characteristic length.

In addition to the fractional approximation for ˆ˜ *h*(*ω*,*s*) we also assume a fractional expansion for the function ˆ *H*˜ (*ω*,*s*) consistent with Equation (59). To this end, we return to Section 2 and average Equations (11) and (12) over all directions **Ω**, with the result (in Laplace space)

$$
\int\_{\mathbb{R}^3} \hat{h}(\mathbf{r}, \mathbf{s}) d^3 \mathbf{r} = \hat{\Psi}(\mathbf{s}), \qquad \int\_{\mathbb{R}^3} \hat{H}(\mathbf{r}, \mathbf{s}) d^3 \mathbf{r} = \hat{\Psi}(\mathbf{s}), \tag{60}
$$

where the sojourn PDF's independent of direction are

$$
\hat{h}(\mathbf{r},s) = \int \hat{h}(\mathbf{r},s|\Omega)\beta(\Omega)d^2\Omega,\qquad\text{and}\qquad\hat{\psi}(s) = \int \hat{\psi}(s|\Omega)\beta(\Omega)d^2\Omega\_s
$$

and similar expressions for *H*ˆ (**r**,*s*) and Ψˆ (*s*). Note that in terms of the Fourier transform we may write

$$
\hat{H}(\omega=0,s) = \hat{\psi}(s), \quad \text{and} \quad \hat{H}(\omega=0,s) = \Psi(s).
$$

However, Laplace transforming Equation (10) we see that <sup>Ψ</sup><sup>ˆ</sup> (*s*)=[<sup>1</sup> <sup>−</sup> *<sup>ψ</sup>*ˆ(*s*)]/*s*, consequently,

$$
\hat{H}(\omega=0,s) = \frac{1}{s} [1 - \hat{\hbar}(\omega=0,s)].
$$

Inserting Equation (59) into this expression yields

$$\hat{H}(\omega = 0, s) = T^{\alpha}s^{\alpha - 1} + 2T^{2\alpha}s^{2\alpha - 1}, s$$

which leads us to conjecture the following fluid limit approximation:

$$\hat{H}(\omega, s) \simeq T^{\alpha} s^{\alpha - 1} + 2T^{2\alpha} s^{2\alpha - 1} \cdot \dots \cdot \tag{61}$$

(*s* → 0, |*ω*| → 0). Let us stress that this is simply a conjecture because the approximation given by Equation (61) might have depended on |*ω*| as well [32].

Substituting Equations (59) and (61) into Montroll–Weiss Equation (26) and reorganizing terms yields

$$\hat{p}(\omega, s) = \frac{2T^{2\alpha}s^{\alpha - 1}[s^{\alpha} + 1/2T^{\alpha}]}{2T^{2\alpha}[s^{2\alpha} + s^{\alpha}/2T^{\alpha} + |\omega|^{2\gamma}(L^{2\gamma}/6T^{2\alpha})]},$$

that is,

$$\hat{p}(\omega, s) = \frac{s^{a-1}(s^a + 1/\tau)}{s^{2a} + s^a/\tau + \upsilon^2|\omega|^{2\gamma}},\tag{62}$$

where

$$
\pi = 2T^{\mathfrak{a}}, \qquad \upsilon = \frac{1}{\sqrt{6}} (L^{\gamma}/T^{\mathfrak{a}}), \tag{63}
$$

(0 < *α* ≤ 1, 0 < *γ* ≤ 1). The parameters *τ* and *v* can be considered as a fractional time and a fractional characteristic velocity, respectively.

#### *5.2. Fractional Telegrapher's Equation in Three Dimensions*

To derive the fractional telegrapher's equation (FTE) in three dimensions for the fractional isotropic model we first need to introduce some mathematical formalism concerning fractional derivatives.

The Caputo fractional derivative of order *β* > 0 of a function *φ*(*t*) is defined by the functional [21,52,53,67,68]

$$\frac{\partial^{\beta}\phi(t)}{\partial t^{\beta}} = \begin{cases} \frac{1}{\Gamma(n-\beta)} \int\_{0}^{t} \frac{\phi^{(n)}(t')dt'}{(t-t')^{1+\beta-n'}}, & n-1 < \beta < n, \\ \phi^{(n)}(t), & \beta = n, \end{cases} \tag{64}$$

(*n* = 1, 2, 3, ...). Using this definition we can readily obtain the Laplace transform of the Caputo derivative. Indeed, Laplace transforming Equation (64) and using the convolution theorem we obtain

$$\mathcal{L}\left\{\frac{\partial^{\beta}\phi(t)}{\partial t^{\beta}}\right\} = \frac{1}{\Gamma(n-\beta)}\mathcal{L}\left\{\phi^{(n)}(t)\right\}\mathcal{L}\left\{t^{n-\beta-1}\right\}\_{\beta}$$

where L{·} stands for the Laplace transform. With the explicit forms [58]

$$\mathcal{L}\left\{\phi^{(n)}(t)\right\} = s^n \hat{\phi}(s) - \sum\_{k=0}^{n-1} s^{n-1-k} \phi^{(k)}(0),$$

$$\phi^{(n)}, \dots, \dots$$

and

$$\mathcal{L}\left\{t^{n-\beta-1}\right\} = \Gamma(n-\beta)s^{\beta-n},$$

the Laplace transform of the Caputo derivative is found to be

$$\mathcal{L}\left\{\frac{\partial^{\boldsymbol{\beta}}\phi(\boldsymbol{t})}{\partial t^{\boldsymbol{\beta}}}\right\} = s^{\boldsymbol{\beta}}\hat{\phi}(\boldsymbol{s}) - s^{\boldsymbol{\beta}-1}\phi(\boldsymbol{0}) - \sum\_{k=1}^{n-1} s^{\boldsymbol{\beta}-1-k} \phi^{(k)}(\boldsymbol{0}).\tag{65}$$

The second kind of fractional operator we need is the Riesz–Feller fractional Laplacian of order *β* (0 < *β* ≤ 2) of a function *g*(*x*) vanishing at *x* → ±∞. There are several equivalent ways to define it [68], although one of the simplest and most operative definitions is obtained using Fourier analysis. We thus define [21]:

$$\nabla^{\beta} \mathcal{g}(\mathbf{r}) = \mathcal{F}^{-1} \left\{ -|\omega|^{\beta} \mathcal{\overline{g}}(\omega) \right\},\tag{66}$$

(0 <sup>&</sup>lt; *<sup>β</sup>* <sup>≤</sup> 2), where <sup>F</sup> <sup>−</sup>1{·} stands for the inverse Fourier transform, and

$$\bar{\mathfrak{g}}(\omega) = \int\_{\mathbb{R}^3} e^{i\omega \cdot \mathbf{r}} \mathfrak{g}(\mathbf{r}) d^3 \mathbf{r}.$$

is the direct transform.

We are now ready to derive the three-dimensional FTE. We begin with Equation (62) which we rewrite as

$$\left(s^{2a} + \frac{1}{\tau}s^a + \upsilon^2|\omega|^{2\gamma}\right)\widehat{\mathfrak{p}}(\omega, s) = s^{2a-1} + \frac{1}{\tau}s^{a-1}.$$

Taking into account the definition of the Riesz–Feller Laplacian, Equation (66), and recalling that <sup>F</sup> <sup>−</sup>1{1} <sup>=</sup> *<sup>δ</sup>*(**r**), the Fourier inversion yields

$$\left(s^{2a} + \frac{1}{\tau}s^a - v^2 \nabla^{2\gamma} \right) \mathfrak{p}(\mathbf{r}, s) = \left(s^{2a-1} + \frac{1}{\tau}s^{a-1} \right) \delta(\mathbf{r})\_{\tau}$$

and, after reorganizing terms, we have

$$\mathbf{s}^{2a}\boldsymbol{\mathfrak{p}}(\mathbf{r},\mathbf{s})-\mathbf{s}^{2a-1}\boldsymbol{\delta}(\mathbf{r}) + \frac{1}{\tau} \left[\mathbf{s}^a\boldsymbol{\mathfrak{p}}(\mathbf{r},\mathbf{s}) - \mathbf{s}^{a-1}\boldsymbol{\delta}(\mathbf{r})\right] = \mathbf{v}^2\boldsymbol{\nabla}^{2\gamma}\boldsymbol{\mathfrak{p}}.\tag{67}$$

In order to Laplace invert this equation, and thus obtaining an equation for *p*(**r**, *t*), we first evaluate the Laplace transforms of the fractional derivatives *∂<sup>α</sup> p*/*∂αt* and *∂*2*<sup>α</sup> p*/*∂*2*αt* using Equation (65). We must distinguish the cases *β* = *α* and *β* = 2*α*.

(i) Set *β* = *α* in Equation (65). Since 0 < *α* ≤ 1, we see that *n* = 1. Hence

$$
\mathcal{L}\left\{\frac{\partial^{\kappa}p(\mathbf{r},t)}{\partial t^{\kappa}}\right\} = s^{\kappa}\not{p}(\mathbf{r},s) - s^{\kappa-1}p(\mathbf{r},0).
$$

However, *p*(**r**, 0) = *δ*(**r**) (cf. Equation (42)). Therefore

$$\frac{\partial^a p(\mathbf{r}, t)}{\partial t^a} = \mathcal{L}^{-1} \left\{ \mathbf{s}^a \boldsymbol{\hat{p}}(\mathbf{r}, \mathbf{s}) - \mathbf{s}^{a-1} \boldsymbol{\delta}(\mathbf{r}) \right\}. \tag{68}$$

(ii) When *β* = 2*α* (0 < *α* ≤ 1) we need to distinguish the cases (a) 0 < *α* ≤ 1/2 and (b) 1/2 < *α* ≤ 1. For case (a) we have 0 < 2*α* ≤ 1, which reproduces the conditions leading to Equation (68), That is,

$$
\mathcal{L}\left\{\frac{\partial^{2\alpha}p(\mathbf{r},t)}{\partial t^{2\alpha}}\right\} = s^{2\alpha}\widehat{p}(\mathbf{r},s) - s^{2\alpha-1}\delta(\mathbf{r}).
$$

In case (b) we have 1 < 2*α* ≤ 2 and from Equation (65) with *n* = 2 we write

$$\mathcal{L}\left\{\frac{\partial^{2a}p(\mathbf{r},t)}{\partial t^{2}\alpha}\right\} = s^{2a}\left\|(\mathbf{r},s) - s^{2a-1}\delta(\mathbf{r}) - s^{2(a-1)}\frac{\partial p(\mathbf{r},t)}{\partial t}\right\|\_{t=0}.$$

Since *∂p*/*∂t*|*t*=<sup>0</sup> = 0 (cf. Equation (42)) we see that this case coincides with case (a) above. Therefore,

$$\frac{\partial^{2a}p}{\partial t^{2a}} = \mathcal{L}^{-1}\left\{\mathbf{s}^{2a}\boldsymbol{\mathcal{P}}(\mathbf{r},\mathbf{s}) - \mathbf{s}^{2a-1}\boldsymbol{\delta}(\mathbf{r})\right\},\tag{69}$$

(0 < *α* ≤ 1). Returning to Equation (67) and taking the inverse transform we find

$$\mathcal{L}^{-1}\left\{\mathbf{s}^{2a}\not p(\mathbf{r},\mathbf{s})-\mathbf{s}^{2a-1}\delta(\mathbf{r})\right\}+2\lambda\mathcal{L}^{-1}\left\{\mathbf{s}^{a}\not p(\mathbf{r},\mathbf{s})-\mathbf{s}^{a-1}\delta(\mathbf{r})\right\}=\upsilon^{2}\nabla^{2\gamma}p.$$

Using Equations (68) and (69), we readily obtain

$$\frac{\partial^{2\alpha}p}{\partial t^{2\alpha}} + \frac{1}{\tau} \frac{\partial^{\kappa}p}{\partial t^{\kappa}} = v^2 \nabla^{2\gamma} p\_{\prime} \tag{70}$$

which is the fractional telegrapher's equation in three dimensions where *τ* is the fractional time and *v* the fractional velocity (cf. Equation (63)).

As is well known, and as we have remarked in previous sections, the ordinary TE enjoys both wave and diffusion characteristics. We now extend this duality to the fractional TE. To this end we take the limit *τ* → 0 in Equation (70) and also letting *v* → ∞ such that *<sup>τ</sup>v*<sup>2</sup> <sup>→</sup> *<sup>D</sup>* finite. This results in the fractional diffusion equation (cf. Equation (7))

$$\frac{\partial^{\alpha}p}{\partial t^{\alpha}} = D\nabla^{2\gamma}p.\tag{71}$$

Let us see that for any values of *τ* and *v* the fractional diffusion equation is also the asymptotic (in time) limit of the fractional TE (recall that a similar situation occurs with the ordinary TE). Indeed, by passing to the limit *s* → 0 in the fluid-limit expression of the PDF (cf. Equation (62)) the small *s* approximation for ˆ *p*˜(*ω*,*s*) is readily found to be

$$\hat{p}(\omega, s) \simeq \frac{s^{\alpha - 1}}{s^{\alpha} + (\pi v^2) |\omega|^{2\gamma}},$$

which after Fourier-Laplace inversion yields Equation (71) with *D* = *τv*2. Therefore, by virtue of Tauberian theorems the fractional diffusion Equation (71) is the long-time approximation of the fractional TE.

The fractional TE also contains the fractional wave equation as a special case. Thus letting *τ* → ∞ with *v* finite in Equation (70) we get

$$\frac{\partial^{2\alpha}p}{\partial t^{2\alpha}} = v^2 \nabla^{2\gamma} p. \tag{72}$$

Note that when *α* = 1/2 and *γ* = 1 this equation reduces to the ordinary diffusion equation. In this regard Mainardi's terminology "fractional diffusion-wave equation" [52] is more precise than "fractional wave equation". Let us finally observe that the fractional diffusionwave equation is the small-time limit of the fractional TE. Indeed, the limit *s* → ∞ in Equation (62) yields

$$
\hat{p}(\omega, s) \simeq \frac{s^{2\alpha - 1}}{s^{2\alpha} + v^2 |\omega|^{2\gamma}},
$$

and the Fourier–Laplace inversion results in Equation (72). Again, due to Tauberian theorems we see that the fractional diffusion-wave equation is the short-time limit of the fractional TE.

We thus see from the preceding discussion that the fractional TE embraces two different dynamics. one of them, at small times, representing fractional wavelike behavior, and another one which at long times enhances fractional diffusion-like behavior. This constitutes the fractional generalization of the dual character between waves and diffusions showed by the ordinary TE.

#### *5.3. Lower Dimensional Cases*

We next address lower dimensional problems and will see that in one and two dimensions the fractional TE has formally the same form than in three dimensions.

#### 5.3.1. One Dimension

As we know the one-dimensional fractional case is based the fractional generalization of the continuous-time persistent random walk on the line which is a discrete two-state model and whose derivation from the continuous multistate model described above, although similar in many aspects, it is not straightforward. We will here state just the main result and refer the interested reader to [32] or the review [15] for details. Thus, the onedimensional fractional TE has formally the same appearance that the three-dimensional Equation (70)

$$\frac{\partial^{2\alpha}p}{\partial t^{2\alpha}} + \frac{1}{\tau} \frac{\partial^{\alpha}p}{\partial t^{\alpha}} = v^{2} \frac{\partial^{2\gamma}p}{\partial x^{2\gamma}}\tag{73}$$

where the Caputo derivatives with respect to time are equal to those of the three dimensional Equation (70) and

$$\frac{\partial^{2\gamma}p}{\partial x^{2\gamma}} = \mathcal{F}^{-1}\left\{-|\omega|^{2\gamma}\vec{p}(\omega,t)\right\}\_{\prime\prime}$$

is the Riesz–Feller fractional derivative (cf. Equation (66)), where

$$
\vec{p}(\omega, t) = \int\_{-\infty}^{\infty} e^{i\omega \chi} p(\mathbf{x}, t) d\mathbf{x},
$$

is the Fourier transform of the one-dimensional PDF *<sup>p</sup>*(*x*, *<sup>t</sup>*) and <sup>F</sup> <sup>−</sup>1{·} is the inverse transform.

As the reader can easily check, the solution of Equation (73) with the standard initial conditions:

$$p(\mathbf{x},0) = \delta(\mathbf{x}), \qquad \frac{\partial p(\mathbf{x},t)}{\partial t}\Big|\_{t=0} = 0,$$

in Fourier–Laplace space reads

$$\hat{p}(\omega, s) = \frac{s^{\alpha - 1}(s^{\alpha} + 1/\tau)}{s^{2\alpha} + s^{\alpha}/\tau + \upsilon^{2}|\omega|^{2\gamma}}. \tag{74}$$

which is the one-dimensional version of Equation (62).

#### 5.3.2. Two Dimensions

As we have discussed in Section 4.3 most expressions of the two-dimensional case are the same than in three dimensions after the replacement *<sup>c</sup>*2/6 <sup>→</sup> *<sup>c</sup>*2/4 (cf. Equation (54)). Thus, for instance, the sojourn densities ˆ˜ *h* and ˆ *H*˜ —which in three dimensions and in the fluid limit approximations are given by Equations (57) and (58)—now read

$$\hat{\tilde{h}}(\omega, s) \simeq 1 - \frac{s}{\lambda} - 2\left(\frac{s}{\lambda}\right)^2 - \frac{1}{2\lambda^2}|\omega|^2 c^2 \cdot \cdots, \qquad \hat{\tilde{H}}(\omega, s) = \frac{1}{\lambda} \left(1 + \frac{2s}{\lambda}\right) \cdot \cdots \tag{75}$$

(*s* → 0, |*ω*| → 0) which mimicking the discussion of Section 5.1 leads to the following fractional generalization of these functions [see Equations (59) and (61)]

$$\hat{H}(\omega, \mathbf{s}) = 1 - (\text{Ts})^a - 2(\text{Ts})^{2a} - \frac{1}{2}(L|\omega|)^{2\gamma} \cdot \cdots \cdot, \qquad \hat{H}(\omega, \mathbf{s}) \simeq T^a \mathbf{s}^{a-1} + 2T^{2a} \mathbf{s}^{2a-1} \cdots \tag{76}$$

Substituting Equation (76) into Montroll–Weiss Equation (26) and reorganizing terms yields

$$\hat{p}(\omega, s) = \frac{s^{a-1}(s^a + 1/\tau)}{s^{2a} + s^a/\tau + \upsilon^2 |\omega|^{2\gamma}}\tag{77}$$

where

$$
\pi = 2T^{\alpha}, \qquad v = \frac{1}{\sqrt{2}} (L^{\gamma}/T^{\alpha}). \tag{78}
$$

Equation (77) gives the transformed PDF of the fractional two-dimensional isotropic random walk in the fluid limit approximation. Let us note that this expression has the same form as that of the three-dimensional case (cf. Equation (62)) with the same time parameter *τ* but a different velocity parameter *v* (cf. Equation (63)). Therefore, following exactly the same procedure detailed in the previous section we find the two-dimensional TE

$$\frac{\partial^{2\alpha}p}{\partial t^{2\alpha}} + \frac{1}{\tau} \frac{\partial^{\kappa}p}{\partial t^{\kappa}} = v^2 \nabla^{2\gamma} p\_{\prime} \tag{79}$$

which has the same form as the fractional TE (70) of the three dimensional case and with the same limiting behavior regarding fractional diffusion and wave-like performances.

#### *5.4. Characteristic Function*

Solving the fractional telegrapher's equation and thus obtaining the exact analytical form of the PDF *p*(**r**, *t*) seems to be out of reach for any dimension. It is, however, possible to obtain regardless dimensionality, a close and exact expression of the characteristic function *p*˜(*ω*, *t*) (i.e., the Fourier transform of the PDF *p*(**r**, *t*)) of the space-time fractional telegrapher's Equation (70). To this end we will perform the Laplace inversion of the joint Fourier–Laplace ˆ *p*˜(*ω*,*s*). Since this function has formally the same form in one, two and tree dimensions (cf. Equations (74), (77) and (62) respectively) the differences between them only arise when Fourier inverting and the characteristic function will be formally the same in all cases. This similarity also shows that the time structure of any average will be the same regardless the number of spatial dimension (we will see this fact explicitly in our discussion on the moments of time-fractional processes to be discussed in the next section).

We start off with Equation (77). Let us first note that taking into account the factorization

$$\left[\mathbf{s}^{2a} + \mathbf{s}^a/\tau + \upsilon^2|\omega|^{2\gamma} = \left[\mathbf{s}^a + \frac{1}{2\tau} - \eta(\omega)\right] \left[\mathbf{s}^a + \frac{1}{2\tau} + \eta(\omega)\right],$$

where

$$
\eta(\omega) = \sqrt{1/4\pi^2 - v^2|\omega|^{2\gamma}}\tag{80}
$$

Equation (77) can be written as

$$\hat{p}(\omega, s) = \frac{s^{a-1}}{2\eta(\omega)} \left[ \frac{1/2\tau + \eta(\omega)}{s^a + 1/2\tau - \eta(\omega)} - \frac{1/2\tau - \eta(\omega)}{s^a + 1/2\tau + \eta(\omega)} \right]. \tag{81}$$

Further manipulations yield

$$\frac{s^{a-1}}{s^a + 1/2\tau \pm \eta(\omega)} = \frac{s^{-1}}{1 + [1/2\tau \pm \eta(\omega)]s^{-a}} = \sum\_{n=0}^{\infty} (-1)^n \left[1/2\tau \pm \eta(\omega)\right]^n s^{-1-na}.$$

We next proceed to Laplace inversion. Since [58]

$$
\mathcal{L}^{-1}\left\{\mathbf{s}^{-1-n\alpha}\right\} = \frac{t^{n\alpha}}{\Gamma(1+n\alpha)}\lambda
$$

we have

$$\mathcal{L}^{-1}\left\{\frac{s^{a-1}}{s^a + 1/2\tau \pm \eta(\omega)}\right\} = \sum\_{n=0}^{\infty} (-1)^n \frac{([1/2\tau \pm \eta(\omega)]t^a)^n}{\Gamma(1+na)} = \mathcal{E}\_\mathfrak{a}\left(-[1/2\tau \pm \eta(\omega)]t^a\right),$$

where E*α*(·) is the Mittag–Leffler function [69]

$$\mathcal{E}\_n(z) = \sum\_{n=0}^{\infty} \frac{z^n}{\Gamma(1+n\alpha)}.\tag{82}$$

For *α* not an integer, the Mittag–Leffler function E*α*(*z*) can be regarded as a kind of "fractional generalization" of the exponential function. Indeed when *α* = 1 and since Γ(1 + *n*) = *n*! we immediately see from Equation (82) that E1(*z*) = *ez*.

Taking the inverse Laplace transform of Equation (81) and using the above intermediate results we finally obtain the characteristic function of the space-time fractional transport process

$$\overline{p}(\omega, t) = \frac{1}{2\eta(\omega)} \left\{ \begin{bmatrix} 1/2\tau & + & \eta(\omega) \end{bmatrix} \mathbf{E}\_{\mathbf{a}} \left( -[1/2\tau - \eta(\omega)]t^{\mathbf{a}} \right) \right. $$

$$ - \quad \left[ 1/2\tau - \eta(\omega) \right] \mathbf{E}\_{\mathbf{a}} \left( -[1/2\tau + \eta(\omega)]t^{\mathbf{a}} \right) \right\}, \tag{83} $$

which we recall is valid for any dimension of the underlying space.

In the wave-like limit when *v* is finite but *τ* → ∞ the fractional TE (70) reduces to the fractional wave-diffusion Equation (72). In this case (cf. Equation (80))

$$\eta(\omega) = i v |\omega|^{\gamma}\_{\cdot \cdot}$$

and the characteristic function reads

$$\vec{p}(\omega, t) = \frac{1}{2} \left[ \mathbb{E}\_{\mathfrak{a}} \left( -i\upsilon |\omega|^{\gamma} t^{\mathfrak{a}} \right) + \mathbb{E}\_{\mathfrak{a}} \left( i\upsilon |\omega|^{\gamma} t^{\mathfrak{a}} \right) \right], \tag{84}$$

a solution already obtained by Mainardi for the wave-diffusion equation [52].

In the diffusion-like limit *<sup>τ</sup>* <sup>→</sup> 0 and *<sup>v</sup>* <sup>→</sup> <sup>∞</sup> such that 2*τv*<sup>2</sup> <sup>=</sup> *<sup>D</sup>* (finite) and from Equation (80) we see that

$$2\pi\eta(\omega) = \sqrt{1 - 4\pi^2 \upsilon^2 |\omega|^{2\gamma}} \longrightarrow 1,$$

and

$$\frac{1}{2\pi} \mp \eta(\omega) = \frac{1}{2\pi} \left( 1 \mp \sqrt{1 - 2\tau D|\omega|^{2\gamma}} \right) = \frac{D|\omega|^{2\gamma}}{1 \pm \sqrt{1 - 2\tau D|\omega|^{2\gamma}}} \longrightarrow D|\omega|^{2\gamma}.$$

From Equation (83) we get

$$\vec{p}(\omega, t) = \mathbb{E}\_{\mathfrak{a}}\left(-Dt^{\mathfrak{a}}|\omega|^{2\gamma}\right),\tag{85}$$

a well known result which corresponds to a Levy density with fractional time [36,38]. When *α* = 1 this result reduces to the ordinary Levy distribution with zero mean,

$$
\vec{p}(\omega, t) = \varepsilon^{-Dt|\omega|^2 \gamma}. \tag{86}
$$

#### **6. Time-Fractional Telegraphic Transport**

In the last section we have developed the fractional telegraphic transport in its most general form assuming that both time and space are fractional. This leads, as the master equation of the process, to the space-time fractional telegrapher's equation which is formally the same in one, two and three dimensions. We have also seen that in both cases the general space-time fractional TE reduces to the space-time fractional wave equation at short times and to the space-time fractional diffusion equation at long times. This dual character is even more manifest for the time-fractional equation when the spatial exponent *γ* = 1 and only time is fractional. For fractional diffusion this particular case has been extensively studied in the literature and has many applications [35–37,39,41–43].

In any dimension the time-fractional TE is

$$\frac{\partial^{2\alpha}p}{\partial t^{2\alpha}} + \frac{1}{\tau} \frac{\partial^{\alpha}p}{\partial t^{\alpha}} = v^2 \nabla^2 p \tag{87}$$

(0 <sup>&</sup>lt; *<sup>α</sup>* <sup>≤</sup> 1), where *<sup>∇</sup>*<sup>2</sup> is either the two or the three dimensional Laplacian and in one dimension is the second spacial derivative.

For the time-fractional TE we can obtain more analytical results than for the space-time TE. Results that turn out to be very useful because they clearly mark the similarities and dissimilarities between telegraphic transport processes in different dimensions. For one dimension we had already obtained in [32] some of the results presented here but not in higher dimensions. We now fill this gap in which the analogies and differences among different dimensions are clearly manifested.

Our starting point is again the Fourier–Laplace solution of the fractional TE (cf. Equations (62), (74) or (77) with *γ* = 1)

$$\hat{p}(\omega, s) = \frac{s^{a-1}(s^a + 1/\tau)}{s^{2a} + s^a/\tau + v^2|\omega|^2}. \tag{88}$$

The basic idea is the following: since time is now the only fractional variable but not space, it is possible to Fourier invert ˆ *p*˜(*ω*,*s*) and obtain a closed expression for the Laplace transform *p*ˆ(**r**,*s*) which compels us to treat different dimensions separately. Once we get the expression for *p*ˆ(**r**,*s*), the use of Tauberian theorems will allow us to obtain asymptotic expressions of the PDF *p*(**r**, *t*) at long and short times. Even though the one dimensional problem was fully addressed in [32], we present here all three dimensions and compare each result.

#### *6.1. Laplace Transform of the PDF*

Let us proceed to Fourier invert the expression (88) for ˆ *p*˜(*ω*,*s*). To this end we need to treat each dimension separately.

#### 6.1.1. One Dimension

Recall that in one dimension the expression (88) of the transformed density ˆ *p*˜ remains valid although in this case |*ω*| is not the modulus of a vector but the absolute value of a single variable. By virtue of the symmetry of ˆ *p*˜ with respect to *ω*, the Fourier inversion will be given by

$$\hat{p}(\mathbf{x},\mathbf{s}) = \frac{1}{2\pi} \int\_{-\infty}^{\infty} e^{-i\omega\chi} \hat{p}(\omega,\mathbf{s}) d\omega = \frac{1}{\pi} \int\_{0}^{\infty} \hat{p}(\omega,\mathbf{s}) \cos\omega \mathbf{x} d\omega.$$

Substituting for Equation (88) yields

$$\not p(\mathbf{x}, \mathbf{s}) = \frac{1}{\pi \mathbf{s}} (\mathbf{s}^{2\mathfrak{a}} + \mathbf{s}^{\mathfrak{a}} / \mathfrak{r}) \int\_0^\infty \frac{\cos \omega \mathbf{x}}{\mathbf{s}^{2\mathfrak{a}} + \mathbf{s}^{\mathfrak{a}} / \mathfrak{r} + \mathfrak{v}^2 \omega^2} d\omega \,\omega$$

and recalling the integral [70]

$$\int\_0^\infty \frac{\cos \omega \mathbf{x}}{a^2 + b^2 \omega^2} d\omega = \frac{1}{2ab} \varepsilon^{-a|\mathbf{x}|/b} \omega$$

we get

$$\phi(\mathbf{x}, \mathbf{s}) = \frac{1}{2\pi\mathbf{s}} \sqrt{\mathbf{s}^{2\mathbf{a}} + \mathbf{s}^{\mathbf{a}}/\tau} \exp\left\{-\frac{|\mathbf{x}|}{v} \sqrt{\mathbf{s}^{2\mathbf{a}} + \mathbf{s}^{\mathbf{a}}/\tau}\right\}.\tag{89}$$

For *α* = 1 the fractional TE (87) reduces to the the ordinary TE and in this onedimensional case the Laplace transform (89) can be inverted yielding the exact PDF *p*(*x*, *t*) in terms of modified Bessel functions. We refer the interested reader to [32] for more details. For the fractional case when *α* = 1, the exact analytical inversion of Equation (89) seems to

be out of reach. However, as we will see below, we can obtain approximate solutions for large values of time.

#### 6.1.2. Two Dimensions

In this case the Fourier inversion formula yields for the PDF in two dimensions

$$\begin{split} \hat{p}(\mathbf{r},s) &= \frac{1}{(2\pi)^{2}} \int\_{\mathbb{R}^{2}} e^{-i\omega r} \hat{p}(\omega,s) d^{2}\omega = \frac{1}{(2\pi)^{2}} \int\_{0}^{\infty} \int\_{0}^{2\pi} e^{-i\omega r \cos\varphi} \hat{p}(\omega,s) \omega d\omega d\varphi \\ &= \frac{1}{(2\pi)^{2}} \int\_{0}^{\infty} \omega \,\hat{p}(\omega,s) d\omega \int\_{0}^{2\pi} e^{-i\omega r \cos\varphi} d\varphi, \end{split}$$

where *<sup>ω</sup>* <sup>=</sup> <sup>|</sup>*ω*<sup>|</sup> and we have taken into account that <sup>ˆ</sup> *p*˜(*ω*,*s*) depends only on the modulus |*ω*| = *ω* [see Equation (88)].

From the integral representation of the Bessel function of zero order [63],

$$J\_0(\omega r) = \frac{1}{2\pi} \int\_0^{2\pi} e^{-i\omega r \cos\varphi} d\varphi,$$

we write

$$
\hat{p}(r,s) = \frac{1}{2\pi} \int\_0^\infty \omega f\_0(\omega r) \hat{p}(\omega r, s) d\omega.
$$

Substituting for (88) we have

$$\hat{\rho}(\mathbf{x}, \mathbf{s}) = \frac{1}{2\pi\mathbf{s}} (\mathbf{s}^{2\alpha} + \mathbf{s}^{\alpha}/\tau) \int\_0^\infty \frac{\omega J\_0(\omega r)}{\mathbf{s}^{2\alpha} + \mathbf{s}^{\alpha}/\tau + v^2 \omega^2} d\omega r$$

and taking into account the integral [70]

$$\int\_0^\infty \frac{\omega l\_0(a\omega)}{b^2 + \omega^2} d\omega = K\_0(ab)\omega$$

(*a* > 0, Re *b* > 0), where *K*0(·) is a modified Bessel function, we finally obtain

$$\phi(r,s) = \frac{1}{2\pi\upsilon^2s} \left(s^{2\mathfrak{a}} + s^{\mathfrak{a}}/\tau\right) \mathbb{K}\_0\left(\frac{r}{\upsilon}\sqrt{s^{2\mathfrak{a}} + s^{\mathfrak{a}}/\tau}\right). \tag{90}$$

#### 6.1.3. Three Dimensions

Bearing in mind that ˆ *p*˜(*ω*,*s*) depends only on the modulus |*ω*| = *ω* (cf. Equation (88)), the Fourier inversion of ˆ *p*˜ is

$$\begin{split} \mathfrak{p}(\mathfrak{r},s) &= \quad \frac{1}{(2\pi)^{3}} \int\_{\mathbb{R}^{3}} e^{-i\omega \cdot \mathbf{r}} \hat{\mathfrak{p}}(\omega,s) d^{3}\omega \\ &= \quad \frac{1}{(2\pi)^{3}} \int\_{0}^{\infty} \int\_{0}^{\pi} \int\_{0}^{2\pi} e^{-i\omega r \cos\theta} \hat{\mathfrak{p}}(\omega,s) \omega^{2} \sin\theta d\omega d\theta d\theta \\ &= \quad \frac{1}{(2\pi)^{2}} \int\_{0}^{\infty} \omega^{2} \hat{\mathfrak{p}}(\omega,s) d\omega \int\_{0}^{\pi} e^{-i\omega r \cos\theta} \sin\theta d\theta. \end{split}$$

Since

$$\int\_0^\pi e^{-i\omega r \cos \theta} \sin \theta d\theta = \frac{2}{\omega r} \sin \omega r\_{\prime\prime}$$

we have

$$
\hat{p}(r,s) = \frac{1}{2\pi^2 r} \int\_0^\infty \omega \sin\omega r \hat{p}(\omega,s) d\omega.
$$

Substituting for Equation (88) yields

$$\not p(\mathbf{x}, \mathbf{s}) = \frac{1}{2\pi^2 rs} \left( \mathbf{s}^{2a} + \mathbf{s}^a / \pi \right) \int\_0^\infty \frac{\omega \sin \omega r}{s^{2a} + \mathbf{s}^a / \pi + v^2 \omega^2} d\omega r$$

and taking into account the integral [70]

$$\int\_0^\infty \frac{\omega \sin a\omega}{b^2 + \omega^2} d\omega = \frac{\pi}{2} \varepsilon^{-ab} \omega$$

(*a* ≥ 0, Re *b* > 0), we obtain

$$\hat{p}(r,s) = \frac{1}{4\pi r v^2 s} \left(s^{2a} + s^a/\tau\right) \exp\left\{-\frac{r}{v}\sqrt{s^{2a} + s^a/\tau}\right\},\tag{91}$$

which is the exact PDF in three dimensions. Notice the different form taken by *p*ˆ(*r*,*s*) in one, two and three dimensions (cf. Equations (89)–(91), respectively).

Let us also observe that, like in the general space-time fractional cases, for the timefractional case (*α* = 1, *γ* = 1) the expressions for *p*ˆ given by Equations (89)–(91) are very difficult, not to say impossible, to invert analytically. Thus, obtaining the exact analytical form of the PDF *p*(*r*, *t*) in real time seems to be beyond reach. We can, however, obtain approximate solutions, valid for large values of time, using Tauberian theorems which relate the small *s* behavior of *p*ˆ(*r*,*s*) with the large *t* behavior of *p*(*r*, *t*) [61,71].

#### *6.2. Long-Time Asymptotic Expressions*

For the asymptotic analysis we rely on Tauberian theorems which allow us to infer the behavior of *p*(*r*, *t*) for long times out of the expression for *p*ˆ(*r*,*s*) for small values of the Laplace variable *s* [61,71]. We work again each dimension separately.

#### 6.2.1. One Dimension

We briefly summarize only the main results in one dimension and refer the reader to [32] for details. For long times such that *<sup>t</sup> <sup>τ</sup>*1/*<sup>α</sup>* we have shown that [32]

$$p(\mathbf{x},t) \simeq \frac{t^{-\alpha/2}}{2v\sqrt{\pi}} M\_{\alpha/2} \left(\frac{|\mathbf{x}|t^{-\alpha/2}}{2v\sqrt{\pi}}\right), \qquad (t \gg \pi^{1/\alpha}), \tag{92}$$

where *Mα*/2(·) is the Mainardi function defined by the power series [52,72]

$$M\_{\beta}(z) = \sum\_{n=0}^{\infty} \frac{(-1)^{n} z^{n}}{n! \Gamma(-\beta n + 1 - \beta)}.\tag{93}$$

The function *Mβ*(*z*) is an entire function for 0 < *β* < 1 [52] being a special case of the Wright function [69,72] (see below) which is, in turn, closely related to Fox function frequently used in the anomalous diffusion literature [36]. We incidentally note that after the replacement *v* <sup>√</sup>*<sup>τ</sup>* <sup>→</sup> *<sup>D</sup>*, the asymptotic expression (92) becomes the exact solution to the time fractional diffusion equation (cf. Equation (71) with *γ* = 1), solution obtained by Mainardi some years ago [52].

From Equation (93) we see that *Mα*/2(*z*) → 1/Γ(1 − *α*/2) as *z* → 0 and Equation (92) yields the asymptotic power law

$$p(x,t) \sim t^{-a/2}, \qquad (t \to \infty). \tag{94}$$

6.2.2. Two Dimensions

Noticing that as *<sup>s</sup>* <sup>→</sup> 0 (specifically, if *<sup>s</sup> <sup>τ</sup>*−1/*α*)

$$s^{2a} + s^a/\tau = (s^a/\tau)(\tau s^a + 1) \simeq s^a/\tau,\tag{95}$$

we write for the two dimensional density (90)

$$
\hat{p}(r,s) \simeq \frac{s^{a-1}}{2\pi\upsilon^2\tau} K\_0\left(\frac{rs^{a/2}}{\upsilon\sqrt{\tau}}\right), \qquad (s \ll \tau^{-1/a}).\tag{96}
$$

On the other hand [63]

$$K\_0(z) = -\left[\gamma + \ln(z/2)\right] I\_0(z) + 2\sum\_{n=1}^{\infty} \frac{1}{n} I\_{2n}(z),$$

(*γ* = 0.5772 ··· is the Euler constant and *Iν*(*z*) are modified Bessel functions), but [63]

$$I\_{\nu}(z) = \sum\_{n=0}^{\infty} \frac{(z/2)^{\nu+2n}}{\Gamma(\nu+n+1)} = O(z^{\nu}) \qquad (z \to 0),$$

thus

$$K\_0(z) = -\left[\gamma + \ln(z/2)\right] + O(z^2 \ln z). \tag{97}$$

Hence

$$K\_0\left(\frac{r s^{a/2}}{v\sqrt{\tau}}\right) = -\left[\gamma + \ln\left(\frac{r}{2v\sqrt{\tau}}\right)\right] - \frac{a}{2}\ln s + O(s^a \ln s),$$

which substituting into Equation (96) yields the approximate expressions valid for small values of *<sup>s</sup>* (i.e., when *<sup>s</sup> <sup>τ</sup>*−1/*α*)

$$\phi(r,s) \simeq \frac{-1}{2\pi v^2 \tau} \left\{ \left[ \gamma + \ln \left( \frac{r}{2v\sqrt{\tau}} \right) \right] \frac{1}{s^{1-a}} + \frac{a/2}{s^{1-a}} \ln s \right\}.\tag{98}$$

We next proceed to Laplace inverting this small *s* expression for *p*ˆ(*r*,*s*) which by virtue of Tauberian theorems will provide an approximate expression of *p*(*r*, *t*) suitable for long times. Taking into account the Laplace inversion formulae [32,58]

$$\mathcal{L}\left\{\frac{1}{s^{\beta}}\right\} = \frac{t^{\beta - 1}}{\Gamma(\beta)} \qquad \text{and} \qquad \mathcal{L}\left\{\frac{\ln s}{s^{\beta}}\right\} = \frac{t^{\beta - 1}}{\Gamma(\beta)} \left[\psi(\beta) - \ln t\right].$$

[*β* > 0 and *ψ*(*z*) = Γ (*z*)/Γ(*z*)] we have

$$\begin{split} p(r,t) &\quad \simeq \quad \frac{-1}{2\pi v^{2}\tau} \Bigg\{ \Big[ \gamma + \ln \left( \frac{r}{2v\sqrt{\tau}} \right) \Bigg] \frac{t^{-\alpha}}{\Gamma(1-\alpha)} + \frac{\alpha}{2} \frac{t^{-\alpha}}{\Gamma(1-\alpha)} \left[ \psi(1-\alpha) - \ln t \right] \Bigg\} \\ &= \quad \frac{1}{2\pi v^{2}\tau} \frac{t^{-\alpha}}{\Gamma(1-\alpha)} \Bigg[ \frac{\alpha}{2} \ln t - \ln \left( \frac{r}{2v\sqrt{\tau}} \right) - \gamma - \psi(1-\alpha) \Bigg] \\ &= \quad \frac{1}{2\pi v^{2}\tau} \frac{t^{-\alpha}}{\Gamma(1-\alpha)} \Bigg[ \ln \left( \frac{2vt^{\mu/2}\sqrt{\tau}}{r} \right) - \gamma - \psi(1-\alpha) \Bigg], \end{split}$$

and neglecting constant terms we finally get

$$p(r,t) \simeq \frac{1}{2\pi v^2 r} \frac{t^{-a}}{\Gamma(1-a)} \ln\left(\frac{2vt^{a/2}\sqrt{\overline{\tau}}}{r}\right),\tag{99}$$

(*<sup>t</sup> <sup>τ</sup>*1/*α*). Therefore, in two dimensions the PDF of the time fractional TE obeys the asymptotic logarithmic power law

$$p(\mathbf{x}, t) \sim t^{-\alpha} \ln t \qquad (t \to \infty). \tag{100}$$

6.2.3. Three Dimensions

In three dimensions the starting point of our asymptotic analysis is Equation (91) which using the small *s* approximation given in Equation (95) yields

$$
\hat{p}(r,s) \simeq \frac{s^{\alpha-1}}{4\pi r v^2 \tau} e^{-rs^{\alpha/2}/v\sqrt{\tau}} \qquad (s \ll \tau^{-1/a}), \tag{101}
$$

and expanding the exponential we write

$$\mathfrak{p}(r,s) \simeq \frac{s^{a-1}}{4\pi r v^2 \tau} \sum\_{n=0}^{\infty} \frac{1}{n!} \left(\frac{-r}{v\sqrt{\tau}}\right)^n s^{-1 + (1+n/2)a} \qquad (s \ll \tau^{-1/a}).$$

Recall again that because of Tauberian theorems the inversion of this expression for *p*ˆ(*r*,*s*), valid for small values of *s*, will provide an asymptotic expression for *p*(*r*, *t*) suitable for large values of *t*. Thus, taking into account the Laplace inversion formula [32,52]

$$\mathcal{L}^{-1}\left\{\mathbf{s}^{\delta}\right\} = \frac{\mathbf{t}^{-1-\delta}}{\Gamma(-\delta)}\tag{102}$$

(where *<sup>δ</sup>* <sup>=</sup> 0 and not a positive integer) we obtain for *<sup>t</sup> <sup>τ</sup>*1/*<sup>α</sup>*

$$p(r,t) \simeq \frac{1}{4\pi r v^2 \pi} \sum\_{n=0}^{\infty} \frac{1}{n!} \left(\frac{-r}{v\sqrt{\pi}}\right)^n \frac{t^{-(1+n/2)a}}{\Gamma[1-(1+n/2)a]^n}$$

that is

$$p(r,t) \simeq \frac{t^{-a}}{4\pi r v^2 \tau} \sum\_{n=0}^{\infty} \frac{1}{n!} \frac{1}{\Gamma[1 - (1 + n/2)a]} \left(\frac{-rt^{-a/2}}{v\sqrt{\tau}}\right)^n, \qquad (t \gg \tau^{1/a}).\tag{103}$$

This asymptotic expression for *p*(*r*, *t*) can be written in a more compact form by using the Wright function defined by [69,72]

$$\mathcal{W}\_{\lambda,\mu}(z) = \sum\_{n=0}^{\infty} \frac{z^n}{n!\Gamma(\mu + \lambda n)},\tag{104}$$

(*λ* > −1 and *μ* and *z* arbitrary complex numbers). It is an entire function originally proposed by Wright in the 1930's for the asymptotic theory of partitions [69]. When *λ* = 1 the Wright function can be written in terms of the Bessel function of order *μ* − 1 [69,72]. Moreover, Mainardi function *Mβ*(*z*) defined in Equation (93) is a particular case of the Wright function. Indeed,

$$
\mathcal{M}\_{\mathfrak{F}}(z) = \mathcal{W}\_{-\mathfrak{F},1-\mathfrak{F}}(-z).
$$

From Equations (103) and (104) we see that the asymptotic PDF for the three dimensional case can be written as

$$p(r,t) \simeq \frac{t^{-a}}{4\pi r v^2 \tau} \mathcal{W}\_{a/2, 1-a} \left(\frac{-rt^{-a/2}}{v\sqrt{\tau}}\right), \qquad (t \gg \tau^{1/a}).\tag{105}$$

Finally, from Equation (104) we see that *Wλ*,*μ*(*z*) → 1/Γ(*μ*) as *z* → 0 and Equation (105) yields the asymptotic power law

$$p(r,t) \sim t^{-a}, \qquad (t \to \infty). \tag{106}$$

#### *6.3. Moments of the Effective Distance Travelled*

One of the magnitudes of greatest interest in transport problems is the distance covered by the particle from the starting point. The evaluation of the actual distance is very involved due to the random turnarounds of the trajectory. We will take as an estimate of it the effective distance travelled (taking into account that the transport processes starts at the origin of the coordinate system) which is the quantity |*x*(*t*)| in one dimension, and |**r**(*t*)| = *r*(*t*) in two and three dimensions. We will thus work each dimension separately, although, as stated in Section 5.4, moments are essentially the same for each dimension.

Let us note that for space-time fractional processes, the moments of the distance travelled may be infinite (as in the case of the Levy processes). However, for time-fractional processes these moments are finite and we can get analytical expressions for them using the forms of the PDF obtained above. Moments also make explicit the dual character of the fractional telegraphic transport between fractional wave transport and fractional diffusion transport which generalizes the same duality presented by the ordinary TE.

#### 6.3.1. One Dimension

Moments are defined by

$$
\langle |\mathbf{x}(t)|^n \rangle = \int\_{-\infty}^{\infty} |\mathbf{x}|^n p(\mathbf{x}, t) \, \qquad (n = 1, 2, \dots),
$$

and recalling that *p*(*x*, *t*) is an even function of *x*, the Laplace transform can be written as

$$\mathcal{L}\{\langle|\mathbf{x}(t)|^{n}\rangle\} = 2\int\_{0}^{\infty} \mathbf{x}^{n}\hat{p}(\mathbf{x},\mathbf{s})d\mathbf{x}.$$

Substituting for Equation (89) yields

$$\mathcal{L}\{\langle|\mathbf{x}(t)|^{n}\rangle\} = \frac{\sqrt{\beta(s)}}{\upsilon s} \int\_{0}^{\infty} \mathbf{x}^{n} e^{-\mathbf{x}\sqrt{\beta(s)}/\upsilon} d\mathbf{x} = \frac{\upsilon^{n}}{\mathrm{s}[\beta(s)]^{n/2}} \int\_{0}^{\infty} z^{n} e^{-z} dz = \frac{\upsilon^{n} n!}{\mathrm{s}[\beta(s)]^{n/2}},$$

where *β*(*s*) = *s*2*<sup>α</sup>* + *sα*/*τ*. Hence

$$\mathcal{L}\{ \langle |x(t)|^n \rangle \} = \frac{n! v^n}{s(s^{2n} + s^a/\tau)^{n/2}}, \qquad (n = 1, 2, \dots). \tag{107}$$

Recall that when *τ* → ∞ we recover the fractional wave equation. In this case from Equation (107) we have the "wave limit"

$$\mathcal{L}\{ |\mathbf{x}(t)|^{n} \rangle \} = \frac{n! v^{n}}{\mathbf{s}^{1+n\mathbf{a}}} \qquad \Rightarrow \qquad \langle |\mathbf{x}(t)|^{n} \rangle = \frac{n! v^{n}}{\Gamma(1+n\mathbf{a})} t^{n\mathbf{a}}.\tag{108}$$

On the other hand when *<sup>τ</sup>* <sup>→</sup> 0 but *<sup>v</sup>* <sup>→</sup> <sup>∞</sup> such that *<sup>v</sup>*2*<sup>τ</sup>* <sup>=</sup> *<sup>D</sup>* (finite) we recover the fractional diffusion equation and have the "diffusion limit"

$$\mathcal{L}\{ \langle |\mathbf{x}(t)|^{n} \rangle \} = \frac{n! D^{n/2}}{\mathbf{s}^{1+n\alpha/2}} \qquad \Rightarrow \qquad \langle |\mathbf{x}(t)|^{n} \rangle = \frac{n! D^{n/2}}{\Gamma(1+n\alpha/2)} t^{n\alpha/2}.\tag{109}$$

Let us also recall that the wave limit is the one we recover from TE as *t* → 0, whereas the diffusion limit corresponds to the long time limit. Indeed, taking into account that

$$(\mathbf{s}^{2\mathfrak{a}} + \mathbf{s}^{\mathfrak{a}}/\tau)^{n/2} \simeq \mathbf{s}^{\mathfrak{na}} \text{ ( $\mathbf{s} \to \infty$ )} \quad \text{and} \quad (\mathbf{s}^{2\mathfrak{a}} + \mathbf{s}^{\mathfrak{a}}/\tau)^{n/2} \simeq (\mathbf{s}^{\mathfrak{a}}/\tau)^{n} \text{ ( $\mathbf{s} \to \mathbf{0}$ )}$$

and bearing in mind Tauberian theorems, we see from Equation (107)

$$\mathcal{L}\{\langle|\mathbf{x}(t)|^{\mathfrak{n}}\rangle\} \simeq \frac{n!\upsilon^{\mathfrak{n}}}{s^{1+n\mathfrak{n}}} \left(s \to \infty\right) \Rightarrow \langle|\mathbf{x}(t)|^{\mathfrak{n}}\rangle \simeq \frac{n!\upsilon^{\mathfrak{n}}}{\Gamma(1+n\mathfrak{n})}t^{na} \left(t \to 0\right),\tag{110}$$

and

$$\mathcal{L}\{\left|\mathbf{x}(t)\right|^{n}\}\rangle \simeq \frac{n!(v\sqrt{\tau})^{n}}{s^{1+na/2}}\left(s \to 0\right) \Rightarrow \left<\left|\mathbf{x}(t)\right|^{n}\right> = \frac{n!(v\sqrt{\tau})^{n}}{\Gamma(1+na/2)}t^{na/2}\left(t \to \infty\right). \tag{111}$$

#### 6.3.2. Two Dimensions

We now have

$$\mathcal{L}\{\langle|\mathbf{r}(t)|^{n}\rangle\} = \int\_{\mathbb{R}^{2}} |\mathbf{r}|^{n} \not p(\mathbf{r},s) d^{2}\mathbf{r} = \int\_{0}^{\infty} \int\_{0}^{2\pi} |\mathbf{r}|^{n} \not p(\mathbf{r},s) r dr d\varphi ds$$

that is,

$$\mathcal{L}\{\langle r^n(t)\rangle\} = 2\pi \int\_0^\infty r^{n+1} \mathfrak{p}(r,s) dr.$$

Substituting for Equation (90) and a simple change of variables yields

$$\mathcal{L}\{\langle r^n(t)\rangle\} = \frac{\upsilon^n}{s(s^{2n} + s^a/\tau)^{n/2}} \int\_0^\infty z^{n+1} K\_0(z) dz,$$

but [70]

$$\int\_0^\infty z^{n+1} K\_0(z) dz = 2^n \Gamma^2(1 + n/2),$$

so that

$$\mathcal{L}\{\langle r^n(t)\rangle\} = \frac{2^n \Gamma^2(1+n/2)v^n}{s(s^{2n}+s^n/\tau)^{n/2}}, \qquad (n = 1, 2, \dots). \tag{112}$$

Observe that this two-dimensional expression is equal to the one-dimensional moment (107) except for a mere numerical factor and, therefore, all two-dimensional expressions can be recovered from the one dimensional ones under the replacement *n*! → 2*n*Γ2(1 + *n*/2). Thus, in particular, we see from Equations (110) and (111) that

$$<\langle r^n(t)\rangle \sim t^{na} \qquad (t \to 0) \qquad \text{and} \qquad \qquad \langle r^n(t)\rangle \sim t^{na/2} \qquad (t \to \infty). \tag{113}$$

#### 6.3.3. Three Dimensions

In the three dimensional case we have

$$\mathcal{L}\{\langle|\mathbf{r}(t)|^{n}\rangle\} = \int\_{\mathbb{R}^{3}} |\mathbf{r}|^{n} \hat{\boldsymbol{\rho}}(\mathbf{r}, \mathbf{s}) d^{3}\mathbf{r} = \int\_{0}^{\infty} \int\_{0}^{\pi} \int\_{0}^{2\pi} |\mathbf{r}|^{n} \hat{\boldsymbol{\rho}}(\mathbf{r}, \mathbf{s}) r^{2} \sin\theta dr d\theta d\boldsymbol{\rho} \cdot \mathbf{r}$$

that is,

$$\mathcal{L}\{\langle r^n(t)\rangle\} = 4\pi \int\_0^\infty r^{n+2} \mathfrak{p}(r,s) dr.$$

Substituting for Equation (91) and elementary integration yields

$$\mathcal{L}\{\langle r^n(t)\rangle\} = \frac{(n+1)!v^n}{s(s^{2n} + s^n/\tau)^{n/2}}, \qquad (n = 1, 2, \dots), \tag{114}$$

which has the same structure as the one and two dimensional cases (cf. Equations (107) and (112)) and, as a consequence, the asymptotic expressions for moments will also be given by Equation (113).

#### **7. Concluding Remarks**

We have reviewed the main aspects of telegraphic transport processes which account for "diffusion with finite velocity" [8] and whose master equation is the telegrapher's equation instead of the diffusion equation. The main part of this report is a comprehensive account of our previous works [30–32], on the derivation, out of random walks models, of the telegrapher's equation (ordinary as well as fractional) in one, two and three dimensions.

We have mostly focussed on two and three dimensions because, for one hand, early attempts to derive higher dimensional TE's from random walk models had been fruitless and, on the other hand, higher dimensional models are usually more relevant for transport problems than any one-dimensional model. We thus present models that are two and three dimensional generalizations of the persistent random walk on the line. The models are based on multistate random walks with a continuous number of states representing the different directions the particle can take. We set the general integral equations for the probability density function of the particle evolution on the plane or in the space. When at every point all possible directions are independent and do not depend on the orientation and position (isotropy and homogeneity), the general equations can be exactly solved in Fourier-Laplace space. The isotropic and homogeneous models are suitable

for addressing the transport of particles experiencing elastic collisions with fixed centers randomly distributed such as photons moving in turbid media [2].

These continuous models constitute a microscopic description for transport in which we statistically count the (elastic) collisions of the particles. If we zoom out this microscopic description by implementing the fluid-limit approximation of large times and distances—and, thus, going to a mesoscopic description of the process—we end up with the telegrapher's equation as the master equation of the transport processes.

We have also generalized the telegrapher's equation to account for anomalous transport in several dimensions. To this end the isotropic and homogeneous random walk has been extended to allow for fractional behavior both in time and space variables. The dual character of the ordinary TE between wave and diffusion behaviors is also manifest in the space-time fractional TE where at small times this equation reduces to the fractional diffusion-wave equation while at long times it does to the anomalous diffusion equation.

The two different dynamics governing the fractional transport—one of them, ruling transport at small times, is given by fractional wave behavior, while at large time the dynamics is dictated by fractional diffusion behavior—are even more apparent for the time-fractional transport when only the time variable is fractional. In this case all moments of the distance to the initial position (the effective distance travelled by the particle) exist and have an analytical expression in terms of their Laplace transforms. For small and large times these moments are approximated by

$$
\langle r^n(t) \rangle \sim t^{n\alpha} \quad (t \to 0), \qquad \langle r^n(t) \rangle \sim t^{n\alpha/2} \quad (t \to \infty),
$$

(*n* = 1, 2, ... ). When 0 < *α* < 1/2 there is a transition from two different subdiffusive regimes, while if 1/2 < *α* < 1 the transition is from superdiffusion at small times to subdiffusion at large times. This fact generalizes the passage from ballistic motion to normal diffusion shown by the ordinary telegrapher's equation.

The exact solution for the characteristic function *p*˜(*ω*,*t*) of the fractional transport has also been obtained regardless the dimensionality of the process, and approximate expressions for wave and diffusion regimes are attained as well. These variety of expressions have been explored by Mainardi and collaborators [52,53,67,72] on solutions for fractional diffusion and fractional wave-diffusion equations (see also Mainardi's recent and useful survey appeared in this special issue [73]). Additionally, Orsingher and collaborators [74–78] have proposed several kinds of solutions to the fractional TE and explored their properties.

For the time-fractional transport we have been able to go one step further and obtain the exact form of the Laplace transform *p*ˆ(*r*,*s*) in one, two and three dimensions. From these expressions it is possible to get analytical forms of the PDF *p*(*r*, *t*) valid for sufficiently long times which, in one and three dimensions are written in terms of Wright functions (cf. Equations (92) and (103)), while in two dimensions by a logarithmic function (cf. Equation (99)). From these expressions we have obtained, as *t* → ∞, the asymptotic power laws

$$p(\mathbf{x},t) \sim t^{-a/2}, \qquad \text{(one dimension)}; \qquad p(\mathbf{x},t) \sim t^{-a}, \qquad \text{(three dimensions)};$$

and the logarithmic power law

$$p(\mathbf{x}, t) \sim t^{-\alpha} \ln t,\qquad(\text{two dimensions}).$$

Let us finish by recalling that a substantial part of this paper is a review of previous works but a significant part is, to my knowledge, new. This is the case of the higher dimensional extension of the characteristic function for the space-time fractional TE (cf. Section 5.4), as well as the whole Section 6 on higher dimensional time-fractional telegraphic processes.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** J.M. acknowledges partial financial supports from MINECO under Contract No. FIS2016-78904-C3-2-P and from AGAUR under Contract No. 2017SGR1064.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


55. Burshtein, I.; Zharikov, A.A.; Temkin, S.I. Response of a two-level system to a random modulation of the resonance with an arbitrary strong external field. *J. Phys. B* **1988**, *21*, 1907. [CrossRef]


## *Review* **Why the Mittag-Leffler Function Can Be Considered the Queen Function of the Fractional Calculus?**

#### **Francesco Mainardi**

Dipartimento di Fisica e Astronomia, Università di Bologna, Via Irnerio 46, I-40126 Bologna, Italy; francesco.mainardi@bo.infn.it Received: 14 November 2020; Accepted: 26 November 2020; Published: 30 November 2020

**Abstract:** In this survey we stress the importance of the higher transcendental Mittag-Leffler function in the framework of the Fractional Calculus. We first start with the analytical properties of the classical Mittag-Leffler function as derived from being the solution of the simplest fractional differential equation governing relaxation processes. Through the sections of the text we plan to address the reader in this pathway towards the main applications of the Mittag-Leffler function that has induced us in the past to define it as the *Queen Function of the Fractional Calculus*. These applications concern some noteworthy stochastic processes and the time fractional diffusion-wave equation We expect that in the future this function will gain more credit in the science of complex systems. Finally, in an appendix we sketch some historical aspects related to the author's acquaintance with this function.

**Keywords:** fractional calculus; Mittag-Leffler functions; Wright functions; fractional relaxation; diffusion-wave equation; Laplace and Fourier transform; fractional Poisson process complex systems

#### **1. Introduction**

For few decades, the special transcendental function known as the Mittag-Leffler function has attracted increasing attention of researchers because of its key role in treating problems related to integral and differential equations of fractional order.

His function was introduced in 1903–1905 by the Swedish mathematician Mittag-Leffler and at the beginning of the last century up to the 1990s, this function was seldom considered by mathematicians and applied scientists.

Before the 1990s, from a mathematical point of view, we recall the 1930 paper by Hille and Tamarkin [1] on the solutions of the Abel integral equation of the second kind, and the books by Davis [2], Sansone & Gerretsen [3], Dzherbashyan [4] (unfortunately in Russian), and finally Samko et al. [5]. Particular mention would be for the 1955 Handbook of High Transcendental Functions of the Bateman project [6], where this function was treated in Volume 3, in the chapter devoted to miscellaneous functions. For former applications we recall an interesting note by Davis [2] reporting previous research by Dr. Kenneth S. Cole in connection with nerve conduction, and the papers by Cole & Cole [7], Gross [8] and Caputo & Mainardi [9,10], where the Mittag-Leffler function was adopted to represent the responses in dielectric and viscoelastic media. More information are found in the Appendix of this survey.

In the 1960's the Mittag-Leffler function started to emerge from the realm of miscellaneous functions because it was considered as a special case of the general class of Fox *H* functions, that can exhibit an arbitrary number of parameters in their integral Mellin-Barnes representation. However, in our opinion, this classification in a too general framework has, to some extent, obscured the relevance and the applicability of this function in applied sciences. In fact, most mathematical models are based on a small number of parameters, say 1 or 2 or 3, so that a general theory may be confusing whereas the adoption of a generalized Mittag-Leffler function with 2 or 3 indices may be sufficient.

Nowadays it is well recognized that the Mittag-Leffler function plays a fundamental role in Fractional Calculus even if with a single parameter (as originally introduced by Mittag-Leffler) just to be worthy of being referred to as the *Queen Function of Fractional Calculus*, see Mainardi & Gorenflo [11]. We find some information on the Mittag-Leffler functions in any treatise on Fractional Calculus but for more details we refer the reader to the surveys of Haubold, Mathai and Saxena [12] and by Van Mieghem [13] and to the treatise by Gorenflo et al. [14], devoted just to Mittag-Leffler functions, related topics and applications.

The plan of this survey is the following. We start to give in Section 2 the main definitions and properties of the Mittag-Leffler function in one parameter with related Laplace transforms. Then in Section 3 we describe its use in the simplest fractional relaxation equation pointing out its compete monotonicity. The asymptotic properties are briefly discussed in Section 4. In Section 5 we briefly discuss the so called generalized Mittag-Leffler function, that is the 2-parameter Mittag-Leffler function. Of course further generalization to 3 and more parameter will be referred to specialized papers and books. Then in the following sections we discuss the application of the Mittag-Leffler function in some noteworthy stochastic processes. We start in Section 6 with the fractional Poisson process, and then in Section 7 with its application of the thinning of renewal processes. The main application are dealt in Section 8 where we discuss the continuous time random walks (CTRW) and then in Section 9 we point out the asymptotic universality. In Section 10 we discuss the time fractional diffusion-wave processes pointing out the role of the Mittag-Leffler functions in two parameters and their connection with the basic Wright functions. In Appendix A we find it worthwhile to report the acquaintance of the author with the Mittag-Leffler functions started in the late 1960s and continued up to nowadays.

We recall that Sections 3–10 are taken from several papers by the author, published alone and with colleagues and former students. Furthermore we have not considered other applications of the Mittag-Leffler functions including, for example, anomalous diffusion theory in terms of fractional and generalized Langevin equations. On this respect we refer the readers to the articles of the author, see References [15,16], and to the recent book by Sandev and Tomovski [17] and references therein. For many items related to the Mittag-Leffler functions we refer again to the treatise by Gorenflo et al. [14].

#### **2. The Mittag-Leffler Functions: Definitions and Laplace Transforms**

The Mittag-Leffler function is defined by the following power series, convergent in the whole complex plane,

$$E\_a(z) := \sum\_{n=0}^{\infty} \frac{z^n}{\Gamma(an+1)}, \quad a > 0, \quad z \in \mathbb{C} \,. \tag{1}$$

We recognize that it is an entire function of order 1/*α* providing a simple generalization of the exponential function exp(*z*) to which it reduces for *α* = 1. For detailed information on the Mittag-Leffler-type functions and their Laplace transforms the reader may consult e.g., [6,18,19] and the recent treatise by Gorenflo et al. [14].

We also note that for the convergence of the power series in (1) the parameter *α* may be complex provided that (*α*) > 0. The most interesting properties of the Mittag-Leffler function are associated with its asymptotic expansions as *z* → ∞ in various sectors of the complex plane.

In this paper we mainly consider the Mittag-Leffler function of order *α* ∈ (0, 1) on the negative real semi-axis where is known to be completely monotone (CM) due a classical result by Pollard [20], see also Feller [21].

Let us recall that a function *<sup>φ</sup>*(*t*) with *<sup>t</sup>* <sup>∈</sup> IR<sup>+</sup> is called a completely monotone (CM) function if it is non-negative, of class *<sup>C</sup>*∞, and (−1)*nφ*(*n*)(*t*) <sup>≥</sup> 0 for all *<sup>n</sup>* <sup>∈</sup> IN. Then a function *<sup>ψ</sup>*(*t*) with *<sup>t</sup>* <sup>∈</sup> IR<sup>+</sup> is called a Bernstein function if it is non-negative, of class *C*∞, with a CM first derivative. These functions play fundamental roles in linear hereditary mechanics to represent relaxation and creep processes, see, for example, Mainardi [22]. For mathematical details we refer the interested reader to the survey paper by Miller and Samko [23] and to the treatise by Schilling et al. [24].

In particular we are interested in the function

$$\varphi\_a(t) := E\_a(-t^a) = \sum\_{n=0}^{\infty} (-1)^n \frac{t^{an}}{\Gamma(an+1)}, \quad t > 0, \quad 0 < a \le 1,\tag{2}$$

whose Laplace transform pair reads

$$
\omega\_{\mathfrak{a}}(t) \doteq \frac{s^{\mathfrak{a}-1}}{s^{\mathfrak{a}}+1}, \quad \mathfrak{a} > 0 \,. \tag{3}
$$

Here we have used the notation ÷ to denote the juxtaposition of a function of time *f*(*t*) with its Laplace transform

$$
\widetilde{f}(s) = \int\_0^\infty \mathbf{e}^{-st} f(t) \, dt \dots
$$

The pair (3) can be proved by transforming term by term the power series representation of *eα*(*t*) in the R.H.S of (2). Similarly we can prove the following Laplace transform pair for its time derivative

$$e\_a'(t) = \frac{d}{dt} E\_a(-t^a) \ \vdots \ -\frac{1}{s^a + 1} \ \ \ \ a > 0 \ . \tag{4}$$

For this Laplace transform pair we can simply apply the usual rule for the Laplace transform for the first derivative of a function, that reads

$$\frac{d}{dt}f(t) \; :\; s\,\widetilde{f}(s) - f(0^+) \; .$$

#### **3. The Mittag-Leffler Function in Fractional Relaxation Processes**

For readers' convenience let us briefly outline the topic concerning the generalization via fractional calculus of the first-order differential equation governing the phenomenon of (exponential) relaxation. Recalling (in non-dimensional units) the initial value problem

$$\frac{du}{dt} = -u(t), \quad t \ge 0, \quad \text{with} \quad u(0^+) = 1 \tag{5}$$

whose solution is

$$u(t) = \exp(-t) \, , \tag{6}$$

the following two alternatives with *α* ∈ (0, 1) are offered in the literature:

$$u(a) \quad \frac{du}{dt} = -D\_t^{1-a} u(t), \quad t \ge 0, \quad \text{with} \quad u(0^+) = 1,\tag{7}$$

$$u(\boldsymbol{b}) \quad \, \_\*D\_t^a u(t) = -u(t) \,, \quad t \ge 0 \,, \quad u(0^+) = 1 \,\, \_\* \tag{8}$$

where *<sup>D</sup>*1−*<sup>α</sup> <sup>t</sup>* and <sup>∗</sup>*D<sup>α</sup> <sup>t</sup>* denote the fractional derivative of order 1 − *α* in the Riemann-Liouville sense and the fractional derivative of order *α* in the Caputo sense, respectively.

For a generic order *<sup>μ</sup>* <sup>∈</sup> (0, 1) and for a sufficiently well-behaved function *<sup>f</sup>*(*t*) with *<sup>t</sup>* <sup>∈</sup> IR<sup>+</sup> the above derivatives are defined as follows, see for example, Gorenflo and Mainardi [18], Podlubny [19],

$$f(a) \quad D\_t^{\mu} f(t) = \frac{1}{\Gamma(1-\mu)} \frac{d}{dt} \left[ \int\_0^t \frac{f(\tau)}{(t-\tau)^{\mu}} d\tau \right],\tag{9}$$

$$f(b) \quad \,\_\*D\_t^\mu f(t) = \frac{1}{\Gamma(1-\mu)} \int\_0^t \frac{f'(\tau)}{(t-\tau)^\mu} \,d\tau \,. \tag{10}$$

Between the two derivatives we have the relationship

$$D\_\* D\_t^\mu f(t) = D\_t^\mu f(t) - f(0^+) \frac{t^{-\mu}}{\Gamma(1-\mu)} = D\_t^\mu \left[ f(t) - f(0^+) \right].\tag{11}$$

Both derivatives in the limit *<sup>μ</sup>* <sup>→</sup> <sup>1</sup><sup>−</sup> reduce to the standard first derivative but for *<sup>μ</sup>* <sup>→</sup> <sup>0</sup><sup>+</sup> we have

$$D\_t^\mu f(t) \to f(t), \quad \, \_\*D\_t^\mu f(t) = f(t) - f(0^+), \quad \mu \to 0^+. \tag{12}$$

In analogy to the standard problem (5), we solve the problems (7) and (8) with the Laplace transform technique, using the rules pertinent to the corresponding fractional derivatives, that we recall hereafter for a generic order *μ* ∈ (0, 1),

$$f(a) \quad D\_t^{\mu} f(t) \doteq \operatorname{s}^{\mu} \tilde{f}(s) - \operatorname{g}(0^{+}), \quad \operatorname{g}(0^{+}) = \frac{1}{\Gamma(1-\mu)} \lim\_{t \to 0^{+}} \int\_{0}^{t} (t-\tau)^{-\mu} f(\tau) \,d\tau. \tag{13}$$

$$f(b) \quad \ast D\_t^{\mu} f(t) \ \div \ast^{\mu} \tilde{f}(s) - f(0^+) \,. \tag{14}$$

We note that it is generally more cumbersome to use the Laplace transform pair for the Riemann Liouville derivative (13) than for the Capute derivative (14). Indeed the rule (13) requires the initial value of the fractional integral of *f*(*t*) whereas the rule (14) simply requires the initial value of *f*(*t*). For this property the Caputo derivative is mostly used in physical problems where finite initial values are given.

Then we recognize that the problems (a) and (b) are equivalent since the Laplace transform of the solution in both cases comes out as

$$
\tilde{u}(\mathbf{s}) = \frac{\mathbf{s}^{\alpha - 1}}{\mathbf{s}^{\alpha} + \mathbf{1}} \,' \,. \tag{15}
$$

that yields, in virtue of the Laplace transform pair (3)

$$u(t) = \mathfrak{e}\_a(t) := E\_a(-t^a) \,. \tag{16}$$

We thus recognize that the Mittag-Leffler function provides the solution to the fractional relaxation equation, as outlined, for example, by Gorenflo and Mainardi [18], Mainardi and Gorenflo [11], and Mainardi [22].

Furthermore, by anti-transforming the R.H.S of (3) by using the complex Bromwich formula, and taking into account for 0 < *α* < 1 the contribution from branch cut on the negative real semi-axis (the denominator *<sup>s</sup><sup>α</sup>* <sup>+</sup> 1 does not vanish in the cut plane <sup>−</sup>*<sup>π</sup>* <sup>≤</sup> arg *<sup>s</sup>* <sup>≤</sup> *<sup>π</sup>*), we get, see the survey by Gorenflo and Mainardi [18],

$$\mathbf{e}\_{\mathfrak{A}}(t) = \int\_{0}^{\infty} \mathbf{e}^{-rt} \mathbf{K}\_{\mathfrak{A}}(r) \, dr \,\,\,\tag{17}$$

where

$$K\_{a}(r) = \mp \frac{1}{\pi} \operatorname{Im} \left\{ \frac{s^{a-1}}{s^{a} + 1} \Big|\_{s=r \, \text{er}^{\pm i\pi}} \right\} = \frac{1}{\pi} \frac{r^{a-1} \sin(a\pi)}{r^{2a} + 2 \, r^{a} \cos(a\pi) + 1} \ge 0. \tag{18}$$

We note that this formula was obtained as a simple exercise of complex analysis without being aware of the Titchmarsh formula for inversion of Laplace transforms [25], revised by Gross and Levi [26] and by Gross [27]. This formula is rarely outlined in books on Laplace transforms so we refer the reader for example to Apelblat's book [28] for its presence. Since *Kα*(*r*) is non-negative for all *r* in the integral, the above formula proves that *eα*(*t*) is a CM function in view of the Bernstein theorem. This theorem provides a necessary and sufficient condition for a CM function as a real Laplace transform of non-negative measure.

However, the CM property of *eα*(*t*) can also be seen as a consequence of the result by Pollard [20] because the transformation *x* = *t <sup>α</sup>* is a Bernstein function for *<sup>α</sup>* <sup>∈</sup> (0, 1). In fact it is known that a CM function can be obtained by composing a CM with a Bernstein function based on the following theorem: *Let φ*(*t*) *be a CM function and let ψ*(*t*) *be a Bernstein function, then φ*[*ψ*(*t*)] *is a CM function.*

As a matter of fact, *Kα*(*r*) provides an interesting spectral representation of *eα*(*t*) in frequencies. With the change of variable *τ* = 1/*r* we get the corresponding spectral representation in relaxation times, namely

$$\mathbf{e}\_{\mathbf{a}}(\mathbf{t}) = \int\_{0}^{\infty} \mathbf{e}^{-t/\tau} H\_{\mathbf{a}}(\mathbf{r}) \, d\mathbf{r} \,, \ H\_{\mathbf{a}}(\mathbf{r}) = \pi^{-2} K\_{\mathbf{a}}(1/\tau) \,, \tag{19}$$

that can be interpreted as a continuous distributions of elementary (i.e., exponential) relaxation processes. As a consequence we get the identity between the two spectral distributions, that is

$$K\_a(r) = H\_a(\tau) = \frac{1}{\pi} \frac{\tau^{a-1} \sin\left(a\pi\right)}{\tau^{2a} + 2\tau^a \cos\left(a\pi\right) + 1},\tag{20}$$

a surprising fact pointed out in Linear Viscoelasticity by the author in his book [22]. This kind of universal/scaling property seems a peculiar one for our Mittag-Leffler function *eα*(*t*).

In Figure 1, we show *Kα*(*r*) for some values of the parameter *α*. Of course for *α* = 1 the Mittag-Leffler function reduces to the exponential function exp(−*t*) and the corresponding spectral distribution is the Dirac delta generalized function centred at *r* = 1, namely *δ*(*r* − 1).

**Figure 1.** The spectral function *Kα*(*r*) for *α* = 0.25, 0.50, 0.75, 0.90 in the frequency range 0 ≤ *r* ≤ 2.

In Figure 2, we show some plots of *eα*(*t*) for some values of the parameter *α*. It is worthwhile to note the different rates of decay of *eα*(*t*) for small and large times. In fact the decay is very fast as *<sup>t</sup>* <sup>→</sup> <sup>0</sup><sup>+</sup> and very slow as *<sup>t</sup>* <sup>→</sup> <sup>+</sup>∞.

**Figure 2.** The Mittag-Leffler function *eα*(*t*) for *α* = 0.25, 0.50, 0.75, 0.90, 1. in the time range 0 ≤ *t* ≤ 15.

The Mittag-Leffler function turns out the basic function in relaxation processes of physical interest occurring in viscoelastic and dielectric materials. We refer the readers for viscoelasticity, that is, to the contribution of the author including References [22,29,30] whereas for dielectric materials to the survey by Garrappa et al. [31]. For the pioneers who have pointed out the role of the Mittagf-Leffler function in mechanical and dielectric relaxation processes we refer to the recent survey by Mainardi and Consiglio [32].

#### **4. Asymptotic Approximations to the Mittag-Lefler Function**

We now report the two common asymptotic approximations of our Mittag-Leffler function. Indeed, it is common to point out that the function *<sup>e</sup>α*(*t*) matches for *<sup>t</sup>* <sup>→</sup> <sup>0</sup><sup>+</sup> with a stretched exponential with an infinite negative derivative, whereas as *t* → ∞ with a negative power law. The short time approximation is derived from the convergent power series representation (2). In fact,

$$c\_a(t) = 1 - \frac{t^a}{\Gamma(1+a)} + \dots \sim \exp\left[-\frac{t^a}{\Gamma(1+a)}\right], \quad t \to 0 \,\tag{21}$$

The long time approximation is derived from the asymptotic power series representation of *eα*(*t*) that turns out to be, see [6]

$$
\varepsilon\_a(t) \sim \sum\_{n=1}^{\infty} (-1)^{n-1} \frac{t^{-an}}{\Gamma(1-an)}, \quad t \to \infty,\tag{22}
$$

so that, at the first order,

$$
\varepsilon\_a(t) \sim \frac{t^{-\alpha}}{\Gamma(1-\alpha)}, \quad t \to \infty. \tag{23}
$$

As a consequence the function *eα*(*t*) interpolates for intermediate time *t* between the stretched exponential and the negative power law. The stretched exponential models the very fast decay for small time *t*, whereas the asymptotic power law is due to the very slow decay for large time *t*. In fact, we have the two commonly stated asymptotic representations:

$$e\_a(t) \sim \begin{cases} e\_a^0(t) := \exp\left[-\frac{t^a}{\Gamma(1+a)}\right], & t \to 0; \\\\ e\_a^\infty(t) := \frac{t^{-a}}{\Gamma(1-a)} = \frac{\sin(a\pi)}{\pi} \frac{\Gamma(a)}{t^a}, & t \to \infty. \end{cases} \tag{24}$$

The stretched exponential replaces the rapidly decreasing expression 1 − *t <sup>α</sup>*/Γ(1 + *α*) from (21). Of course, *for sufficiently small and for sufficiently large values of t* we have the inequality

$$e\_a^0(t) \le e\_a^\infty(t), \quad 0 < a < 1. \tag{25}$$

In Figures 3 and 4, we compare for *α* = 0.25, 0.5, 0.75, 0.90 in logarithmic scales the function *eα*(*t*) (continuous line) and its asymptotic representations, the stretched exponential *e*<sup>0</sup> *<sup>α</sup>*(*t*) valid for *t* → 0 (dashed line) and the power law *e*<sup>∞</sup> *<sup>α</sup>* (*t*) valid for *t* → ∞ (dotted line). We have chosen the time range <sup>10</sup>−<sup>5</sup> <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> <sup>10</sup><sup>+</sup>5.

We note from Figures 3 and 4 that, whereas the plots of *e*<sup>0</sup> *<sup>α</sup>*(*t*) remain always under the corresponding ones of *eα*(*t*), the plots of *e*<sup>∞</sup> *<sup>α</sup>* (*t*) start above those of *eα*(*t*) but, at a certain point, an intersection may occur so changing the sign of the relative errors. The interested reader may consul the plots of the relative errors in the 2014 paper by the author [33] from which, in particular, Figures 1–4 have been extracted.

**Figure 3.** Approximations *e*<sup>0</sup> *<sup>α</sup>*(*t*) (dashed line) and *e*<sup>∞</sup> *<sup>α</sup>* (*t*) (dotted line) to *<sup>e</sup>α*(*t*) in 10−<sup>5</sup> <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> <sup>10</sup>+<sup>5</sup> for *α* = 0.25 (LEFT) and for *α* = 0/50 (RIGHT).

**Figure 4.** Approximations *e*<sup>0</sup> *<sup>α</sup>*(*t*) (dashed line) and *e*<sup>∞</sup> *<sup>α</sup>* (*t*) (dotted line) to *eα*(*t*) (LEFT) and the corresponding relative errors (RIGHT) in 10−<sup>5</sup> <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> <sup>10</sup>+<sup>5</sup> for *<sup>α</sup>* <sup>=</sup> 0/75 (LEFT) and for *<sup>α</sup>* <sup>=</sup> 0.90 (RIGHT).

#### **5. The Generalized Mittag-Leffler Function**

In this survey we will devote our attention mainly to the classical Mittag-Leffler function in one parameter *α* as introduced by Mittag-Leffler in 1903 and defined by the power series in (1). We have just learned from the instructive E-print by Van Mieghem [13] that the series (1) was discussed by Hadamard in 1893, that is 10 years earlier than Mittag-Leffler himself.

As a matter of fact a straightforward generalization of the classical Mittag-Leffler function is obtained by replacing the additive constant 1 in the argument of the Gamma function in (1) by an arbitrary complex parameter *β* . It was formerly considered in 1905 by Reference [34] and soon later by Mittag-leffler himself, almost incidentally in one of his notes. Later, in the 1950's, such generalization was investigated by Humbert and Agarwal, with respect to the Laplace transformation, see References [35–37]. Usually, when dealing with Laplace transform pairs, the parameter *β* is required to be real and positive like *α*.

For this function we agree to use the notation

$$E\_{a,\emptyset}(z) := \sum\_{n=0}^{\infty} \frac{z^n}{\Gamma(an+\beta)}, \quad \Re(a) > 0, \ \beta \in \mathbb{C}, \quad z \in \mathbb{C} \,. \tag{26}$$

Of course *Eα*,1(*z*) ≡ *Eα*(*z*). The series is still convergent for all the complex plane C so the function (26) is still entire for (*α*) > 0 for any *β* ∈ C with order 1/(*α*) so the additional parameter play any role on this respect. However the Laplace transform pairs concerning the Mittag-Leffler function (26) and its derivative are known to be with *α*, *β* > 0 and (*s*) > |*λ*| <sup>1</sup>*α*, see, for example, Refs. [14,19,22],

$$\left(t^{\theta -1}E\_{a,\theta}\left(-\lambda \, t^a\right)\right) \doteq \frac{s^{a-\beta}}{s^a + \lambda} = \frac{s^{-\beta}}{1 + \lambda s^{-a}}\,. \tag{27}$$

and

$$t^{ak+\beta-1}E\_{a,\beta}^{(k)}(\lambda t^a) \doteq \frac{k! \, s^{a-\beta}}{(s^a-\lambda)^{k+1}}, \quad k = 0, 1, 2, \ldots \tag{28}$$

We also note the following relation concerning the first derivative of the classical Mittag-Leffler function with the two-parameter Mittag-Leffler function usually overlooked by several authors but easily proved:

$$\phi\_a(t) := t^{-(1-a)} \operatorname{E}\_{a,a} \left( -t^a \right) = -\frac{d}{dt} \operatorname{E}\_{a} \left( -t^a \right), \quad t \ge 0, \quad 0 < a < 1. \tag{29}$$

We report the plot of the function *φα*(*t*) herewith in Figure 5.

**Figure 5.** Plots of *φα*(*t*) with *α* = 1/4, 1/2, 3/4, 1 versus *t*; for 0 ≤ *t* ≤ 5.

We note that Mittag-Leffler functions with more than two parameters were also dealt with by several authors as pointed out in [14]. In particular, for the 3-parameter Mittag-Leffler function (known as Prabhakar function) and related operators we refer the reader to the recent survey by Giusti et al. [38] and references therein. Kiryakova has dealt in a number of papers the multi-index Mittag-Leffler functions, see for example [39].

#### **6. The Fractional Poisson Process and the Mittag-Leffler Function**

Hereafter we describe how the Mittag-Leffler function enters into the so-called fractional Poisson process. We are following the original approach by Mainardi et al. in [40] where the fractional Poisson process is referred to as the renewal process of the Mittag-Leffler type. However, an independent approach to the fractional Poisson process was given for example, by Laskin in [41].

#### *6.1. Essentials of Renewal Theory*

The concept of *renewal process* has been developed as a stochastic model for describing the class of counting processes for which the times between successive events are independent identically distributed (*iid*) non-negative random variables, obeying a given probability law. These times are referred to as waiting times or inter-arrival times. For more details see, for example, the classical treatises by Cox [42], Feller [21].

For a renewal process having waiting times *T*1, *T*2, . . . , let

$$t\_0 = 0 \,, \quad t\_k = \sum\_{j=1}^k T\_{j\,\,\,\,'} \quad k \ge 1 \,\,. \tag{30}$$

That is *t*<sup>1</sup> = *T*<sup>1</sup> is the time of the first renewal, *t*<sup>2</sup> = *T*<sup>1</sup> + *T*<sup>2</sup> is the time of the second renewal and so on. In general *tk* denotes the *k*th renewal.

The process is specified if we know the probability law for the waiting times. In this respect we introduce the *probability density function* (*pdf*) *φ*(*t*) and the (cumulative) distribution function Φ(*t*) so defined:

$$\phi(t) := \frac{d}{dt}\Phi(t) \,, \quad \Phi(t) := P\left(T \le t\right) = \int\_0^t \phi(t') \, dt'. \tag{31}$$

When the non-negative random variable represents the lifetime of technical systems, it is common to refer to Φ(*t*) as to the *failure probability* and to

$$\Psi(t) := P\left(T > t\right) = \int\_{t}^{\infty} \phi(t') \, dt' = 1 - \Phi(t) \,. \tag{32}$$

as to the *survival probability*, because Φ(*t*) and Ψ(*t*) are the respective probabilities that the system does or does not fail in (0, *T*]. A relevant quantity is the *counting function N*(*t*) defined as

$$N(t) := \max\left\{ k|t\_k \le t, \ k = 0, 1, 2, \dots \right\},\tag{33}$$

that represents the effective number of events before or at instant *t*. In particular we have Ψ(*t*) = *P* (*N*(*t*) = 0) . Continuing in the general theory we set *F*1(*t*) = Φ(*t*), *f*1(*t*) = *φ*(*t*), and in general

$$F\_k(t) := P\left(t\_k = T\_1 + \dots + T\_k \le t\right), \ f\_k(t) = \frac{d}{dt} F\_k(t), \ k \ge 1,\tag{34}$$

thus *Fk*(*t*) represents the probability that the sum of the first *k* waiting times is less or equal *t* and *fk*(*t*) its density. Then, for any fixed *k* ≥ 1 the normalization condition for *Fk*(*t*) is fulfilled because

$$\lim\_{t \to \infty} F\_k(t) = P\left(t\_k = T\_1 + \dots + T\_k < \infty\right) = 1\,\text{.}\tag{35}$$

In fact, the sum of *k* random variables each of which is finite with probability 1 is finite with probability 1 itself. By setting for consistency *F*0(*t*) ≡ 1 and *f*0(*t*) = *δ*(*t*), where for the Dirac delta generalized function in IR<sup>+</sup> we assume the *formal representation*

$$\delta(t) := \frac{t^{-1}}{\Gamma(0)}, \quad t \ge 0,$$

we also note that for *k* ≥ 0 we have

$$P\left(N(t) = k\right) := P\left(t\_k \le t \text{ } t\_{k+1} > t\right) = \int\_0^t f\_k(t') \, \mathbb{1}\left(t - t'\right) \, dt'.\tag{36}$$

We now find it convenient to introduce the simplified ∗ notation for the Laplace convolution between two causal well-behaved (generalized) functions *f*(*t*) and *g*(*t*)

$$\int\_0^t f(t') \, g(t - t') \, dt' = \left( f \ast g \right)(t) = \left( g \ast f \right)(t) = \int\_0^t f(t - t') \, g(t') \, dt' \dots$$

Being *fk*(*t*) the *pdf* of the sum of the *k iid* random variables *T*1, ... , *Tk* with *pdf φ*(*t*), we easily recognize that *fk*(*t*) turns out to be the *k*-fold convolution of *φ*(*t*) with itself,

$$f\_k(t) = \left(\phi^{\*k}\right)(t) \,. \tag{37}$$

so Equation (36) simply reads:

$$P\left(N(t) = k\right) = \left(\boldsymbol{\phi}^{\*k} \, \* \, \mathbb{Y}\right)(t) \,. \tag{38}$$

Because of the presence of Laplace convolutions a renewal process is suited for the Laplace transform method. Throughout this paper we will denote by *f* (*s*) the Laplace transform of a sufficiently well-behaved (generalized) function *f*(*t*) according to

$$\mathcal{L}\left\{f(t);s\right\} = \tilde{f}(s) = \int\_0^{+\infty} \mathbf{e}^{-s\mathbf{f}} f(t) \,dt,\quad s > s\_{0,\sigma}$$

and for *δ*(*t*) consistently we will have *δ* (*s*) ≡ 1 . Note that for our purposes we agree to take *s* real. We recognize that (38) reads in the Laplace domain

$$\mathcal{L}\{P(N(t)=k);s\} = \left[\widetilde{\boldsymbol{\phi}}(s)\right]^k \,\,\widetilde{\boldsymbol{\Psi}}(s) \,,\tag{39}$$

where, using (32),

$$
\widetilde{\Psi}(\mathbf{s}) = \frac{1 - \widetilde{\mathfrak{G}}(\mathbf{s})}{\mathbf{s}}.\tag{40}
$$

#### *6.2. The Classical Poisson Process as a Renewal Process*

The most celebrated renewal process is the Poisson process characterized by a waiting time *pdf* of exponential type,

$$\boldsymbol{\phi}(t) = \lambda \,\mathrm{e}^{-\lambda t}, \quad \lambda > 0, \quad t \ge 0. \tag{41}$$

The process has *no memory*. Its moments turn out to be

$$
\langle T \rangle = \frac{1}{\lambda} \,' \quad \langle T^2 \rangle = \frac{1}{\lambda^2} \,' \quad \dots \,' \quad \langle T^n \rangle = \frac{1}{\lambda^n} \,' \quad \dots \,' \tag{42}
$$

and the *survival probability* is

$$\Psi(t) := P\left(T > t\right) = \mathbf{e}^{-\lambda t}, \quad t \ge 0. \tag{43}$$

We know that the probability that *k* events occur in the interval of length *t* is

$$P\left(N(t) = k\right) = \frac{(\lambda t)^k}{k!} \mathbf{e}^{-\lambda t}, \quad t \ge 0, \quad k = 0, 1, 2, \dots \tag{44}$$

The probability distribution related to the sum of *k iid* exponential random variables is known to be the so-called *Erlang distribution* (of order *k*). The corresponding density (the *Erlang pd f*) is thus

$$f\_k(t) = \lambda \frac{(\lambda t)^{k-1}}{(k-1)!} \mathbf{e}^{-\lambda t}, \quad t \ge 0, \quad k = 1, 2, \dots, \tag{45}$$

so that the Erlang distribution function of order *k* turns out to be

$$F\_k(t) = \int\_0^t f\_k(t') \, dt' = 1 - \sum\_{n=0}^{k-1} \frac{(\lambda t)^n}{n!} \mathbf{e}^{-\lambda t} = \sum\_{n=k}^{\infty} \frac{(\lambda t)^n}{n!} \mathbf{e}^{-\lambda t}, \quad t \ge 0. \tag{46}$$

In the limiting case *k* = 0 we recover *f*0(*t*) = *δ*(*t*), *F*0(*t*) ≡ 1, *t* ≥ 0.

The results (44)–(46) can easily obtained by using the technique of the Laplace transform sketched in the previous section noting that for the Poisson process we have:

$$
\tilde{\Phi}(s) = \frac{\lambda}{\lambda + s}, \quad \tilde{\Psi}(s) = \frac{1}{\lambda + s}, \tag{47}
$$

and for the Erlang distribution:

$$\widetilde{f}\_k(s) = [\widetilde{\phi}(s)]^k = \frac{\lambda^k}{(\lambda + s)^k}, \quad \widetilde{F}\_k(s) = \frac{[\widetilde{\phi}(s)]^k}{s} = \frac{\lambda^k}{s(\lambda + s)^k}.\tag{48}$$

We also recall that the survival probability for the Poisson renewal process obeys the ordinary differential equation (of relaxation type)

$$\frac{d}{dt}\Psi(t) = -\lambda\Psi(t), \quad t \ge 0; \quad \Psi(0^+) = 1. \tag{49}$$

#### *6.3. The Renewal Process of Mittag-Leffler Type*

A "fractional" generalization of the Poisson renewal process is simply obtained by generalizing the differential Equation (49) replacing there the first derivative with the integro-differential operator ∗*D<sup>β</sup> <sup>t</sup>* that is interpreted as the fractional derivative of order *β* in Caputo's sense, see Section 2. We write, taking for simplicity *λ* = 1,

$$\_\*D\_t^\beta \Psi(t) = -\Psi(t) \,, \quad t > 0 \,, \quad 0 < \beta \le 1 \,, \quad \Psi(0^+) = 1 \,. \tag{50}$$

We also allow the limiting case *β* = 1 where all the results of the previous section (with *λ* = 1) are expected to be recovered.

For our purpose we need to recall the Mittag-Leffler function as the natural "fractional" generalization of the exponential function, that characterizes the Poisson process. We again recall that the Mittag-Leffler function of parameter *β* is defined in the complex plane by the power series

$$E\_{\beta}(z) := \sum\_{n=0}^{\infty} \frac{z^n}{\Gamma(\beta n + 1)}, \quad \beta > 0, \quad z \in \mathbb{C} \tag{51}$$

as stated in Section 2 where the parameter was denoted by *α*.

The solution of Equation (50) is known to be, see Section 3

$$\Psi(t) = E\_{\emptyset}(-t^{\emptyset}), \quad t \ge 0, \quad 0 < \emptyset \le 1,\tag{52}$$

so

$$\phi(t) := -\frac{d}{dt}\Psi(t) = -\frac{d}{dt}E\_{\beta}(-t^{\beta}), \quad t \ge 0, \quad 0 < \beta \le 1. \tag{53}$$

Then, the corresponding Laplace transforms read

$$\Psi(\mathbf{s}) = \frac{\mathbf{s}^{\beta - 1}}{1 + \mathbf{s}^{\beta}}, \quad \tilde{\phi}(\mathbf{s}) = \frac{1}{1 + \mathbf{s}^{\beta}}, \quad 0 < \beta \le 1. \tag{54}$$

Hereafter, we find it convenient to summarize the most relevant features of the functions Ψ(*t*) and *φ*(*t*) when 0 < *β* < 1 . We begin to quote their series expansions convergent in all of IR suitable for *<sup>t</sup>* <sup>→</sup> <sup>0</sup><sup>+</sup> and their asymptotic representations for *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>,

$$\Psi(t) = \sum\_{n=0}^{\infty} (-1)^n \frac{t^{\otimes n}}{\Gamma(\beta n + 1)} \sim \frac{\sin \left(\beta \pi\right)}{\pi} \frac{\Gamma(\beta)}{t^{\beta}} \,, \tag{55}$$

and

$$\phi(t) = \frac{1}{t^{1-\beta}} \sum\_{n=0}^{\infty} (-1)^n \frac{t^{\beta n}}{\Gamma(\beta n + \beta)} \sim \frac{\sin \left(\beta \pi\right)}{\pi} \frac{\Gamma(\beta + 1)}{t^{\beta + 1}} \,. \tag{56}$$

In contrast to the Poissonian case *β* = 1, in the case 0 < *β* < 1 for large *t* the functions Ψ(*t*) and *φ*(*t*) no longer decay exponentially but algebraically. As a consequence of the power-law asymptotics the process turns out to be no longer Markovian but of long-memory type. However, we recognize that for 0 < *β* < 1 both functions Ψ(*t*), *φ*(*t*) keep the "completely monotonic" character of the Poissonian case. as can be simply derived from Section 2. We recall that *complete monotonicity* of our functions Ψ(*t*) and *φ*(*t*) means

$$(-1)^{n}\frac{d^{n}}{dt^{n}}\Psi(t)\geq 0,\quad(-1)^{n}\frac{d^{n}}{dt^{n}}\phi(t)\geq 0,\quad n=0,1,2,\ldots,\quad t\geq 0,\tag{57}$$

or equivalently, their representability as real Laplace transforms of non-negative generalized functions (or measures).

For the generalizations of Equations (44)–(46), characteristic of the Poisson and Erlang distributions respectively, we must point out the Laplace transform pair

$$t^{\beta \cdot k} E\_{\beta}^{(k)}(-t^{\beta}) \doteqneq \frac{k! \, s^{\beta - 1}}{(1 + s^{\beta})^{k + 1}}, \quad \beta > 0, \quad k = 0, 1, 2, \ldots, \tag{58}$$

with *E*(*k*) *<sup>β</sup>* (*z*) :<sup>=</sup> *<sup>d</sup><sup>k</sup> dz<sup>k</sup> <sup>E</sup>β*(*z*), that can be deduced from the book by Podlubny, see Equation (1.80) in Reference [19]. Then, by using the Laplace transform pairs (25) and Equations (52), (53), (58) in Equations (37) and (38), we have the *generalized Poisson distribution*,

$$P\left(N(t) = k\right) = \frac{t^{k\beta}}{k!} E\_{\beta}^{(k)}\left(-t^{\beta}\right), \quad k = 0, 1, 2, \dots \tag{59}$$

and the *generalized Erlang pd f* 's (of order *k* ≥ 1),

$$f\_k(t) = \beta \frac{t^{k\beta - 1}}{(k - 1)!} E\_{\beta}^{(k)}(-t^{\beta}) \,. \tag{60}$$

The *generalized Erlang distribution functions* turn out to be

$$F\_k(t) = \int\_0^t f\_k(t') \, dt' = 1 - \sum\_{n=0}^{k-1} \frac{t^{n\notin}}{n!} E\_{\beta}^{(n)}(-t^{\notin}) = \sum\_{n=k}^{\infty} \frac{t^{n\notin}}{n!} E\_{\beta}^{(n)}(-t^{\notin}) \,. \tag{61}$$

#### **7. The Gnedenko-Kovalenko Theory of Thinning and the Mittag-Leffler Function**

The *thinning* theory for a renewal process has been considered in detail by Gnedenko and Kovalenko [43] in the first edition of their book on Queue theory of 1968. However, the connection with the Laplace transform of the Mittag-Leffler function outlined at the end of this section in Equations (71) and (72), see also [44] and [45], is surprisingly not present in the second edition of the book by Gnedenko & Kovalenko in 1989.

We must note that other authors, like Szántai [46,47] speak of *rarefaction* in place of thinning.

Let us sketch here the essentials of this theory: in the interest of transparency and readability we avoid the possible decoration of the relevant power law by multiplying it with a *slowly varying function*.

Denoting by *tn*, *n* = 1, 2, 3, ... the time instants of events of a renewal process, assuming 0 = *<sup>t</sup>*<sup>0</sup> < *<sup>t</sup>*<sup>1</sup> < *<sup>t</sup>*<sup>2</sup> < *<sup>t</sup>*<sup>3</sup> < ... , with *<sup>i</sup>*.*i*.*d*. waiting times *<sup>T</sup>*<sup>1</sup> = *<sup>t</sup>*<sup>1</sup> , *Tk* = *tk* − *tk*−<sup>1</sup> for *<sup>k</sup>* ≥ 2, (generically denoted by T), *thinning* (or *rarefaction*) means that for each positive index *k* a decision is made: the event happening in the instant *tk* is deleted with probability *p* or it is maintained with probability

*q* = 1 − *p*, 0 < *q* < 1. This procedure produces a *thinned* or *rarefied* renewal process with fewer events (very few events if *q* is near zero, the case of particular interest) in a moderate span of time.

To compensate for this loss we change the unit of time so that we still have not very few but still a moderate number of events in a moderate span of time. Such change of the unit of time is equivalent to rescaling the waiting time, multiplying it with a positive factor *τ* so that we have waiting times *τT*1, *τT*2, *τT*3, ... , and instants *τt*1, *τt*2, *τt*3, ... , in the rescaled process. Our intention is, vaguely speaking, to dispose on *τ* in relation to the rarefaction parameter *q* in such a way that for *q* near zero in some sense the "average" number of events per unit of time remains unchanged. In an asymptotic sense we will make these considerations precise.

Denoting by *F*(*t*) = *P*(*T* ≤ *t*) the probability distribution function of the (original) waiting time *T*, by *f*(*t*) its density (*f*(*t*) is a generalized function generating a probability measure) so that *F*(*t*) = *<sup>t</sup>* <sup>0</sup> *f*(*t* ) *dt* , and analogously by *Fk*(*t*) and *fk*(t) the distribution and density, respectively, of the sum of *k* waiting times, we have recursively

$$f\_k(t) = \int\_0^t f\_{k-1}(t - t') \, dF(t'), \text{ for } k \ge 2 \,. \tag{62}$$

Observing that after a maintained event the next one of the original process is kept with probability *q* but dropped in favour of the second-next with probability *p q* and, generally, *n* − 1 events are dropped in favour of the *n*-th-next with probability *pn*−<sup>1</sup> *q*, we get for the waiting time density of the thinned process the formula

$$g\_{\mathcal{J}}(t) = \sum\_{n=1}^{\infty} q \, p^{n-1} \, f\_n(t) \,. \tag{63}$$

With the modified waiting time *τ T* we have

$$P(\mathfrak{r}T \le t) = P(T \le t/\tau) = F(t/\tau) \,.$$

hence the density *f*(*t*/*τ*)/*τ*, and analogously for the density of the sum of *n* waiting times *fn*(*t*/*τ*)/*τ*. The density of the waiting time of the rescaled (and thinned) process now turns out as

$$\log\_{\mathbb{M}} \pi(t) = \sum\_{n=1}^{\infty} q \, p^{n-1} \, f\_n(t/\tau) / \tau \,. \tag{64}$$

In the Laplace domain we have *f <sup>n</sup>*(*s*) = *f* (*s*) *n* , hence (using *p* = 1 − *q*)

$$\widetilde{g}\_q(s) = \sum\_{n=1}^{\infty} q \, p^{n-1} \left( \widetilde{f}(s) \right)^n = \frac{q \, \widetilde{f}(s)}{1 - (1 - q) \, \widetilde{f}(s)} \, \tag{65}$$

from which by Laplace inversion we can, in principle, construct the waiting time density of the thinned process. By rescaling we get

$$\widetilde{g}\_{q,\tau}(\mathbf{s}) = \sum\_{n=1}^{\infty} q \, p^{n-1} \left( \widetilde{f}(\tau \mathbf{s}) \right)^{n} = \frac{q \, \dot{f}(\tau \mathbf{s})}{1 - (1 - q) \, \widetilde{f}(\tau \mathbf{s})} \,. \tag{66}$$

Being interested in stronger and stronger thinning (*infinite thinning*) let us now consider a scale of processes with the parameters *τ* (of *rescaling*) and *q* (of *thinning*), with *q* tending to zero *under a scaling relation q* = *q*(*τ*) *yet to be specified*.

We have essentially two cases for the waiting time distribution: its expectation value is finite or infinite. In the first case we put

$$
\lambda = \int\_0^\infty t' f(t') \, dt' < \infty \,. \tag{67}
$$

In the second case we assume a queue of power law type (dispensing with a possible decoration by a function slowly varying at infinity)

$$\Psi(t) := \int\_{t}^{\infty} f(t') \, dt' \sim \frac{c}{\beta} t^{-\beta}, \; t \to \infty \quad \text{if} \quad 0 < \beta < 1. \tag{68}$$

Then, by the Karamata theory (see References [21,48]) the above conditions mean in the Laplace domain

$$
\widetilde{f}(s) = 1 - \lambda \, s^{\beta} + o\left(s^{\beta}\right), \quad \text{for} \quad s \to 0^{+}, \tag{69}
$$

with a positive coefficient *λ* and 0 < *β* ≤ 1. The case *β* = 1 obviously corresponds to the situation with finite first moment (2.6a), whereas the case 0 < *β* < 1 is related to a power law queue with *c* = *λ* Γ(*β* + 1) sin(*βπ*)/*π* .

Now, passing to the limit of *q* → 0 of infinite thinning under the scaling relation

$$
\eta = \lambda \,\,\tau^{\beta}, \quad 0 < \beta \le 1,\tag{70}
$$

between the positive parameters *<sup>q</sup>* and *<sup>τ</sup>*, the Laplace transform of the rescaled density *<sup>g</sup>*3*q*,*τ*(*s*) in (66) of the thinned process tends for fixed *s* to

$$
\widetilde{g}(\mathbf{s}) = \frac{1}{1 + \mathbf{s}^{\widetilde{\boldsymbol{\theta}}}} \,\tag{71}
$$

which corresponds to the Mittag-Leffler density

$$\log(t) = -\frac{d}{dt}E\_{\mathbb{R}}(-t^{\mathbb{R}}) = \phi^{\text{ML}}(t) \,. \tag{72}$$

Let us remark that Gnedenko and Kovalenko obtained (71) as the Laplace transform of the limiting density but did not identify it as the Laplace transform of a Mittag-Leffler type function. Observe that in the special case *λ* < ∞ we have *β* = 1, hence as the limiting process the Poisson process, as formerly shown in 1956 by Rényi [49].

#### **8. The Continuous Time Random Walk (CTRW) and the Mittag-Leffler Function**

The name *continuous time random walk* (CTRW) became popular in physics after Montroll and Weiss (just to cite the pioneers) published a celebrated series of papers on random walks for modelling diffusion processes on lattices, see, for example, Reference [50], and the book by Weiss [51] with references therein. CTRWs are rather good and general phenomenological models for diffusion, including processes of anomalous transport, that can be understood in the framework of the classical renewal theory. In fact a CTRW can be considered as a compound renewal process (a simple renewal process with reward) or a random walk *subordinated* to a simple renewal process. Hereafter we will mainly follow the approach by Gorenflo & Mainardi, see, for example, Reference [52].

A spatially one-dimensional CTRW is generated by a sequence of independent identically distributed (*iid*) positive random waiting times *T*1, *T*2, *T*3, ... , each having the same probability density function *φ*(*t*), *t* > 0 , and a sequence of *iid* random jumps *X*1, *X*2, *X*3, ... , in IR , each having the same probability density *w*(*x*), *x* ∈ IR .

Let us remark that, for ease of language, we use the word density also for generalized functions in the sense of Gel'fand & Shilov [53], that can be interpreted as probability measures. Usually the *probability density functions* are abbreviated by *pdf* . We recall that *<sup>φ</sup>*(*t*) <sup>≥</sup> 0 with <sup>∞</sup> <sup>0</sup> *φ*(*t*) *dt* = 1 and *<sup>w</sup>*(*x*) <sup>≥</sup> 0 with <sup>+</sup><sup>∞</sup> <sup>−</sup><sup>∞</sup> *<sup>w</sup>*(*x*) *dx* <sup>=</sup> 1.

Setting *t*<sup>0</sup> = 0 , *tn* = *T*<sup>1</sup> + *T*<sup>2</sup> + ... *Tn* for *n* ∈ IN , the wandering particle makes a jump of length *Xn* in instant *tn*, so that its position is *x*<sup>0</sup> = 0 for 0 ≤ *t* < *T*<sup>1</sup> = *t*<sup>1</sup> , and *xn* = *X*<sup>1</sup> + *X*<sup>2</sup> + ... *Xn* , for *tn* ≤ *t* < *tn*+<sup>1</sup> . We require the distribution of the waiting times and that of the jumps to be independent of each other. So, we have a compound renewal process (a renewal process with reward), compare Reference [42].

By natural probabilistic arguments we arrive at the *integral equation* for the probability density *p*(*x*, *t*) (a density with respect to the variable *x*) of the particle being in point *x* at instant *t* ,

$$p(\mathbf{x},t) = \delta(\mathbf{x})\,\Psi(t) + \int\_0^t \phi(t-t') \left[ \int\_{-\infty}^{+\infty} w(\mathbf{x}-\mathbf{x}') \, p(\mathbf{x}',t') \, d\mathbf{x}' \right] \, dt',\tag{73}$$

in which *δ*(*x*) denotes the Dirac generalized function, and the *survival function*

$$
\Psi(t) = \int\_t^\infty \phi(t') \, dt'\tag{74}
$$

denotes the probability that at instant *t* the particle is still sitting in its starting position *x* = 0 . Clearly, Equation (73) satisfies the initial condition *p*(*x*, 0+) = *δ*(*x*).

Note that the *special choice*

$$
\delta w(\mathbf{x}) = \delta(\mathbf{x} - \mathbf{1})\tag{75}
$$

gives the *pure renewal process*, with position *x*(*t*) = *N*(*t*), denoting the *counting function*, and with jumps all of length 1 in positive direction happening at the renewal instants.

For many purposes the integral Equation (73) of CTRW can be easily treated by using the Laplace and Fourier transforms. Writing these as

$$\mathcal{L}\left\{f(t);s\right\} = \tilde{f}(s) := \int\_0^\infty \mathbf{e}^{-st} \, f(t) \, dt,\quad \mathcal{F}\left\{g(\mathbf{x}); \mathbf{x}\right\} = \hat{g}(\mathbf{x}) := \int\_{-\infty}^{+\infty} \mathbf{e}^{+i\mathbf{x}\cdot\mathbf{x}} \, g(\mathbf{x}) \, d\mathbf{x},\quad \mathcal{F}\left\{g(\mathbf{x}); \mathbf{x}\right\} = \int\_{-\infty}^{+\infty} \mathbf{e}^{+i\mathbf{x}\cdot\mathbf{x}} \, g(\mathbf{x}) \, d\mathbf{x}$$

then in the Laplace-Fourier domain Equation (73) reads

$$
\hat{\vec{p}}(\mathbf{x}, \mathbf{s}) = \frac{1 - \hat{\phi}(\mathbf{s})}{\mathbf{s}} + \hat{\phi}(\mathbf{s}) \, i \hat{\mathbf{y}}(\mathbf{x}) \, \hat{\vec{p}}(\mathbf{x}, \mathbf{s}) \,. \tag{76}
$$

Introducing formally in the Laplace domain the auxiliary function

$$\tilde{H}(s) = \frac{1 - \tilde{\phi}(s)}{s\,\tilde{\phi}(s)} = \frac{\tilde{\Psi}(s)}{\tilde{\phi}(s)}, \quad \text{hence} \quad \tilde{\phi}(s) = \frac{1}{1 + s\tilde{H}(s)},\tag{77}$$

and assuming that its Laplace inverse *H*(*t*) exists, we get, following Mainardi et al. [54], in the Laplace-Fourier domain the equation

$$\left[\tilde{H}(\mathbf{s})\left[s\hat{\vec{p}}(\mathbf{x},\mathbf{s})-1\right]\right] = \left[\hat{w}(\mathbf{x})-1\right]\hat{\vec{p}}(\mathbf{x},\mathbf{s})\,,\tag{78}$$

and in the space-time domain the generalized Kolmogorov-Feller equation

$$\int\_{0}^{t} H(t - t') \, \frac{\partial}{\partial t'} p(\mathbf{x}, t') \, dt' = -p(\mathbf{x}, t) + \int\_{-\infty}^{+\infty} w(\mathbf{x} - \mathbf{x}') \, p(\mathbf{x}', t) \, d\mathbf{x}',\tag{79}$$

with *p*(*x*, 0) = *δ*(*x*), where *H*(*t*) acts as a *memory function*.

If the Laplace inverse *H*(*t*) of the formally introduced function *H*(*s*) does not exist, we can formally set *K*(*s*) = 1/*H*(*s*) and multiply (78) with *K*(*s*). Then, if *K*(*t*) exists, we get in place of (79) the alternative form of the generalized Kolmogorov-Feller equation

$$\frac{\partial}{\partial t}p(\mathbf{x},t) \ = \int\_0^t \mathbf{K}(t-t') \left[ -p(\mathbf{x},t') + \int\_{-\infty}^{+\infty} w(\mathbf{x}-\mathbf{x}') \ p(\mathbf{x}',t') \,d\mathbf{x}' \right] \,dt' \,\tag{80}$$

with *p*(*x*, 0) = *δ*(*x*) and *K*(*t*) acts as a *memory function*

Special choices of the memory function *H*(*t*) are (**i**) and (**ii**), see Equations (81) and (85):

$$H(\mathbf{i}) \quad H(t) = \delta(t) \quad \text{corresponding to} \quad H(\mathbf{s}) = 1,\tag{81}$$

giving the *exponential waiting time* with

$$
\tilde{\phi}(s) = \frac{1}{1+s'} \quad \phi(t) = \Psi(t) = \mathbf{e}^{-t}. \tag{82}
$$

In this case we obtain in the Fourier-Laplace domain

$$s\ddot{\hat{p}}(\kappa,s) - 1 = \left[i\hat{w}(\kappa) - 1\right]\dot{\hat{p}}(\kappa,s) \,, \tag{83}$$

and in the space-time domain the *classical Kolmogorov-Feller equation*

$$\frac{\partial}{\partial t}p(\mathbf{x},t) = -p(\mathbf{x},t) + \int\_{-\infty}^{+\infty} w(\mathbf{x}-\mathbf{x}') \, p(\mathbf{x}',t) \, d\mathbf{x}', \quad p(\mathbf{x},0) = \delta(\mathbf{x}) \,. \tag{84}$$

$$H(\mathbf{ii}) \quad H(t) = \frac{t^{-\beta}}{\Gamma(1-\beta)}, \ 0 < \beta < 1, \text{ corresponding to } \tilde{H}(s) = s^{\beta-1}, \tag{85}$$

giving the *Mittag-Leffler waiting time* with

$$\tilde{\phi}(s) = \frac{1}{1 + s^{\beta}}, \quad \phi(t) = -\frac{d}{dt} E\_{\tilde{\mathbb{P}}}(-t^{\beta}) = \phi^{ML}(t), \quad \mathbb{Y}(t) = \mathbb{E}\_{\tilde{\mathbb{P}}}(-t^{\beta}) \,. \tag{86}$$

In this case we obtain in the Fourier-Laplace domain

$$s^{\mathcal{J}-1} \left[ s \widehat{p}(\kappa, s) - 1 \right] = \left[ \widehat{w}(\kappa) - 1 \right] \widehat{p}(\kappa, s) \,, \tag{87}$$

and in the space-time domain the *time fractional Kolmogorov-Feller equation*

$$\partial\_{\ast} D\_t^{\delta} p(\mathbf{x}, t) = -p(\mathbf{x}, t) + \int\_{-\infty}^{+\infty} w(\mathbf{x} - \mathbf{x}') \, p(\mathbf{x}', t) \, d\mathbf{x}', \quad p(\mathbf{x}, 0^+) = \delta(\mathbf{x}) \,. \tag{88}$$

where <sup>∗</sup>*D<sup>β</sup> <sup>t</sup>* denotes the fractional derivative of of order *β* in the Caputo sense, see Section 3.

The time fractional Kolmogorov-Feller equation can be also expressed via the Riemann-Liouville fractional derivative *D*1−*<sup>β</sup> <sup>t</sup>* , see again Section 3, that is

$$\frac{\partial}{\partial t}p(\mathbf{x},t) = D\_t^{1-\beta} \left[ -p(\mathbf{x},t) + \int\_{-\infty}^{+\infty} w(\mathbf{x}-\mathbf{x}') \, p(\mathbf{x}',t) \, d\mathbf{x}' \right],\tag{89}$$

with *p*(*x*, 0+) = *δ*(*x*). The equivalence of the two forms (88) and (89) is easily proved in the Fourier-Laplace domain by multiplying both sides of Equation (87) with the factor *s*1−*β*.

We note that the choice (**i**) may be considered as a limit of the choice (**ii**) as *β* = 1. In fact, in this limit we find *H*(*s*) ≡ 1 so *H*(*t*) = *t* <sup>−</sup>1/Γ(0) <sup>≡</sup> *<sup>δ</sup>*(*t*) so that Equations (78)–(79) reduce to Equations (83)–(84), respectively. In this case the order of the Caputo derivative reduces to 1 and that of the R-L derivative to 0, whereas the Mittag-Leffler waiting time law reduces to the exponential.

In the sequel we will formally unite the choices (**i**) and (**ii**) by defining what we call the Mittag-Leffler memory function

$$H^{ML}(t) = \begin{cases} \frac{t^{-\beta}}{\Gamma(1-\beta)}, & \text{if} \quad 0 < \beta < 1, \\\ \delta(t), & \text{if} \quad \beta = 1, \end{cases} \tag{90}$$

whose Laplace transform is

$$
\tilde{H}^{ML}(\mathbf{s}) = \mathbf{s}^{\beta - 1}, \quad 0 < \beta \le 1. \tag{91}
$$

Thus we will consider the whole range 0 < *β* ≤ 1 by extending the Mittag-Leffler waiting time law in (86) to include the exponential law (82).

**Remark 1.** *Equation (79) clearly may be supplemented by an arbitrary initial probability density p*(*x*, 0) = *f*(*x*)*. The corresponding replacement of δ*(*x*) *by f*(*x*) *in (73) then requires in (76) multiplication of the term* (1 − *φ*(*s*))/*s by f* (*κ*) *and in (78) replacement of the LHS by H*(*s*) *<sup>s</sup> <sup>p</sup>*(*κ*,*s*) <sup>−</sup> *<sup>f</sup>* (*κ*) *. With p*(*x*, 0) = *δ*(*x*) *we obtain p*(*x*, *t*) *the fundamental solution of Equation (79).*

**Note:** The probability density function for the waiting time distribution in terms of the Mittag-Leffler function was formerly given since 1995 by Hilfer [55–57]. In these papers the waiting time density was given with the Mittag-Leffler function in two parameters without noting the relation with the first derivative of the classical Mittag-Leffler function as stated in Equation (29). We also note that 10 years earlier Balakrishnan [58] had derived a similar expression without recognizing the Mittag-Leffler function. Like in the case of the thinning process dealt by Gnedenko-Kowalenko (see Section 7) once again the Mitag-Leffler function was unknown to the authors.

#### *Manipulations: Rescaling and Respeeding*

We now consider two types of manipulations on the CTRW by acting on its governing Equation (79) in its Laplace-Fourier representation (78).

(**A**): rescaling the waiting time, hence the whole time axis;

(**B**): respeeding the process.

(**A**) means change of the unit of time (measurement). We replace the random waiting time *T* by a waiting time *τT*, with the positive *rescaling factor τ*. Our idea is to take 0 < *τ* 1 in order to bring into near sight the distant future so that in a moderate span of time we will have a large number of jump events. For *τ* > 0 we get the rescaled waiting time density

$$
\widetilde{\boldsymbol{\phi}}\_{\tau}(\mathbf{s}) = \widetilde{\boldsymbol{\phi}}(\tau \mathbf{s}) \,. \tag{92}
$$

By decorating also the density *p* with an index *τ* we obtain the rescaled integral equation of the CTRW in the Laplace-Fourier domain as

$$
\hat{H}\_{\tau}(\mathbf{s}) \left[ s \hat{\vec{p}}\_{\tau}(\kappa, \mathbf{s}) - 1 \right] = \left[ \hat{w}(\kappa) - 1 \right] \hat{\vec{p}}\_{\tau}(\kappa, \mathbf{s}) \,, \tag{93}
$$

where, in analogy to (77),

$$
\hat{H}\_{\mathbf{r}}(s) = \frac{1 - \breve{\phi}(\tau s)}{s\breve{\phi}(\tau s)}.\tag{94}
$$

(**B**) means multiplying the quantity representing *<sup>∂</sup> ∂t p*(*x*, *t*) by a factor 1/*a*, where *a* > 0 is the *respeeding factor*: *a* > 1 means *acceleration*, 0 < *a* < 1 means *deceleration*. In the Laplace-Fourier representation this means multiplying the RHS of Equation (78) by the factor *<sup>a</sup>* since the expression  *<sup>s</sup> <sup>p</sup>*(*κ*,*s*) <sup>−</sup> <sup>1</sup> corresponds to *<sup>∂</sup> ∂t p*(*x*, *t*).

We now chose to consider the procedures of rescaling and respeeding in their combination so that the equation in the transformed domain of the rescaled and respeeded process has the form

$$
\left[\hat{H}\_{\mathbf{r}}(\mathbf{s})\left[s\hat{\vec{p}}\_{\mathbf{r},a}(\mathbf{x},\mathbf{s})-1\right] = a\left[\hat{w}(\mathbf{x})-1\right]\hat{\vec{p}}\_{\mathbf{r},a}(\mathbf{x},\mathbf{s})\right],\tag{95}
$$

Clearly, the two manipulations can be discussed separately: the choice {*τ* > 0, *a* = 1} means *pure rescaling*, the choice {*τ* = 1, *a* > 0} means *pure respeeding* of the original process. In the special case *τ* = 1 we only respeed the original system; if 0 < *τ* 1 we can counteract the compression effected by rescaling to again obtain a moderate number of events in a moderate span of time by respeeding (decelerating) with 0 < *a* 1. These vague notions will become clear as soon as we consider power law waiting times.

Defining

$$
\hat{H}\_{\tau,a}(s) := \frac{H\_{\tau}(s)}{a} = \frac{1 - \dot{\phi}(\tau s)}{as\,\hat{\phi}(\tau s)}.\tag{96}
$$

we finally get, in analogy to (78), the equation

$$\hat{H}\_{\mathbf{r},a}(\mathbf{s})\left[\hat{s}\hat{\vec{p}}\_{\mathbf{r},a}(\mathbf{x},\mathbf{s})-1\right] = \left[\hat{w}(\mathbf{x})-1\right]\hat{\vec{p}}\_{\mathbf{r},a}(\mathbf{x},\mathbf{s})\,. \tag{97}$$

What is the combined effect of rescaling and respeeding on the waiting time density? In analogy to (77) and taking account of (96) we find

$$\tilde{\phi}\_{\mathsf{r},\mathsf{a}}(\mathsf{s}) = \frac{1}{1 + s\hat{H}\_{\mathsf{r},\mathsf{a}}(\mathsf{s})} = \frac{1}{1 + s\frac{1 - \tilde{\phi}(\mathsf{r}\mathsf{s})}{as\tilde{\phi}(\mathsf{r}\mathsf{s})}}\,\mathsf{}\tag{98}$$

and so, for the deformation of the waiting time density, the *essential formula*

$$\frac{a\,\widetilde{\phi}(\tau s)}{1 - (1 - a)\widetilde{\phi}(\tau s)}\,\,\,\,\,\tag{99}$$

**Remark 2.** *The formula (99) has the same structure as the thinning formula (66) in Section 5 (just devoted to the thinning theory) by identification of a with q. In both problems we have a rescaled process defined by a time scale τ, and we send the relevant factors τ, a and q to zero under a proper relationship. However in the thinning theory the relevant independent parameter going to 0 is that of thinning (actually respeeding) whereas in the present problem it is the rescaling parameter τ.*

#### **9. Power Laws and Asymptotic Universality of the Mittag-Leffler Waiting Time Density**

We have essentially two different situations for the waiting time distribution according to its first moment (the expectation value) being finite or infinite. In other words we assume for the waiting time *pdf φ*(*t*) either

$$\rho := \int\_0^\infty t' \,\phi(t') \, dt' < \infty, \quad \text{labelled as } \beta = 1,\tag{100}$$

or

$$
\phi(t) \sim \varepsilon \, t^{-(\beta+1)} \text{ for } t \to \infty \quad \text{hence} \quad \mathbb{Y}(t) \sim \frac{\varepsilon}{\beta} \, t^{-\beta}, \, 0 < \beta < 1, \, \varepsilon > 0. \tag{101}
$$

For convenience we have dispensed in (101) with decorating by a slowly varying function at infinity with an asymptotic power law. Then, by the standard Tauberian theory (see References [21,48]) the above conditions (100)–(101) mean in the Laplace domain the (comprehensive) asymptotic form

$$
\widetilde{\phi}(s) = 1 - \lambda s^{\beta} + o(s^{\beta}) \quad \text{for} \quad s \to 0^{+}, \quad 0 < \beta \le 1,\tag{102}
$$

where we have

$$
\lambda = \rho, \quad \text{if} \quad \beta = 1;\\
\lambda = c\Gamma(-\beta) = \frac{c}{\Gamma(\beta + 1)} \frac{\pi}{\sin(\beta \pi)}, \text{ if } 0 < \beta < 1. \tag{103}
$$

*Entropy* **2020**, *22*, 1359

Then, *fixing s* as required by the continuity theorem of probability theory for Laplace transforms, taking

$$a = \lambda \,\tau^{\beta}\,,\tag{104}$$

and *sending τ to zero*, we obtain in the limit the Mittag-Leffler waiting time law. In fact, Equations (99) and (102) imply as *τ* → 0 with 0 < *β* ≤ 1,

$$\widetilde{\phi}\_{\tau\_{\rho}\lambda\tau^{\beta}}(s) = \frac{\lambda\tau^{\beta}\left[1 - \lambda\tau^{\beta}\mathbf{s}^{\beta} + o(\tau^{\beta}\mathbf{s}^{\beta})\right]}{1 - (1 - \lambda\tau^{\beta})\left[1 - \lambda\tau^{\beta}\mathbf{s}^{\beta} + o(\tau^{\beta}\mathbf{s}^{\beta})\right]} \to \frac{1}{1 + \mathbf{s}^{\beta}}\tag{105}$$

the Laplace transform of *φML*(*t*). This formula expresses **the asymptotic universality of the Mittag-Leffler waiting time law** that includes the exponential law for *β* = 1. It can easily be generalized to the case of power laws decorated with slowly varying functions, thereby using the Tauberian theory by Karamata (see again References [21,48]).

**Comment:** The formula (105) says that our general power law waiting time density is gradually deformed into the Mittag-Leffler waiting time density as *τ* tends to zero.

**Remark 3.** *Let us stress here the distinguished character of the Mittag-Leffler waiting time density <sup>φ</sup>ML*(*t*) = <sup>−</sup> *<sup>d</sup> dt <sup>E</sup>β*(−*<sup>t</sup> <sup>β</sup>*)*. Considering its Laplace transform*

$$
\tilde{\phi}^{ML}(s) = \frac{1}{1 + s^{\beta}}, \quad \phi^{ML}(t) = -\frac{d}{dt} E\_{\beta}(-t^{\beta}) \text{, } 0 < \beta \le 1 \,\,\,\tag{106}
$$

*we can easily prove the identity*

$$
\widetilde{\phi}^{\rm ML}\_{\tau,a}(s) = \widetilde{\phi}^{\rm ML}(\tau s / a^{1/\beta}) \quad \text{for all} \quad \tau > 0, \quad a > 0. \tag{107}
$$

Note that Equation (107) states the *self-similarity* of the combined operation *rescaling-respeeding* for the Mittag-Leffler waiting time density. In fact, (107) implies *φML <sup>τ</sup>*,*<sup>a</sup>* (*t*) = *φML*(*t*/*c*)/*c* with *c* = *τ*/*a*1/*<sup>β</sup>* , which means replacing the random waiting time *TML* by *c TML*. As a consequences, choosing *a* = *τ<sup>β</sup>* we have

$$
\widetilde{\phi}^{ML}\_{\tau,\tau^{\beta}}(s) = \widetilde{\phi}^{ML}(s) \quad \text{for all} \quad \tau > 0. \tag{108}
$$

Hence *the Mittag-Leffler waiting time density is invariant against combined rescaling with τ and respeeding with a* = *τβ*.

Observing (105) we can say that *<sup>φ</sup>ML*(*t*) is a *<sup>τ</sup>* <sup>→</sup> 0 attractor for any power law waiting time (101) under simultaneous rescaling with *τ* and respeeding with *a* = *λτβ*. In other words, this attraction property of the Mittag-Leffler probability distribution with respect to power law waiting times (with 0 < *β* ≤ 1) is a kind of analogy to the attraction of sums of power law jump distributions by stable distributions.

#### **10. The Mittag-Leffler Functions W.R.T the Time Fractional Diffusion-Wave Equations and the Wright Functions**

In this section we show the relations of the Mittag-Leffler function with the Wright function via Laplace and Fourier transformations, in order to provide other arguments to outline the role of the Mittag-Leffler in the Fractional Calculus. For this purpose, because of the necessity to work with two independent parameters we first recall the proper definitions of the Mittag-Leffler and the Wright function. Then we will consider the time fractional diffusion-wave equation with its fundamental solutions to the basic boundary value problem that turn out to be expressed in terms of special cases of the Wright functions, the so called *F* and *M* functions. Finally we pay attention to some noteworthy formulas for the *M*-Wright function, including its connections with the Mittag-Leffler function.

#### *10.1. Definitions and Main Properties of the Wright Functions*

The classical *Wright function*, that we denote by *Wλ*,*μ*(*z*), is defined by the series representation convergent in the whole complex plane,

$$\mathcal{W}\_{\lambda,\mu}(z) := \sum\_{n=0}^{\infty} \frac{z^n}{n!\Gamma(\lambda n + \mu)}, \quad \lambda > -1, \quad \mu \in \mathbb{C}, \tag{109}$$

As a consequence *Wλ*,*μ*(*z*) is an *entire function* for all *λ* ∈ (−1, +∞). Originally Wright assumed *λ* ≥ 0 in connection with his investigations on the asymptotic theory of partition [59,60] and only in 1940 he considered −1 < *λ* < 0, [61]. We note that in the Vol 3, Chapter 18 of the handbook of the Bateman Project [6], presumably for a misprint, the parameter *λ* is restricted to be non-negative, whereas the Wright functions remained practically ignored in other handbooks. In 1993 the present author, being aware only of the Bateman handbook, proved that the Wright function is entire also for −1 < *λ* < 0 in his approaches to the time fractional diffusion equation, as outlined in his papers published from 1994 to 1997, [62–66]. For other earlier treatments of this function we refer to the 1999 paper by Gorenflo, Luchko and Mainardi [67]).

In view of the asymptotic representation in the complex domain and of the Laplace transform the Wright functions were distinguished by the author in *first kind* (*λ* ≥ 0) and *second kind* (−1 < *λ* < 0) as outlined e.g., in the Appendix F of his book [22].

We note that the Wright functions are entire of order 1/(1 + *λ*) hence only the first kind functions (*λ* ≥ 0) are of exponential order whereas the second kind functions (−1 < *λ* < 0) are not of exponential order. The case *λ* = 0 is trivial since *W*0,*μ*(*z*) = e*z*/Γ(*μ*).

Following the proofs in Appendix F in Reference [22] we get the following Laplace transform pairs of the Wright functions in terms of the Mittag-Leffler functions in two parameters, where *r* can be the time variable *t* > 0 or the space variable *x* > 0)

*for the first kind* (*λ* ≥ 0)

$$\mathcal{W}\_{\lambda,\mu}(\pm r) \, \mathrel{\mathop{:}} \, \frac{1}{\mathop{\rm s}} E\_{\lambda,\mu} \left( \pm \frac{1}{\mathop{\rm s}} \right) \,, \quad \lambda > 0 \, \text{} \, \tag{110}$$

*for the second kind* (*λ* = −*ν*, 0 < *ν* < 1)

$$\mathcal{W}\_{-\nu,\mu}(-r) \div E\_{\nu,\mu+\nu}(-s) \,, \quad 0 < \nu < 1 \,. \tag{111}$$

The Wright functions of the first kind are useful to find the solutions of some (linear and non-linear) differential equations of fractional order as recently shown by Garra and Mainardi, [68].

Since the pioneering works in 1990's by the author, noteworthy cases of Wright functions of the second kind, known as *auxiliary functions F* and *M* play fundamental roles in solving the Signalling problem and the Cauchy value problem, respectively for the time fractional diffusion-wave equation.

We first recall hereafter these auxiliary functions in terms of the Wright functions of the second kind, following their power series representations. They read

$$F\_{\nu}(z) := \mathcal{W}\_{-\nu,0}(-z) \,, \quad 0 < \nu < 1 \,\,\,\,\,\tag{112}$$

and

$$M\_{\nu}(z) := \mathcal{W}\_{-\nu, 1-\nu}(-z) \,, \quad 0 < \nu < 1 \,\,\,\,\tag{113}$$

interrelated through

$$F\_{\mathcal{V}}(z) = \nu \, z \, M\_{\mathcal{V}}(z) \,. \tag{114}$$

*Entropy* **2020**, *22*, 1359

The *series representations* of our auxiliary functions are derived from those of *Wλ*,*μ*(*z*) in (109). We have:

$$F\_{\nu}(z) = \sum\_{n=1}^{\infty} \frac{(-z)^n}{n! \, \Gamma(-\nu n)} = -\frac{1}{\pi} \sum\_{n=1}^{\infty} \frac{(-z)^n}{n!} \, \Gamma(\nu n + 1) \sin(\pi \nu n) \, , \tag{115}$$

and

$$M\_{\nu}(z) = \sum\_{n=0}^{\infty} \frac{(-z)^n}{n! \, \Gamma[-\nu n + (1-\nu)]} = \frac{1}{\pi} \sum\_{n=1}^{\infty} \frac{(-z)^{n-1}}{(n-1)!} \Gamma(\nu n) \sin(\pi \nu n) \, , \tag{116}$$

where we have used the well-known reflection formula for the Gamma function,

$$
\Gamma(\zeta)\,\Gamma(1-\zeta) = \pi/\sin\pi\zeta\,.
$$

#### *10.2. The Time-Fractional Diffusion-Wave Equation and the Related Green Functions*

For the reader's convenience let us recall the main formulas for the time fractional diffusion equations and their fundamental solutions (also referred to as the Green functions) for the Cauchy and Signalling problems. For more details we refer to References [69,70].

Denoting as usual *x*, *t* the space and time variables, and *r* = *r*(*x*, *t*) the response variable, the family of these evolution equations reads

$$\frac{\partial^{\beta}r}{\partial t^{\beta}} = a \frac{\partial^{2}r}{\partial x^{2}}, \quad 0 < \beta \le 2,\tag{117}$$

where *the time derivative of order <sup>β</sup> is intended in the Caputo sense*, namely is the operator <sup>∗</sup>*D<sup>β</sup> <sup>t</sup>* , introduced in Section 3, but for order less than 1, see Equation (10), and *a* is a positive constant of dimension *<sup>L</sup>*<sup>2</sup> *<sup>T</sup>*−*β*. Thus we must distinguish the cases 0 <sup>&</sup>lt; *<sup>β</sup>* <sup>≤</sup> 1 and 1 <sup>&</sup>lt; *<sup>β</sup>* <sup>≤</sup> 2. We have

$$\begin{aligned} \frac{\partial^{\beta}r}{\partial t^{\beta}} &:= \begin{cases} \frac{1}{\Gamma(1-\beta)} \int\_{0}^{t} \left[ \frac{\partial}{\partial \tau} r(\mathbf{x}, \tau) \right] \frac{d\tau}{(t-\tau)^{\beta}}, & 0 < \beta < 1, \\\\ \frac{\partial r}{\partial t}, & \beta = 1; \end{cases} \tag{118} \\\\ \frac{\partial^{\beta}r}{\partial t^{\beta}} &:= \begin{cases} \frac{1}{\Gamma(2-\beta)} \int\_{0}^{t} \left[ \frac{\partial^{2}}{\partial \tau^{2}} r(\mathbf{x}, \tau) \right] \frac{d\tau}{(t-\tau)^{\beta-1}}, & 1 < \beta < 2, \\\\ \frac{\partial^{2}r}{\partial t^{2}}, & \beta = 2. \end{cases} \end{aligned} \tag{119}$$

It should be noted that in both cases 0 < *β* ≤ 1, 1 < *β* ≤ 2, the time fractional derivative in the L.H.S. of Equation (117) can be removed by a suitable fractional integration, leading to alternative forms where the necessary initial conditions at *t* = 0<sup>+</sup> explicitly appear.

For this purpose we apply to Equation (117) the fractional integral operator of order *β*, namely

$$J\_t^{\beta}f(t) := \frac{1}{\Gamma(\beta)} \int\_0^t (t-\tau)^{\beta-1} f(\tau) \,d\tau.$$

For *β* ∈ (0, 1] we have:

$$J\_t^\beta \circ \, \_\*D\_t^\beta r(\mathbf{x}, t) = J\_t^\beta \circ J\_t^{1-\beta} D\_t^1 r(\mathbf{x}, t) = J\_t^1 D\_t^1 r(\mathbf{x}, t) = r(\mathbf{x}, t) - r(\mathbf{x}, 0^+) \, \_\*D\_t^\beta r(\mathbf{x}, t)$$

For *β* ∈ (1, 2] we have:

$$\|f\_t^{\mathcal{G}} \circ \circ D\_t^{\mathcal{G}}r(\mathbf{x},t) = f\_t^{\mathcal{G}} \circ I\_t^{2-\beta}D\_t^2r(\mathbf{x},t) = f\_t^2 \, D\_t^2 \, r(\mathbf{x},t) = r(\mathbf{x},t) - r(\mathbf{x},0^+) - tr\_l(\mathbf{x},0^+).$$

Then, as a matter fact, we get the integro-differential equations: if 0 < *β* ≤ 1 :

$$r(\mathbf{x},t) = r(\mathbf{x},0^{+}) + \frac{a}{\Gamma(\beta)} \int\_{0}^{t} \left(\frac{\partial^2 r}{\partial \mathbf{x}^2}\right) (t-\tau)^{\beta-1} \,d\tau;\tag{120}$$

if 1 < *β* ≤ 2 :

$$r(\mathbf{x},0^{+}) + t\frac{\partial}{\partial t}\left.r(\mathbf{x},t)\right|\_{t=0^{+}} + \frac{a}{\Gamma(\beta)}\int\_{0}^{t} \left(\frac{\partial^{2}r}{\partial \mathbf{x}^{2}}\right)(t-\tau)^{\beta-1}d\tau. \tag{121}$$

Denoting by *<sup>f</sup>*(*x*), *<sup>x</sup>* <sup>∈</sup> IR and *<sup>h</sup>*(*t*), *<sup>t</sup>* <sup>∈</sup> IR<sup>+</sup> sufficiently well-behaved functions, the basic boundary-value problems are thus formulated as following, assuming 0 < *β* ≤ 1,

*(a) Cauchy problem*

$$r(\mathbf{x},0^{+}) = f(\mathbf{x}), \ -\infty < \mathbf{x} < +\infty; \ r(\mp\infty, t) = 0, \ t > 0;\tag{122}$$

*(b) Signalling problem*

$$r(\mathbf{x},0^+)=0,\ \mathbf{x}>0;\ r(0^+,t)=h(t),\ r(+\infty,t)=0,\ t>0.\tag{123}$$

If 1 < *β* < 2 , we must add into (122) and (123) the initial values of the first time derivative of the field variable, *rt*(*x*, 0+), since in this case the corresponding fractional derivative is expressed in terms of the second order time derivative. To ensure the continuous dependence of our solution with respect to the parameter *β* also in the transition from *β* = 1<sup>−</sup> to *β* = 1<sup>+</sup> , we agree to assume

$$\frac{\partial}{\partial t} \left. r(x, t) \right|\_{t=0^+} = 0, \text{ for } 1 < \beta \le 2,\tag{124}$$

as it turns out from the integral forms (120)–(121).

In view of our subsequent analysis we find it convenient to set

$$\nu := \beta/2, \quad \text{so} \quad \begin{cases} 0 < \nu \le 1/2, \iff 0 < \beta \le 1, \\ 1/2 < \nu \le 1, \iff 1 < \beta \le 2, \end{cases} \tag{125}$$

and from now on to add the parameter *ν* to the independent space-time variables *x* , *t* in the solutions, writing *r* = *r*(*x*, *t*; *ν*).

For the Cauchy and Signalling problems we introduce the so-called *Green functions* G*c*(*x*, *t*; *ν*) and G*s*(*x*, *t*; *ν*), which represent the respective fundamental solutions, obtained when *f*(*x*) = *δ*(*x*) and *h*(*t*) = *δ*(*t*). As a consequence, the solutions of the two basic problems are obtained by a space or time convolution according to

$$\sigma(\mathbf{x}, t; \nu) = \int\_{-\infty}^{+\infty} \mathcal{G}\_{\mathbf{c}}(\mathbf{x} - \boldsymbol{\xi}, t; \nu) \, f(\boldsymbol{\xi}) \, d\boldsymbol{\xi} \, \tag{126}$$

$$r(\mathbf{x}, t; \nu) = \int\_{0^-}^{t^+} \mathcal{G}\_s(\mathbf{x}, t - \mathbf{r}; \nu) \, h(\mathbf{r}) \, d\mathbf{r} \,. \tag{127}$$

It should be noted that in (126) G*c*(*x*, *t*; *ν*) = G*c*(|*x*|, *t*; *ν*) because the Green function of the Cauchy problem turns out to be an even function of *x*. According to a usual convention, in (127) the limits of integration are extended to take into account the possibility of impulse functions centred at the extremes.

Now we recall the results obtained in 1990's by the author that allow us to express the two Green functions in terms of the auxiliary functions *Fν*(*ξ*) and *Mν*(*ξ*) where, for *x* > 0, *t* > 0

$$\mathfrak{F} := \mathfrak{x} / (\sqrt{a} \, t^{\nu}) > 0 \tag{128}$$

acts as *similarity variable*. Then we obtain the Green functions in the space-time domain in the form

$$\mathcal{G}\_{\mathbb{C}}(\mathbf{x}, t; \nu) = \frac{1}{2\,\nu\,\mathbf{x}} F\_{\mathbb{V}}(\mathbf{\tilde{s}}) = \frac{1}{2\sqrt{a}\,t^{\nu}} \, M\_{\mathbb{V}}(\mathbf{\tilde{s}})\,,\tag{129}$$

$$\mathcal{G}\_{\mathbb{S}}(\mathbf{x},t;\nu) = \frac{1}{t} F\_{\mathbb{V}}(\xi) = \frac{\nu \, \mathcal{X}}{\sqrt{\overline{a} \, t^{1+\nu}}} \, M\_{\mathbb{V}}(\xi) \,. \tag{130}$$

We also recognize the following *reciprocity relation* for the original Green functions,

$$\mathcal{D}\nu \ge \mathcal{G}\_{\mathbb{C}}(\mathbf{x}, t; \nu) = t \, \mathcal{G}\_{\mathbb{S}}(\mathbf{x}, t; \nu) = F\_{\mathbb{V}}(\xi) = \nu \xi^{\mathbb{P}} M\_{\mathbb{V}}(\xi) \,. \tag{131}$$

Now *Fν*(*ξ*), *Mν*(*ξ*) are the *auxiliary functions* for the general case 0 < *ν* ≤ 1, which generalize those well known for the standard (Fourier) diffusion equation and for the standard (D'alembert) wave equation derived for *ν* = 1/2 and for *ν* = 1, respectively.

#### *10.3. Some Noteworthy Results for the Mν Wright Function*

In this survey we find worthwhile to concentrate our attention on a single auxiliary function, the *M*-Wright function, sometimes referred to as the *Mainardi function*. Indeed this function is indeed referred with this name in the 1999 book by Podlubny [19], that is one of the most cited treatises on fractional calculus. Then this name is found in several successive papers and books related to fractional diffusion and wave processes, see for example, the relevant 2015 paper by Sandev et al. [71].

Let us now recall some interesting analytic results related to the so-called Mainardi function. One reason for the major attention is due to its straightforward generalization of the Gaussian probability density obtained for *ν* = 1/2, that is the fundamental solution of the Cauchy problem for the standard diffusion equation. Furthermore it allows an impressive visualization of the evolution with the order *ν* ∈ (0, 1) of the Green function of the Cauchy problem of the fractional diffusion wave Equation (129) as shown in the next figures with *a* = 1 and taking *t* = 1.

The readers are invited to look at the YouTube video by my former student Armando Consiglio whose title is "Simulation of the *M*−Wright function", in which the author has shown the evolution of this function as the parameter *ν* changes between 0 and 0.85 in the interval (−5 < *x* < +5) of IR centered in *x* = 0 represented herewith in Figures 6 and 7 at fixed time *t* = 1.

**Figure 6.** Plot of the symmetric *M*−Wright function *Mν*(|*x*|) for 0 ≤ *ν* ≤ 1/2. Note that the *M*-Wright function becomes a Gaussian density for *ν* = 1/2.

**Figure 7.** Plot of the symmetric *M*−Wright type function *Mν*(|*x*|) for 1/2 ≤ *ν* ≤ 1. Note that the *M*Wright function becomes a a sum of two delta functions centered in *x* = ±1 for *ν* = 1.

The readers interested to have more details on the classical Wright functions should consult the recent survey by Luchko [72] and references therein.

In view of time-fractional diffusion processes related to time-fractional diffusion equations it is worthwhile to introduce the function in two variables

$$\mathbb{M}\_{\nu}(\mathbf{x},t) := t^{-\nu} M\_{\nu}(\mathbf{x}t^{-\nu}), \quad 0 < \nu < 1, \quad \mathbf{x}, t \in \mathbb{R}^{+}, \tag{132}$$

which defines a spatial probability density in *x* evolving in time *t* with self-similarity exponent *H* = *ν*. Of course for *x* ∈ IR we have to consider the symmetric version of the *M*-Wright function. obtained from (132) multiplying by 1/2 and replacing *x* by |*x*|.

Hereafter we provide a list of the main properties of this function, which can be derived from the Laplace and Fourier transforms for the corresponding Wright *M*-function in one variable presented in papers by Mainardi and recalled in the Appendix F of Reference [22].

For the Laplace transform of M*ν*(*x*, *t*) with respect to *t* > 0 and *x* > 0 we get respectively:

$$\mathcal{L}\left\{\mathbb{M}\_{\mathbb{V}}(\mathbf{x},t);t\to s\right\} := \int\_{0}^{\infty} \mathbf{e}^{-\mathbf{s}t} \, t^{-\nu} \, M\_{\mathbb{V}}(\mathbf{x}\,t^{-\nu}) \, dt = \mathbf{s}^{\nu-1} \, \mathbf{e}^{-\mathbf{x}\mathbf{s}^{\nu}}\,;\tag{133}$$

$$\mathcal{L}\left\{\mathcal{M}\_{\boldsymbol{\nu}}(\mathbf{x},t);\mathbf{x}\rightarrow\mathbf{s}\right\} := \int\_{0}^{\infty} \mathbf{e}^{-s\mathbf{x}} \, t^{-\nu} \, M\_{\boldsymbol{\nu}}(\mathbf{x}\,t^{-\nu}) \, d\mathbf{x} = \mathbb{E}\_{\boldsymbol{\nu},1}\left(-s\mathbf{t}^{\boldsymbol{\nu}}\right) \,. \tag{134}$$

For the Fourier transforms with respect to the spatial variable *<sup>x</sup>* we have for <sup>M</sup>*ν*(*x*, *<sup>t</sup>*) with *<sup>x</sup>* <sup>∈</sup> IR+,

$$\begin{aligned} \mathcal{F}\_{\mathbb{C}} \left\{ \mathbb{M}\_{\mathbb{V}}(\mathbf{x}, t); \mathbf{x} \to \mathbb{x} \right\} &:= \int\_{0}^{\infty} \cos(\mathbf{x} \mathbf{x}) \, t^{-\nu} \, M\_{\mathbb{V}}(\mathbf{x} \, t^{-\nu}) \, d\mathbf{x} = E\_{2\nu, 1}(-\kappa^{2} t^{2\nu}), \\ \mathcal{F}\_{\mathbb{S}} \left\{ \mathbb{M}\_{\mathbb{V}}(\mathbf{x}, t); \mathbf{x} \to \mathbb{x} \right\} &:= \int\_{0}^{\infty} \sin(\mathbf{x} \mathbf{x}) \, t^{-\nu} \, M\_{\mathbb{V}}(\mathbf{x} \, t^{-\nu}) \, d\mathbf{x} = \kappa^{\nu} E\_{2\nu, \nu + 1}(-\kappa^{2} t^{2\nu}), \end{aligned} \tag{135}$$

so that for the symmetric function <sup>M</sup>*ν*(|*x*|, *<sup>t</sup>*) we get

$$\mathcal{F}\left\{\mathcal{M}\_{\boldsymbol{\nu}}(|\mathbf{x}|,t);\mathbf{x}\rightarrow\mathbf{x}\right\} = 2\int\_{0}^{\infty}\cos(\kappa\mathbf{x})\,t^{-\nu}\,M\_{\boldsymbol{\nu}}(\mathbf{x}\,t^{-\nu})\,d\mathbf{x} = 2E\_{2\boldsymbol{\nu},1}\left(-\kappa^{2}t^{2\nu}\right).\tag{136}$$

Restricting our attention at the known analytic expressions of the *Mν* functions versus *x* at fixed time *t* = 1 we recall the following results for some special rational values of the parameter *ν*: *ν* = 1/3 (see Reference [22])

$$M\_{1/3}(\mathbf{x}) = 3^{2/3} \text{Ai}(\mathbf{x}/3^{1/3})\,,\tag{137}$$

*ν* = 1/2 (see Reference [22])

$$M\_{1/2}(\mathbf{x}) = \frac{1}{\sqrt{\pi}} \mathbf{e}^{-\mathbf{x}^2/4},\tag{138}$$

*ν* = 2/3 (see Reference [73])

$$M\_{2/3}(\mathbf{x}) = 3^{-2/3} \left[ 3^{1/3} \ge \text{Ai}\left(\mathbf{x}^2/3^{4/3}\right) - 3\text{Ai}'\left(\mathbf{x}^2/3^{4/3}\right) \right] \text{ e}^{-2\mathbf{x}^3/27}.\tag{139}$$

In the above equations Ai and Ai denote the *Airy function* and its first derivative.

**Funding:** This research received no external funding.

**Acknowledgments:** The work of the author has been carried out in the framework of the activities of the National Group of Mathematical Physics (GNFM, INdAM). The author would like to thank the anonymous reviewers for their helpful and constructive comments and Bruce West for the invitation.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A. My Acquaintance with the Mittag-Leffler Function Since the Late 1960's**

I was formerly acquainted with the Mittag-Leffler function from the pioneering 1947 paper by Gross on creep and relaxation in linear viscoelasticity. It was during my PhD studies at the University of Bologna under the supervision of Prof Caputo in the year 1969. Indeed I was asked to apply in the framework of anelastic materials the derivative of non-integer order introduced by Prof Caputo in [74,75]. More recently this fractional derivative was named after him thanks the suggestions of Gotrenflo and Mainardi [18] and Podlubny [19]. I understood that the Mittag-Leffler function proposed by Gross both in creep and relaxation processes could be used in the corresponding processes in the fractional Zener model. Because Gross had computed and plotted only the spectra, see Figure 1 in this article, I was interested to plot the Mittag-Leffler function on which I was addressed in the Third volume of the Handbook of the Bateman Project published in 1955 [6]. Carrying out the plot of the Mittag-Leffler function *Eα*(−*t <sup>α</sup>*) using a Fortran program was not easy for me using its power series representation, so I limited the time interval to [0.5] with ordinate in logarithmic scale. As far as I know this was the former plot of this function, see References [9,10] where the results of my PhD thesis were published in 1971 jointly with my supervisor. Later I was acquainted with the viscoelastic model by Rabotnov in 1948 [76] and with the Russian school of Meskov and Rossikhin who used the so-called Rabotnov function, indeed related to the Mittag-Leffler function, and consequently with results similar to some extent to those in References [9,10]. However, our work was totally independent from the Russian school (incidentally published in Russian), as outlined in the Notes to the chapter 3 of my 2010 book, see pp. 74–76 in [22]. More later, in the 1980 I was acquainted with the results by Bagley-Torvik and by Koeller that confirmed the relevant role of the Mittag-Leffler functions in linear viscoelastic models governed by constitutive laws of fractional order. Once again their results crossed with those in References [9,10]. However, I have to confess that, when in conferences of those years I dealt with fractional derivatives in rheology, the audience remained indifferent if not hostile and laughable so I left this topic preferring to transfer my research interests to wave phenomena, in particular on the effects of dissipation on linear dispersive waves.

Incidentally, in 1980's, I was also aware of the nice treatise by Harold T. Davis on the Theory of Linear operators published in 1936 [2], where the author gave information about the fractional calculus and the Mittag-Leffler function. It was my honor to publish a recent survey on the contributions by Davis and Gross (already recalled in Introduction), whom I consider the pioneers of fractional relaxation processes in viscoelastic and dielectric materials [32]. In the firsts years of 1990s under the push of fractals, the relevance of fractional derivatives (used not always in a correct way) was outlined in several papers. For this I was induced to come back to fractional calculus. It was just this occasion for me to devote my research interests to the application of fractional calculus in relaxation, oscillation phenomena governed by fractional ODEs and diffusion, wave phenomena governed by fractional PDEs. Once again I understood the relevance of the Mittag-Leffler functions but also that of the Wright functions, both of them classified as miscellaneous functions in the handbook of Bateman project. I must note that, as far as I known, the Bateman handbook was the only one published in English to deal with these special function, and therefore accessible to me.

The year 1994 was the golden year for me as far as my acquaintance with fractional calculus and related special functions is concerned. Indeed I took profit by the acquaintance in three different conferences with the late Prof Gorenflo and Prof. Nigmatullin (in Bordeaux, France), with Prof. Podlubny and Prof. Caputo (in Atlanta, USA), and with Prof Virginia Kiryakova and the late Prof. Stankovic (in Sofia, Bulgaria), among other authorities of the fractional calculus. But it was with Prof Gorenflo that I started a collaboration for more than 20 years (1995–2015) motivated by our common interest towards the potential of the Mittag-Leffler functions in the applications of the fractional calculus.

Then, since 1997, I was interested in the emerging science of Econophysics thanks mainly to my younger colleague Enrico Scalas. With Gorenflo, Scalas and his student Raberto we published some papers on the advent of fractional Calculus in Econophysics, see e.g., [54] and my historical survey in Mathematics [77]. In 2007, on the occasion of the 80-birthday of Prof. Caputo, I published with Gorenflo a survey in Fractional Calculus and Applied Analysis [11] where I took the liberty to propose for the Mittag-Leffler function the (successful) title of *the Queen Function of the Fractional Calculus*. Some years earlier, Gorenflo had contacted the American Mathematical Society to give a specific number to the Mittag-Leffler function, that is 33E12, in the MSC classification.

Gorenflo and I promoted the Mittag-Lffler functions in several Conferences and Workshops in all the world. In particular, I would like to recall my lectures in India (under invitation of Prof Mathai, director of the Center of Mathematical Sciences, in Brazil (under invitation of Prof Edmundo Capelas de Oliveira, Campinas University) and in US (under invitation of Prof. Karniadakis, Brown University, see Reference [78]).

I like to outline my gratitude to Professor Michele Caputo (1927) and Rudolf Gorenflo (1930–2017) for having provided me with useful advice in earlier and later times, respectively. It is my pleasure to enclose a photo showing the author between them, taken in Bologna, April 2002.

**Figure A1.** F. Mainardi between R. Gorenflo (left) and M. Caputo (right).

Unfortunately, I lost Gorenflo's guidance and collaboration in 2015 when he suffered strong health troubles that led him to his death on 20 October 2017 at 87 years. He was Emeritus Professor of Mathematics at the Free University of Berlin since his retirement in 1998.

Nowadays I am quite interested to promote the special functions of the Mittag-Leffler and Wright type with the second edition of the treatise by Gorenflo et al. [14] and my surveys [79,80], including the present review.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Why Do Big Data and Machine Learning Entail the Fractional Dynamics?**

**Haoyu Niu 1,†, YangQuan Chen 2,\*,† and Bruce J. West <sup>3</sup>**


**Abstract:** Fractional-order calculus is about the differentiation and integration of non-integer orders. Fractional calculus (FC) is based on fractional-order thinking (FOT) and has been shown to help us to understand complex systems better, improve the processing of complex signals, enhance the control of complex systems, increase the performance of optimization, and even extend the enabling of the potential for creativity. In this article, the authors discuss the fractional dynamics, FOT and rich fractional stochastic models. First, the use of fractional dynamics in big data analytics for quantifying big data variability stemming from the generation of complex systems is justified. Second, we show why fractional dynamics is needed in machine learning and optimal randomness when asking: "is there a more optimal way to optimize?". Third, an optimal randomness case study for a stochastic configuration network (SCN) machine-learning method with heavy-tailed distributions is discussed. Finally, views on big data and (physics-informed) machine learning with fractional dynamics for future research are presented with concluding remarks.

**Keywords:** fractional calculus; fractional dynamics; fractional-order thinking; heavytailedness; big data; machine learning; variability; diversity

#### **1. Fractional Calculus (FC) and Fractional-Order Thinking (FOT)**

Fractional calculus (FC) is the quantitative analysis of functions using non-integerorder integration and differentiation, where the order can be a real number, a complex number or even the function of a variable. The first recorded query regarding the meaning of a non-integer order differentiation appeared in a letter written in 1695 by Guillaume de l'Hôpital to Gottfried Wilhelm Leibniz, who at the same time as Isaac Newton, but independently of him, co-invented the infinitesimal calculus [1]. Numerous contributors have provided definitions for fractional derivatives and integrals [2] since then, and the theory along with the applications of FC have been expanded greatly over the centuries [3–5]. In more recent decades, the concept of **fractional dynamics** has merged and gained followers in the statistical and chemical physics communities [6–8]. For example, optimal image processing has improved through the use of fractional-order differentiation and fractionalorder partial differential equations as summarized in Chen et al. [9–11]. Anomalous diffusion was described using fractional-diffusion equations in [12,13], and Metzler et al. used fractional Langevin equations to model viscoelastic materials [14].

Today, big data and machine learning (ML) are two of the hottest topics of applied scientific research, and they are closely related to one another. To better understand them, we also need fractional dynamics, as well as fractional-order thinking (FOT). Section 2 is devoted to the discussion of the relationships between big data, variability, and fractional dynamics, as well as to fractional-order data analytics (FODA) [15]. The topics touched

**Citation:** Niu, H.; Chen, Y.; West, B.J. Why Do Big Data and Machine Learning Entail the Fractional Dynamics? *Entropy* **2021**, *23*, 297. https://doi.org/10.3390/e23030297

Academic Editor: Jose A. Tenreiro Machado

Received: 2 February 2021 Accepted: 24 February 2021 Published: 28 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

on in this section include the Hurst parameter [16,17], fractional Gaussian noise (fGn), fractional Brownian motion (fBm), the fractional autoregressive integrated moving average (FARIMA) [18], the formalism of continuous time random walk (CTRW) [19], unmanned aerial vehicles (UAVs) and precision agriculture (PA) [20]. In Section 3, how to learn efficiently (optimally) for ML algorithms is investigated. The key to developing an efficient learning process is the method of optimization. Thus, it is important to design an efficient or perhaps optimal optimization method. The derivative-free methods, and the gradientbased methods, such as the Nesterov accelerated gradient descent (NAGD) [21], are both discussed. Furthermore, the authors propose designing and analyzing the ML algorithms in an S or Z transform domain in Section 3.3. FC is used in optimal randomness in the methods of stochastic gradient descent (SGD) [22] and random search, and in implementing the internal model principle (IMP) [23].

FOT is a way of thinking using FC. For example, there are non-integers between the integers; between logic 0 and logic 1, there is the fuzzy logic [24]; compared with integer-order splines, there are fractional-order splines [25]; between the high-order integer moments, there are non-integer-order moments, etc. FOT has been entailed by many research areas, for example, self-similar [26,27], scale-free or scale-invariant, power-law, long-range-dependence (LRD) [28,29], and 1/ *f <sup>α</sup>* noise [30,31]. The terms porous media, particulate, granular, lossy, anomaly, disorder, soil, tissue, electrodes, biology [32], nano, network, transport, diffusion, and soft matters are also intimately related to FOT. However, in the present section, we mainly discuss **complexity and inverse power laws (IPL)**.

#### *1.1. Complexity and Inverse Power Laws*

When studying complexity, it is fair to ask, what does it mean to be complex? When do investigators begin identifying a system, network or phenomenon as complex [33,34]? There is an agreement among a significant fraction of the scientific community that when the distribution of the data associated with the process of interest obeys an IPL, the phenomenon is complex; see Figure 1. On the left side of the figure, the complexity "bow tie" [35–38] is the phenomenon of interest, thought to be a complex system. On the right side of the figure is the spectrum of system properties associated with IPL probability density functions (PDFs): the system has one or more of the properties of being scale-free, having a heavy tail, having a long-range dependence, and/or having a long memory [39,40]. In the book by West and Grigolini [41], there is a table listing a sample of the empirical power laws and IPLs uncovered in the past two centuries. For example, in scale-free networks, the degree distributions follow an IPL in connectivity [42,43]; in the processing of signals containing pink noise, the power spectrum follows an IPL [29]. For other examples, such as the probability density function (PDF), the autocorrelation function (ACF) [44], allometry (*Y* = *aXb*) [45], anomalous relaxation (evolving over time) [46], anomalous diffusion (mean squared dissipation versus time) [13], and self-similarity can all be described by the IPL "bow tie" depicted in Figure 1.

The power law is usually described as:

$$f(\mathbf{x}) = a\mathbf{x}^k,\tag{1}$$

when *k* is negative, *f*(*x*) is an IPL. One important characteristic of this power law is scale invariance [47] determined by:

$$f(c\mathbf{x}) = a(c\mathbf{x})^k = c^k f(\mathbf{x}) \approx f(\mathbf{x}).\tag{2}$$

Note that when *x* is the time, the scaling depicts a property of the system dynamics. However, when the system is stochastic, the scaling is a property of the PDF (or correlation structure) and is a constraint on the collective properties of the system.

**Figure 1.** Inverse power law (complexity "bow tie"): On the left are the systems of interest that are thought to be complex. In the center panel, an aspect of the empirical data is characterized by an inverse power law (IPL). The right panel lists the potential properties associated with systems with data that have been processed and yield an IPL property. See text for more details.

FC is entailed by complexity, since an observable phenomenon represented by a fractal function has integer-order derivatives that diverge. Consequently, for the complexity characterization and regulation, we ought to use the fractional dynamics point of view because the fractional derivative of a fractal function is finite. Thus, complex phenomena, no matter whether they are natural or carefully engineered, ought to be described by fractional dynamics. Phenomena in complex systems in many cases should be analyzed using FC-based models, where mathematically, the IPL is actually the "Mittag–Leffler law" (MLL), which asymptotically becomes an IPL (Figure 2), known to have heavy-tail behavior.

**Figure 2.** Complex signals (IPL): Here, the signal generated by a complex system is depicted. Exemplars of the systems are given as are the potential properties arising from the systems' complexity.

When an IPL results from processing data, one should think about how the phenomena can be connected to the FC. In [48], Gorenflo et al. explained the role of the FC in generating stable PDFs by generalizing the diffusion equation to one of fractional order. For the Cauchy problem, they considered the space-fractional diffusion equation:

$$\frac{\partial \mu}{\partial t} = D(\boldsymbol{\alpha}) \frac{\partial^{\boldsymbol{\alpha}} \boldsymbol{\mu}}{\partial |\boldsymbol{\alpha}|^{\boldsymbol{\alpha}}},\tag{3}$$

where −∞ < *x* < ∞, *t* ≥ 0 with *u*(*x*, 0) = *δ*(*x*), 0 < *α* ≤ 2, and *D*(*α*) is a suitable diffusion coefficient. The fractional derivative in the diffusion variable is of the Riesz–Feller form, defined by its Fourier transform to be |*k*| *<sup>a</sup>*. For the signalling problem, they considered the so-called time-fractional diffusion equation [49]:

$$\frac{\partial^2 \theta^1 \mu}{\partial t^{2\beta}} = D(\beta) \frac{\partial^2 \mu}{\partial x^2} \,. \tag{4}$$

where *x* ≥ 0, *t* ≥ 0 with *u*(0, *t*) = *δ*(*t*), 0 < *β* < 1, and *D*(*β*) is a suitable diffusion coefficient. Equation (4) has also been investigated in [50–52]. Here, the Caputo fractional derivative in time is used.

There are rich forms in stochasticity [22], for example, heavytailedness, which corresponds to fractional-order master equations [53]. In Section 1.2, heavy-tailed distributions are discussed.

#### *1.2. Heavy-Tailed Distributions*

In probability theory, heavy-tailed distributions are PDFs whose tails do not decay exponentially [54]. Consequently, they have more weight in their tails than does an exponential distribution. In many applications, it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy. Heavy-tailed distributions are widely used for modeling in different disciplines, such as finance [55], insurance [56], and medicine [57]. The distribution of a real-valued random variable *X* is said to have a heavy right tail if the tail probabilities *P*(*X* > *x*) decay more slowly than those of any exponential distribution:

$$\lim\_{x \to \infty} (\frac{P(X > x)}{e^{-\lambda x}}) = \infty,\tag{5}$$

for every *λ* > 0 [58]. For the heavy left tail, an analogous definition can be constructed [59]. Typically, there are three important subclasses of heavy-tailed distributions: fat-tailed, long-tailed and subexponential distributions.

#### 1.2.1. Lévy Distribution

A Lévy distribution, named after the French mathematician Paul Lévy, can be generated by a random walk whose steps have a probability of having a length determined by a heavy-tailed distribution [60]. As a fractional-order stochastic process with heavy-tailed distributions, a Lévy distribution has better computational characteristics [61]. A Lévy distribution is stable and has a PDF that can be expressed analytically, although not always in closed form. The PDF of Lévy flight [62] is:

$$p(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{\gamma}) = \begin{cases} \frac{\sqrt{\frac{\gamma}{2\pi}}}{e^{\frac{\gamma}{2(x-\mu)}(x-\mu)^{3/2}}}, & \mathbf{x} > \boldsymbol{\mu},\\ 0, & \mathbf{x} \le \boldsymbol{\mu}, \end{cases} \tag{6}$$

where *μ* is the location parameter and *γ* is the scale parameter. In practice, the Lévy distribution is updated by

$$L\acute{e}vy(\beta) = \frac{u}{|\nu|^{1/\beta}}\tag{7}$$

where *u* and *ν* are random numbers generated from a normal distribution with a mean of 0 and standard deviation of 1 [63]. The stability index *β* ranges from 0 to 2. Moreover, it is interesting to point out that the well-known Gaussian and Cauchy distributions are special cases of the Lévy PDF when the stability index is set to 2 and 1, respectively.

#### 1.2.2. Mittag–Leffler PDF

The Mittag–Leffler PDF [64] for the time interval between events can be written as a mixture of exponentials with a known PDF for the exponential rates:

$$E\_{\theta}(-t^{\theta}) = \int\_{0}^{\infty} \exp(-\mu t) \mathcal{g}(\mu) d\mu,\tag{8}$$

with a weight for the rates given by:

$$\log(\mu) = \frac{1}{\pi} \frac{\sin(\theta \pi)}{\mu^{1+\theta} + 2\cos(\theta \pi)\mu + \mu^{1-\theta}}.\tag{9}$$

The most convenient expression for the random time interval was proposed by [65]:

$$\pi\_{\theta} = -\gamma\_t (\ln \mu \frac{\sin(\theta \pi)}{\tan(\theta \pi \upsilon)} - \cos(\theta \pi \iota))^{1/\theta},\tag{10}$$

where *u*, *v* ∈ (0,1) are independent uniform random numbers, *γ<sup>t</sup>* is the scale parameter, and *τθ* is the Mittag–Leffler random number. In [66], Wei et al. used the Mittag–Leffer distribution for improving the Cuckoo Search algorithm, which did show an improved performance.

#### 1.2.3. Weibull Distribution

A random variable is described by a Weibull distribution function *F*:

$$F(\mathbf{x}) = \boldsymbol{\sigma}^{-(\mathbf{x}/k)^{\lambda\_w}},\tag{11}$$

where *k* > 0 is the scale parameter, and *λ<sup>w</sup>* > 0 is the shape parameter [67]. If the shape parameter is *λ<sup>w</sup>* < 1, the Weibull distribution is determined to be heavy tailed.

#### 1.2.4. Cauchy Distribution

A random variable is described by a Cauchy PDF if its cumulative distribution is [68,69]:

$$F(\mathbf{x}) = \frac{1}{\pi} \arctan(\frac{2(\mathbf{x} - \mu\_c)}{\sigma}) + \frac{1}{2},\tag{12}$$

where *μ<sup>c</sup>* is the location parameter and *σ* is the scale parameter. Cauchy distributions are examples of fat-tailed distributions, which have been empirically encountered in a variety of areas including physics, earth sciences, economics and political science [70]. Fat-tailed distributions include those whose tails decay like an IPL, which is a common point of reference in their use in the scientific literature [71].

#### 1.2.5. Pareto Distribution

A random variable is said to be described by a Pareto PDF if its cumulative distribution function is

$$F(\mathbf{x}) = \begin{cases} 1 - (\frac{b}{\mathbf{x}})^a, \mathbf{x} \ge b, \\ 0, \mathbf{x} < b, \end{cases} \tag{13}$$

where *b* > 0 is the scale parameter and *a* > 0 is the shape parameter (Pareto's index of inequality) [72] (Figure 3).

**Figure 3.** Cauchy distributions are examples of fat-tailed distributions. The parameter a is the location parameter; the parameter b is the scale parameter.

#### 1.2.6. The *α*-Stable Distribution

A PDF is said to be stable if a linear combination of two independent random variables, each with the same distribution, has the same distribution for the conjoined variable. This PDF is also called the Lévy *α*-stable distribution [73,74]. Since the normal distribution, Cauchy distribution and Lévy distribution all have the above property, one can consider them to be special cases of stable distributions. Stable distributions have 0 < *α* ≤ 2, with the upper bound corresponding to the normal distribution, and *α* = 1, to the Cauchy distribution (Figure 4). The PDFs have undefined variances for *α* < 2, and undefined means for *α* ≤ 1. Although their PDFs do not admit a closed-form formula in general, except in special cases, they decay with an IPL tail and the IPL index determines the behavior of the PDF. As the IPL index gets smaller, the PDF acquires a heavier tail. An example of an IPL index analysis is given in Section 1.4.

**Figure 4.** Symmetric *α*-stable distributions with unit scale factor. The most narrow PDF shown has the smallest IPL index and, consequently, the most weight in the tail regions.

#### *1.3. Mixture Distributions*

A mixture distribution is derived from a collection of other random variables. First, a random variable is selected by chance from the collection according to given probabilities of selection. Then, the value of the selected random variable is realized. The mixture PDFs are complicated in terms of simpler PDFs, which provide a good model for certain datasets. The different subsets of the data can exhibit different characteristics. Therefore, the mixed PDFs can effectively characterize the complex PDFs of certain real-world datasets. In [75], a robust stochastic configuration network (SCN) based on a mixture of Gaussian and Laplace PDFs was proposed. Thus, Gaussian and Laplace distributions are mentioned in this section for comparison purposes.

#### 1.3.1. Gaussian Distribution

A random variable *X* has a Gaussian distribution with the mean *μ<sup>G</sup>* and variance *σ*2 *<sup>G</sup>* (−∞ < *μ<sup>G</sup>* < ∞ and *σ<sup>G</sup>* > 0) if *X* has a continuous distribution for which the PDF is as follows [76]:

$$f(\mathbf{x}|\mu\_{\mathbf{G}}, \sigma\_{\mathbf{G}}^2) = \frac{1}{(2\pi)^{1/2}\sigma\_{\mathbf{G}}} e^{-\frac{1}{2}(\frac{\mathbf{x}-\mu\_{\mathbf{G}}}{\sigma\_{\mathbf{G}}})^2}, for \, -\infty < \mathbf{x} < \infty. \tag{14}$$

#### 1.3.2. Laplace Distribution

The PDF of the Laplace distribution can be written as follows [75]:

$$F(\mathbf{x}|\mu\_l, \eta) = \frac{1}{(2\eta^2)^{1/2}} e^{( -\frac{\sqrt{2}|\mathbf{x} - \mu\_l|}{\eta})},\tag{15}$$

where *μ<sup>l</sup>* and *η* represent the location and scale parameters, respectively.

#### *1.4. IPL Tail-Index Analysis*

There are two approaches to the problem of the IPL tail-index estimation: the parametric [77] and the nonparametric [78]. To estimate the tail index using the parametric approach, some researchers employ a generalized extreme value (GEV) distribution [79] or Pareto distribution, and they may apply the maximum-likelihood estimator (MLE).

The stochastic gradient descent (SGD) has been widely used in deep learning with great success because of the computational efficiency [80,81]. The gradient noise (GN) in the SGD algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. The machine-learning tasks are usually considered as solving the following optimization problem:

$$w^\* = \operatorname\*{argmin}{\{f(w) \triangleq \frac{1}{n} \sum\_{i=1}^n f^{(i)}(w)\}},\tag{16}$$

where *w* denotes the weights of the neural network, *f* denotes the loss function, and *n* denotes the total number of instances. Then, the SGD is calculated based on the following iterative scheme:

$$w\_{k+1} = w\_k - \eta \nabla f\_k(w\_k)\_\prime \tag{17}$$

where *k* means the iteration number, and ∇ *fk*(*wk*) denotes the stochastic gradient at iteration *k*.

Since the gradient noise might not be Gaussian, the use of Brownian motion would not be appropriate to represent its behavior. Therefore, ¸Sim¸sekli et al. replaced the gradient noise with the *α*-stable Lévy motion [82], whose increments have an *α*-stable distribution [83]. Because of the heavy-tailed nature of the *α*-stable distribution, the Lévy motion might incur large, discontinuous jumps [84], and therefore, it would exhibit a fundamentally different behavior than would Brownian motion (Figure 5):

Figure 6 shows that there are two distinct phases of SGD (in this configuration, before and after iteration 1000). At first, the loss decreases very slowly, the accuracy slightly increases, and more interestingly, *α* rapidly decreases. When *α* reaches its lowest level, which means a longer tail distribution, there is a significant jump, which causes a sudden decrease in accuracy. Beyond this point, the process recovers again, and we see stationary behavior in *α* and an increasing behavior in the accuracy.

**Figure 5.** (**a**) Brownian motion; (**b**) Lévy motion. Note that both figures are at the same size scale.

**Figure 6.** (**a**) The behavior of tail-index *α* during the iterations; (**b**) The training and testing accuracy. At first, the *α* decreases very slowly; when *α* reaches its lowest level, which means longer tail distribution, there is a significant jump, which causes a sudden decrease in accuracy. Beyond this point, the process recovers again, and we see stationary behavior in *α* and an increasing behavior in the accuracy.

#### **2. Big Data, Variability and FC**

The term "big data" started showing up in the early 1990s. The world's technological per capita capacity to store information has roughly doubled every 40 months since the 1980s [85]. Since 2012, there have been 2.5 exabytes (2.5 <sup>×</sup> 260 bytes) of data generated every day [86]. According to data report predictions, there will be 163 zettabytes of data by 2025 [87]. Firican proposed, in [88], ten characteristics (properties) of big data to prepare for both the challenges and advantages of big data initiatives (Table 1). In this article, **variability** is the most important characteristic being discussed. Variability refers to several properties of big data. First, it refers to the number of inconsistencies in the data, which need to be understood by using anomaly- and outlier-detection methods for any meaningful analytics to be performed. Second, variability can also refer to diversity [89,90], resulting from disparate data types and sources, for example, healthy or unhealthy [91,92]. Finally, variability can refer to multiple research topics (Table 2).

Considering variability, Xunzi (312 BC–230 BC), who was a Confucian philosopher, made a useful observation: "Throughout a thousand acts and ten thousand changes, his way remains one and the same" [93]. Therefore, we ask: what is the "one and the same" for big data? This is the **variability**, which refers to the behavior of the dynamic system. The ancient Greek philosopher Heraclitus (535 BC–475 BC) also realized the importance

of variability, prompting him to say: "The only thing that is constant is change"; "It is in changing that we find purpose"; "Nothing endures but change"; "No man ever steps in the same river twice, for it is not the same river and he is not the same man".

Heraclitus actually recognized the (fractional-order) dynamics of the river without modern scientific knowledge (in nature). After two thousand years, the integer-order calculus was invented by Sir Issac Newton and Gottfried Wilhelm Leibniz, whose main purpose was to quantify that change [94,95]. From then, scientists started using integerorder calculus to depict dynamic systems, differential equations, modelling, etc. In the 1950s, Scott Blair, who first introduced the FC into rheology, pointed out that the integerorder dynamic view of change is only for our own "convenience" (a little bit selfish). In other words, denying fractional calculus is equivalent to denying the existence of nonintegers between the integers!


**Table 1.** The 10 Vs of big data.

**Table 2.** Variability in multiple research topics.


Blair said [96]: "We may express our concepts in Newtonian terms if we find this convenient but, if we do so, we must realize that we have made a translation into a language which is foreign to the system which we are studying (1950)".

Therefore, variability exists in big data. However, how do we realize the modeling, analysis and design (MAD) for the variability in big data within complex systems? We need fractional calculus! In other words, big data are at the nexus of complexity and FC. Thus, we first proposed fractional-order data analytics (FODA) in 2015. Metrics based on

using the fractional-order signal processing techniques should be used for quantifying the generating dynamics of observed or perceived variability [15].

#### *2.1. Hurst Parameter, fGn, and fBm*

The Hurst parameter or Hurst exponent (*H*) was proposed for the analysis of the long-term memory of time series. It was originally developed to quantify the long-term storage capacity of reservoirs for the Nile river's volatile rain and drought conditions more than a half century ago [16,17]. To date, the Hurst parameter has also been used to measure the intensity of long range dependence (LRD) in time series [97], which requires accurate modeling and forecasting. The self-similarity and the estimation of the statistical parameters of LRD have commonly been investigated recently [98]. The Hurst parameter has also been used for characterizing the LRD process [97,99]. A LRD time series is defined as a stationary process that has long-range correlations if its covariance function *C*(*n*) decays slowly as:

$$\lim\_{n \to \infty} \frac{\mathbb{C}(n)}{n^{-a}} = c,\tag{18}$$

where 0 < *α* < 1, which relates to the Hurst parameter according to *α* = 2 − 2*H* [100,101]. The parameter *c* is a finite, positive constant. When the value of *n* is large, *C*(*n*) behaves as the IPL *c*/*n<sup>α</sup>* [102]. Another definition for an LRD process is that the weakly stationary time-series *X*(*t*) is said to be LRD if its power spectral density (PSD) follows:

$$f(\lambda) \sim \mathcal{C}\_f |\lambda|^{-\beta},\tag{19}$$

as *λ* → 0, for a given *Cf* > 0 and a given real parameter *β* ∈ (0,1), which corresponds to *H* = (1 + *β*)/2 [103]. When 0 < *H* < 0.5, it indicates that the time intervals constitute a negatively correlated process. When 0.5 < *H* < 1, it indicates that time intervals constitute a positively correlated process. When *H* = 0.5, it indicates that the process is uncorrelated.

Two of the most common LRD processes are fBm [104] and fGn [105]. The fBm process with *H*(0 < *H* < 1) is defined as:

$$B\_H(t) = \frac{1}{\Gamma(H + 1/2)} \{ \int\_{-\infty}^0 [(t - s)^{H - 1/2} - (-s)^{H - 1/2}] d\mathcal{W}(s) + \int\_0^t (t - s)^{H - 1/2} d\mathcal{W}(s) \},\tag{20}$$

where *W* denotes a Wiener process defined on (−∞, ∞) [106]. The fGn process is the increment sequences of the fBm process, defined as:

$$X\_k = \Upsilon(k+1) - \Upsilon(k),\tag{21}$$

where *Y*(*k*) is a fBm process [107].

#### *2.2. Fractional Lower-Order Moments (FLOMs)*

The FLOM is based on *α*-stable PDFs. The PDFs of an *α*-stable distribution decay in the tails more slowly than a Gaussian PDF does. Therefore, for sharp spikes or occasional bursts in signals, an *α*-stable PDF can be used for characterizing signals more frequently than Gauss-distributed signals [108]. Thus, the FLOM plays an important role in impulsive processes [109], equivalent to the role played by the mean and variance in a Gaussian processes. When 0 < *α* ≤ 1, the *α*-stable processes have no finite first- or higher-order moments; when 1 < *α* < 2, the *α*-stable processes have a finite first-order moment and all the FLOMs with moments of fractional order that is less than 1. The correlation between the FC and FLOM was investigated in [110,111]. For the Fourier-transform pair *p*(*x*) and *φ*(*μ*), the latter is the characteristic function and is the Fourier transform of the PDF; a complex FLOM can have complex fractional lower orders [110,111]. A FLOM-based fractional power spectrum includes a covariation spectrum and a fractional low-order covariance spectrum [112]. FLOM-based fractional power spectrum techniques have been successfully used in time-delay estimation [112].

#### *2.3. Fractional Autoregressive Integrated Moving Average (FARIMA) and Gegenbauer Autoregressive Moving Average (GARMA)*

A continuous-time linear time-invariant (LTI) system can be characterized using a linear difference equation, which is known as an autoregression and moving average (ARMA) model [113,114]. The process *Xt* of ARMA(*p*, *q*) is defined as:

$$
\Phi(B)X\_t = \Theta(B)\varepsilon\_{t\prime} \tag{22}
$$

where  *<sup>t</sup>* is white Gaussian noise (wGn), and *B* is the backshift operator. However, the ARMA model can only describe a short-range dependence (SRD) property. Therefore, based on the Hurst parameter analysis, more suitable models, such as FARIMA [115,116] and fractional integral generalized autoregressive conditional heteroscedasticity (FIGARCH) [117], were designed to more accurately analyze the LRD processes. The most important feature of these models is the long memory characteristic. The FARIMA and FIGARCH can capture both the short- and the long-memory nature of time series. For example, the FARIMA process *Xt* is usually defined as [118]:

$$
\Phi(B)(1-B)^d X\_{t} = \Theta(B)\epsilon\_{t\prime} \tag{23}
$$

where *<sup>d</sup>* <sup>∈</sup> (−0.5, 0.5), and (<sup>1</sup> <sup>−</sup> *<sup>B</sup>*)*<sup>d</sup>* is a fractional-order difference operator. The locally stationary long-memory FARIMA model has the same equation as that of Equation (23), except that *d* becomes *dt*, which is a time-varying parameter [119]. The locally stationary long-memory FARIMA model captures the local self-similarity of the system.

The generalized locally stationary long-memory process FARIMA model was investigated in [119]. For example, a generalized FARIMA model, which is called the Gegenbauer autoregressive moving average (GARMA), was introduced in [120]. The GARMA model is defined as:

$$
\Phi(B)(1 - 2\mu B + B^2)^d X\_{\ell} = \Theta(B)\varepsilon\_{\ell,\prime} \tag{24}
$$

where *u* ∈ [−1, 1], which is a parameter that can control the frequency at which the long memory occurs. The parameter *d* controls the rate of decay of the autocovariance function. The GARMA model can also be extended to the so-called "*k*-factor GARMA model", which allows for long-memory behaviors to be associated with each of *k* frequencies (Gegenbauer frequencies) in the interval [0, 0.5] [121].

#### *2.4. Continuous Time Random Walk (CTRW)*

The CTRW model was proposed by Montroll and Weiss as a generalization of diffusion processes to describe the phenomenon of anomalous diffusion [19]. The basic idea is to calculate the PDF for the diffusion process by replacing the discrete steps with continuous time, along with a PDF for step lengths and a waiting-time PDF for the time intervals between steps. Montroll and Weiss applied random intervals between the successive steps in the walking process to account for local structure in the environment, such as traps [122]. The CTRW has been used for modeling multiple complex phenomena, such as chaotic dynamic networks [123]. The correlation between CTRW and diffusion equations with fractional time derivatives has also been established [124]. Meanwhile, time-space fractional diffusion equations can be treated as CTRWs with continuously distributed jumps or continuum approximations of CTRWs on lattices [125].

#### *2.5. Unmanned Aerial Vehicles (UAVs) and Precision Agriculture*

As a new remote-sensing platform, researchers are more and more interested in the potential of small UAVs for precision agriculture [126–136], especially for heterogeneous crops, such as vineyards and orchards [137,138]. Mounted on UAVs, lightweight sensors, such as RGB cameras, multispectral cameras and thermal infrared cameras, can be used to collect high-resolution images. The higher temporal and spatial resolutions of the images, relatively low operational costs, and nearly real-time image acquisition make

the UAVs an ideal platform for mapping and monitoring the variability of crops and trees. UAVs can create big data and demand the FODA due to the "complexity" and, thus, variability inherent in the life process. For example, Figure 7 shows the normalized difference vegetation index (NDVI) mapping of a pomegranate orchard at a USDA ARS experimental field. Under different irrigation levels, the individual trees can show strong variability during the analysis of water stress. Life is complex! Thus, it entails variability, which as discussed above, in turn, entails fractional calculus. UAVs can then become "Tractor 2.0" for farmers in precision agriculture.

**Figure 7.** Normalized difference vegetation index (NDVI) mapping of pomegranate trees.

#### **3. Optimal Machine Learning and Optimal Randomness**

**Machine learning (ML)** is the science (and art) of programming computers so they can learn from data [139]. A more engineering-oriented definition was given by Tom Mitchell in 1997 [140], "A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E".

Most ML algorithms perform training by solving optimization problems that rely on first-order derivatives (Jacobians), which decide whether to increase or decrease weights. For huge speed boosts, faster optimizers are being used instead of the regular gradient descent optimizer. For example, the most popular boosters are momentum optimization [141], Nesterov acelerated gradient [21], AdaGrad [142], RMSProp [143] and Adam optimization [144]. The second-order (Hessian) optimization methods usually find the solutions with faster rates of convergence but with higher computational costs. Therefore, the answer to the following question is important: what is a more optimal ML algorithm? What if the derivative is fractional order instead of integer order? In this section, we discuss some applications of fractional-order gradients to optimization methods in machine-learning algorithms and investigate the accuracy and convergence rates.

As mentioned in the big data section, there is a huge amount of data in human society and nature. During the learning process of ML, we care not only about the speed, but also the accuracy of the data the machine is learning (Figure 8). The learning algorithm is important; otherwise, the data labeling and other labor costs will exhaust people beyond their abilities. When applying the accoladed artificial intelligence (AI) to an algorithm, a strong emphasis is on artificial, only followed weakly by intelligence. Therefore, the key to ML is what optimization methods are being applied. The convergence rate and global searching are two important parts of the optimization method.

**Figure 8.** Data analysis in nature.

**Reflection:** ML is, today, a hot research topic and will probably remain so into the near future. How a machine can learn efficiently (optimally) is always important. The key for the learning process is the optimization method. Thus, in designing an efficient optimization method, it is necessary to answer the following three questions:


**Optimal randomness:** In the section on the Lévy PDF, the Lévy flight is the search strategy for food the albatross has developed over millions of years of evolution. Admittedly, this is a slow optimization procedure [84]. From this perspective, we should call "Lévy distribution" an optimized or learned randomness used by albatrosses for searching for food. Therefore, we pose the question: "can the search strategy be more optimal than Lévy flight?" The answer is yes if one adopts the FC [145]! Optimization is a very complex area of study. However, a few studies have investigated using FC to obtain a better optimization strategy.

Theoretically, there are two broad optimization categories; these are derivative-free and gradient-based. For the derivative-free methods, there are the direct-search methods, consisting of particle swarm optimization (PSO) [146,147], etc. For the gradient-based methods, there are gradient descent and its variants. Both of the two categories have shown better performance when using the FC as demonstrated below.

#### *3.1. Derivative-Free Methods*

For derivative-free methods, there are single agent search and swarm-based search methods (Figure 9). Exploration is often achieved by randomness or random numbers in terms of some predefined PDFs. Exploitation uses local information such as gradients to search local regions more intensively, and such intensification can enhance the rate of convergence. Thus, a question was posed: what is the optimal randomness? Wei et al. [148] investigated the optimal randomness in a swarm-based search. Four heavy-tailed PDFs have been used for sample path analysis (Figure 10). Based on the experimental results, the randomness-enhanced cuckoo search (CS) algorithms [66,149,150] can identify the unknown specific parameters of a fractional-order system with better effectiveness and robustness. The randomness-enhanced CS algorithms can be considered as a promising tool for solving real-world complex optimization problems. The reason is that optimal randomness is applied with fractional-order noise during the exploration, which is more optimal than the "optimized PSO", CS. The fractional-order noise refers to the stable PDFs [48]. In other words, when we are discussing optimal randomness, we are discussing fractional calculus!

**Figure 9.** The 2-D Alpine function for derivative-free methods; there are (**a**) single agent search and (**b**) swarm-based search methods.

**Figure 10.** Sample paths. Wei et al. [148] investigated the optimal randomness in a swarm-based search. Four heavy-tailed PDFs were used for sample path analysis; there are (**a**) Mittag-Leffler distribution, (**b**) Weibull distribution, (**c**) Pareto distribution, and (**d**) Cauchy distribution. The Long steps, referring to the jump length, frequently happened for all distributions, which showed strong heavy-tailed performance. For more details, please refer to [148].

#### *3.2. The Gradient-Based Methods*

The gradient descent (GD) is a very common optimization algorithm, which can find the optimal solutions by iteratively tweaking parameters to minimize the cost function. The stochastic gradient descent (SGD) randomly selects times during the training process. Therefore, the cost function bounces up and down, decreasing on average, which is good for escape from local optima. Sometimes, noise is added into the GD method, and usually, such noise follows a Gaussian PDF in the literature. We ask, "why not heavy-tailed PDFs"? The answer to this question could lead to interesting future research.

#### Nesterov Accelerated Gradient Descent (NAGD)

There are many variants of GD analysis as suggested in Figure 11. One of the most popular methods is the NAGD [21]:

$$\begin{cases} y\_{k+1} = ay\_k - \mu \nabla f(\mathbf{x}\_k)\_\prime \\ \mathbf{x}\_{k+1} = \mathbf{x}\_k + y\_{k+1} + by\_{k\prime} \end{cases} \tag{25}$$

where by setting *b* = −*a*/(1 + *a*), one can derive the NAGD. When *b* = 0, one can derive the momentum GD. The NAGD can also be formulated as:

$$\begin{cases} \mathbf{x}\_k = y\_{k-1} - \mu \nabla f(y\_{k-1}), \\ y\_k = \mathbf{x}\_k + \frac{k-1}{k+2} (\mathbf{x}\_k - \mathbf{x}\_{k-1}). \end{cases} \tag{26}$$

Set *t* = *k* √*μ*, and one can, in the continuous limit, derive the corresponding differential equation:

$$
\dot{X} + \frac{3}{t}\dot{X} + \nabla f(X) = 0.\tag{27}
$$

The main idea of Jordan's work [151] is to analyze the iteration algorithm in the continuoustime domain. For differential equations, one can use the Laypunov or variational method to analyze the properties; for example, the convergence rate is *O*( <sup>1</sup> *<sup>t</sup>*<sup>2</sup> ). One can also use the variational method to derive the master differential equation for an optimization method, such as the least action principle [152], Hamilton's variational principle [153] and the quantum-mechanical path integral approach [154]. Wilson et al. [151] built a Euler–Lagrange function to derive the following equation:

$$
\ddot{X}\_t + 2\gamma \ddot{X}\_t + \frac{\gamma^2}{\mu} \nabla f(X\_t) = 0. \tag{28}
$$

which is in the same form as the master differential equation of NAGD.

**Figure 11.** Gradient descent and its variants.

Jordan's work revealed that one can transform an iterative (optimization) algorithm to its continuous-time limit case, which can simplify the analysis (Laypunov methods). One can directly design a differential equation of motion (EOM) and then discretize it to derive an iterative algorithm (variational method). The key is to find a suitable Laypunov functional to analyze the stability and convergent rate. The new exciting fact established by Jordan is that optimization algorithms can be systematically synthesized using Lagrangian mechanics (Euler–Lagrange) through EOMs.

Thus, is there an optimal way to optimize using optimization algorithms stemming from Equation (28)? Obviously, why not an equation such as Equation (28) of fractional order? Considering the *<sup>X</sup>*˙ *<sup>t</sup>* as *<sup>X</sup>*(*α*) *<sup>t</sup>* , it will provide us with more research possibilities, such as the fractional-order calculus of variation (FOCV) and fractional-order Euler–Lagrange (FOEL) equation. For the SGD, optimal randomness using the fractional-order noises can also offer better than the best performance, similarly shown by Wei et al. [148].

#### *3.3. What Can the Control Community Offer to ML?*

In the IFAC 2020 World Congress Pre-conference Workshop, Eric Kerrigan proposed "The Three Musketeers" that the control community can contribute to ML [155]. These three are the IMP [23], the Nu-Gap metric [156] and model discrimination [157]. Herein, we focused on the IMP. Kashima et al. [158] transferred the convergence problem of numerical algorithms into a stability problem of a discrete-time system. An et al. [159] explained that the commonly used SGD-momentum algorithm in ML is a PI controller and designed a PID algorithm. Motivated by [159] but differently from M. Jordan's work, we proposed designing and analyzing the algorithms in the *S* or *Z* domain. Remember that GD is a first-order algorithm:

$$\mathbf{x}\_{k+1} = \mathbf{x}\_k - \mu \nabla f(\mathbf{x}\_k)\_\prime \tag{29}$$

where *μ* > 0 is the step size (or learning rate). Using the Z transform, one can achieve:

$$X(z) = \frac{\mu}{z - 1} [-\nabla f(\mathbf{x}\_k)]\_z. \tag{30}$$

Approximate the gradient around the extreme point *x*∗, and one can obtain:

$$
\nabla f(\mathbf{x}\_k) \approx A(\mathbf{x}\_k - \mathbf{x}^\*), \text{ with } A = \nabla^2 f(\mathbf{x}^\*). \tag{31}
$$

**Figure 12.** The integrator model (embedded in *G*(*z*)). The integrator in the forward loop eliminates the tracking steady-state error for a constant reference signal (internal model principle (IMP)).

For the plain GD in Figure 12, we have *G*(*z*) = 1/(*z* − 1), which is an integrator. For fractional-order GD (FOGD), the updating term of *xk* in Equation (29) can be treated as a filtered gradient signal. In [160], Fan et al. shared similar thoughts: "Accelerating the convergence of the moment method for the Boltzmann equation using filters". The integrator in the forward loop eliminates the tracking error for a constant reference signal according to the internal model principle (IMP). Similarly, the GD momentum (GDM) designed to accelerate the conventional GD, which is popularly used in ML, can be analyzed using Figure 12 by:

$$\begin{cases} y\_{k+1} = \alpha y\_k - \mu \nabla f(\mathbf{x}\_k)\_\prime \\ \mathbf{x}\_{k+1} = \mathbf{x}\_k + y\_{k+1} \end{cases} \tag{32}$$

where *yk* is the accumulation of the history gradient and *α* ∈ (0, 1) is the rate of the moving average decay. Using the Z transform for the update rule, one can derive:

$$\begin{cases} zY(z) = aY(z) - \mu[\nabla f(x\_k)]\_{z\prime} \\ zX(z) = X(z) + zY(z). \end{cases} \tag{33}$$

Then, after some algebra, one obtains the following equation:

$$X(z) = \frac{\mu z}{(z-1)(z-a)} [-\nabla f(x\_k)]\_z. \tag{34}$$

For the GD momentum, we have *G*(*z*) = *<sup>z</sup>* (*z*−1)(*z*−*α*) in Figure 12, with an integrator in the forward loop. The GD momentum is a second-order (*G*(*z*)) algorithm with an additional pole at *z* = *α* and one zero at *z* = 0. The "second-order" refers to the order of *G*(*z*), which makes it different from the algorithm using the *Hessian* matrix information. Moreover, NAGD can be simplified as:

$$\begin{cases} y\_{k+1} = \mathbf{x}\_k - \mu \nabla f(\mathbf{x}\_k)\_\prime \\ x\_{k+1} = (1 - \lambda) y\_{k+1} + \lambda y\_{k\prime} \end{cases} \tag{35}$$

where *μ* is the step size and *λ* is a weighting coefficient. Using the Z transform for the update rule, one can derive:

$$\begin{cases} zY(z) = X(z) - \mu[\nabla f(\mathbf{x}\_k)]\_{z\prime} \\ zX(z) = (1-\lambda)zY(z) + \lambda Y(z). \end{cases} \tag{36}$$

Different from the GD momentum, and after some algebra, one obtains:

$$X(z) = \frac{-(1 - \lambda)z - \lambda}{(z - 1)(z + \lambda)} \mu [\nabla f(\mathbf{x}\_k)]\_z = \frac{z + \frac{\lambda}{1 - \lambda}}{(z - 1)(z + \lambda)} \mu (1 - \lambda) [-\nabla f(\mathbf{x}\_k)]\_z. \tag{37}$$

For NAGD, we have *<sup>G</sup>*(*z*) = *<sup>z</sup>*<sup>+</sup> *<sup>λ</sup>* 1−*λ* (*z*−1)(*z*+*λ*), again, with an integrator in the forward loop (Figure 12). NAGD is a second-order algorithm with an additional pole at *z* = −*λ* and a zero at *<sup>z</sup>* <sup>=</sup> <sup>−</sup>*<sup>λ</sup>* <sup>1</sup>−*<sup>λ</sup>* .

"Can *G*(*z*) be of higher order or fractional order"? Of course it can! As shown in Figure 12, a necessary condition for the stability of an algorithm is that all the poles of the closed-loop system are within the unit disc. If the Lipschitz continuous gradient constant *L* is given, one can replace *A* with *L*, and then, the condition is sufficient. For each *G*(*z*), there is a corresponding iterative optimization algorithm. *G*(*z*) can be a third- or higher-order system. Apparently, *G*(*z*) can also be a fractional-order system. Considering a general second-order discrete system:

$$G(z) = \frac{z+b}{(z-1)(z-a)},\tag{38}$$

the corresponding iterative algorithm is Equation (25). As mentioned earlier, when setting *b* = −*a*/(1 + *a*), one can derive the NAGD. When *b* = 0, one can derive the momentum GD. The iterative algorithm can be viewed as a state-space realization of the corresponding system. Thus, it may have many different realizations (all are equivalent). Since two parameters *a* and *b* are introduced for a general second-order algorithm design, we used the integral squared error (ISE) as the criterion to optimize the parameters. This is because for different target functions *f*(*x*), the Lipschitz continuous gradient constant is different. Thus, the loop forward gain is defined as *ρ* := *μA*.


**Table 3.** General second-order algorithm design. The parameter *ρ* is the loop forward gain; see text for more details.

According to the experimental results (Table 3), interestingly, it is found that the optimal *a* and *b* satisfy *b* = −*a*/(1 + *a*), which is the same design as NAGD. Other criteria such as the IAE and ITAE were used to find other optimal parameters, but the results are the same as for the ISE. Differently from for NAGD, the parameters were determined by search optimization rather than by mathematical design, which can be extended to more general cases. The algorithms were then tested using the MNIST dataset (Figure 13). It is obvious that for different zeros and poles, the performance of the algorithms is different. One finds that both the *b* = −0.25 and *b* = −0.5 cases perform better than does the SGD momentum. Additionally, both *b* = 0.25 and *b* = 0.5 perform worse. It is also shown that an additional zero can improve the performance, if adjusted properly. It is interesting to observe that both the method and the Nesterov method give an optimal choice of the zero, which is closely related to the pole (*b* = −*a*/(1 + *a*)).

**Figure 13.** Training loss (**left**); test accuracy (**right**). It is obvious that for different zeros and poles, the performance of the algorithms is different. One finds that both the *b* = −0.25 and *b* = −0.5 cases perform better than does the stochastic gradient descent (SGD) momentum. Additionally, both *b* = 0.25 and *b* = 0.5 perform worse. It is also shown that an additional zero can improve the performance, if adjusted carefully.

Now, let us consider a general third-order discrete system:

$$G(z) = \frac{z^2 + cz + d}{(z - 1)(z^2 + az + b)}.\tag{39}$$

Set *b* = *d* = 0; it will reduce to the second-order algorithm discussed above. Compared with the second-order case, the poles can now be complex numbers. More generally, a higher-order system can contain more internal models. If all the poles are real, then:

$$G(z) = \frac{1}{(z-1)} \frac{(z-c)}{(z-a)} \frac{(z-d)}{(z-b)},\tag{40}$$

whose corresponding iterative optimization algorithm is

$$\begin{cases} y\_{k+1} = y\_k - \mu \nabla f(\mathbf{x}\_k), \\ z\_{k+1} = az\_k + y\_{k+1} - cy\_{k\prime} \\ x\_{k+1} = b\mathbf{x}\_k + z\_{k+1} - dz\_k. \end{cases} \tag{41}$$

**Table 4.** General third-order algorithm design, with parameters defined by Equation (41).


After some experiments (Table 4), it was found that since the ISE was used for tracking a step signal (it is quite simple), the optimal poles and zeros are the same as for the second-order case with a pole-zero cancellation. This is an interesting discovery. In this optimization result, all the poles and zeros are real, and the resulting performance is not very good, as expected. Compare this with the second-order case; the only difference is that in the latter, complex poles can possibly appear. Thus, the question arises: "how do complex poles play a role in the design?" The answer is obvious: by fractional calculus!

Inspired by M. Jordan's idea in the frequency domain, a continuous time fractionalorder system was designed:

$$G(s) = \frac{1}{s(s^\alpha + \beta)},\tag{42}$$

where *α* ∈ (0, 2), *β* ∈ (0, 20] at first. It was then found that the optimal parameters were obtained by searching using the ISE criterion (Table 5).


**Table 5.** The continuous time fractional-order system.

Equation (42) encapsulates the continuous-time design, and one can use the numerical inverse Laplace transform (NILP) [161] and Matlab command **stmcb( )** [162] to derive its discrete form. After the complex poles are included, one can have:

$$G(z) = \frac{(z+c)}{(z-1)}(\frac{1}{z-a+jb} + \frac{1}{z-a-jb})\tag{43}$$

whose corresponding iterative algorithm is:

$$\begin{cases} y\_{k+1} = ay\_k - bz\_k - \mu \nabla f(\mathbf{x}\_k), \\ z\_{k+1} = az\_k + by\_{k'} \\ x\_{k+1} = x\_k + y\_{k+1} + cy\_k. \end{cases} \tag{44}$$

Then, the algorithms were tested again using the MNIST dataset, and the results were compared with the SGD's. For the fractional order, *ρ* = 0.9 was used, *a* = 0.6786, *b* = 0.1354, and different values for zero *c* were used. When *c* = 0, the result was similar to that for the second-order SGD. When *c* was not equal to 0, the result was similar to that for the second-order NAGD. For the SGD, *α* was set to be 0.9, and the learning rate was 0.1

(Figure 14). Both *c* = 0 and *c* = 0.283 perform better than the SGD momentum; generally, with appropriate values of *c*, better performance can be achieved than in the second-order cases. The simulation results demonstrate that fractional calculus (complex poles) can potentially improve the performance, which is closely related to the learning rate.

**Figure 14.** Training loss (**left**); test accuracy (**right**).

In general, M. Jordan asked the question: "is there an optimal way to optimize?". Our answer is a resounding yes, by limiting dynamics analysis and discretization and SGD with other randomness, such as Langevin motion. Herein, the question posed was: "is there a more optimal way to optimize?". Again, the answer is yes, but it requires the fractional calculus to be used to optimize the randomness in SGD, random search and the IMP. There is more potential for further investigations along this line of ideas.

#### **4. A Case Study of Machine Learning with Fractional Calculus: A Stochastic Configuration Network with Heavytailedness**

#### *4.1. Stochastic Configuration Network (SCN)*

The SCN model is generated incrementally by using stochastic configuration (SC) algorithms [163]. Compared with the existing randomized learning algorithms for singlelayer feed-forward neural networks (SLFNNs) [164], the SCN can randomly assign the input weights (w) and biases (b) of the hidden nodes in a supervisory mechanism, which is selecting random parameters with an inequality constraint and assigning the scope of the random parameters adaptively. It can ensure that the built randomized learner models have a universal approximation property. Then, the output weights are analytically evaluated in either a constructive or selective manner [163]. In contrast with the known randomized learning algorithms, such as the randomized radial basis function (RRBF) networks [165] and the random vector functional link (RVFL) [166], the SCN can provide good generalization performance at a faster speed. Concretely, there are three types of SCN algorithms, which are labeled for convenience as SC-I, SC-II and SC-III.

The SC-I algorithm uses a constructive scheme to evaluate the output weights only for the newly added hidden node [167]. All of the previously obtained output weights are kept the same. The SC-II algorithm recalculates part of the current output weights by analyzing a local-least-squares problem with a user-defined shifting window size. The SC-III algorithm finds all the output weights together by solving a global-least-squares problem. The SCN has better performance than other randomized neural networks in terms of fast learning, the scope of the random parameters, and the required human intervention. Therefore, it has already been used in many data-processing projects, such as [134,168,169].

#### *4.2. SCN with Heavy-Tailed PDFs*

For the original SCN algorithms, weights and biases are randomly generated using a uniform PDF. Randomness plays a significant role in both exploration and exploitation. A good neural network architecture with randomly assigned weights can easily outperform a more deficient architecture with finely tuned weights [170]. Therefore, it is critical to discuss the optimal randomness for the weights and biases in SCN algorithms. Heavytailed PDFs have shown optimal randomness for finding targets [171,172], which plays a significant role in exploration and exploitation [148]. Therefore, herein, heavy-tailed PDFs were used to randomly update the weights and biases in the hidden layers to determine if the SCN models display improved performance. Some of the key parameters of the SCN models are listed in Table 6. For example, the maximum times of random configuration T*max* are set as 200. The scale factor lambda in the activation function, which directly determines the range for the random parameters, was examined by using different settings (0.5–200). The tolerance was set as 0.05. Most of the parameters for the SCN with heavytailed PDFs were kept the same with the original SCN algorithms for comparison purposes. For more details, please refer to [163] and Appendix A.


**Table 6.** Stochastic configuration networks (SCNs) with key parameters.

*4.3. A Regression Model and Parameter Tuning*

The dataset of the regression model was generated by a real-valued function [173]:

$$f(\mathbf{x}) = 0.2e^{-(10\mathbf{x} - 4)^2} + 0.5e^{-(80\mathbf{x} - 40)^2} + 0.3e^{-(80\mathbf{x} - 20)^2},\tag{45}$$

where x ∈ [0, 1]. There were 1000 points randomly generated from the uniform distribution on the unit interval [0, 1] in the training dataset. The test set had 300 points generated from a regularly spaced grid on [0, 1]. The input and output attributes were normalized into [0, 1], and all the results reported in this research represent averages over 1000 independent trials. The settings of the parameters were similar to for the SCN in [163].

Heavy-tailed PDF algorithms have user-defined parameters, for example, the powerlaw index for SCN-Lévy, and location and scale parameters for SCN-Cauchy and SCN-Weibull, respectively. Thus, to illustrate the effect of parameters on the optimization results and to offer reference values for the proposed SCN algorithms, parameter analysis was conducted, and corresponding experiments were performed. Based on the experimental results, for the SCN-Lévy algorithm, the most optimal power-law index is 1.1 for achieving the minimum number of hidden nodes. For the SCN-Weibull algorithm, the optimal location parameter *α* and scale parameter *β* for the minimum number of hidden nodes are 1.9

and 0.2, respectively. For the SCN-Cauchy algorithm, the optimal location parameter *α* and scale parameter *β* for the minimum number of hidden nodes are 0.9 and 0.1, respectively.

#### Performance Comparison among SCNs with Heavy-Tailed PDFs

In Table 7, the performance of SCN, SCN-Lévy, SCN-Cauchy, SCN-Weibull and SCN-Mixture are shown, in which mean values are reported based on 1000 independent trials. Wang et al. [163] used time cost to evaluate the SCN algorithms' performance. In the present study, the authors used the mean hidden node numbers to evaluate the performance. The number of hidden nodes is associated with modeling accuracy. Therefore, herein, the analysis determined if an SCN with heavy-tailed PDFs used fewer hidden nodes to generate high performance, which would make the NNs less complex. According to the numerical results, the SCN-Cauchy used the lowest number of mean hidden nodes, 59, with an root mean squared error (RMSE) of 0.0057. The SCN-Weibull had a mean number of 63 hidden nodes, with an RMSE of 0.0037. The SCN-Mixture had a mean number of 70 hidden nodes, with an RMSE of 0.0020. The mean number of hidden nodes for SCN-Lévy was also 70. The original SCN model had a mean number of 75 hidden nodes. A more detailed training process is shown in Figure 15. With fewer hidden node numbers, the SCN models with heavy-tailed PDFs can be faster than the original SCN model. The neural network structure is also less complicated than the SCN. Our numerical results for the regression task demonstrate remarkable improvements in modeling performance compared with the current SCN model results.

**Table 7.** Performance comparison of SCN models for regression problem.


#### *4.4. MNIST Handwritten Digit Classification*

The handwritten digit dataset contains 4000 training examples and 1000 testing examples, a subset of the MNIST handwritten digit dataset. Each image is a 20-by-20-pixel grayscale image of the digit (Figure 16). Each pixel is represented by a number indicating the grayscale intensity at that location. The 20-by-20 grid of pixels is "unrolled" into a 400-dimensional vector.

**Figure 15.** Performance of SCN, SCN-Lévy, SCN-Weibull, SCN-Cauchy and SCN-Mixture. The parameter L is the hidden node number.

**Figure 16.** The handwritten digit dataset example.

Similar to the parameter tuning for the regression model, parameter analysis was conducted to illustrate the impact of parameters on the optimization results and to offer reference values for the MNIST handwritten digit classification SCN algorithms. Corresponding experiments were performed. According to the experimental results, for the SCN-Lévy algorithm, the most optimal power law index is 1.6 for achieving the best RMSE performance. For the SCN-Cauchy algorithm, the optimal location parameter *α* and scale parameter *β* for the lowest RMSE are 0.2 and 0.3, respectively.

#### Performance Comparison among SCNs on MNIST

The performance of the SCN, SCN-Lévy, SCN-Cauchy and SCN-Mixture are shown in Table 8. Based on the experimental results, the SCN-Cauchy, SCN-Lévy and SCN-Mixture have better performance in training and test accuracy, compared with the original SCN model. A detailed training process is shown in Figure 17. Within around 100 hidden nodes, the SCN models with heavy-tailed PDFs perform similarly to the original SCN model. When the number of the hidden nodes is greater than 100, the SCN models with heavy-tailed PDFs have lower RMSEs. Since more parameters for weights and biases are initialized in heavy-tailed PDFs, this may cause an SCN with heavy-tailed PDFs to converge to the optimal values at a faster speed. The experimental results for the MNIST handwritten classification problem demonstrate improvements in modeling performance. They also show that SCN models with heavy-tailed PDFs have a better search ability for achieving lower RMSEs.

**Table 8.** Performance comparison of SCNs.


**Figure 17.** Classification performance of SCNs.

#### **5. Take-Home Messages and Looking into the Future: Fractional Calculus Is Physics Informed**

Big data and machine learning (ML) are two of the hottest topics of applied scientific research, and they are closely related to one another. To better understand them, in this article, we advocate fractional calculus (FC), as well as fractional-order thinking (FOT), for big data and ML analysis and applications. In Section 2, we discussed the relationships between big data, variability and FC, as well as why fractional-order data analytics (FODA) should be used and what it is. The topics included the Hurst parameter, fractional Gaussian noise (fGn), fractional Brownian motion (fBm), the fractional autoregressive integrated moving average (FARIMA), the formalism of continuous time random walk (CTRW), unmanned aerial vehicles (UAVs) and precision agriculture (PA).

In Section 3, how to learn efficiently (optimally) for ML algorithms is discussed. The key to developing an efficient learning process is the method of optimization. Thus, it is important to design an efficient optimization method. The derivative-free methods, as well as the gradient-based methods, such as the Nesterov accelerated gradient descent (NAGD), are discussed. Furthermore, it is shown to be possible, following the internal model principle (IMP), to design and analyze the ML algorithms in the S or Z transform domain in Section 3.3. FC is used in optimal randomness in the methods of stochastic gradient descent (SGD) and random search. Nonlocal models have commonly been used to describe physical systems and/or processes that cannot be accurately described by classical approaches [174]. For example, fractional nonlocal Maxwell's equations and the corresponding fractional wave equations were applied in [175] for fractional vector calculus [176]. The nonlocal differential operators [177], including nonlocal analogs of the gradient/Hessian, are the key of these nonlocal models, which could lead to very interesting research with FC in the near future.

Fractional dynamics is a response to the need for a more advanced characterization of our complex world to capture structure at very small or very large scales that had previously been smoothed over. If one wishes to obtain results that are better than the best possible using integer-order calculus-based methods, or are "more optimal", we advocate applying FOT and going fractional! In this era of big data, decision and control need FC, such as fractional-order signals, systems and controls. The future of ML should be physics-informed, scientific (cause–effect embedded or cause–effect discovery) and involving the use of FC, where the modeling is closer to nature. Laozi (unknown, around the 6th century to 4th century BC), the ancient Chinese philosopher, is said to have written

a short book *Dao De Jing*(*Tao Te Ching*), in which he observed: "The Tao that can be told is not the eternal Tao" [178]. People over thousands of years have shared different understandings of the meaning of the Tao. Our best understanding of the Tao is nature, whose rules of complexity can be explained in a non-normal way. Fractional dynamics, FC and heavytailedness may well be that non-normal way (Figure 18), at least for the not-too-distant future.

**Figure 18.** Timeline of FC (courtesy of Professor Igor Podlubny).

**Author Contributions:** H.N. drafted the original manuscript based on numerous talks/discussions with Y.C. in the past several years plus a seminar (http://mechatronics.ucmerced.edu/news/2020 /why-big-data-and-machine-learning-must-meet-fractional-calculus) (accessed on 2 February 2021). Y.C. and B.J.W. contributed to the result interpretation, discussions and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** Thanks go to Jiamin Wei, Yuquan Chen, Guoxiang Zhang, Tiebiao Zhao, Lihong Guo, Zhenlong Wu, Yanan Wang, Panpan Gu, Jairo Viola, Jie Yuan, etc., for walks, chats and tea/coffee breaks at Castle, Atwater, CA, before the COVID-19 era. In particular, Yuquan Chen performed computation in various IMP-based GD schemes, and Jiamin Wei performed the computation in cuckoo searches using four different heavy-tailed randomnesses. YangQuan Chen would like to thank Justin Dianhui Wang for many fruitful discussions in the past years on SCN, in particular, and machine learning in general. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. Last but not least, we thank the helpful reviewers for constructive comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:



#### **Appendix A. SCN Codes**

The Matlab and Python codes can be found at https://github.com/niuhaoyu16/ StochasticConfigurationNetwork (accessed on 2 February 2021).

#### **References**


## *Article* **Skellam Type Processes of Order** *k* **and Beyond**

#### **Neha Gupta 1, Arun Kumar <sup>1</sup> and Nikolai Leonenko 2,\***


Received: 31 August 2020; Accepted: 19 October 2020; Published: 22 October 2020

**Abstract:** In this article, we introduce the Skellam process of order *k* and its running average. We also discuss the time-changed Skellam process of order *k*. In particular, we discuss the space-fractional Skellam process and tempered space-fractional Skellam process via time changes in Skellam process by independent stable subordinator and tempered stable subordinator, respectively. We derive the marginal probabilities, Lévy measures, governing difference-differential equations of the introduced processes. Our results generalize the Skellam process and running average of Poisson process in several directions.

**Keywords:** Skellam process; subordination; Lévy measure; Poisson process of order *k*; running average

#### **1. Introduction**

The Skellam distribution is obtained by taking the difference between two independent Poisson distributed random variables, which was introduced for the case of different intensities *λ*1, *λ*<sup>2</sup> by (see [1]) and for equal means in [2]. For large values of *λ*<sup>1</sup> + *λ*2, the distribution can be approximated by the normal distribution and if *λ*<sup>2</sup> is very close to 0, then the distribution tends to a Poisson distribution with intensity *λ*1. Similarly, if *λ*<sup>1</sup> tends to 0, the distribution tends to a Poisson distribution with non-positive integer values. The Skellam random variable is infinitely divisible, since it is the difference of two infinitely divisible random variables (see Proposition 2.1 in [3]). Therefore, one can define a continuous time Lévy process for Skellam distribution, which is called Skellam process.

The Skellam process is an integer valued Lévy process and it can also be obtained by taking the difference of two independent Poisson processes. Its marginal probability mass function (pmf) involves the modified Bessel function of the first kind. The Skellam process has various applications in different areas, such as to model the intensity difference of pixels in cameras (see [4]) and for modeling the difference of the number of goals of two competing teams in a football game [5]. The model based on the difference of two point processes is proposed in (see [6–9]).

Recently, the time-fractional Skellam process has been studied in [10], which is obtained by time-changing the Skellam process with an inverse stable subordinator. Further, they provided the application of time-fractional Skellam process in modeling of arrivals of jumps in high frequency trading data. It is shown that the inter-arrival times between the positive and negative jumps follow a Mittag–Leffler distribution rather then the exponential distribution. Similar observations are observed in the case of Danish fire insurance data (see [11]). Buchak and Sakhno, in [12], have also proposed the governing equations for time-fractional Skellam processes. Recently, [13] introduced time-changed Poisson process of order *k*, which is obtained by time changing the Poisson process of order *k* (see [14]) by general subordinators.

In this paper, we introduce Skellam process of order *k* and its running average. We also discuss the time-changed Skellam process of order *k*. In particular, we discuss space-fractional Skellam process and tempered space-fractional Skellam process via time changes in Skellam process by independent stable subordinator and tempered stable subordinator, respectively. We obtain closed form expressions for the marginal distributions of the considered processes and other important properties. Skellam process is used to model the difference between the number of goals between two teams in a football match. At the beginning, both teams have scores 0 each and at time *t* the team 1 score is *N*1(*t*), which is the cumulative sum of arrivals (goals) of size 1 until time *t* with exponential inter-arrival times. Similarly for team 2, the score is *N*2(*t*) at time *t*. The difference between the number of goals can be modeled using *N*1(*t*) − *N*2(*t*) at time *t*. Similarly, the Skellam process of order *k* can be used to model the difference between the number of points scored by two competing teams in a basketball match where *k* = 3. Note that, in a basketball game, a free throw is count as one point, any basket from a shot taken from inside the three-point line counts for two points and any basket from a shot taken from outside the three-point line is considered as three points. Thus, a jump in the score of any team may be of size one, two, or three. Hence, a Skellam process of order 3 can be used to model the difference between the points scored.

In [10], it is shown that the fractional Skellam process is a better model then the Skellam process for modeling the arrivals of the up and down jumps for the tick-by-tick financial data. Equivalently, it is shown that the Mittag–Leffler distribution is a better model than the exponential distribution for the inter-arrival times between the up and down jumps. However, it is evident from Figure 3 of [10] that the fractional Skellam process is also not perfectly fitting the arrivals of positive and negative jumps. We hope that a more flexible class of processes like time-changed Skellam process of order *k* (see Section 6) and the introduced tempered space-fractional Skellam process (see Section 7) would be better model for arrivals of jumps. Additionally, see [8] for applications of integer-valued Lévy processes in financial econometrics. Moreover, distributions of order *k* are interesting for reliability theory [15]. The Fisher dispersion index is a widely used measure for quantifying the departure of any univariate count distribution from the equi-dispersed Poisson model [16–18]. The introduced processes in this article can be useful in modeling of over-dispersed and under-dispersed data. Further, in (49), we present probabilistic solutions of some fractional equations.

The remainder of this paper proceeds, as follows: in Section 2, we introduce all the relevant definitions and results. We also derive the Lévy density for space- and tempered space-fractional Poisson processes. In Section 3, we introduce and study running average of Poisson process of order *k*. Section 4 is dedicated to Skellam process of order *k*. Section 5 deals with running average of Skellam process of order *k*. In Section 6, we discuss the time-changed Skellam process of order *k*. In Section 7, we determine the marginal pmf, governing equations for marginal pmf, Lévy densities, and moment generating functions for space-fractional Skellam process and tempered space-fractional Skellam process.

#### **2. Preliminaries**

In this section, we collect relevant definitions and some results on Skellam process, subordinators, space-fractional Poisson process, and tempered space-fractional Poisson process. These results will be used to define the space-fractional Skellam processes and tempered space-fractional Skellam processes.

#### *2.1. Skellam Process*

In this section, we revisit the Skellam process and also provide a characterization of it. Let *S*(*t*) be a Skellam process, such that

$$S(t) = N\_1(t) - N\_2(t), \ t \ge 0\_\prime$$

where *N*1(*t*) and *N*2(*t*) are two independent homogeneous Poisson processes with intensity *λ*<sup>1</sup> > 0 and *λ*<sup>2</sup> > 0, respectively. The Skellam process is defined in [8] and the distribution has been introduced and studied in [1], see also [2]. This process is only symmetric when *<sup>λ</sup>*<sup>1</sup> = *<sup>λ</sup>*2. The pmf *sk*(*t*) = P(*S*(*t*) = *<sup>k</sup>*) of *S*(*t*) is given by (see e.g., [1,10])

$$s\_k(t) = e^{-t(\lambda\_1 + \lambda\_2)} \left(\frac{\lambda\_1}{\lambda\_2}\right)^{k/2} I\_{|k|}(2t\sqrt{\lambda\_1\lambda\_2}), \ k \in \mathbb{Z},\tag{1}$$

where *Ik* is modified Bessel function of first kind (see [19], p. 375),

$$I\_k(z) = \sum\_{n=0}^{\infty} \frac{(z/2)^{2n+k}}{n!(n+k)!}.\tag{2}$$

The pmf *sk*(*t*) satisfies the following differential difference equation (see [10])

$$\frac{d}{dt}s\_k(t) = \lambda\_1(s\_{k-1}(t) - s\_k(t)) - \lambda\_2(s\_k(t) - s\_{k+1}(t)), \ k \in \mathbb{Z},\tag{3}$$

with initial conditions *s*0(0) = 1 and *sk*(0) = 0, *k* = 0. For a real-valued Lévy process *Z*(*t*) the characteristic function admits the form

$$\mathbb{E}(\varepsilon^{\mathrm{i}uZ(t)}) = \varepsilon^{t\psi\_{Z}(u)},\tag{4}$$

where the function *ψ<sup>Z</sup>* is called characteristic exponent and it admits the following Lévy-Khintchine representation (see [20])

$$\psi\_{\mathcal{Z}}(u) = iau - bu^2 + \int\_{\mathbb{R}\backslash\{0\}} (e^{iux} - 1 - iux \mathbf{1}\_{\{|x| \le 1\}}) \pi\_{\mathcal{Z}}(dx). \tag{5}$$

Here, *<sup>a</sup>* <sup>∈</sup> <sup>R</sup>, *<sup>b</sup>* <sup>≥</sup> 0 and *<sup>π</sup><sup>Z</sup>* is a Lévy measure. If *<sup>π</sup>Z*(*dx*) = *<sup>ν</sup>Z*(*x*)*dx* for some function *<sup>ν</sup>Z*, then *<sup>ν</sup><sup>Z</sup>* is called the Lévy density of the process *Z*. The Skellam process is a Lévy process, its Lévy density *ν<sup>S</sup>* is a linear combination of two Dirac delta functions, *νS*(*y*) = *λ*1*δ*1(*y*) + *λ*2*δ*−1(*y*) and the corresponding characteristic exponent is given by

$$
\psi\_{S(1)}(\mathfrak{u}) = \int\_{-\infty}^{\infty} (1 - \mathfrak{e}^{-\mathfrak{u}\mathfrak{y}}) \nu\_S(\mathfrak{y}) d\mathfrak{y}.
$$

The moment generating function (mgf) of Skellam process is

$$\mathbb{E}[e^{\theta S(t)}] = e^{-t\left(\lambda\_1 + \lambda\_2 - \lambda\_1 e^{\theta} - \lambda\_2 e^{-\theta}\right)}, \ \theta \in \mathbb{R}.\tag{6}$$

With the help of mgf, one can easily find the moments of Skellam process. In the next result, we give a characterization of Skellam process, which is not available in literature as per our knowledge. For a function *<sup>h</sup>*, we write *<sup>h</sup>*(*δ*) = *<sup>o</sup>*(*δ*) if lim*δ*→<sup>0</sup> *<sup>h</sup>*(*δ*)/*<sup>δ</sup>* = 0.

**Theorem 1.** *Suppose that an arrival process has the independent and stationary increments and it also satisfies the following incremental condition, then the process is Skellam.*

$$\mathbb{P}(S(t+\delta) = m | S(t) = n) = \begin{cases} \lambda\_1 \delta + o(\delta), & m > n, \ m = n + 1; \\ \lambda\_2 \delta + o(\delta), & m < n, \ m = n - 1; \\ 1 - \lambda\_1 \delta - \lambda\_2 \delta + o(\delta), & m = n; \\ o(\delta) & \text{otherwise.} \end{cases}$$

**Proof.** Consider the interval [0,t], which is discretized with *n* sub-intervals of size *δ* each, such that *nδ* = *t*. For *k* ≥ 0, we have

$$\begin{split} \mathbb{P}(S(t) = k) &= \sum\_{m=0}^{\left[\frac{m-k}{2}\right]} \frac{n!}{m!(m+k)!(n-2m-k)!} (\lambda\_1 \delta)^{m+k} (\lambda\_2 \delta)^m (1 - \lambda\_1 \delta - \lambda\_2 \delta)^{n-2m-k} + o(\delta) \\ &= \sum\_{m=0}^{\left[\frac{m-k}{2}\right]} \frac{n!}{m!(m+k)!(n-2m-k)!} \left(\frac{\lambda\_1 t}{n}\right)^{m+k} \left(\frac{\lambda\_2 t}{n}\right)^m \left(1 - \frac{\lambda\_1 t}{n} - \frac{\lambda\_2 t}{n}\right)^{n-2m-k} + o(\delta) \\ &= \sum\_{m=0}^{\left[\frac{m-k}{2}\right]} \frac{(\lambda\_1 t)^{m+k} (\lambda\_2 t)^m}{m!(m+k)!} \frac{n!}{(n-2m-k)!n^{2m+k}} \left(1 - \frac{\lambda\_1 t}{n} - \frac{\lambda\_2 t}{n}\right)^{n-2m-k} + o(\delta) \\ &= e^{-(\lambda\_1 + \lambda\_2)t} \sum\_{m=0}^{\infty} \frac{(\lambda\_1 t)^{m+k} (\lambda\_2 t)^m}{m!(m+k)!}, \end{split}$$

by taking *n* → ∞. The result follows now by using the definition of modified Bessel function of first kind *Ik*. Similarly, it can be proved for *k* < 0.

#### *2.2. Poisson Process of Order k*

In this section, we recall the definition and some important properties of Poisson process of order k (PPoK). Kostadinova and Minkova (see [14]) introduced and studied the PPoK. Let *x*1, *x*2, ··· , *xk* be non-negative integers and *ζ<sup>k</sup>* = *x*<sup>1</sup> + *x*<sup>2</sup> + ··· + *xk*, Π*k*! = *x*1!*x*2!... *xk*! and

$$\Omega(k, n) = \{X = (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_k) | \mathbf{x}\_1 + 2\mathbf{x}\_2 + \dots + k\mathbf{x}\_k = n\}. \tag{7}$$

Additionally, let {*Nk*(*t*)}*t*≥0, represent the PPoK with rate parameter *<sup>λ</sup>t*, then probability mass function (pmf) is given by

$$p\_n^{N^k}(t) = \mathbb{P}(N^k(t) = n) = \sum\_{\mathcal{X} = \Omega(k, n)} e^{-k\lambda t} \frac{(\lambda t)^{\overline{\zeta}\_k}}{\Pi\_k!}.\tag{8}$$

The pmf of *Nk*(*t*) satisfies the following differential-difference equations (see [14])

$$\frac{d}{dt}p\_n^{N^k}(t) = -k\lambda p\_n^{N^k}(t) + \lambda \sum\_{j=1}^{n \wedge k} p\_{n-j}^{N^k}(t), \ \ n = 1, 2, \dots$$

$$\frac{d}{dt}p\_0^{N^k}(t) = -k\lambda p\_0^{N^k}(t), \tag{9}$$

with initial condition *pN<sup>k</sup>* <sup>0</sup> (0) = 1 and *<sup>p</sup>N<sup>k</sup> <sup>n</sup>* (0) = 0 and *n* ∧ *k* = min{*k*, *n*}. The characteristic function of PPoK *Nk*(*t*)

$$\phi\_{N^k(t)}(\mu) = \mathbb{E}[\boldsymbol{\varepsilon}^{i\mu N^k(t)}] = \boldsymbol{\varepsilon}^{-\lambda t (k - \sum\_{j=1}^k \boldsymbol{\varepsilon}^{i\mu j})},\tag{10}$$

where *<sup>i</sup>* <sup>=</sup> √−1. The process PPoK is Lévy, so it is infinite divisible i.e. *<sup>φ</sup>N<sup>k</sup>* (*t*)(*u*)=(*φN<sup>k</sup>* (1)(*u*))*<sup>t</sup>* . The Lévy density for PPoK is easy to derive and it is given by

$$\nu\_{\mathbf{N}^k}(\mathbf{x}) = \lambda \sum\_{j=1}^k \delta\_j(\mathbf{x})\_j$$

where *<sup>δ</sup><sup>j</sup>* is the Dirac delta function concentrated at *<sup>j</sup>*. The transition probability of the PPoK {*Nk*(*t*)}*t*≥<sup>0</sup> is also given by Kostadinova and Minkova [14],

$$\mathbb{P}(N^k(t+\delta) = m | N^k(t) = n) = \begin{cases} 1 - k\lambda\delta, & m = n; \\ \lambda\delta & m = n + i, i = 1, 2, \dots, k; \\ 0 & \text{otherwise}. \end{cases} \tag{11}$$

The probability generating function (pgf) *GN<sup>k</sup>* (*s*, *t*) is given by (see [14])

$$G^{N^k}(s,t) = e^{-\lambda t \langle k - \sum\_{j=1}^k s^j \rangle}. \tag{12}$$

The mean, variance and covariance function of the PPoK are given by

$$\begin{aligned} \mathbb{E}[N^k(t)] &= \frac{k(k+1)}{2}\lambda t; \\ \text{Var}[N^k(t)] &= \frac{k(k+1)(2k+1)}{6}\lambda t; \\ \text{Cov}[N^k(t), N^k(s)] &= \frac{k(k+1)(2k+1)}{6}\lambda (t \wedge s). \end{aligned} \tag{13}$$

#### *2.3. Subordinators*

Let *Df*(*t*) be a real valued Lévy process with non-decreasing sample paths and its Laplace transform has the form

$$\mathbb{E}[\mathfrak{e}^{-sD\_f(t)}] = \mathfrak{e}^{-tf(s)}{}\_{\prime}$$

where

$$f(\mathbf{s}) = b\mathbf{s} + \int\_0^\infty (1 - e^{\mathbf{x}\mathbf{s}}) \pi(d\mathbf{x}), \text{ s } > 0, \ b \ge 0,$$

is the integral representation of Bernstein functions (see [21]). The Bernstein functions are *C*∞, non-negative and such that (−1)*<sup>m</sup> <sup>d</sup><sup>m</sup> dx<sup>m</sup> f*(*x*) ≤ 0 for *m* ≥ 1 [21]. Here, *π* denote the non-negative Lévy measure on the positive half line, such that

$$\int\_0^\infty (\mathfrak{x} \wedge 1)\pi(d\mathfrak{x}) < \infty, \ \pi([0,\infty)) = \infty,$$

and *b* is the drift coefficient. The right continuous inverse *Ef*(*t*) = inf{*u* ≥ 0 : *Df*(*u*) > *t*} is the inverse and first exist time of *Df*(*t*), which is non-Markovian with non-stationary and non-independent increments. Next, we analyze some special cases of Lévy subordinators with drift coefficient b = 0,

$$f(s) = \begin{cases} p\log(1+\frac{s}{a}), \; p>0, \; a>0, & \text{(gamma-stable conductor)};\\ (s+\mu)^a - \mu^a, \; \mu>0, \; 0a$$

It is worth noting that, among the subordinators given in (14), all of the integer order moments of *α*-stable subordinators are infinite and others subordinators have all finite moments.

#### *2.4. The Space-Fractional Poisson Process*

In this section, we discuss the main properties of a space-fractional Poisson process (SFPP). We also provide the Lévy density for SFPP, which is not discussed in the literature. The SFPP *Nα*(*t*) was introduced by (see [22]), as follows

$$N\_d(t) = \begin{cases} N(D\_d(t)), \ t \ge 0, & 0 < a < 1, \\ N(t), \ t \ge 0, & a = 1, \end{cases} \tag{15}$$

where *Dα*(*t*) is an *α*-stable subordinator, which is independent of the homogeneous Poisson process *N*(*t*).

The probability generating function (pgf) of this process is

$$G^{N\_a}(s, t) = \mathbb{E}[s^{N\_a(t)}] = e^{-\lambda^a (1 - s)^a t}, \ |s| \le 1, \ a \in (0, 1). \tag{16}$$

The pmf of SFPP is

$$P^{\mathfrak{a}}(k,t) = \mathbb{P}\{N\_{\mathfrak{a}}(t) = k\} = \frac{(-1)^k}{k!} \sum\_{r=0}^{\infty} \frac{(-\lambda^a)^r t^r}{r!} \frac{\Gamma(ra+1)}{\Gamma(ra-k+1)}$$

$$= \frac{(-1)^k}{k!} {}\_1\psi\_1 \left[ \begin{matrix} (1,a);\\(1-k,a); \end{matrix} \begin{matrix} (-\lambda^a t) \\ \end{matrix} \right],\tag{17}$$

where *<sup>h</sup>ψi*(*z*) is the Fox Wright function (see formula (1.11.14) in [23]). It was shown in [22] that the pmf of the SFPP satisfies the following fractional differential-difference equations

$$\frac{d}{dt}P^{\mathfrak{a}}(k,t) = -\lambda^{\mathfrak{a}}(1-B)^{\mathfrak{a}}P^{\mathfrak{a}}(k,t), \quad \mathfrak{a} \in (0,1], \ k = 1,2,\ldots \tag{18}$$

$$\frac{d}{dt}P^a(0,t) = -\lambda^a P^a(0,t),\tag{19}$$

with initial conditions

$$P^{\mathfrak{a}}(k,0) = \delta\_{k,0\prime} \tag{20}$$

where *δk*,0 is the Kronecker delta function, given by

$$\delta\_{k,0} = \begin{cases} 0, & k \ge 1, \\ 1, & k = 0. \end{cases} \tag{21}$$

The fractional difference operator

$$(1 - B)^{a} = \sum\_{j=0}^{\infty} \binom{a}{j} (-1)^{j} B^{j} \tag{22}$$

is defined in [24], where *B* is the backward shift operator. The characteristic function of SFPP is

$$\mathbb{E}[\boldsymbol{\varepsilon}^{i\mu N\_{\rm a}(t)}] = \boldsymbol{\varepsilon}^{-\lambda^{a}(1-\boldsymbol{\varepsilon}^{i\mu})^{a}t}.\tag{23}$$

**Proposition 1.** *The Lévy density νN<sup>α</sup>* (*x*) *of SFPP is given by*

$$\nu\_{\mathcal{N}\_{\mathbf{t}}}(\mathbf{x}) = \lambda^{\mathbf{a}} \sum\_{n=1}^{\infty} (-1)^{n+1} \binom{\mathbf{a}}{n} \delta\_{\mathbf{n}}(\mathbf{x}). \tag{24}$$

**Proof.** We use Lévy-Khintchine formula (see [20]),

$$\begin{aligned} &\int\_{\mathbb{R}\backslash\{0\}} (e^{iux} - 1)\lambda^{\alpha} \sum\_{n=1}^{\infty} (-1)^{n+1} \binom{\alpha}{n} \delta\_{\mathfrak{n}}(x) dx \\ & \qquad = \lambda^{\alpha} \left[ \sum\_{n=1}^{\infty} (-1)^{n+1} \binom{\alpha}{n} e^{iux} + \sum\_{n=0}^{\infty} (-1)^{n} \binom{\alpha}{n} - 1 \right] \\ & \qquad = \lambda^{\alpha} \sum\_{n=0}^{\infty} (-1)^{n+1} \binom{\alpha}{n} e^{iux} = -\lambda^{\alpha} (1 - e^{iu})^{\alpha} \end{aligned}$$

which is the characteristic exponent of SFPP from Equation (23).

#### *2.5. Tempered Space-Fractional Poisson Process*

The tempered space-fractional Poisson process (TSFPP) can be obtained by subordinating the homogeneous Poisson process *N*(*t*) with the independent tempered stable subordinator *Dα*,*μ*(*t*) (see [25])

$$N\_{a,\mu}(t) = N(D\_{a,\mu}(t)),\ \mu \in (0,1),\ \mu > 0. \tag{25}$$

This process has finite integer order moments due to the tempered *α*-stable subordinator. The pmf of TSFPP is given by (see [25])

$$\begin{split} P^{a,\mu}(k,t) &= (-1)^k e^{t\mu^a} \sum\_{m=0}^\infty \mu^m \sum\_{r=0}^\infty \frac{(-t)^r}{r!} \lambda^{ar-m} \binom{ar}{m} \binom{ar-m}{k} \\ &= e^{t\mu^a} \frac{(-1)^k}{k!} \sum\_{m=0}^\infty \frac{\mu^m \lambda^{-m}}{m!} \,\_1\Psi\_1 \left[ \begin{matrix} (1,a);\\(1-k-m,a); \end{matrix} ; (-\lambda^a t) \right], \; k = 0, 1, \ldots \end{split} \tag{26}$$

The governing difference-differential equation is given by

$$\frac{d}{dt}P^{a,\mu}(k,t) = -( (\mu + \lambda(1-B))^a - \mu^a )P^{a,\mu}(k,t), \; k > 0. \tag{27}$$

The characteristic function of TSFPP,

$$\mathbb{E}\left[e^{iuN\_{a,\mu}(t)}\right] = e^{-t\left((\mu+\lambda(1-\epsilon^{i\mu}))^{a}-\mu^{a}\right)}.\tag{28}$$

While using a standard conditioning argument, the mean and variance of TSFPP are given by

$$\mathbb{E}[N\_{a,\mu}(t)] = \lambda a \mu^{a-1} t, \quad \text{Var}[N\_{a,\mu}(t)] = \lambda a \mu^{a-1} t + \lambda^2 a (1-a) \mu^{a-2} t. \tag{29}$$

**Proposition 2.** *The Lévy density νNα*,*<sup>μ</sup>* (*x*) *of TSFPP is*

$$\nu\_{N\_{a,\mu}}(\mathbf{x}) = \sum\_{n=1}^{\infty} \mu^{n-n} \binom{n}{n} \lambda^n \sum\_{l=1}^n \binom{n}{l} (-1)^{l+1} \delta\_l(\mathbf{x}), \ \mu > 0. \tag{30}$$

*Entropy* **2020**, *22*, 1193

**Proof.** Using (28), the characteristic exponent of TSFPP is given by *<sup>ψ</sup>Nα*,*<sup>μ</sup>* (*u*) = <sup>−</sup>((*<sup>μ</sup>* <sup>+</sup> *<sup>λ</sup>*(<sup>1</sup> <sup>−</sup> *<sup>e</sup>iu*))*<sup>α</sup>* <sup>−</sup> *μα*). We find the Lévy density with the help of Lévy-Khintchine formula (see [20]),

$$\begin{split} &\int\_{\mathbb{R}\backslash\{0\}} (e^{i\mu x} - 1) \sum\_{n=1}^{\infty} \mu^{a-n} \binom{n}{n} \lambda^{n} \sum\_{l=1}^{n} \binom{n}{l} (-1)^{l+1} \delta\_{l}(x) dx \\ & \qquad = \sum\_{n=1}^{\infty} \mu^{a-n} \binom{n}{n} \lambda^{n} \left( \sum\_{l=1}^{n} \binom{n}{l} (-1)^{l+1} e^{i\mu x} - \sum\_{l=1}^{n} \binom{n}{l} (-1)^{l+1} \right) \\ & \qquad = \sum\_{n=0}^{\infty} \mu^{a-n} \binom{n}{n} \lambda^{n} \sum\_{l=0}^{n} \binom{n}{l} (-1)^{l+1} \delta\_{l}(x) - \mu^{a} \\ & \qquad = -((\mu + \lambda(1 - e^{i\mu}))^{a} - \mu^{a}), \end{split}$$

hence proved.

**Definition 1.** *A stochastic process X*(*t*) *is over-dispersed, equi-dispersed or under-dispersed [18], if the Fisher index of dispersion, given by (see e.g., [17])*

$$\text{FI}[X(t)] = \frac{\text{Var}[X(t)]}{\text{E}[X(t)]}$$

*is more than 1, equal to 1, or smaller than 1, respectively, for all t* > 0*.*

**Remark 1.** *Using* (29)*, we have* FI[*Nα*,*μ*(*t*)] = <sup>1</sup> + *<sup>λ</sup>*(1−*α*) *<sup>μ</sup>* > 1, *i.e. TSFPP Nα*,*μ*(*t*) *is over-dispersed.*

#### **3. Running Average of PPoK**

In this section, we first introduced the running average of PPoK and their main properties. These results will be used further to discuss the running average of SPoK.

**Definition 2** (Running average of PPoK)**.** *We define the running average process N<sup>k</sup> <sup>A</sup>*(*t*), *t* ≥ 0 *by taking time-scaled integral of the path of the PPoK (see [26]),*

$$N\_A^k(t) = \frac{1}{t} \int\_0^t N^k(s)ds. \tag{31}$$

We can write the differential equation with initial condition *N<sup>k</sup> <sup>A</sup>*(0) = 0,

$$\frac{d}{dt}(N\_A^k(t)) = \frac{1}{t}N^k(t) - \frac{1}{t^2} \int\_0^t N^k(s)ds.$$

Which shows that it has continuous sample paths of bounded total variation. We explored the compound Poisson representation and distribution properties of running average of PPoK. The characteristic of *N<sup>k</sup> <sup>A</sup>*(*t*) is obtained using the Lemma 1 of [26]. We recall Lemma 1 from [26] for ease of reference.

**Lemma 1.** *If Xt is a Lévy process and Yt its Riemann integral is defined by*

$$\mathbf{Y}\_t = \int\_0^t \mathbf{X}\_s d\mathbf{s}\_t$$

*then the characteristic functions of Y satisfies*

$$\Phi\_{Y(t)}(u) = \mathbb{E}[\varepsilon^{iuY(t)}] = \varepsilon^{t\left(\int\_0^1 \log \Phi\_{X(1)}(tuz)dz\right)}, \ u \in \mathbb{R}.\tag{32}$$

*Entropy* **2020**, *22*, 1193

**Proposition 3.** *The characteristic function of N<sup>k</sup> <sup>A</sup>*(*t*) *is given by*

$$\phi\_{N\_A^k(t)}(u) = e^{-t\lambda \left(k - \sum\_{j=1}^k \frac{(e^{iuj}-1)}{uj}\right)}.\tag{33}$$

**Proof.** Using the Equation (10), we have

$$\int\_0^1 \log \phi\_{N^k(1)}(tuz) dz = -\lambda \left( k - \sum\_{j=1}^k \frac{(e^{ituzj} - 1)}{ituj} \right).$$

Using (32) and (31), we have

$$\phi\_{N\_A^k(t)}(u) = \mathcal{e}^t \left( \int\_0^1 \log \phi\_{N^k(1)}(uz) dz \right) = \mathcal{e}^{-t\lambda \left( k - \sum\_{j=1}^k \frac{(t^{\inf} - 1)}{\inf} \right)} \int\_0^1$$

**Proposition 4.** *The running average process has a compound Poisson representation, such that*

$$Y(t) = \sum\_{i=1}^{N(t)} X\_{i\prime} \tag{34}$$

*where Xi* = 1, 2, ... *are independent, identically distributed (iid) copies of X random variables, independent of N*(*t*) *and N*(*t*) *is a Poisson process with intensity kλ. Subsequently,*

$$\mathcal{Y}(t) \stackrel{law}{=} \mathcal{N}\_A^k(t).$$

*Further, the random variable X has the following pdf*

$$f\_X(\mathbf{x}) = \sum\_{i=1}^k p\_{V\_i}(\mathbf{x}) f\_{lI\_i}(\mathbf{x}) = \frac{1}{k} \sum\_{i=1}^k f\_{lI\_i}(\mathbf{x}),\tag{35}$$

*where Vi follows discrete uniform distribution over* (0, *k*) *and Ui follows continuous uniform distribution over* (0, *i*), *i* = 1, 2, . . . , *k*.

**Proof.** The pdf of *Ui* is *fUi* (*x*) = <sup>1</sup> *<sup>i</sup>* , 0 ≤ *x* ≤ *i*. Using (45), the characteristic function of *X* is given by

$$\phi\_X(u) = \frac{1}{k} \sum\_{j=1}^k \frac{(e^{iuj} - 1)}{iuj}.$$

For fixed *t*, the characteristic function of *Y*(*t*) is

$$\phi\_{Y(t)}(u) = \mathcal{e}^{-k\lambda t (1 - \phi\_X(u))} = \mathcal{e}^{-t\lambda \left(k - \sum\_{j=1}^{k} \frac{(\epsilon^{iuj} - 1)}{nu}\right)} \, \, \, \, \tag{36}$$

which is equal to the characteristic function of PPoK that is given in (33). Hence, by the uniqueness of characteristic function, the result follows.

Using the definition

$$m\_{\varGamma} = \mathbb{E}[X^r] = (-i)^r \frac{d^r \phi\_X(u)}{du^r} \, , \tag{37}$$

the first two moments for random variable *<sup>X</sup>* given in Proposition (4) are *<sup>m</sup>*<sup>1</sup> <sup>=</sup> (*k*+1) <sup>4</sup> and *<sup>m</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>18</sup> [(*k* + 1)(2*k* + 1)]. Further, using the mean, variance, and covariance of compound Poisson process, we have

$$\begin{split} \mathbb{E}[N\_A^k(t)] &= \mathbb{E}[N(t)]\mathbb{E}[X] = \frac{k(k+1)}{4}\lambda t; \\ \mathrm{Var}[N\_A^k(t)] &= \mathbb{E}[N(t)]\mathbb{E}[X^2] = \frac{1}{18}[k(k+1)(2k+1)]\lambda t; \\ \mathrm{Cov}[N\_A^k(t), N\_A^k(s)] &= \mathbb{E}[N\_A^k(t), N\_A^k(s)] - \mathbb{E}[N\_A^k(t)]\mathbb{E}[N\_A^k(s)] \\ &= \mathbb{E}[N\_A^k(s)]\mathbb{E}[N\_A^k(t-s)] - \mathbb{E}[N\_A^k(s)^2] - \mathbb{E}[N\_A^k(t)]\mathbb{E}[N\_A^k(s)] \\ &= \frac{1}{18}[k(k+1)(2k+1)]\lambda s - \frac{k^2(k+1)^2}{16}\lambda^2 s^2, s < t. \end{split}$$

**Corollary 1.** *Putting k* = 1*, the running average of PPoK N<sup>k</sup> <sup>A</sup>*(*t*) *reduces to the running average of standard Poisson process NA*(*t*) *(see Appendix in [26]).*

**Corollary 2.** *The mean and variance of PPoK and running average of PPoK satisfy,* E[*N<sup>k</sup> <sup>A</sup>*(*t*)]/E[*Nk*(*t*)] = <sup>1</sup> 2 *and* Var[*N<sup>k</sup> <sup>A</sup>*(*t*)]/Var[*Nk*(*t*)] = <sup>1</sup> 3 *.*

**Remark 2.** *The Fisher index of dispersion for running average of PPoK N<sup>k</sup> <sup>A</sup>*(*t*) *is given by* FI[*N<sup>k</sup> <sup>A</sup>*(*t*)] = <sup>2</sup> <sup>9</sup> (2*k* + 1). *If k* = 1 *the process is under-dispersed and for k* > 1 *it is over-dispersed.*

Next we discuss the long-range dependence (LRD) property of running average of PPoK. We recall the definition of LRD for a non-stationary process.

**Definition 3** (Long range dependence (LRD))**.** *Let X*(*t*) *be a stochastic process that has a correlation function for s* ≥ *t for fixed s, that satisfies,*

$$c\_1(s)t^{-d} \le \text{Cor}(X(t), X(s)) \le c\_2(s)t^{-d},$$

*for large t, d* > 0*, c*1(*s*) > 0 *and c*2(*s*) > 0*. For the particular case when c*1(*s*) = *c*2(*s*) = *c*(*s*)*, the above equation reduced to*

$$\lim\_{t \to \infty} \frac{\text{Cor}(X(t), X(s))}{t^{-d}} = \mathfrak{c}(s).$$

*We say that, if d* ∈ (0, 1)*, then X(t) has the LRD property and if d* ∈ (1, 2) *it has short-range dependence (SRD) property [27].*

**Proposition 5.** *The running average of PPoK has LRD property.*

**Proof.** Let 0 <sup>≤</sup> *<sup>s</sup>* <sup>&</sup>lt; *<sup>t</sup>* <sup>&</sup>lt; <sup>∞</sup>, then the correlation function for running average of PPoK *<sup>N</sup><sup>k</sup> <sup>A</sup>*(*t*) is

$$\operatorname{Corr}[N\_A^k(t), N\_A^k(s)] = \frac{(8(2k+1) - 9(k+1)k\lambda s)s^{1/2}t^{-1/2}}{8(2k+1)}.$$

Subsequently, for *d* = 1/2, it follows

$$\lim\_{t \to \infty} \frac{\operatorname{Cor}[N\_A^k(t) \, \_t N\_A^k(s)]}{t^{-d}} = \frac{(8(2k+1) - 9(k+1)k\lambda s)s^{1/2}}{8(2k+1)} = c(s).$$

*Entropy* **2020**, *22*, 1193

#### **4. Skellam Process of Order** *k* **(SPoK)**

In this section, we introduce and study the Skellam process of order *k* (SPoK).

**Definition 4** (SPoK)**.** *Let N<sup>k</sup>* <sup>1</sup> (*t*) *and <sup>N</sup><sup>k</sup>* <sup>2</sup> (*t*) *be two independent PPoK with intensities λ*<sup>1</sup> > 0 *and λ*<sup>2</sup> > 0*. The stochastic process*

$$\mathcal{S}^k(t) = \mathcal{N}\_1^k(t) - \mathcal{N}\_2^k(t)$$

*is called a Skellam process of order k (SPoK).*

**Proposition 6.** *The marginal distribution Rm*(*t*) = P(*Sk*(*t*) = *m*) *of SPoK Sk*(*t*) *is given by*

$$R\_{\mathcal{W}}(t) = e^{-kt(\lambda\_1 + \lambda\_2)} \left(\frac{\lambda\_1}{\lambda\_2}\right)^{m/2} I\_{|m|}(2tk\sqrt{\lambda\_1 \lambda\_2}), \ m \in \mathbb{Z}. \tag{38}$$

**Proof.** For *m* ≥ 0, using the pmf of PPoK that is given in (8), it follows

$$\begin{split} R\_{m}(t) &= \sum\_{n=0}^{\infty} \mathbb{P}(\mathcal{N}\_{1}^{k}(t) = n+m) \mathbb{P}(\mathcal{N}\_{2}^{k}(t) = n) \mathbb{I}\_{m \ge 0} \\ &= \sum\_{n=0}^{\infty} \left( \sum\_{\mathcal{X} = \Omega(k, n+m)} e^{-k\lambda\_{1}t} \frac{(\lambda\_{1}t)^{\zeta\_{k}}}{\Pi\_{k}!} \right) \left( \sum\_{\mathcal{X} = \Omega(k, n)} e^{-k\lambda\_{2}t} \frac{(\lambda\_{2}t)^{\zeta\_{k}}}{\Pi\_{k}!} \right) . \end{split}$$

Setting *xi* = *ni* and *n* = *x* + ∑*<sup>k</sup> <sup>i</sup>*=1(*i* − 1)*ni*, we have

$$\begin{split} R\_m(t) &= e^{-kt(\lambda\_1+\lambda\_2)} \sum\_{x=0}^{\infty} \frac{(\lambda\_2 t)^x}{x!} \frac{(\lambda\_1 t)^{m+x}}{(m+x)!} \left( \sum\_{n\_1+n\_2+\ldots+n\_k=m+x} \binom{m+x}{n\_1! n\_2! \ldots n\_k!} \right) \left( \sum\_{n\_1+n\_2+\ldots+n\_k=x} \binom{x}{n\_1! n\_2! \ldots n\_k!} \right) \\ &= e^{-kt(\lambda\_1+\lambda\_2)} \sum\_{x=0}^{\infty} \frac{(\lambda\_2 t)^x}{x!} \frac{(\lambda\_1 t)^{m+x}}{(m+x)!} k^{m+x} k^x \end{split}$$

using the multinomial theorem and modified Bessel function given in (2). Similarly, it follows for *m* < 0.

**Proposition 7.** *The Lévy density for SPoK is*

$$\nu\_{S^k}(\mathbf{x}) = \lambda\_1 \sum\_{j=1}^k \delta\_j(\mathbf{x}) + \lambda\_2 \sum\_{j=1}^k \delta\_{-j}(\mathbf{x}).$$

**Proof.** The proof follows by using the independence of two PPoK used in the definition of SPoK.

**Remark 3.** *Using* (12)*, the pgf of SPoK is given by*

$$\mathbf{G}^{S^{k}}(s,t) = \sum\_{m=-\infty}^{\infty} \mathbf{s}^{m} \mathbf{R}\_{m}(t) = \mathbf{c}^{-t\left(k(\lambda\_{1}+\lambda\_{2})-\lambda\_{1}\sum\_{j=1}^{k} \mathbf{s}^{j} - \lambda\_{2}\sum\_{j=1}^{k} \mathbf{s}^{-j}\right)}.\tag{39}$$

*Further, the characteristic function of SPoK is given by*

$$\phi\_{S^k(t)}(u) = e^{-t\left[k(\lambda\_1 + \lambda\_2) - \lambda\_1 \sum\_{j=1}^k \sigma^{iju} - \lambda\_2 \sum\_{j=1}^k \sigma^{-iju}\right]}.\tag{40}$$

#### *SPoK as a Pure Birth and Death Process*

In this section, we provide the transition probabilities of SPoK at time *t* + *δ*, given that we started at time *t*. Over such a short interval of length *δ* → 0, it is nearly impossible to observe more than *k* event; in fact, the probability to see more than *k* event is *o*(*δ*).

*Entropy* **2020**, *22*, 1193

**Proposition 8.** *The transition probabilities of SPoK are given by*

$$\mathbb{P}(\mathbb{S}^{k}(t+\delta)=m|\mathbb{S}^{k}(t)=n) = \begin{cases} \lambda\_{1}\delta + o(\delta), & m>n, \ m=n+i, i=1,2,\ldots,k;\\ \lambda\_{2}\delta + o(\delta), & m$$

*Basically, at most k events can occur in a very small interval of time δ. Additionally, even though the probability for more than k event is non-zero, it is negligible.*

**Proof.** Note that *Sk*(*t*) = *N<sup>k</sup>* <sup>1</sup> (*t*) <sup>−</sup> *<sup>N</sup><sup>k</sup>* <sup>2</sup> (*t*). We call *<sup>N</sup><sup>k</sup>* <sup>1</sup> (*t*) as the first process and *<sup>N</sup><sup>k</sup>* <sup>2</sup> (*t*) as the second process. For *i* = 1, 2, ··· , *k*, we have

<sup>P</sup>(*S<sup>k</sup>* (*<sup>t</sup>* <sup>+</sup> *<sup>δ</sup>*) = *<sup>n</sup>* <sup>+</sup> *<sup>i</sup>*|*S<sup>k</sup>* (*t*) = *<sup>n</sup>*) = *k*−*i* ∑ *j*=1 P(the first process has i+j arrivals and the second process has j arrivals) + P(the first process has i arrivals and the second process has 0 arrivals) + *o*(*δ*) = *k*−*i* ∑ *j*=0 (*λ*1*δ* + *o*(*δ*)) × (*λ*2*δ* + *o*(*δ*)) + (*λ*1*δ* + *o*(*δ*)) × (1 − *kλ*2*δ* + *o*(*δ*)) + *o*(*δ*) = *λ*1*δ* + *o*(*δ*).

Similarly, for *i* = 1, 2, ··· , *k*, we have

<sup>P</sup>(*S<sup>k</sup>* (*<sup>t</sup>* <sup>+</sup> *<sup>δ</sup>*) = *<sup>n</sup>* <sup>−</sup> *<sup>i</sup>*|*S<sup>k</sup>* (*t*) = *<sup>n</sup>*) = *k*−*i* ∑ *j*=1 P(the first process has j arrivals and the second process has i+j arrivals) + P(the first process has 0 arrivals and the second process has i arrivals) + *o*(*δ*) = *k*−*i* ∑ *j*=0 (*λ*1*δ* + *o*(*δ*)) × (*λ*2*δ* + *o*(*δ*)) + (1 − *kλ*1*δ* + *o*(*δ*)) × (*λ*2*δ* + *o*(*δ*)) + *o*(*δ*) = *λ*2*δ* + *o*(*δ*).

Further,

<sup>P</sup>(*S<sup>k</sup>* (*<sup>t</sup>* <sup>+</sup> *<sup>δ</sup>*) = *<sup>n</sup>*|*S<sup>k</sup>* (*t*) = *<sup>n</sup>*) = *k* ∑ *j*=1 P(the first process has j arrivals and the second process has j arrivals) + P(the first process has 0 arrivals and the second process has 0 arrivals) + *o*(*δ*) = *k* ∑ *j*=0 (*λ*1*δ* + *o*(*δ*)) × (*λ*2*δ* + *o*(*δ*)) + (1 − *kλ*1*δ* + *o*(*δ*)) × (1 − *kλ*2*δ* + *o*(*δ*)) + *o*(*δ*) = 1 − *kλ*1*δ* − *kλ*2*δ* + *o*(*δ*).

**Remark 4.** *The pmf Rm*(*t*) *of SPoK satisfies the following difference differential equation*

$$\begin{aligned} \frac{d}{dt}R\_m(t) &= -k(\lambda\_1 + \lambda\_2)R\_m(t) + \lambda\_1 \sum\_{j=1}^k R\_{m-j}(t) + \lambda\_2 \sum\_{j=1}^k R\_{m+j}(t) \\ &= -\lambda\_1 \sum\_{j=1}^k (1 - B^j)R\_m - \lambda\_2 \sum\_{j=1}^k (1 - F^j)R\_m(t), \quad m \in \mathbb{Z}, \end{aligned}$$

*with initial condition R*0(0) = 1 *and Rm*(0) = 0 *for m* = 0*. Let B be the backward shift operator defined in* (22) *and F be the forward shift operator defined by F<sup>j</sup> <sup>X</sup>*(*t*) = *<sup>X</sup>*(*<sup>t</sup>* <sup>+</sup> *<sup>j</sup>*)*, such that* (<sup>1</sup> <sup>−</sup> *<sup>F</sup>*)*<sup>α</sup>* <sup>=</sup> <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> ( *α j* )*F<sup>j</sup> . Multiplying by s<sup>m</sup> and summing for all m in* (42)*, we obtain the following differential equation for the pgf*

$$\frac{d}{dt}G^{S^k}(s,t) = \left(-k(\lambda\_1 + \lambda\_2) + \lambda\_1 \sum\_{j=1}^k s^j + \lambda\_2 \sum\_{j=1}^k s^{-j}\right) G^{S^k}(s,t).$$

The mean, variance and covariance of SPoK can be easily calculated by using the pgf,

$$\begin{aligned} \mathbb{E}[S^k(t)] &= \frac{k(k+1)}{2}(\lambda\_1 - \lambda\_2)t; \\ \mathrm{Var}[S^k(t)] &= \frac{1}{6} \left[k(k+1)(2k+1)\right](\lambda\_1 + \lambda\_2)t; \\ \mathrm{Cov}[S^k(t), S^k(s)] &= \frac{1}{6} \left[k(k+1)(2k+1)\right](\lambda\_1 + \lambda\_2)s, \ s < t. \end{aligned}$$

**Remark 5.** *For the SPoK, when <sup>λ</sup>*<sup>1</sup> <sup>&</sup>gt; *<sup>λ</sup>*2*,* Var[*Sk*(*t*)] <sup>−</sup> <sup>E</sup>[*Sk*(*t*)] = *<sup>k</sup>*(*k*+1) <sup>3</sup> [(*k* − 1)*λ*<sup>1</sup> + (*k* + 2)*λ*<sup>2</sup> > 0, *which implies that* FI[*Sk*(*t*)] > 1 *and hence Sk*(*t*) *exhibits over-dispersion. For λ*<sup>1</sup> < *λ*2*, the process is under-dispersed.*

Next, we show the LRD property for SPoK.

**Proposition 9.** *The SPoK has LRD property defined in Definition 3.*

**Proof.** The correlation function of SPoK satisfies

$$\lim\_{t \to \infty} \frac{\text{Cor}(S^k(t), S^k(s))}{t^{-d}} = \frac{s^{1/2} t^{-1/2}}{t^{-1/2}} = c(s).$$

Hence, SPoK exhibits the LRD property.

#### **5. Running Average of SPoK**

In this section, we introduce and study the new stochastic Lévy process, which is the running average of SPoK.

**Definition 5.** *The following stochastic process defined by taking the time-scaled integral of the path of the SPoK,*

$$S\_A^k(t) = \frac{1}{t} \int\_0^t S^k(s) ds,\tag{42}$$

*is called the running average of SPoK.*

Next, we provide the compound Poisson representation of running average of SPoK.

**Proposition 10.** *The characteristic function φS<sup>k</sup> A*(*t*)(*u*) = <sup>E</sup>[*eiuS<sup>k</sup> <sup>A</sup>*(*t*) ] *of S<sup>k</sup> <sup>A</sup>*(*t*) *is given by*

$$\phi\_{S\_A^k(t)}(u) = e^{-kt\left\{\lambda\_1 \left(1 - \frac{1}{k} \sum\_{j=1}^k \frac{(e^{iuj} - 1)}{u\eta} \right) + \lambda\_2 \left(1 - \frac{1}{k} \sum\_{j=1}^k \frac{(1 - e^{-iuj})}{u\eta} \right) \right\}}, \ u \in \mathbb{R}. \tag{43}$$

**Proof.** By using the Lemma 3.1 to Equation (40) after scaling by 1/*t*.

**Remark 6.** *It is easily observable that Equation* (43) *has removable singularity at u* = 0*. To remove that singularity, we can define φS<sup>k</sup> A*(*t*)(0) = <sup>1</sup>*.*

*Entropy* **2020**, *22*, 1193

**Proposition 11.** *Let Y*(*t*) *be a compound Poisson process*

$$Y(t) = \sum\_{n=1}^{N(t)} f\_{n\prime} \tag{44}$$

*where N*(*t*) *is a Poisson process with rate parameter k*(*λ*<sup>1</sup> + *λ*2) > 0 *and* {*Jn*}*n*≥<sup>1</sup> *are iid random variables with mixed double uniform distribution function pj, which are independent of N*(*t*)*. Subsequently,*

$$\mathcal{Y}(t) \stackrel{law}{=} \mathcal{S}\_A^k(t).$$

**Proof.** Rearranging the *φS<sup>k</sup> A*(*t*)(*u*),

$$\phi\_{S\_A^k(t)}(u) = e^{(\lambda\_1 + \lambda\_2)kt} \left( {}^{\lambda\_1}\_{\overline{\lambda\_1 + \lambda\_2}} \not{1} {}^{\lambda\_k}\_{\overline{\lambda\_j + 1}} \frac{(e^{iuj} - 1)}{iuj} + {}^{\lambda\_2}\_{\overline{\lambda\_1 + \lambda\_2}} \not{1} {}^{\lambda\_k}\_{\overline{\lambda\_j + 1}} \frac{(1 - e^{-iuj})}{iuj} - 1 \right),$$

The random variable *J*<sup>1</sup> being a mixed double uniformly distributed has density

$$p\_{Ii}(\mathbf{x}) = \sum\_{i=1}^{k} p\_{Vi}(\mathbf{x}) f\_{lIi}(\mathbf{x}) = \frac{1}{k} \sum\_{i=1}^{k} f\_{lIi}(\mathbf{x}),\tag{45}$$

where *Vj* follows discrete uniform distribution over (0, *k*) with pmf *pVj* (*x*) = P(*Vj* = *<sup>x</sup>*) = <sup>1</sup> *<sup>k</sup>* , *j* = 1, 2, . . . *k*, and *Ui* be doubly uniform distributed random variables with density

$$f\_{\mathcal{U}\_i}(\mathbf{x}) = (1 - w)\mathbf{1}\_{[-i,0]}(\mathbf{x}) + w\mathbf{1}\_{[0,i]}(\mathbf{x}), \ -i \le \mathbf{x} \le i.$$

Further, 0 < *w* < 1 is a weight parameter and 1(·) is the indicator function. Here, we obtained the characteristic of *J*<sup>1</sup> using the Fourier transform of (45),

$$\phi\_{\bar{l}\_1}(u) = \frac{\lambda\_1}{\lambda\_1 + \lambda\_2} \frac{1}{k} \sum\_{j=1}^k \frac{(\epsilon^{i\mu j} - 1)}{iuj} + \frac{\lambda\_2}{\lambda\_1 + \lambda\_2} \frac{1}{k} \sum\_{j=1}^k \frac{(1 - \epsilon^{-iuj})}{iuj}.$$

The characteristic function of *Y*(*t*) is

$$\phi\_{Y(t)}(u) = e^{-kt(\lambda\_1 + \lambda\_2)t(1 - \phi\_{f\_1}(u))} \, \_\prime \tag{46}$$

putting the characteristic function *φJ*<sup>1</sup> (*u*) in the above expression yields the characteristic function of *Sk <sup>A</sup>*(*t*), which completes the proof.

**Remark 7.** *The q-th order moments of J*<sup>1</sup> *can be calculated using* (37) *and also using Taylor series expansion of the characteristic φJ*<sup>1</sup> (*u*)*, around* 0*, such that*

$$\frac{(e^{iuj}-1)}{iuj} = 1 + \sum\_{r=1}^{\infty} \frac{(iuj)^r}{(r+1)!} \quad \& \quad \frac{(1-e^{-iuj})}{iuj} = 1 + \sum\_{r=1}^{\infty} \frac{(-iuj)^r}{(r+1)!}.$$

*Entropy* **2020**, *22*, 1193

*We have <sup>m</sup>*<sup>1</sup> <sup>=</sup> (*k*+1)(*λ*1−*λ*2) <sup>4</sup>(*λ*1+*λ*2) *and <sup>m</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>18</sup> [(*k* + 1)(2*k* + 1)]*. Further, the mean, variance, and covariance of running average of SPoK are*

$$\mathbb{E}[S\_A^k(t)] = \mathbb{E}[N(t)]\mathbb{E}[I\_1] = \frac{k(k+1)}{4}(\lambda\_1 - \lambda\_2)t$$

$$\text{Var}[S\_A^k(t)] = \mathbb{E}[N(t)]\mathbb{E}[I\_1^2] = \frac{1}{18}[k(k+1)(2k+1)](\lambda\_1 + \lambda\_2)t$$

$$\text{Cov}[S\_A^k(t), S\_A^k(s)] = \frac{1}{18}[k(k+1)(2k+1)](\lambda\_1 - \lambda\_2)s - \frac{k^2(k+1)^2}{16}(\lambda\_1 - \lambda\_2)^2s^2.$$

**Corollary 3.** *For λ*<sup>2</sup> = 0 *the running average of SPoK is same as the running average of PPoK, i.e.,*

$$
\phi\_{S\_A^k(t)}(u) = \phi\_{N\_A^k(t)}(u).
$$

**Corollary 4.** *For k* = 1 *this process behave like the running average of Skellam process.*

**Corollary 5.** *The ratio of mean and variance of SPoK and running average of SPoK are* 1/2 *and* 1/3*, respectively.*

**Remark 8.** *For running average of SPoK, when λ*<sup>1</sup> > *λ*<sup>2</sup> *and k* > 1*, the process is over-dispersed. Otherwise, it exhibits under-dispersion.*

#### **6. Time-Changed Skellam Process of Order** *k*

We consider time-changed SPoK, which can be obtained by subordinating SPoK *Sk*(*t*) with the independent Lévy subordinator *Df*(*t*) satisfying E[*Df*(*t*)]*<sup>c</sup>* < <sup>∞</sup> for all *<sup>c</sup>* > 0. The time-changed SPoK is defined by

$$Z\_f(t) = \mathcal{S}^k(D\_f(t)), \ t \ge 0.$$

Note that the stable subordinator does not satisfy the condition E[*Df*(*t*)]*<sup>c</sup>* < <sup>∞</sup>. The mgf of time-changed SPoK *Zf*(*t*) is given by

$$\mathbb{E}[\boldsymbol{e}^{\theta \mathbf{Z}\_f(t)}] = \boldsymbol{e}^{-tf(k(\lambda\_1 + \lambda\_2) - \lambda\_1 \sum\_{j=1}^k \boldsymbol{e}^{\theta j} - \lambda\_2 \sum\_{j=1}^k \boldsymbol{e}^{-\theta j})} \boldsymbol{e}^{-\theta j})\_{\mathbf{Z}\_f(t)}$$

**Theorem 2.** *The pmf Hf*(*t*) = P(*Zf*(*t*) = *<sup>m</sup>*) *of time-changed SPoK is given by*

$$H\_f(\mathbf{t}) = \sum\_{\mathbf{x} = \max(0, -m)}^{\infty} \frac{(k\lambda\_1)^{m+\mathbf{x}} (k\lambda\_2)^x}{(m+\mathbf{x})! \mathbf{x}!} \mathbb{E}[\mathbf{e}^{-k(\lambda\_1 + \lambda\_2)D\_f(t)} D\_f^{2m+\mathbf{x}}(\mathbf{t})], \ m \in \mathbb{Z}. \tag{47}$$

**Proof.** Let *hf*(*x*, *t*) be the probability density function of Lévy subordinator. Using conditional argument

$$\begin{split} H\_{f}(t) &= \int\_{0}^{\infty} R\_{\text{m}}(y) h\_{f}(y, t) dy \\ &= \int\_{0}^{\infty} e^{-ky(\lambda\_{1} + \lambda\_{2})} \left( \frac{\lambda\_{1}}{\lambda\_{2}} \right)^{m/2} I\_{|m|}(2yk\sqrt{\lambda\_{1}\lambda\_{2}}) h\_{f}(y, t) dy \\ &= \sum\_{x=\max(0, -m)}^{\infty} \frac{(k\lambda\_{1})^{m+x} (k\lambda\_{2})^{x}}{(m+x)! x!} \int\_{0}^{\infty} e^{-k(\lambda\_{1} + \lambda\_{2})y} y^{2m+x} h\_{f}(y, t) dy \\ &= \sum\_{x=\max(0, -m)}^{\infty} \frac{(k\lambda\_{1})^{m+x} (k\lambda\_{2})^{x}}{(m+x)! x!} \mathbb{E}[e^{-k(\lambda\_{1} + \lambda\_{2})D\_{f}(t)} D\_{f}^{2m+x}(t)]. \end{split}$$

The mean and covariance of time changed SPoK are given by,

$$\mathbb{E}[Z\_f(t)] = \frac{k(k+1)}{2}(\lambda\_1 - \lambda\_2)\mathbb{E}[D\_f(t)].$$

$$\text{Cov}[Z\_f(t), Z\_f(s)] = \frac{1}{6}[k(k+1)(2k+1)](\lambda\_1 + \lambda\_2))\mathbb{E}[D\_f(s)] + \frac{k^2(k+1)^2}{4}(\lambda\_1 - \lambda\_2)^2 \text{Var}[D\_f(s)].$$

#### **7. Space Fractional Skellam Process and Tempered Space Fractional Skellam Process**

In this section, we introduce time-changed Skellam processes where time-change are stable subordinator and tempered stable subordinator. These processes give the space-fractional version of the Skellam process similar to the time-fractional version of the Skellam process introduced in [10].

#### *7.1. The Space-Fractional Skellam Process*

In this section, we introduce space-fractional Skellam processes (SFSP). Further, for introduced processes, we study main results, such as state probabilities and governing difference-differential equations of marginal pmf.

**Definition 6** (SFSP)**.** *Let N*1(*t*) *and N*2(*t*) *be two independent homogeneous Poison processes with intensities λ*<sup>1</sup> > 0 *and λ*<sup>2</sup> > 0,*, respectively. Let Dα*<sup>1</sup> (*t*) *and Dα*<sup>2</sup> (*t*) *be two independent stable subordinators with indices α*<sup>1</sup> ∈ (0, 1) *and α*<sup>2</sup> ∈ (0, 1)*, respectively. These subordinators are independent of the Poisson processes N*1(*t*) *and N*2(*t*)*. The subordinated stochastic process*

$$S\_{\mathfrak{a}\_1,\mathfrak{a}\_2}(t) = N\_1(D\_{\mathfrak{a}\_1}(t)) - N\_2(D\_{\mathfrak{a}\_2}(t))$$

*is called a SFSP.*

Next, we derive the mgf of SFSP. We use the expression for marginal (pmf) of SFPP that is given in (17) to obtain the marginal pmf of SFSP.

$$M\_{\theta}(t) = \mathbb{E}[e^{\theta S\_{\mathbb{A}\_{1}, \theta\_{2}}(t)}] = \mathbb{E}[e^{\theta (N\_{1}(D\_{\mathbf{a}\_{1}}(t)) - N\_{2}(D\_{\mathbf{a}\_{2}}(t)))}] = e^{-t[\lambda\_{1}^{a\_{1}}(1-t^{\theta})^{a\_{1}} + \lambda\_{2}^{a\_{2}}(1-t^{-\theta})^{a\_{2}}]}, \ \theta \in \mathbb{R}.$$

In the next result, we obtain the state probabilities of the SFSP.

**Theorem 3.** *The pmf Hk*(*t*) = <sup>P</sup>(*Sα*1,*α*<sup>2</sup> (*t*) = *<sup>k</sup>*) *of SFSP is given by*

$$\begin{split} H\_{k}(t) &= \sum\_{n=0}^{\infty} \frac{(-1)^{k}}{n!(n+k)!} \left( {}\_{1}\Psi\_{1} \left[ \begin{matrix} (1,a\_{1}); & (-\lambda\_{1}{}^{a\_{1}}t) \\ (1-n-k,a\_{1}); & (-\lambda\_{1}{}^{a\_{1}}t) \end{matrix} \right] \right) \left( {}\_{1}\Psi\_{1} \left[ \begin{matrix} (1,a\_{2}); & (-\lambda\_{2}{}^{a\_{2}}t) \\ (1-n,a\_{2}); & (-\lambda\_{2}{}^{a\_{2}}t) \end{matrix} \right] \right) \mathbb{I}\_{k\geq 0} \\ &+ \sum\_{n=0}^{\infty} \frac{(-1)^{|k|}}{n!(n+|k|)!} \left( {}\_{1}\Psi\_{1} \left[ \begin{matrix} (1,a\_{1}); & (-\lambda\_{1}{}^{a\_{1}}t) \\ (1-n,a\_{1}); & (-\lambda\_{1}{}^{a\_{1}}t) \end{matrix} \right] \right) \left( {}\_{1}\Psi\_{1} \left[ \begin{matrix} (1,a\_{2}); & (-\lambda\_{2}{}^{a\_{2}}t) \\ (1-n-|k|,a\_{2}); & (-\lambda\_{2}{}^{a\_{2}}t) \end{matrix} \right] \right) \mathbb{I}\_{k<0} \end{split}$$

*for k* <sup>∈</sup> <sup>Z</sup>*.*

**Proof.** Note that *N*1(*Dα*<sup>1</sup> (*t*)) and *N*2(*Dα*<sup>2</sup> (*t*)) are independent, hence

$$\begin{aligned} \mathbb{P}(\mathcal{S}\_{\mathfrak{a}\_1\mathfrak{a}\_2}(t) = k) &= \sum\_{n=0}^{\infty} \mathbb{P}(N\_1(D\_{\mathfrak{a}\_1}(t)) = n + k) \mathbb{P}(N\_2(D\_{\mathfrak{a}\_2}(t)) = n) \mathbb{I}\_{k \ge 0} \\ &+ \sum\_{n=0}^{\infty} \mathbb{P}(N\_1(D\_{\mathfrak{a}\_1}(t)) = n) \mathbb{P}(N\_2(D\_{\mathfrak{a}\_2}(t)) = n + |k|) \mathbb{I}\_{k < 0} \end{aligned}$$

Using (17), the result follows.

*Entropy* **2020**, *22*, 1193

In the next theorem, we discuss the governing differential-difference equation of the marginal pmf of SFSP.

**Theorem 4.** *The marginal distribution Hk*(*t*) = <sup>P</sup>(*Sα*1,*α*<sup>2</sup> (*t*) = *<sup>k</sup>*) *of SFSP satisfies the following differential difference equations*

$$\frac{d}{dt}H\_k(t) = -\lambda\_1^{a\_1}(1-B)^{a\_1}H\_k(t) - \lambda\_2^{a\_2}(1-F)^{a\_2}H\_k(t), \; k \in \mathbb{Z} \tag{49}$$

$$\frac{d}{dt}H\_0(t) = -\lambda\_1^{a\_1}H\_0(t) - \lambda\_2^{a\_2}H\_1(t),\tag{50}$$

*with initial conditions H*0(0) = 1 *and Hk*(0) = 0 *for k* = 0.

**Proof.** The proof follows by using pgf.

**Remark 9.** *The mgf of the SFSP solves the differential equation*

$$\frac{dM\_{\theta}(t)}{dt} = -M\_{\theta}(t)(\lambda\_1^{a\_1}(1-\varepsilon^{\theta})^{a\_1} + \lambda\_2^{a\_2}(1-\varepsilon^{-\theta})^{a\_2}).\tag{51}$$

**Proposition 12.** *The Lévy density νSα*1,*α*<sup>2</sup> (*x*) *of SFSP is given by*

$$\nu\_{S\_{n\_1,n\_2}}(\mathbf{x}) = \lambda\_1^{a\_1} \sum\_{n\_1=1}^{\infty} (-1)^{n\_1+1} \binom{a\_1}{n\_1} \delta\_{n\_1}(\mathbf{x}) + \lambda\_2^{a\_2} \sum\_{n\_2=1}^{\infty} (-1)^{n\_2+1} \binom{a\_2}{n\_2} \delta\_{-n\_2}(\mathbf{x}) .$$

**Proof.** Substituting the Lévy density *νNα*<sup>1</sup> (*x*) and *<sup>ν</sup>Nα*<sup>2</sup> (*x*) of *N*1(*Dα*<sup>1</sup> (*t*)) and *N*2(*Dα*<sup>2</sup> (*t*)), respectively, from the Equation (24), we obtain

$$\nu\_{\mathcal{S}\_{a\_1 a\_2}}(\mathbf{x}) = \nu\_{\mathcal{N}\_{a\_1}}(\mathbf{x}) + \nu\_{\mathcal{N}\_{a\_2}}(\mathbf{x}),$$

which gives the desired result.

#### *7.2. Tempered Space-Fractional Skellam Process (TSFSP)*

In this section, we present the tempered space-fractional Skellam process (TSFSP). We discuss the corresponding fractional difference-differential equations, marginal pmfs, and moments of this process.

**Definition 7** (TSFSP)**.** *The TSFSP is obtained by taking the difference of two independent tempered space fractional Poisson processes. Let Dα*1,*μ*<sup>1</sup> (*t*)*, Dα*2,*μ*<sup>2</sup> (*t*) *be two independent TSS (see [28]) and N*1(*t*), *N*2(*t*) *be two independent Poisson processes that are independent of TSS. Subsequently, the stochastic process*

$$S\_{\alpha\_1,\mu\_2}^{\mu1,\mu2}(t) = \mathcal{N}\_1(D\_{\alpha\_1,\mu\_1}(t) - \mathcal{N}\_2(D\_{\alpha\_2,\mu\_2}(t)))$$

*is called the TSFSP.*

**Theorem 5.** *The PMF Hμ*1,*μ*<sup>2</sup> *<sup>k</sup>* (*t*) = <sup>P</sup>(*Sμ*1,*μ*<sup>2</sup> *<sup>α</sup>*1,*α*<sup>2</sup> (*t*) = *<sup>k</sup>*) *is given by*

$$H\_k^{\mu1,\mu2}(t) = \sum\_{n=0}^{\infty} \frac{(-1)^k}{n!(n+k)!} e^{t(\mu\_1^{a\_1} + \mu\_1^{a\_1})} \left( \sum\_{m=0}^{\infty} \frac{\mu\_1^m \lambda\_1^{-m}}{m!} \,\_1\Psi\_1 \left[ (1-n-k-m, a\_1); \begin{matrix} (1, a\_1); \\ (1-n-k-m, a\_1); \end{matrix} \right] \right) \times \tag{52}$$
 
$$\left( \sum\_{l=0}^{\infty} \frac{\mu\_2^{l} \lambda\_2^{-l}}{l!} \,\_1\Psi\_1 \left[ \begin{matrix} (1, a\_2); \\ (1-l-k, a\_2); \end{matrix} \begin{matrix} (-\lambda\_2^{-a\_2} t) \\ \end{matrix} \right] \right) \tag{52}$$

*when k* ≥ 0 *and similarly for k* < 0*,*

$$H\_k^{\mu1,\mu2}(t) = \sum\_{n=0}^{\infty} \frac{(-1)^{|k|}}{n!(n+|k|)!} \epsilon^{t(\mu\_1^{a\_1} + \mu 1^{a\_1})} \left( \sum\_{m=0}^{\infty} \frac{\mu\_1^m \lambda\_1^{-m}}{m!} \,\_1\Psi\_1 \left[ \begin{matrix} (1, a\_1); \\ (1 - n - m, a\_1); \end{matrix} ; -\lambda\_1 \,\_1\mathfrak{l} \right] \right) \times \tag{53}$$
 
$$\left( \sum\_{l=0}^{\infty} \frac{\mu\_2^m \lambda\_2^{-l}}{l!} \,\_1\Psi\_1 \left[ \begin{matrix} (1, a\_2); \\ (1 - l - n - |k|, a\_2) \end{matrix} ; -\lambda\_2 \,\_2\mathfrak{l} \right] \right) . \tag{54}$$

**Proof.** Because *N*1(*Dα*1,*μ*<sup>1</sup> (*t*)) and *N*2(*Dα*2,*μ*<sup>2</sup> (*t*)) are independent,

$$\begin{aligned} \mathbb{P}\left(S\_{a\_1,\mu\_2}^{\mu\_1\mu\_2}(t) = k\right) &= \sum\_{n=0}^{\infty} \mathbb{P}(N\_1(D\_{a\_1,\mu\_1}(t)) = n+k) \mathbb{P}(N\_2(D\_{a\_2,\mu\_2}(t)) = n) \mathbb{I}\_{k \ge 0} \\ &+ \sum\_{n=0}^{\infty} \mathbb{P}(N\_1(D\_{a\_1,\mu\_1}(t)) = n) \mathbb{P}(N\_2(D\_{a\_2,\mu\_2}(t)) = n+|k|) \mathbb{I}\_{k < 0} \end{aligned}$$

which gives the marginal pmf of TSFPP using (26).

**Remark 10.** *We use this expression to calculate the marginal distribution of TSFSP. The mgf is obtained using the conditioning argument. Let fα*,*μ*(*x*, *t*) *be the density function of Dα*,*μ*(*t*)*. Subsequently,*

$$\mathbb{E}[e^{\theta N(D\_{a,\mu}(t))}] = \int\_0^\infty \mathbb{E}[e^{\theta N(u)}] f\_{a,\mu}(u,t) du = e^{-t\left\{(\lambda(1-e^{\theta})+\mu)^a - \mu^a\right\}}.\tag{54}$$

*Using* (54)*, the mgf of TSFSP is*

$$\mathbb{E}\left[e^{\theta S\_{\mathbf{z}\_1,\mu\_2}^{\mu\_1,\mu\_2}(t)}\right] = \mathbb{E}\left[e^{\theta N\_1(D\_{\mathbf{z}\_1,\mu\_1}(t))}\right] \mathbb{E}\left[e^{-\theta N\_2(D\_{\mathbf{z}\_2,\mu\_2}(t))}\right] \\ = e^{-t\left[\{(\lambda\_1(1-\varepsilon^\theta)+\mu\_1)^{a\_1}-\mu\_1^{a\_1}\}+\{(\lambda\_2(1-\varepsilon^{-\theta})+\mu\_2)^{a\_2}-\mu\_2^{a\_2}\}\right]} \\ = e^{-t\left[\{(\lambda\_1(1-\varepsilon^\theta)+\mu\_1)^{a\_1}-\mu\_2^{a\_2}\}+\{(\lambda\_2(1-\varepsilon^{-\theta})+\mu\_2)^{a\_2}-\mu\_1^{a\_1}\}\right]} \\ = e^{-t\left[\{(\lambda\_1(1-\varepsilon^\theta)+\mu\_1)^{a\_1}-\mu\_2^{a\_2}\}\right]} \mathbb{E}\left[e^{\theta N\_1(D\_{\mathbf{z}\_1,\mu\_1}(t))}\right]$$

**Remark 11.** *We have* <sup>E</sup>[*Sμ*1,*μ*<sup>2</sup> *<sup>α</sup>*1,*α*<sup>2</sup> (*t*)] = *<sup>t</sup>*(*λ*1*α*1*μα*1−<sup>1</sup> <sup>1</sup> <sup>−</sup> *<sup>λ</sup>*2*α*2*μα*2−<sup>1</sup> <sup>2</sup> ). *Further, the covariance of TSFSP can be obtained by using* (29) *and*

$$\begin{aligned} \text{Cov}\left[S^{\mu\_{1}\mu\_{2}}\_{a\_{1},\mu\_{2}}(t), S^{\mu\_{1}\mu\_{2}}\_{a\_{1},\mu\_{2}}(s)\right] &= \text{Cov}\left[N\_{1}(D\_{a\_{1},\mu\_{1}}(t)), N\_{1}(D\_{a\_{1},\mu\_{1}}(s))\right] + \text{Cov}\left[N\_{2}(D\_{a\_{2},\mu\_{2}}(t)), N\_{2}(D\_{a\_{2},\mu\_{2}}(s))\right] \\ &= \text{Var}\left(N\_{1}(D\_{a\_{1},\mu\_{1}}(\min(t,s)))\right) + \text{Var}\left(N\_{2}(D\_{a\_{2},\mu\_{2}}(\min(t,s)))\right). \end{aligned}$$

**Proposition 13.** *The Lévy density ν<sup>S</sup> <sup>μ</sup>*1,*μ*<sup>2</sup> *<sup>α</sup>*1,*α*<sup>2</sup> (*x*) *of TSFSP is given by*

$$\begin{split} \nu\_{S\_{n\_1,n\_2}^{\mu\_1\mu\_2}}(\mathbf{x}) &= \sum\_{n\_1=1}^{\infty} \mu\_1^{n\_1-n\_1} \binom{n\_1}{n\_1} \lambda\_1^{n\_1} \sum\_{l=1}^{n\_1} \binom{n\_1}{l\_1} (-1)^{l\_1+1} \delta\_{l\_1}(\mathbf{x}) \\ &+ \sum\_{n\_2=1}^{\infty} \mu\_2^{n\_2-n\_2} \binom{n\_2}{n\_2} \lambda\_2^{n\_2} \sum\_{l\_2=1}^{n\_2} \binom{n\_2}{l\_2} (-1)^{l\_2+1} \delta\_{l\_2}(\mathbf{x}), \ \mu\_1, \ \mu\_2 > 0. \end{split}$$

**Proof.** By adding Lévy density *νNα*1,*μ*<sup>1</sup> (*x*) and *<sup>ν</sup>Nα*2,*μ*<sup>2</sup> (*x*) of *N*1(*Dα*1,*μ*<sup>1</sup> (*t*)) and *N*2(*Dα*2,*μ*<sup>2</sup> (*t*)), respectively, from Equation (30), which leads to

$$\nu\_{S\_{\mathfrak{a}\_1\mathfrak{a}\_2}^{\mu 1,\mu 2}}(\mathfrak{x}) = \nu\_{\mathcal{N}\_{\mathfrak{a}\_1\mathfrak{a}\_1}}(\mathfrak{x}) + \nu\_{\mathcal{N}\_{\mathfrak{a}\_2,\mathfrak{a}\_2}}(\mathfrak{x}).$$

#### *7.3. Simulation of SFSP and TSFSP*

We present the algorithm to simulate the sample trajectories for SFSP and TSFSP. We use *Python 3.7* and its libraries *Numpy* and *Matplotlib* for the simulation purpose. It is worth mentioning that Python is an open source and freely available software.

**Simulation of SFSP:** fix the values of the parameters *α*1, *α*2, *λ*<sup>1</sup> and *λ*2;

**Step-1:** generate independent and uniformly distributed random vectors *U*, *V* of size 1000 each in the interval [0, 1];

**Step-2:** generate the increments of the *α*1-stable subordinator *Dα*<sup>1</sup> (*t*) (see [29]) with pdf *fα*<sup>1</sup> (*x*, *t*), while using the relationship *<sup>D</sup>α*<sup>1</sup> (*<sup>t</sup>* <sup>+</sup> *dt*) <sup>−</sup> *<sup>D</sup>α*<sup>1</sup> (*t*) *<sup>d</sup>* <sup>=</sup> *<sup>D</sup>α*<sup>1</sup> (*dt*) *<sup>d</sup>* = (*dt*) 1 *<sup>α</sup>*<sup>1</sup> *Dα*<sup>1</sup> (1), where

$$D\_{\mathfrak{a}\_1}(1) = \frac{\sin(\mathfrak{a}\_1 \pi \mathcal{U}) [\sin((1-\mathfrak{a}\_1)\pi \mathcal{U})]^{1/\mathfrak{a}\_1 - 1}}{[\sin(\pi \mathcal{U})]^{1/\mathfrak{a}\_1} |\log V|^{1/\mathfrak{a}\_1 - 1}};$$

**Step-3:** generate the increments of Poisson distributed rvs *N*1(*Dα*<sup>1</sup> (*dt*)) with parameter *λ*1(*dt*)1/*α*1*Dα*<sup>1</sup> (1);

**Step-4:** cumulative sum of increments gives the space fractional Poisson process *N*1(*Dα*<sup>1</sup> (*t*)) sample trajectories; and,

**Step-5:** similarly generate *N*2(*Dα*<sup>2</sup> (*t*)) and subtract these to obtain the SFSP *Sα*1,*α*<sup>2</sup> (*t*).

We next present the algorithm for generating the sample trajectories of TSFSP. **Simulation of TSFSP:** fix the values of the parameters *α*1, *α*2, *λ*1, *λ*2, *μ*<sup>1</sup> and *μ*2.

Use the first two steps of previous algorithm for generating the increments of *α*-stable subordinator *Dα*<sup>1</sup> (*t*).

**Step-3**: for generating the increments of TSS *Dα*1,*μ*<sup>1</sup> (*t*) with pdf *fα*1,*μ*<sup>1</sup> (*x*, *t*), we use the following steps, called "acceptance-rejection method";


*e*−*μ*1*<sup>x</sup>* for *c* = *eμ*<sup>1</sup> *<sup>α</sup>*<sup>1</sup> *dt* and the ratio is bounded between 0 and 1;

**Step-4**: generate Poisson distributed rv *N*(*Dα*1,*μ*<sup>1</sup> (*dt*)) with parameter *λ*1*Dα*1,*μ*<sup>1</sup> (*dt*)

**Step-5**: cumulative sum of increments gives the tempered space fractional Poisson process *N*1(*Dα*1,*μ*<sup>1</sup> (*t*)) sample trajectories; and,

**Step-6**: similarly generate *N*2(*Dα*2,*μ*<sup>2</sup> (*t*)), then take difference of these to get the sample paths of the TSFSP*Sμ*1,*μ*<sup>2</sup> *<sup>α</sup>*1,*α*<sup>2</sup> (*t*).

The tail probability of *α*-stable subordinator behaves asymptotically as (see e.g., [30])

$$\mathbb{P}(D\_{\alpha}(t) > x) \sim \frac{t}{\Gamma(1-\alpha)} x^{-\alpha}, \text{ as } x \to \infty.$$

For *α*<sup>1</sup> = 0.6 and *α*<sup>2</sup> = 0.9 and fixed *t*, it is more probable that the value of the rv *Dα*<sup>1</sup> (*t*) is higher than the rv *Dα*<sup>2</sup> (*t*). Thus, for same intensity parameter *λ* for Poisson process the process *N*(*Dα*<sup>1</sup> (*t*)) will have generally more arrivals than the process *N*(*Dα*<sup>2</sup> (*t*)) until time *t*. This is evident from the trajectories of the SFSP in Figure 1, because the trajectories are biased towards positive side. The TSFPP is a finite mean process, however SFPP is an infinite mean process and hence SFSP paths are expected to have large jumps, since there could be a large number of arrivals in any interval.

**Figure 1.** The left hand figure shows the sample trajectories of SFSP with parameters *α*<sup>1</sup> = 0.6, *α*<sup>2</sup> = 0.9, *λ*<sup>1</sup> = 6 and *λ*<sup>2</sup> = 10. The sample trajectories of TSFSP are shown in the right figure with parameters *α*<sup>1</sup> = 0.6, *α*<sup>2</sup> = 0.9, *λ*<sup>1</sup> = 6, *λ*<sup>2</sup> = 10, *μ*<sup>1</sup> = 0.2 and *μ*<sup>2</sup> = 0.5.

**Author Contributions:** Conceptualization, N.G., A.K. and N.L.; Methodology, A.K. and N.L.; Simulation, N.G.; Writing-Original Draft Preparation, N.G., A.K. and N.L.; Writing-Review & Editing, N.G., A.K. and N.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** N.G. would like to thank Council of Scientific and Industrial Research (CSIR), India for supporting her research under the fellowship award number 09/1005(0021)2018-EMR-I. Further, A.K. would like to express his gratitude to Science and Engineering Research Board (SERB), India for the financial support under the MATRICS research grant MTR/2019/000286.

**Acknowledgments:** N.G. would like to thank Council of Scientific and Industrial Research(CSIR), India, for the award of a research fellowship.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **A Simplified Fractional Order PID Controller's Optimal Tuning: A Case Study on a PMSM Speed Servo**

**Weijia Zheng 1, Ying Luo 2,\*, YangQuan Chen <sup>3</sup> and Xiaohong Wang <sup>4</sup>**


**Abstract:** A simplified fractional order PID (FOPID) controller is proposed by the suitable definition of the parameter relation with the optimized changeable coefficient. The number of the pending controller parameters is reduced, but all the proportional, integral, and derivative components are kept. The estimation model of the optimal relation coefficient between the controller parameters is established, according to which the optimal FOPID controller parameters can be calculated analytically. A case study is provided, focusing on the practical application of the simplified FOPID controller to a permanent magnet synchronous motor (PMSM) speed servo. The dynamic performance of the simplified FOPID control system is tested by motor speed control simulation and experiments. Comparisons are performed between the control systems using the proposed method and those using some other existing methods. According to the simulation and experimental results, the simplified FOPID control system achieves the optimal dynamic performance. Therefore, the validity of the proposed controller structure and tuning method is demonstrated.

**Keywords:** fractional order PID control; PMSM; frequency-domain control design; optimal tuning

#### **1. Introduction**

Recently, fractional calculus has attracted increasing interest in various fields of science and engineering [1–4]. Fractional calculus is a generalization of the traditional integral and differential operators from integer order to real number order [5–8]. Thus, it has a larger feasible scope and greater flexibility in the system modeling and controller design methodology than the classical integer order one [9–11]. Fractional control has aroused theoretical and practical interest in the control community. Different kinds of fractional order controllers and tuning methods have been introduced and studied [12–14].

The fractional order proportional-integral-derivative (FOPID) controller has the tunable integral and differential orders, creating the possibility to provide better control performance [15]. However, the design of the FOPID controller is also more difficult. Generally, the tuning methods of the FOPID controller can mainly be divided into the analytic design methods and the optimization methods. The classic frequency-domain method is a typical analytic design method for the FOPI/D controller. Applying this method, three equations can be derived from three frequency-domain specifications [16], according to which the controller parameters can be calculated. However, with only three specifications, this method may not be directly used to design the FOPID controller with five degrees of freedom. On the other hand, the optimization design methods are based on iterative optimization [17,18]. Applying the optimization methods, the FOPID controller parameters are obtained by optimizing an objective function characterizing the performance of the control system, under the

**Citation:** Zheng, W.; Luo, Y.; Chen, Y.; Wang, X. A Simplified Fractional Order PID Controller'sOptimal Tuning: A Case Study on a PMSM Speed Servo. *Entropy* **2021**, *23*, 130. https://doi.org/10.3390/e23020130

Received: 27 November 2020 Accepted: 14 January 2021 Published: 20 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

constraints corresponding to specific design requirements, such as the system stability and sensitivity [19]. Thus, an optimal FOPID controller can be obtained using the optimization method, but the optimization process requires sufficient time and computing capability.

In our previous work, an analytic design method was proposed for the FOPID controller, according to the linear relation between the controller parameters [20]. On this basis, an improved FOPID controller is proposed in this paper, building the nonlinear relation between the integral gain *Ki* and derivative gain *Kd*, with a changeable coefficient. The optimal coefficient is modeled using the numerical fitting method, based on its optimal distribution with regard to the plant model characteristics and design specifications. With the estimated model, the parameters of the optimal FOPID controller can be calculated analytically according to the design specifications. Compared with our previous work, the improved FOPID controller proposed in this paper can be applied to a larger scope of plant models and design specifications because a more sophisticated relation between the controller parameters is adopted.

A case study of the proposed controller on the PMSM speed control is provided. The robustness to the gain variations, step response performance, and anti-load disturbance performance of the FOPID control system are tested by simulations and experiments. Comparisons are performed between the control systems using the proposed controller and those using some existing FOPID controllers. The advantages of the proposed method are demonstrated by simulation and experimental results.

The contributions of this paper mainly include: (1) The relations among the FOPID controller parameters being reasonably defined with a changeable coefficient, obtaining a simplified FOPID controller structure, but a complete P&I&D tunability. (2) The estimation model of the optimal relation coefficient between the controller parameters is built, realizing the optimal estimation of the fractional orders and the subsequent analytical calculation of the remaining parameters of the controller.

The paper is organized as follows: The simplified FOPID controller and the corresponding tuning method are proposed in Section 2. The estimation model of the optimal relation coefficient is discussed and established in Section 3. In Sections 4 and 5, the application of the improved FOPID controller to the PMSM speed control is studied. The robustness and dynamic performance of the control system using the simplified FOPID controller are verified by simulations and experiments. The conclusion is presented in Section 6.

#### **2. Simplified FOPID Controller**

The FOPID controller can be represented as (1),

$$\mathcal{C}(s) = K\_p \left( 1 + \frac{K\_i}{s^{\lambda}} + K\_d s^{\mu} \right), \tag{1}$$

where *Kp*, *Ki*, and *Kd* represent the gains of the proportional, integral, and derivative components, respectively; *λ* and *μ* are the real number orders with 0 < *λ* < 2 and 0 < *μ* < 2.

The typical unit negative feedback control system can be represented as Figure 1, where *G*(*s*) and *C*(*s*) are the plant and controller, respectively, and *nr* and *n* are the reference and output signals, respectively. The classic frequency-domain method depends on three specifications, i.e., the gain crossover frequency *ωc*, the phase margin *ϕm*, and the slope of the phase at *ω<sup>c</sup>* [21], yielding,

$$|\mathcal{C}(j\omega\_{\varepsilon})\mathcal{G}(j\omega\_{\varepsilon})| = 1,\tag{2}$$

$$\text{Arg}[C(j\omega\_\ell)] + \text{Arg}[G(j\omega\_\ell)] = -\pi + \varphi\_{\text{rel}} \tag{3}$$

$$\left. \frac{d\left[ \text{Arg} \left[ \mathbf{C}(j\omega) \mathbf{G}(j\omega) \right] \right]}{d\omega} \right|\_{\omega = \omega\_{\varepsilon}} = 0,\tag{4}$$

Therefore, the parameters of the FOPI or FOPD controllers can be calculated according to these specifications. However, five pending parameters of the FOPID controller cannot be solved according to only three equations.

To solve this problem, a relation between *Ki* and *Kd* is proposed as (5),

$$K\_d = \frac{1}{aK\_i},\tag{5}$$

where *a* is a changeable coefficient. The dynamic characteristics of the FOPID controller, e.g., the overshoot and oscillation of the step response, are affected by the fractional orders *λ* and *μ*. Taking advantage of a simple assumption [22], a relation between *λ* and *μ* is proposed as (6),

$$
\lambda = \mu. \tag{6}
$$

Thus, the FOPID controller is converted into a simplified form,

$$\mathcal{C}(s) = K\_p \left( 1 + \frac{K\_i}{s^{\lambda}} + \frac{1}{aK\_i} s^{\lambda} \right). \tag{7}$$

The amplitude and phase of the simplified FOPID controller can be obtained,

$$|\mathbb{C}(j\omega)| = K\_p \sqrt{P(\omega)^2 + Q(\omega)^2},\tag{8}$$

$$\text{Arg}[\mathcal{C}(j\omega)] = \arctan\left(\frac{\mathcal{Q}(\omega)}{P(\omega)}\right),\tag{9}$$

where:

$$P(\omega) = 1 + K\_i \omega^{-\lambda} \cos\left(\frac{\pi}{2}\lambda\right) + \frac{1}{aK\_i} \omega^{\lambda} \cos\left(\frac{\pi}{2}\lambda\right),\tag{10}$$

$$Q(\omega) = \frac{1}{aK\_l} \omega^{\lambda} \sin\left(\frac{\pi}{2}\lambda\right) - K\_l \omega^{-\lambda} \sin\left(\frac{\pi}{2}\lambda\right). \tag{11}$$

**Figure 1.** The closed-loop control system.

If *ω<sup>c</sup>* and *ϕ<sup>m</sup>* are given as the design specifications, substituting (9) into (3) yields,

$$\arctan\left(\frac{Q(\omega\_{\varepsilon})}{P(\omega\_{\varepsilon})}\right) + \text{Arg}[G(j\omega\_{\varepsilon})] = -\pi + \varphi\_{\text{m}}.\tag{12}$$

Assuming that the coefficient *a* has been determined, denoting *T* as tan(−*π* + *ϕ<sup>m</sup>* − Arg[*G*(*jωc*)]), an equation relating *Ki* and *λ* can be obtained,

$$s\_1 K\_i^2 + s\_0 K\_i - \frac{1}{a} = 0,\tag{13}$$

where:

$$s\_1 = \frac{T\omega\_c^{-\lambda}\cos\left(\frac{\pi}{2}\lambda\right) + \omega\_c^{-\lambda}\sin\left(\frac{\pi}{2}\lambda\right)}{\omega\_c^{\frac{\lambda}{\lambda}}\sin\left(\frac{\pi}{2}\lambda\right) - T\omega\_c^{\frac{\lambda}{\lambda}}\cos\left(\frac{\pi}{2}\lambda\right)},\tag{14}$$

$$s\_0 = \frac{T}{\omega\_\varepsilon \lambda \sin\left(\frac{\pi}{2}\lambda\right) - T\omega\_\varepsilon \lambda \cos\left(\frac{\pi}{2}\lambda\right)}.\tag{15}$$

Substituting (9) into (4), another equation about *Ki* and *λ* is obtained,

$$\frac{\lambda \omega\_c \lambda^{-1}}{aK\_i} \sin \left(\frac{\pi}{2} \lambda\right) + \frac{2\lambda}{a\omega\_c} \sin(\lambda \pi) + \frac{M}{\omega\_c^2 \pi} K\_i^2 + \frac{M\omega\_c^{2\lambda}}{aK\_i^2} + \frac{2M}{a} \cos(\lambda \pi)$$

$$+ \frac{2M\omega\_c^{\lambda}}{aK\_i} \cos \left(\frac{\pi}{2} \lambda\right) + \lambda \omega\_c^{\lambda - 1} K\_i \sin \left(\frac{\pi}{2} \lambda\right) + \frac{2M}{\omega\_c \lambda} \cos \left(\frac{\pi}{2} \lambda\right) + M = 0. \tag{16}$$

where:

$$M = \frac{d[\text{Arg}[G(j\omega)]]}{d\omega} \Big|\_{\omega = \omega\_c} \tag{17}$$

The integral gain *Ki* and order *λ* can be calculated by solving (13) and (16), and then, the proportional gain *Kp* can also be calculated by solving (2). Thus, if *a* is determined, all the parameters of the simplified FOPID controller can be calculated according to the design specifications.

#### **3. Estimation Model Establishment**

According to the proposed tuning method, the coefficient *a* should be determined before the calculation of the FOPID controller parameters. Thus, in order to improve the control performance, the distribution of the optimal *a* should be studied. In this paper, we concentrate on the third-order plant model described by (18),

$$G(s) = \frac{K}{s^3 + \tau\_1 s^2 + \tau\_2 s},\tag{18}$$

where *K*, *τ*1, and *τ*<sup>2</sup> are the parameters of the plant. The estimation model of *a* is established in the hyperspace defined by the ranges of the plant model parameters (*τ*1, *τ*2) and the design specifications (*ωc*, *ϕm*). The ranges of *τ*<sup>1</sup> and *τ*<sup>2</sup> are determined according to the parameters of the plant models in actual applications, while those of *ω<sup>c</sup>* and *ϕ<sup>m</sup>* are determined according to the design requirements. In this paper, the range of *τ*<sup>1</sup> is set from 90 to 180 and that of *τ*<sup>2</sup> is set from 6000 to 11,000. The range of the gain crossover frequency *ω<sup>c</sup>* is set from 35 rad/s to 70 rad/s, and that of the phase margin *ϕ<sup>m</sup>* is set from 30◦ to 60◦, covering the design requirements of a class of motion control systems [23].

#### *3.1. Optimal Samples' Collection*

Several values of *τ*<sup>1</sup> and *τ*<sup>2</sup> are uniformly selected from their ranges, respectively, obtaining (*τ*1,1, *τ*1,2, ..., *τ*1,*m*) and (*τ*2,1, *τ*2,2, ..., *τ*2,*n*). Since the plant model gain *K* has no influence on the estimation of *a*, it is given a fixed value. Thus, several test models can be established by combining the values of *τ*<sup>1</sup> and *τ*2,

$$G\_{i,j}(\mathbf{s}) = \frac{\mathbf{K}}{\mathbf{s}^3 + \tau\_{1,i}\mathbf{s}^2 + \tau\_{2,j}\mathbf{s}'} \tag{19}$$

where *i* = 1, 2, ..., *m*, *j* = 1, 2, ..., *n*. Similarly, several values of *ω<sup>c</sup>* (*ωc*,1, *ωc*,2, ..., *ωc*,*p*) and *ϕ<sup>m</sup>* (*ϕm*,1, *ϕm*,2, ..., *ϕm*,*q*) are selected from their ranges to be the given design specifications.

The integral of time and absolute error (ITAE) is adopted as the loss function to evaluate the dynamic performance of the control system,

$$J = \int\_0^\infty t|e(t)|dt,\tag{20}$$

where *e*(*t*) represents the error between the reference and output signals.

The optimal sample of *a* for each test model (*τ*1, *τ*2) and design index (*ωc*, *ϕm*) is collected following the steps shown in Figure 2. An accuracy threshold *σ* is set for the search of the optimal *a*. If the value resolution of the obtained *a* is smaller than *σ*, this value is considered to be the optimum; otherwise, another loop of search needs to be performed in a smaller range of *a*. For example, as shown in Figure 3, if the *k*th value of *a*, *ak*, is the current optimal value, but its resolution is larger than *σ*, namely *ak*<sup>+</sup><sup>1</sup> − *ak* > *σ*, then a new range of *a* will be created as (*ak*−1, *ak*+1), in which a new optimum will be obtained.

**Figure 2.** The determining process of the optimal *a*.

**Figure 3.** The construction of the new range of *a*.

According to the model parameter ranges, several values of *τ*1: 90, 100, ..., 180, and *τ*2: 6000, 6200, ..., 11,000, are selected to generate the test models. Similarly, several values of *ωc*: 34 rad/s, 36 rad/s, ..., 70 rad/s and *ϕm*: 30◦, 32◦, ..., 60◦ are selected to be the design specifications. The initial range of *a* is from 0.001 to 500. The accuracy threshold *σ* is 0.001. Thus, following the steps shown in Figure 2, the optimal values of *a* corresponding to all the test models and design specifications are collected.

#### *3.2. Estimation Model Establishment*

Given the design specifications (*ωc*, *ϕm*), an optimal FOPID controller can be designed for a plant model *G*(*s*), according to an optimal value of *a*, which depends on the plant model characteristics (*τ*1, *τ*2) and design specifications (*ωc*, *ϕm*). The estimation model is established to approximate the distribution law of the optimal *a*.

Firstly, the distribution of the optimal *a* for a single plant model with regard to *ω<sup>c</sup>* and *ϕ<sup>m</sup>* is studied. Taking the test model *G*2,5(*s*) (*τ*<sup>1</sup> = 100, *τ*<sup>2</sup> = 6800) as an example, the optimal values of *a* corresponding to different given crossover frequencies *ω<sup>c</sup>* and a fixed phase margin *ϕ<sup>m</sup>* (*ϕ<sup>m</sup>* = 30◦) are selected and plotted as the *ωc*–*a* relation curve in Figure 4. According to Figure 4, the distribution of the optimal *a* can be approximated as a curve.

**Figure 4.** The *ωc*–*a* relation curve with *ϕm* fixed to be 30◦.

The *ωc*–*a* relation curves of different *ϕ<sup>m</sup>* for test model *G*2,5(*s*) are plotted in Figure 5. It can be seen that the *ωc*–*a* relation curves corresponding to different *ϕ<sup>m</sup>* are close to each other. Thus, an assumption is adopted to simplify the analysis, i.e., the difference between the *ωc*–*a* relation curves corresponding to different *ϕ<sup>m</sup>* can be ignored. Therefore, for the same plant model, the optimal value of *a* is assumed to be only determined by *ωc*.

Adopting the simplifying assumption, an estimation model needs to be built for the mean values of the optimal *a*. The *ωc*–mean *a* relation corresponding to *G*2,5(*s*) is plotted as data spots in Figure 6.

It can be seen that the mean *a* values with regard to *ω<sup>c</sup>* obey an obvious distribution law, which can be described by an exponential function,

$$a = A(\pi\_1, \pi\_2)e^{B(\pi\_1, \pi\_2)\omega\_c} \,, \tag{21}$$

where *A* and *B* are the coefficients determined by the model parameters *τ*<sup>1</sup> and *τ*2. The values of *A* and *B* can be obtained using the numerical fitting methods. The fitting function is plotted as the red curve in Figure 6. Fitting the *ωc*–mean *a* relations of all the plant models, the values of *A* and *B* corresponding to different plant models: *Ai*,*<sup>j</sup>* and *Bi*,*j*, are obtained, where the subscript *i* corresponds to that of *τ*1,*<sup>i</sup>* and the subscript *j* corresponds to that of *τ*2,*j*, *i* = 1, 2, ..., *m*, *j* = 1, 2, ..., *n*.

**Figure 5.** The *ωc*–*a* relation curves correspond to different *ϕm*.

**Figure 6.** The *ωc*–mean *a* relation and fitting curve of the test model *G*2,5(*s*).

Secondly, the relation between the coefficient *A* and the model parameters (*τ*1, *τ*2) is studied. Taking *τ*2/*τ*<sup>1</sup> as the abscissa and the corresponding coefficient *A* as the ordinate, the distribution of *A* with regard to *τ*2/*τ*<sup>1</sup> is plotted in Figure 7. As can be seen, the distribution of *A* with regard to *τ*2/*τ*<sup>1</sup> can be approximated as a curve.

**Figure 7.** The distribution of *A* with regard to *τ*2/*τ*1.

The *τ*2/*τ*1–*A* relation is plotted again in Figure 8, without distinguishing the data spots corresponding to different plant models. According to the distribution of the data spots, the *τ*2/*τ*1–*A* relation can be fitted by a model with two exponential functions,

$$A(\tau\_1, \tau\_2) = M e^{P \frac{\tau\_2}{\tau\_1}} + N e^{Q \frac{\tau\_2}{\tau\_1}},\tag{22}$$

where *M*, *N*, *P*, and *Q* are the model coefficients, which can be obtained using numerical fitting methods. The fitting function is plotted as the red curve in Figure 8.

**Figure 8.** The *τ*2/*τ*1–*A* relation and the fitting curve.

Thirdly, the three-dimensional distribution of coefficient *B* with regard to *τ*<sup>1</sup> and *τ*<sup>2</sup> is plotted in Figure 9.

**Figure 9.** The distribution of *B* with regard to *τ*<sup>1</sup> and *τ*2.

Taking *τ*<sup>1</sup> and *τ*<sup>2</sup> as the independent variables, the (*τ*1, *τ*2)–*B* relation can be fitted by a cubic polynomial function,

$$B(\tau\_1, \tau\_2) = p\_{00} + p\_{10}\tau\_2 + p\_{01}\tau\_1 + p\_{20}\tau\_2^2 + p\_{11}\tau\_2\tau\_1 + p\_{02}\tau\_1^2 + p\_{30}\tau\_2^3 + p\_{21}\tau\_2^2\tau\_1 + p\_{12}\tau\_2\tau\_1^2,\tag{23}$$

where *p*00, *p*10, *p*01, *p*20, *p*11, *p*02, *p*30, *p*21, and *p*<sup>12</sup> are the model coefficients, which can be obtained using the numerical fitting methods. Therefore, all the coefficients of the estimation model are obtained.

#### **4. Simulation Study**

#### *4.1. Feasible Region Study*

The design flexibility of the proposed FOPID controller can be verified by studying the feasible regions of the design specifications. The feasible region of the design specifications includes the (*ωc*, *ϕm*) combinations, according to which the reasonable FOPID controller can be obtained by solving (2)–(4). To demonstrate the advantage of the proposed method, the feasible region of the simplified FOPID controller is compared with those of the FOPI and IOPID controllers.

Taking the test model *G*1,26(*s*) (*τ*<sup>1</sup> = 90, *τ*<sup>2</sup> = 11,000) as an example, the feasible regions of the FOPI, IOPID, and FOPID controllers are plotted in Figures 10–12, respectively, where the feasible design specifications are marked in blue. According to Figure 10, if the design specifications are in the region where both *ω<sup>c</sup>* and *ϕ<sup>m</sup>* are large, we are unable to design an FOPI controller to satisfy (2)–(4) simultaneously. Similarly, according to Figure 11, we are unable to design an IOPID controller if both *ω<sup>c</sup>* and *ϕ<sup>m</sup>* are small. In contrast, according to Figure 12, the feasible region of the FOPID controller covers the entire region of the design specifications. Therefore, the proposed FOPID controller achieves more design options and flexibility than the FOPI and IOPID controllers.

**Figure 10.** The feasible region of the FOPI controller.

**Figure 11.** The feasible region of the IOPID controller.

**Figure 12.** The feasible region of the FOPID controller.

#### *4.2. PMSM Speed Servo Plant*

The proposed estimation model and tuning method are applied to design the FOPID controllers for a class of PMSM speed servo systems. Applying the *d* − *q* coordinates and the field-oriented control scheme, the dynamic characteristics of a PMSM can be described by the following equations,

$$
\mu\_{\eta} = R i\_{\eta} + L\_{\eta} \frac{d i\_{\eta}}{dt} + \mathbb{C}\_{\mathfrak{e}} n\_{\mathfrak{e}} \tag{24}
$$

$$\frac{GD^2}{375} \frac{dn}{dt} = \mathbb{C}\_m i\_q - T\_{L, \prime} \tag{25}$$

where *uq* and *iq* are the *q*-axis voltage and current, respectively, *R* is the stator resistance, *Lq* is the *q*-axis stator inductance, *Ce* is the induced voltage constant, *n* is the motor speed in revolutions per minute (RPM), *Cm* is the torque constant, *TL* is the load disturbance torque, and *GD*<sup>2</sup> is the flywheel inertia.

In the PMSM servo system, the *q*-axis voltage is often supplied by the pulse-width modulation (PWM) inverter, whose dynamic characteristics can be approximated by a first-order filter with time constant *Ts*. Adopting a PI controller as the feedback controller of the *q*-axis current,

$$\mathcal{K}\_i(s) = K\_s(1 + \frac{1}{T\_{\rm s}s}),\tag{26}$$

the *q*-axis voltage can be obtained as:

$$
\mu\_{\boldsymbol{q}}(\mathbf{s}) = \frac{K\_{\boldsymbol{s}}}{T\_{\boldsymbol{s}}s} (i\_{\boldsymbol{q}r}(\mathbf{s}) - i\_{\boldsymbol{q}}(\mathbf{s})),
\tag{27}
$$

where *iqr* is the *q*-axis reference current. Thus, according to (24), (25), and (27), the transfer function of the PMSM speed servo plant (from *iqr* to *n*) can be represented as:

$$G(s) = \frac{\frac{K\_s}{C\_s T\_m T\_s T\_l}}{s^3 + \frac{1}{T\_l}s^2 + \frac{K\_s K\_l}{RT\_s T\_l}s'} \tag{28}$$

where *Tl* is the electromagnetic time constant, *Tl* = *L*/*R*, and *Tm* is the electromechanical time constant, *Tm* = *GD*2*R*/(375*CeCm*). The transfer function of the PMSM speed servo plant model used in this paper is described as:

$$G(s) = \frac{47,979.257}{s^3 + 127.38s^2 + 9995.678s}. \tag{29}$$

#### *4.3. Gain Robustness Study*

Taking the PMSM speed servo as the plant model, setting the design specifications as *ω<sup>c</sup>* = 40 rad/s and *ϕ<sup>m</sup>* = 55◦, the optimal coefficient *a* is estimated as 9.968. Thus, the FOPID controller is obtained,

$$C\_1(s) = 8.032 \left( 1 + \frac{13.207}{s^{0.983}} + 0.0076 s^{0.983} \right). \tag{30}$$

The open-loop Bode diagram of the PMSM servo system using the FOPID controller is shown in Figure 13. It can be seen that the magnitude and phase characteristics of the control system satisfy the design specifications. The phase characteristic has zero slope at *ωc*. Thus, the systems with gain variations will have similar phase margins as the nominal system.

**Figure 13.** The open-loop Bode diagram of the control system.

The step response is performed to test the overshoots of the control systems with gain variations. The nominal gain of the plant is multiplied by 120% and 80% to simulate the gain variations. The step responses of the nominal system and those with gain variations are shown in Figure 14.

It can be seen that the responses of the control systems with gain variations have similar overshoots, satisfying the robustness requirement.

**Figure 14.** The step responses of the simplified FOPID control systems with different loop-gains (simulation).

#### *4.4. Comparisons with Some Existing Methods*

An optimization-based tuning method was proposed in [24], with the sensitivity and complementary sensitivity functions introduced as the constraints. Applying this method, an optimal FOPID controller is designed for the PMSM speed control system,

$$C\_3(s) = 8.896 \left( 1 + \frac{29.815}{s^{1.299}} + 0.0685s^{0.403} \right). \tag{31}$$

The gain crossover frequency of the obtained control system is *ω<sup>c</sup>* = 51.6 rad/s, and the phase margin is *ϕ<sup>m</sup>* = 50◦. According to these design specifications, the optimal coefficient *a* is estimated as 5.047, and the FOPID controller is obtained,

$$\mathcal{C}\_4(s) = 10.451 \left( 1 + \frac{21.017}{s^{0.991}} + 0.0094 s^{0.991} \right). \tag{32}$$

The step response simulation is performed, using the optimal FOPID controller *C*3(*s*) (denoted as opt-FOPID) and the proposed FOPID controller *C*4(*s*) (denoted as a-FOPID) as the speed controllers, respectively. To guarantee a fair comparison, the two systems are made to have similar rising times. The response curves and the performance indexes are shown in Figure 15 and Table 1, respectively.

The load disturbance response simulation is also performed to test the anti-load disturbance performance of the control systems. The response curves and performance indexes are shown in Figure 16 and Table 2, respectively.

**Table 1.** The step response performance indexes of the control systems using the optimal (opt)-FOPID and a-FOPID (simulation).


**Figure 15.** The step responses of the control systems using the opt-FOPID and a-FOPID (simulation).

**Figure 16.** Theload disturbance responses of the control systems using the opt-FOPID and a-FOPID (simulation).

**Table 2.** The anti-load disturbance performance indexes of the control systems using the opt-FOPID and a-FOPID (simulation).


According to Figure 15 and Table 1, the responses of two systems have similar overshoots, but the system using the a-FOPID has a shorter settling time. Therefore, the system using the a-FOPID achieves better step response performance. According to Figure 16 and Table 2, the response of the system using the a-FOPID has a smaller speed drop and a

shorter recovery time. Therefore, the system using the a-FOPID achieves better anti-load disturbance performance.

A Bode shaping-based tuning method for the FOPID controller is proposed in [25]. Applying this method, a FOPID controller is designed for the PMSM control system,

$$C\_5(s) = 7.532 \left( 1 + \frac{49.843}{s^{1.27}} + 0.0604s^{0.556} \right). \tag{33}$$

The gain crossover frequency of the obtained control system is *ω<sup>c</sup>* = 41.5 rad/s, and the phase margin is *ϕ<sup>m</sup>* = 55.7◦. According to these design specifications, the optimal coefficient *a* is estimated as 9.128, and the FOPID controller is obtained,

$$\mathcal{C}\_6(s) = 8.362 \left( 1 + \frac{13.628}{s^{0.986}} + 0.008s^{0.986} \right) \,\text{.}\tag{34}$$

Step response simulation is performed, using the Bode shaping-based FOPID controller *C*5(*s*) (denoted as BS-FOPID) and the proposed FOPID controller *C*6(*s*) (denoted as a-FOPID) as the speed controllers, respectively. The response curves and the performance indexes are shown in Figure 17 and Table 3, respectively. The load disturbance response simulation is also performed. The response curves and performance indexes are shown in Figure 18 and Table 4, respectively.

**Figure 17.** The step responses of the control systems using the Bode shaping-based (BS)-FOPID and a-FOPID (simulation).

**Table 3.** The step response performance indexes of the control systems using the BS-FOPID and a-FOPID (simulation).


According to Figure 17 and Table 3, the response of the system using the a-FOPID has a smaller oscillation and a shorter settling time. Therefore, the system using the a-FOPID achieves better step response performance. According to Figure 18 and Table 4, the two responses have a similar speed drop and recovery time, but the response of the system

0 1 2 3 4 5 6 0 200 400 600 800 1000 1200 1400 time (s) speed (rpm) BS−FOPID a−FOPID 3 3.1 3.2 3.3 3.4 960 970 980 990 1000 1010 1020 1030

using the a-FOPID has a smaller oscillation. Therefore, the system using the a-FOPID achieves better anti-load disturbance performance.

**Figure 18.** Theload disturbance responses of the control systems using the BS-FOPID and a-FOPID (simulation).

**Table 4.** The anti-load disturbance performance indexes of the control systems using the BS-FOPID and a-FOPID (simulation).


#### **5. Experimental Study**

Figure 19 shows the PMSM speed control platform used in this paper. The PMSM is the model Sanyo-P10B18200BXS PMSM. In the experiments, the fractional order operator *s<sup>r</sup>* is realized by applying the impulse invariant discretization method [26].

**Figure 19.** The PMSM speed control platform.

#### *5.1. Gain Robustness Study*

Step response experiments are performed to test the gain robustness of the control system using the proposed FOPID controller. The proportional gain of the FOPID controller is multiplied by 120% and 80% to simulate the gain variations. The step responses of the nominal system and those with gain variations are shown in Figure 20.

**Figure 20.** The step responses of the simplified FOPID control systems with different loop-gains (experiment).

According to Figure 20, similar to the simulation result, the responses of the control systems with gain variations have similar overshoots, satisfying the robustness requirement.

#### *5.2. Comparisons with Some Existing Methods*

Step response experiments are performed, using the optimal FOPID controller *C*3(*s*) (opt-FOPID) and the proposed FOPID controller *C*4(*s*) (a-FOPID) as the speed controllers, respectively. The response curves and the performance indexes are shown in Figure 21 and Table 5, respectively. The load disturbance response simulation is also performed to test the anti-load disturbance performance of the control systems. The response curves and performance indexes are shown in Figure 22 and Table 6, respectively.

According to Figure 21 and Table 5, similar to the simulation result, the responses of the two systems have similar overshoots, but the response of the system using the a-FOPID has a shorter settling time. Therefore, the system using the a-FOPID achieves better step response performance. According to Figure 22 and Table 6, the responses of two systems have similar speed drops, but the response of the system using the a-FOPID has a shorter recovery time. Therefore, the system using the a-FOPID achieves better anti-load disturbance performance.

**Table 5.** The step response performance indexes of the control systems using the opt-FOPID and a-FOPID (experiment).


**Figure 21.** The step responses of the control systems using the opt-FOPID and a-FOPID (experiment).

**Figure 22.** The load disturbance responses of the control systems using the opt-FOPID and a-FOPID (experiment).

**Table 6.** The anti-load disturbance performance indexes of the control systems using the opt-FOPID and a-FOPID (experiment).


Step response experiments are performed, using the Bode shaping-based FOPID controller *C*5(*s*) (BS-FOPID) and the simplified FOPID controller *C*6(*s*) (a-FOPID) as the speed controllers, respectively. The response curves and the performance indexes are shown in

Figure 23 and Table 7, respectively. The load disturbance response simulation is also performed to test the anti-load disturbance performance of the control systems. The response curves and performance indexes are shown in Figure 24 and Table 8, respectively.

**Figure 23.** The step responses of the control systems using the BS-FOPID and a-FOPID (experiment).

**Table 7.** The step response performance indexes of the control systems using the BS-FOPID and a-FOPID (experiment).


**Figure 24.** Theload disturbance responses of the control systems using the BS-FOPID and a-FOPID (experiment).


**Table 8.** The anti-load disturbance performance indexes of the control systems using the BS-FOPID and a-FOPID (experiment).

According to Figure 23 and Table 7, the response of the system using the a-FOPID has a smaller overshoot and a shorter settling time. Therefore, the system using the a-FOPID achieves better step response performance. According to Figure 24 and Table 8, the speed drops and recovery time of two responses are close to each other, but the response of the system using the a-FOPID has smaller oscillation. Therefore, the system using the a-FOPID achieves better anti-load disturbance performance. From the simulation and experimental results, the simplified FOPID controller achieves flexible tuning capability, sufficient robustness to gain variations, and the optimal step response performance.

#### **6. Conclusions**

A simplified FOPID controller is proposed by building the relations between the controller parameters. An estimation model for the optimal relation coefficient *a* is built for a class of third-order models, according to which the optimal FOPID controller controllers can be obtained analytically. An actual application of the proposed controller and tuning method on the PMSM speed servo is studied by simulation and experiments, verifying the robustness and dynamic performance of the simplified FOPID control system. The advantages of the proposed method are demonstrated by the comparisons with some other existing methods. Some issues may be studied in the future works, such as improving the relation between the fractional orders and applying the simplified FOPID controller to other classes of plants.

**Author Contributions:** Methodology, Y.C. and W.Z.; validation, W.Z. and Y.L.; resources, X.W.; writing, W.Z.; supervision, Y.L.; funding acquisition, W.Z., Y.L. and X.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China grant number 51975234 and 61803087, the Natural Science Foundation of Guangdong, China grant number 2019A1515110180, the Projects of Guangdong Provincial Department of Education grant number 2017KQNCX215, 2018KTSCX237 and 2019KZDZX1034, the Science and Technology Planning Project of Guangdong, China grant number 2016B090911003, and the Science and Technology Program of Guangzhou, China grant number 201902010066. The APC was funded by 2019A1515110180.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Statistical Assessment of Discrimination Capabilities of a Fractional Calculus Based Image Watermarking System for Gaussian Watermarks**

**Mario Gonzalez-Lee 1,\*,†, Hector Vazquez-Leal 2,3,†, Luis J. Morales-Mendoza 1,†, Mariko Nakano-Miyatake 4,†, Hector Perez-Meana 4,† and Juan R. Laguna-Camacho 5,†**


**Abstract:** In this paper, we explore the advantages of a fractional calculus based watermarking system for detecting Gaussian watermarks. To reach this goal, we selected a typical watermarking scheme and replaced the detection equation set by another set of equations derived from fractional calculus principles; then, we carried out a statistical assessment of the performance of both schemes by analyzing the Receiver Operating Characteristic (ROC) curve and the False Positive Percentage (FPP) when they are used to detect Gaussian watermarks. The results show that the ROC of a fractional equation based scheme has 48.3% more Area Under the Curve (AUC) and a False Positives Percentage median of 0.2% whilst the selected typical watermarking scheme has 3%. In addition, the experimental results suggest that the target applications of fractional schemes for detecting Gaussian watermarks are as a semi-fragile image watermarking systems robust to Gaussian noise.

**Keywords:** fractional calculus; Gaussian watermarks; statistical assessment; false positive rate; semi-fragile watermarking system

#### **1. Introduction**

Digital watermarking has gained popularity in the past few decades as a copyright enforcement tool. It is an active research field that includes applications such as data authentication and data indexing among other practical applications [1–3]. The scenario of copyright enforcement is as follows: the copyright holder wants to exploit some digital media, so he embeds a watermark under the premise that, in case of an unauthorized person exploiting the media, the copyright holder would be able to demonstrate in court that his watermark was embedded in the media and hence he owns all rights to the media.

A watermarking system embeds a signal, called the watermark, into another signal known as the cover; a cover might be digital media such as an image, audio, video, or other digital media. Most of the proposed watermarking systems generate a pseudo-random signal (the watermark) using a user's key and then embeds this watermark into the cover; conversely, the watermarking system is able to detect the watermark or even retrieve it from the watermarked cover. If watermark samples are in the set {−1,1}, then the watermark is

**Citation:** Gonzalez-Lee, M.; Vazquez-Leal, H.; Morales-Mendoza, L.J.; Nakano-Miyatake, M.; Perez-Meana, H.; Laguna-Camacho, J.R. Statistical Assessment of Discrimination Capabilities of a Fractional Calculus Based Image Watermarking System for Gaussian Watermarks. *Entropy* **2021**, *23*, 255. https://doi.org/ 10.3390/e23020255

Received: 18 December 2020 Accepted: 13 January 2021 Published: 23 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

called binary; sometimes, designers let the watermark be a pseudo-random sequence with Gaussian distribution, this kind or watermark is called a Gaussian Watermark.

A watermarking system can have two types of errors during its attempt to detect a watermark:

**Error type I:** The system failed to find a watermark; this is called a False Negative (FN). **Error type II:** The system found a given watermark even when either no watermark or another watermark was embedded; this is called False Positive (FP).

A FP is considered flawed that must be avoided because this might lead to a legal dispute on the copyrights of the digital media. For this reason, systems that exhibit a high FPP are impractical and thus excluded from literature. Usually, a watermarking system has a negligible FPP for detecting binary watermarks; conversely, some systems might have a high FPP when detecting Gaussian watermarks.

To clarify this issue, consider the following example: Figure 1 (left) shows an image watermarked with a Gaussian watermark. Only one watermark was embedded; however, the system detects several watermarks as if they were actually embedded as shown in Figure 1 (right). Assuming that a court acknowledges as the copyright owner any individual who claims the rights to some digital media granted, he can prove that the watermarking system detects his watermark within the media. Under these conditions, an attacker would have to search for a watermark that produces a positive detection and could then claim ownership of the media, causing a legal dispute.

**Figure 1.** Faulty detection of a Gaussian watermark due to False Positives. (**left**) Watermarked image. *g* = 5, *PSNR* = 34.14 dB. (**right**) The systems verify the presence of several watermarks; the cases that fall in the red zone are False Positives.

To help to mitigate this issue, we proposed in a previous paper to replace the detection equations of watermarking systems to reduce the FPP. Although results were interesting and seem promising, our tests were not conclusive due to the low number of images in the database used in the experiments; thus, the purpose of this paper is to fill the remaining gaps in our previous proposal by analyzing the cases we left unexplored using a bigger image database. In this paper, we put our early proposal on a firmer basis, we:


With these results as a basis, we expect designers of watermarking systems to take advantage of Gaussian watermarks when appropriate to meet their design goals. At the moment, it is difficult to detect watermarks using simple equations, so we look forward to providing an alternative to reuse previously proposed schemes by using fractional calculus based equations.

The usage scenarios for such schemes include:


Another scenario will be discussed later.

The rest of the paper is organized as follows: in Section 2, we review the background of the analyzed watermarking scheme. A discussion about related works is presented in Section 3. Section 4 presents a Fractional Scheme for watermarking. In Section 5, we discuss the materials and methods of analysis used to carry out the experiments; next, in Section 6, we present the experimental results; then, in Section 7, we discuss the experimental results and present the conclusions, and finally the references are in the last section.

#### **2. The Watermarking Model**

Before continuing with the background fundamentals, let us define the terminology used in the remainder of this paper. One often refers to different watermarks, so we will call the set of different watermarks W; *wk* is the k-th watermark of the set W and *wk*[*i*] denotes the i-th sample of the k-th watermark. The set of images that serve as covers is X; similarly, *xk* denotes the k-th image and *xk*[*i*] is the i-th sample of the k-th. *yk*[*i*] is the i-th sample of the k-th watermarked image. Note that, although we are focusing on images, we will use one index for the sake of simplicity, so consider *i* = (*r*, *c*) a coordinate pair of the image.

A simple model approach to watermarking is to make analogies to the field of the theory of communications. In this context, we assume that the watermark is transmitted through a communications channel as pictured in Figure 2. The model has the following variables: the cover which is a signal used as host for the watermark; a user's key as input for a pseudo-random number generator, and the embedding gain which is related to the watermark's energy. In an ideal scenario, the cover does not distort the watermark; however, in practice, this can not be achieved, so the effects of the cover on a watermark are modeled as the distortion caused by the channel. Attacks to the watermark are modeled as noise. An attack is a signal processing operation performed on the watermarked with the goal of making the watermark undetectable by the watermarking system.

**Figure 2.** General model of watermarking as a communication process.

#### *2.1. Watermark Embedding*

There are two basic rules for embedding watermarks: the additive rule and the multiplicative rule. We will focus on the additive embedding rule since it is widely used in most related works.

The watermarking system embeds the watermark *wl* into the cover *xl* producing the watermarked signal *yl* as shown in Figure 3. This scheme uses the additive rule defined as:

$$y\_l = x\_l + \mathcal{g}w\_{l\prime} \tag{1}$$

where *yl* is the watermarked signal and *g* is the embedding gain.

**Figure 3.** Watermark embedding scheme.

#### *2.2. Watermark Detection*

A typical watermark system assesses the presence of the watermark by computing two statistics: a decision variable which is a measurement of the presence of the watermark within the watermarked image, and a threshold that helps to decide if the watermark is present or absent. If the decision variable is greater than or equal to the threshold, then the watermark was detected; otherwise, the watermark is absent as shown in Figure 4.

**Figure 4.** Block diagram of the watermark detection process.

Most watermarking systems have detected watermarks using the cross-correlation formula since the early works on watermarking; an example is the highly influential paper by Cox et al. [3]. The watermarking system uses the received and possible noisy watermarked media (*y*∗ *<sup>l</sup>* ) for detecting the watermark; first, it computes a decision variable *dI*(*wl*) as follows:

$$d\_I(w\_l) = \frac{1}{N} \sum\_{i=1}^{N} w\_l[i] y\_l^\*[i];\tag{2}$$

Next, the system compares *dI*(*wl*) to a threshold (*ThI*(*wl*)) and, if *dI*(*wl*) ≥ *ThI*(*wl*), then the detection is positive; the threshold is computed using the following equation [4]:

$$\operatorname{Th}\_I(w\_l) = 3.3 \sqrt{2 \frac{\sigma^2}{N}} \tag{3}$$

where *σ*<sup>2</sup> is the variance of *y*<sup>∗</sup> *l* .

Many state-of-the-art algorithms use (1)–(3) for inserting and detecting the watermark as discussed later in this paper. We will call (2) and (3), the integer equation set—hence, the subscript *I* of the detection variable set. A watermarking scheme based on Equations (1)–(3) is shown in Figure 5.

The decision of the system for the integer equation set is computed as:

**Figure 5.** Integer watermark detecting scheme.

Many works use (1) and (2) to embed and detect watermarks respectively as discussed in next section.

#### **3. Works Related to Watermarking Based on Fractional Calculus**

On the other hand, Fractional Calculus (FC) has gained attention in recent years; for example, Refs. [5–8] are good references that cover the basics on FC ranging from introductory to advanced FC theory. Many scientists used it for modeling several physical phenomena with applications to engineering; for example, in [9–11], the authors present applications of FC to the analysis of control systems. In [12–14], the authors present applications to Digital Filters design. In [15,16], the authors discuss an approach to linear systems analysis for both continuous and discrete cases. Researchers already started to develop FC applications to watermarking; related works exhibit a tendency to adapt (1) and (2) for working with fractional calculus based approaches.

Some authors use a fractional derivative for watermarking since there is a relationship between the order of the derivative and the resulting function; this relationship is difficult to establish. For example, the authors of [17] use the Grünwald–Letnikov fractional operator for computing a pseudo-random sine function, allowing two fractional orders *α* and *β* to act as keys. The authors claim that this scheme is robust toward occlusion attack; however, this is the only test they reported. The work [18] is similar to [17]. The main difference between those works is that authors of [18] use the fractional Cauchy formula for the sine function. Authors report that the system is robust; nevertheless, their results are supported by the test in just one image lacking evidence for confirming the system's reliability.

Other authors use the Fractional Fourier Transform (FrFT) for watermarking since there is a strong dependency between the orders and the resulting coefficient set of the FrFT, a dependency that seems random. The algorithm proposed in [19] uses the FrFT coefficients as the embedding domain. The authors report good results; however, they present just a

case of study. A similar approach is presented in [20]. This approach also uses the fractional orders as the secret keys. The watermark is detected using standard cross-correlation. The authors claim that the system is robust toward JPEG compression, noise addition, and image manipulation operations such as median filtering, Gaussian smoothing, and sharpening filtering. Another work that uses the FrFT is [21]; its authors affirm that their proposal is robust to geometrical transform, filtering, and histogram stretching; however, they carried out too few experiments. In [22], the authors present an approach based on the FrFT with a random modification to the phase. The resulting system is more similar to a digital signature based system than to a typical watermarking system. This system is robust against cropping, salt and pepper noise addition, uniform noise addition, Gaussian noise addition, noise addition in both the amplitude and the phase, JPEG compression, and histogram equalization operations. Another idea presented in [23] is to generate a watermark in the FrFT domain and embed it into an image also in the FrFT domain using the additive rule. The authors used the cross-correlation for detecting the watermark. This scheme is robust toward occlusion attack, which is the only attack reported by the authors.

The Random Fractional Fourier Transform (RFrFT) is a variation of the FrFT; it has the same properties of FrFT but has the advantage that the spectrum is random and exhibits a high embedding capacity and robustness for watermarking applications. An RFrFT application to watermarking is presented in [24]. This system computes the RFrFT with a given random phase; then, it divides the transformed image into blocks and computes their fractal dimension; next, it selects a set of those blocks and uses the highest amplitude in each block for watermark embedding using Amplitude Shift Keying (ASK). The watermark extraction is accomplished by reversing previous steps. The system computes the Mean Square Error to measure the robustness using both the extracted and the real watermark. They tested their system by performing three attacks: noise addition, cropping, and JPEG compression.

Another fractional calculus based transform, the Discrete Fractional Random Transform (DFRNT), was used in [25]. This work is similar to [24]; first, the system computes the DFRNT; then, it divides the signal into blocks and selects a set of blocks randomly; next, it selects the highest amplitudes for watermark embedding using Phase Shift Keying (PSK). The authors report that their proposal is robust against Gaussian noise addition, cropping, and low pass filtering; however, they present too few tests.

One more fractional based transform is the Fractional Dual-Tree Complex Wavelet Transform (FrDT-WT); the FrDT-WT is used to find the wavelet transform in the Fourier domain resulting in a mathematical description of the multiresolution properties. The work presented in [26] and exploits that the randomness of the FrDT-WT coefficients depends on the fractional order, also using a biometric pattern to further enhance the security. The main idea is to build two biometric images; then, use the SURF algorithm to compute the robust matching point vectors; next, use these vectors to compute the keys for building a chaotic map. The watermark extraction uses both the original and the watermarked images. The authors report that their system is robust. The attacks covered in the test include average filtering, median filtering, Gaussian noise addition, salt and pepper noise addition, JPEG compression, SPIHT compression, row-column deletion, resizing, cropping, rotation, histogram equalization, contrast adjustment, and sharpen attacks; however, there were only six images used for the test; furthermore, the reported results correspond to their best case.

Another work is [27] that is almost the same as the system presented in [26]. The main difference between these works is that Ref. [27] uses the Redundant Fractional Wavelet Transform (RFrWT) due to a problem with the discrete FrDT-WT related to the use of decimators.

The authors of [28] present an interesting idea; unlike most watermarking schemes, their system does not embed a watermark into a host image, but they use Visual Cryptography (VC) and a Visual Secret Sharing Scheme. The system constructs two shares that convey a secret message in the following way: the encoder divides the host image into blocks; then, it selects a set of blocks and computes the FrFT using orders *α* and *β*; next, it

computes the Singular Value Decomposition (SVD) of the transformed blocks and uses the first value of the resulting SVD for computing the master share according to the standard rules of secret sharing schemes. The authors report that their scheme resists various signal processing operations such as JPEG compression, average filtering, median filtering, blurring filtering, sharpening filtering, Gaussian noise addition, contrast adjustment, gamma correction, histogram equalization, resizing, rotation, and geometrical distortion.

All of these works have in common the use of (1) and (2) to embed and detect watermarks; from this perspective, we can say that the overall difference among them is the use of some transform coefficient set for watermarking. In other words, they use already proposed equations, and the novelty of these works rely on the use of a different embedding domain. This leads to incrementing the complexity of the watermarking system and other problems related to the multiple definitions of fractional operators proposed until now.

On the other hand, the authors of [29] analyzed the watermarking systems proposed in [17] through [28] and observed that they use (1) and (2), so they proposed a new improved equation set to substitute (2) and (3). They showed that this modification increases the system's robustness, so the watermarking system designer might prefer to use fractional equations as a reliable solution for copyright enforcement; however, they limited their study to the case where the watermark is binary, and they added that they would skip the case of Gaussian watermarks since the system based on (2) and (3) was not reliable in this case, so a fair comparison to their proposed equation set was not possible in the context of the experiments carried out to test their proposed scheme.

The authors of [30] explored the case of Gaussian watermarks, and their results suggest that the scheme proposed in [29] reduces the False Positive Percentage; however, they limited the benchmark corpus to 20 images from the standard image set.

The case of the fractional scheme proposed in [30] for the Gaussian watermarks case needs a deeper study; for this reason, we accomplished this study where we explore the behavior of the fractional scheme for detecting Gaussian watermarks; we looked for confirming that the fractional scheme proposed in [29] reduces the false positive percentage of the detector when Gaussian watermarks were embedded, and, by reaching this goal, we confirmed that the fractional scheme is reliable for watermarking applications when Gaussian watermarks are used; thus, the novelty of this paper is to generalize the results presented in [29,30]. The main advantage of the proposed scheme is that it avoids the problems related to the use of fractional transforms found in previous works, keeping the complexity almost the same, however, for detecting Gaussian watermarks.

#### **4. Fractional Calculus Approach to Watermark Detection**

The detection variable derived from FC principles proposed in [29] is:

$$d\_{F}(w\_{l}) = -\operatorname{Im}\left[\frac{3}{4}\frac{1}{N}\sqrt{\left(\sum\_{i=1}^{N}y\_{l}^{\*}[i]w\_{l}[i]\right)^{2} - \frac{2}{3}\sigma^{2}N[2NH-1]} - \epsilon\right],\tag{5}$$

where Im[·] is the imaginary part operator, and the threshold is:

$$Th\_F(w\_l) = k\_P \sigma^2 \sqrt{\frac{H}{\epsilon}},\tag{6}$$

where:

$$\epsilon = \frac{3}{4} \frac{1}{N} \sqrt{\frac{2}{3} \sigma^2 N [2NH - 1]},\tag{7}$$

with *H* = ln( √ 2*πσ*2e); *σ*<sup>2</sup> is the variance of *y*<sup>∗</sup> *<sup>l</sup>* . We call (5) and (6) the Fractional equation set. A fractional scheme based on (5) and (6) is shown in Figure 6.

The decision of the system for the fractional equation set is:

**Figure 6.** Fractional watermark detecting scheme.

If we use (5) and (6) for detecting Gaussian watermarks, we get the result shown in Figure 7, which clearly has improved detection characteristics since it has no false positives. We are looking for confirming that the scheme in Figure 6 is more reliable than the scheme in Figure 5. This follows the strategy stated early in this paper about reusing the algorithm in Figure 5 by replacing detection equations with fractional calculus based equations resulting in a possibly improved algorithm and then verifying the effectiveness of this strategy. For this reason, there is a lack of comparison to related works as a means of control to the experiments. In other words, a fair experiment in the purpose's context of this paper is to compare the original algorithm versus the same algorithm with fractional equations and assess its improvement. Thus, the only control needed is the original algorithm and, as a result, the outcomes of the experiments are reliable.

**Figure 7.** Detection of the watermark. (**left**) using (2) and (3); (**right**) using (5) and (6), and note the lack of False Positives. The cases that fall in the red zone are False Positives.

#### **5. Materials and Methods**

To carry out experiments, we used 10,000 images of the BOWS database as the set X; each image of this set is grayscale with size 512 × 512 pixels and their luminance values are in the range [0, 255].

We used the embedding scheme shown in Figure 3 for watermarking each image in the set X. In addition, the embedding gain was fixed for all cases to the value *g* = 5; this setting leads to a Peak Signal-to-Noise Ratio (PSNR) mean of 34.21 dB for the entire set giving a fair balance between robustness and imperceptibility of the watermark. The

embedded watermark *wl* was selected at random from the watermark set W for each image; each watermark in the set was equally probable.

The goal of the tests was to assess the capacity of watermarking schemes shown in Figures 5 and 6 to reduce the false positives by computing the FPP and the ROC curve.

For the first test, we computed the FP of each image. To achieve this, each image in the set X was watermarked; then, the system tried to detect all watermarks in W within a single image; next, all FP were identified and counted using (4) and (8). The false positives computing process is summarized in Procedure 1, and it is described in Figure 8 (left). False positives gave us an insight about the reliability of both the integer and the fractional schemes.

For the sake of simplicity, and without loss of generality, we indicated *D*(*wl*) instead of *DI*(*wl*) or *DF*(*wl*) in all procedures since the same steps were followed for both schemes.

We selected the Receiver Operating Characteristic (ROC) since it is regarded as an objective measure to evaluate performance of a decision technique. Thus, as a second test, we computed the ROC as follows: each image in the set X was watermarked using watermark *wl*; then, we computed *dI*(*wl*), *dF*(*wl*), *DI*(*wl*), *DF*(*wl*). Those values and the corresponding ground truth values of *DI*(*wl*), *DF*(*wl*) were recorded. The data were used to derive a Generalized Lineal Model for estimating the ROC for both the integer and the fractional schemes. Data collecting steps were summarized in Procedure 2 and further explained in Figure 8 (right). The ROC curves were used to evaluate the integer and the fractional to clarify which of them is more reliable.

With a last test, we examined the robustness of the fractional scheme. To achieve this goal, each image in the set X was watermarked to get the set of watermarked images Y, and then an attack was carried out on the watermarked images; next, we added up the cases where the embedded watermark was detected to compute the percentage of detected watermark cases. The process of computing the detection rate is summarized in Procedure 3. The percentage of detected watermarks after the watermarked image was attacked suggested the target applications of the fractional scheme based on its robustness.

#### **Procedure 1** Procedure to record measures for False positives.

**Require:** Image set X, Watermark set W.


```
10: FP = FP + 1
```

#### **Procedure 2** Procedure to record measures for getting the ROC curve.

**Require:** Image set X, Watermark set W.


#### **Procedure 3** Procedure to record Detection Rate.

**Require:** Image set X, Watermark set W.


**Figure 8.** Collecting data. (**left**), all the system's responses that cross the threshold when no watermark was embedded are collected since these are False Positives. These data fall in the red zone. (**right**) We collected the true positive (data in the green zone), a single true negative (data in the blue zone), and all false positives for each image in the set Y (data in the red zone).

#### **6. Experimental Results**

As a first test, we computed the false positive percentages for both the integer and the fractional schemes and build a boxplot. Figure 9 (left) shows that the false positives for the integer scheme span from 0.1% to 16.15%; in contrast, the fractional scheme has very low percentages of false positives and the range of values concentrates around 0.2%.

**Figure 9.** Statistical assessment of discrimination characteristics. (**left**) value ranges of false positives for both the Integer and Fractional watermarking schemes. No outliers are drawn for the sake of clearness; (**right**) comparison of the ROC curves of the integer and fractional schemes.

In our second test, we evaluated the quality of both schemes; we used the data we collected to draw the ROC curve shown in Figure 9 (right). As a result of this test, we found that the ROC of the integer scheme has an Area Under the Curve (AUC) of 50.9%, whereas the fractional scheme obtained an AUC of 99.2% for the same test.

The last test consisted of examining the successful detection rate after attacks; this was accomplished by attacking each watermarked image in the set Y. The attacks performed were: average filtering, median filtering, Gaussian noise addition, speckle noise addition, salt and pepper noise addition, JPEG compression, cropping, removing random rows and columns, substituting random rows and columns, and scaling. The corresponding figures are in Appendix A for the sake of readability of this section.

A bar plot of the percentage of detected watermarks after the image set Y filtered using an average filter is shown in Figure A1. The attack was repeated for window sizes of 3 × 3, 5 × 5, and 7 × 7. The resulting bar plot shows that the percentage of successful watermark detection after the attack is about 6% and became lower as the window size increases.

We performed a similar attack; this time, a median filter was used to filter the set Y. Results shown in Figure A2 reveal a similarity to the results reached for the average filtering attack; this time, detection percentages are lower than 10%.

The next test is comprised of adding Gaussian noise to each watermarked image in the set Y and then we tried to detect the watermark. We constructed a bar plot showing the percentages of detected watermark for various noise variances. Figure A3 shows that the scheme is robust to Gaussian noise. This figure might look suspicious because it looks atypical; thus, to discard that the Gaussian noise triggers false positives and this causes a high detection rate, we inspected some cases and present an example in Figure 10.

**Figure 10.** Detection of the watermark. (**left**) noisy watermarked imaged; noise variance was 0.05. (**right**) the corresponding evaluation of (5).

The following test consists of adding salt and pepper noise and then detecting the watermark. Results depicted in Figure A4 show that the fractional scheme is robust up to noise densities of 20%, the detection rate drops for noise densities higher than 20%.

We carried out the next test by adding speckle noise before trying to detect the watermark. Results in Figure A5 show that the fractional scheme is robust up to noise variances of 0.2. After this limit, the detection rate drops.

A very common scenario is to compress images using the JPEG standard, so the next test comprised watermarking the image and compressing the image with the JPEG standard, and then detecting the watermark. Figure A6 shows that the fractional scheme is robust to JPEG compression up to a quality factor of 90% and then the detection percentage starts to decline.

Another common signal processing operation is the cropping attack. Figure A7 shows results when the watermarked image is cropped. This figure shows that the fractional scheme is robust up to 20% of cropped pixels.

The next test selects *t* rows and *t* columns at random, and then removes these rows and columns from the watermarked image; the resulting image is smaller than the original watermarked image, so the image is then scaled to match the size of the original watermarked image. The watermark was then detected, and the results are shown in Figure A8. Results show that the fractional scheme is not robust since it exhibits a detection rate around 8% for removing 10 rows and columns.

Another test, similar to the previous one, selects *t* rows and *t* columns at random, and then substitutes the selected rows and columns with the adjacent row or column of the same image. The watermark was detected, and results are shown in Figure A9. Results show that the fractional scheme is robust up to substituting 100 rows and columns.

Finally, we carried out a scaling attack; the watermarked image was scaled to make it smaller and then the image was restored back to its original size. The results are shown in Figure A10; this figure shows that the fractional scheme is robust up to 90%. In other words, we shrank the image to 90% of its original size and then restored to the original size before we tried to detect the watermark.

#### **7. Conclusions**

In this study, we compared the FPP of the original watermarking scheme versus the corresponding version with detection equations derived from fractional calculus; evaluated the quality of both schemes as a watermark detector by comparing their ROC curves, and examined the successful detection rate after attacking the watermarked images to define robustness of the system. We performed several tests that allowed us to conclude the following facts:

The False Positives percentage is much lower for the fractional scheme than the corresponding percentages of the integer scheme. According to Figure 9 (left), the FPP spans from 0.2% to 16.2% for the integer scheme whilst the FPP concentrates around 0.2% for the fractional scheme. This means it is more likely to get a 0.2% FPP when using a fractional scheme and also the FP rate will be lower for this fractional scheme than the corresponding results for an integer scheme.

Results show that the fractional scheme is a reliable method for detecting Gaussian watermarks according to Figure 9 (right); the fractional scheme has a significant advantage compared to the integer scheme since the AUC is higher for the fractional case (*AUC* = 99.20% versus *AUC* = 50.90%); this means that the fractional scheme has higher discriminative power compared to the integer scheme.

In addition, the experimental results in Figures A1–A10 show that this system is fragile to all attacks presented in Section 6, except for the case of the Gaussian noise addition attack, this is because the noise is added in the same manner as the watermarks are; thus, the systems treats Gaussian noise as a watermark. The target applications of such a scheme include cases where the watermarks should not survive attacks; an example of practical application of a fractional scheme is for authenticating information.

Since the target application might be authenticating information, it will be convenient to propose another value of *kp*; this value should be higher than that used in this study since this will help to reduce the detection rate after attacks. Additional usage scenarios include: The system designer wants to enhance the discriminative power of a system already proposed, the watermark is some information that closely holds the Gaussian distribution, and the complexity of the watermarking system has to be low.

The results provide designers of watermarking systems with an alternative to take advantage of Gaussian watermarks when appropriate to meet their design goals. Thus, the proposed strategy is an alternative to reuse previously proposed schemes by using fractional calculus based equations.

The results obtained in this study complement the study in [29] since the case of Gaussian watermarks was left unexplored, so this paper provides the designer of watermark systems with a more logical insight of the potential and practical applications of a fractional watermark detector.

The characteristics to discriminate between patterns with Gaussian statistical distribution suggest that the fractional equations might be used in pattern recognition applications where samples have a Gaussian distribution.

**Author Contributions:** Conceptualization, M.G.-L. and H.V.-L.; Methodology, M.G.-L. and H.V.-L.; Software, L.J.M.-M. and J.R.L.-C.; Validation, M.N.-M. and H.P.-M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors thank the Universidad Veracruzana for their support for this work. In addition, the authors thank the students Ivan de Gaona-Marquez, Marco A. Salas-Moreno, and Flavio C. Garcia-Salas who helped during the development of this work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

This appendix has complementary experimental data that the reader might need to check out closer.

**Figure A1.** Percentage of successful watermark detection after an average filter attack for various window sizes.

**Figure A2.** Percentage of successful watermark detection after a median filter attack for various window sizes.

**Figure A3.** Percentage of successful watermark detection when Gaussian noise is added to the watermarked image. The mean of the noise was zero and the horizontal axis corresponds to the variance of the noise.

**Figure A4.** Percentage of successful watermark detection when salt and pepper noise is added to the watermarked image. Horizontal axis show the noise density.

**Figure A5.** Percentage of successful watermark detection when speckle noise is added to the watermarked image. Horizontal axis show the noise variance.

**Figure A6.** Percentage of successful watermark detection after the watermarked image was compressed using the JPEG standard. The horizontal axis corresponds to the JPEG compression quality factor.

**Figure A7.** Percentage of successful watermark detection after cropping. This figure spans various cropping percentages.

**Figure A8.** Percentage of successful watermark detection after a number of rows and columns were removed from the watermarked image. The horizontal axis shows the number of rows and columns removed.

**Figure A9.** Percentage of successful watermark detection after a number of rows and columns of the watermarked image were substituted with another row or column.The horizontal axis shows the number of rows and columns replaced.

**Figure A10.** Percentage of successful watermark detection when the watermarked image is scaled. The horizontal axis shows the scaling factor.

#### **References**


## *Article* **Variable-Order Fractional Models for Wall-Bounded Turbulent Flows**

**Fangying Song <sup>1</sup> and George Em Karniadakis 2,\***


**Abstract:** Modeling of wall-bounded turbulent flows is still an open problem in classical physics, with relatively slow progress in the last few decades beyond the log law, which only describes the intermediate region in wall-bounded turbulence, i.e., 30–50 y+ to 0.1–0.2 R+ in a pipe of radius R. Here, we propose a fundamentally new approach based on fractional calculus to model the entire mean velocity profile from the wall to the centerline of the pipe. Specifically, we represent the Reynolds stresses with a non-local fractional derivative of variable-order that decays with the distance from the wall. Surprisingly, we find that this variable fractional order has a universal form for all Reynolds numbers and for three different flow types, i.e., channel flow, Couette flow, and pipe flow. We first use existing databases from direct numerical simulations (DNSs) to lean the variableorder function and subsequently we test it against other DNS data and experimental measurements, including the Princeton superpipe experiments. Taken together, our findings reveal the continuous change in rate of turbulent diffusion from the wall as well as the strong nonlocality of turbulent interactions that intensify away from the wall. Moreover, we propose alternative formulations, including a divergence variable fractional (two-sided) model for turbulent flows. The total shear stress is represented by a two-sided symmetric variable fractional derivative. The numerical results show that this formulation can lead to smooth fractional-order profiles in the whole domain. This new model improves the one-sided model, which is considered in the half domain (wall to centerline) only. We use a finite difference method for solving the inverse problem, but we also introduce the fractional physics-informed neural network (fPINN) for solving the inverse and forward problems much more efficiently. In addition to the aforementioned fully-developed flows, we model turbulent boundary layers and discuss how the streamwise variation affects the universal curve.

**Keywords:** fractional conservations laws; variable fractional model; turbulent flows; fractional PINN; physics-informed learning

#### **1. Introduction**

Reynolds [1] was the first to statistically describe turbulence by decomposing the instantaneous velocity vector into an average field and its fluctuation. Upon substitution into the Navier–Stokes equations and averaging, assuming quasi-stationarity, a new modified equation emerged for the average velocity that includes an additional term, namely, the averaged dissipation tensor leading to the turbulence-closure problem [2]. Addressing the closure complexity has been a century-long pursuit, starting with the seminal work of Prandtl [3], who proposed a simplified mixing length model analogous with Fick's law of local diffusion. Interestingly, at about the same time, Richardson [4], in an attempt to unify turbulent diffusion with molecular diffusion, combined geophysical measurements with Brownian motion to produce the famous scaling law on turbulent pair diffusivity. While ingenious, both approaches assume implicitly locality in turbulent interactions, which limits the universality of the derived correlations—an open standing question for over a century. As stated by Kraichnan [5], Prandtl's approach is valid only when the

**Citation:** Song, F.; Karniadakis, G.E. ariable-Order Fractional Models for Wall-Bounded Turbulent Flows. *Entropy* **2021**, *23*, 782. https:// doi.org/10.3390/e23060782

Academic Editors: Bruce J. West and José A. Tenreiro Machado

Received: 14 May 2021 Accepted: 15 June 2021 Published: 20 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

spatial scale of inhomogeneity of the mean field is large compared to the mixing length. This assumption is clearly violated in most turbulent flows, e.g., in Reynolds' pipe flow, where the turbulent eddies are of the size of the pipe radius. This has motivated research in nonlocal constitutive equations of turbulence, and Prandltl, in subsequent work [6], developed a turbulent shear-layer model in an attempt to introduce non-locality in his approach. Kraichnan [5] pioneered such non-local approximations and, based on his work, more recently generalized versions of the second Prandtl non-local model were proposed in the literature [7].

Fractional calculus is an effective tool to solve complex problems with nonlocality and scale-free self-similar processes as well as non-Gaussian statistics. Lévy statistics lead to anomalous diffusion [8] and can effectively model turbulent intermittency [9]. Hence, it is possible that turbulent eddy diffusion could be accurately modeled by fractional Reynolds stresses [10]. Based on physical arguments, in order to represent nonlocality and intermittency, Chen [11] proposed a fractional Laplacian as a model for representing the Reynolds stress with a fixed fractional exponent *α* = 2/3. More recently, starting with the Boltzmann equation, Epps et al. [12] rigorously derived the fractional Navier–Stokes equations by replacing the Maxwell–Boltzmann distribution with the more general Levy *α*stable distribution; see a recent extension of this work in [13]. For *α* = 2, the new equations revert to the standard Navier–Stokes equations, while for *α* = 1, we obtain the logarithmic velocity profile known as the law of the wall [14]. The work of Epps et al. [12] laid a new framework for turbulence modeling that may lead to new fundamental understanding of turbulence, but it is only valid in an open domain and thus ignores the important issue of nonlocal boundary conditions encountered in defining fractional Laplacians in bounded domains [15].

The work we include here incorporates our first paper [16] published in the archives, and is a significant extension. We also refer to the work of [17], who modeled the total shear stress directly in wall units by formulating a one-sided variable-order model using the Caputo fractional derivative for Couette flow [17] and in ongoing work on transitional and turbulent boundary layers. For the case of Couette flow, universality was found. We note that directly formulating the problem in wall units does not require modeling of any additional coefficients, unlike the formulation in the present study.

The remainder of this paper is organized as follows: Since the small-scale components can be described as an anomalous diffusion [11], we introduce the variable-order fractional calculus in the next section. Then, we formulate the inverse optimization problem corresponding to the governing equations. We present the fractional differential equations to model different turbulent flows (e.g., channel flow, Couette flow, and pipe flow) in Section 2. The inverse problem is solved by a finite difference (FD) method to obtain the fractional order. Moreover, we introduce the fractional physics-informed neural network for solving the inverse problem to find the variable-orders. In Section 3, we present the numerical results that show that the universal fractional-order profiles of the channel and pipe flow as a function of the distance from the wall, a unique capability enabled by fractional calculus. In particular, we discovered that this fractional-order function is universal for all Reynolds numbers and for different geometries. Finally, we provide a short summary in Section 4.

#### **2. Variable-Order Fractional Models for Turbulent Flows**

The first fractional model for the Reynolds averaged Navier–Stokes equations was developed by Chen [11], who proposed a fractional Laplacian to model the Reynolds stresses and to account for intermittency [18,19] as follows:

$$\frac{\partial \mathcal{U}}{\partial t} + \mathcal{U} \cdot \nabla \mathcal{U} = -\frac{1}{\rho} \nabla P + \nu\_0 \Delta \mathcal{U} - \gamma (-\Delta)^{1/3} \mathcal{U},\tag{1}$$

where *U* is the average velocity and *γ* is the turbulent diffusion coefficient. Hence, the effective fractional order in this model is fixed at *α* = 2/3. This value is consistent with the Richardson superdiffusion scaling for homogeneous turbulence that leads to a *t* <sup>3</sup> scaling

for the mean square displacement, but it is not valid for wall-bounded turbulence where anisotropy and the distance from the wall determine the effective rate of turbulent diffusion. Defining a fractional Laplacian in multiple dimensions and in bounded domains is still an open issue in fractional calculus and extending it to variable orders is challenging [15]. However, other somewhat equivalent definitions based on tempered fractional calculus [20] may lead to satisfactory nonlocal representations as well; specifically, in a Boltzmannian framework, Samiee et al. [13] developed a tempered fractional subgrid-scale model to capture high-order structures at the inertial and dissipative ranges. As Richardson first noted, the velocity field in the atmosphere shares a number of properties with the Weierstrass function, i.e., it appears to be continuous but non-differentiable, and this provides a strong case for fractional modeling of turbulence in the atmosphere but also in wall-bounded flows in engineering applications.

In this section, we present a variable-order fractional model for turbulent flows. We firstly consider a one-sided model for channel and pipe flows. Furthermore, we formulate an inverse problem for the fractional order *α*(*y*). We present a finite difference method and design a physics-informed neural network (PINN) to obtain the fractional order. Finally, we propose a divergence variable fractional (two-sided) model for turbulent flows.

#### *2.1. Turbulent Channel Flow and Pipe Flow*

#### 2.1.1. One-Sided Fractional Derivative Modeling

For wall-bounded turbulence, the effective rate of diffusion varies with distance from the wall. Hence, we exploit the power of fractional calculus that allows variable fractional order, and we propose a variable-order fractional differential equation for modeling the Reynolds stresses, i.e., *α*(*y*), where *y* is the distance from the wall. In particular, we consider fully developed turbulent flows with one-dimensional (dimensionless) averaged velocity *U*(*y*) = *u*/*V* (where *V* is the characteristic velocity), including channel flows and pipe flows for which we apply a unified fractional modeling approach. Specifically, assuming that the flow direction is along *x* and *y* is the wall-normal direction (distance from the wall), we consider the variable fractional model (VFM-I) in the normalized interval [0, 1]:

$$\text{(VFM-I)}\quad \frac{\partial}{\partial y}(\nu\_0 \frac{\partial \mathcal{U}}{\partial y} - \overline{\mu' \overline{\nu'}}) = \nu(y)D\_y^{a(y)}\mathcal{U} = f,\ \forall y \in \Lambda = (0,1],\tag{2}$$

with *<sup>α</sup>*(0) = 1, 0 <sup>≤</sup> *<sup>α</sup>*(*y*) <sup>≤</sup> 1, *<sup>D</sup><sup>α</sup> <sup>y</sup>* is the (Caputo) fractional derivative, *<sup>f</sup>* <sup>=</sup> <sup>−</sup><sup>1</sup> *<sup>ρ</sup> ∂P*/*∂x* is a constant pressure gradient, *U*(*y*) is the mean velocity we want to model, and *ν*<sup>0</sup> is the kinematic viscosity. The Caputo derivative is defined as:

$$D\_y^{\kappa} \mathcal{U}(y) = \frac{1}{\Gamma(1-\kappa)} \int\_0^y (y-\tau)^{-\alpha} \mathcal{U}'(\tau) d\tau.$$

and it is identical to the Riemann–Liouville left-sided derivative because *U*(0) = 0. Interestingly, we can obtain the scalar coefficient *ν*(*y*) (we refer to it as turbulent diffusivity, although it does not have the correct units) explicitly in terms of the fractional order *α*(*y*) from:

$$\nu(y) = f \Gamma(2 - \mathfrak{a}(y)) Re\_{\tau}^{-\mathfrak{a}(y)} V / \mathfrak{u}\_{\tau \iota} \tag{3}$$

where *Re<sup>τ</sup>* = *uτR*/*ν*<sup>0</sup> is the friction Reynolds number, *R* is the radius of the pipe (or the half channel width), and *<sup>u</sup><sup>τ</sup>* is the wall friction velocity *<sup>u</sup><sup>τ</sup>* <sup>=</sup> *τw*/*ρ*, where *<sup>τ</sup><sup>w</sup>* <sup>=</sup> *μ∂U*/*∂y*|*y*=<sup>0</sup> is the wall shear stress with *μ* being the dynamic viscosity.

We discuss an alternative model, where the variable fractional order *α*(*y*) is between one and two instead of the VFM-I we presented, where 0 < *α*(*y*) ≤ 1; this model is analogous to VFM-I and is defined by:

$$\text{(VFM-II)}\quad \frac{\partial}{\partial y}(\nu\_0 \frac{\partial \mathcal{U}}{\partial y} - \overline{\mu' \sigma'}) = \nu(y) D\_y^{\mathfrak{a}(y)} \mathcal{U} = f,\ \forall y \in \Lambda = (0,1],\tag{4}$$

with *α*(0) = 2, and the variable-order 1 ≤ *α*(*y*) ≤ 2 is an unknown function to be determined by the data. The scalar coefficient *ν*(*y*) can also be computed from a similar formula as before, i.e.,

$$\nu(y) = \lim\_{y\_0 \to \frac{1}{\text{Re}\_\mathbb{F}}} \frac{f}{D\_y^{a(y)}(\mathcal{U}|\_{y\_0})}.\tag{5}$$

#### 2.1.2. Numerical Method

We assume that we know the mean velocity *U*(*y*) (also *U*+(*y*+)) from the DNS data or experimental results. The VFM-I can be written in the form:

$$\nu(y)D\_y^{\mathfrak{a}(y)}\mathcal{U} = f\_\prime \tag{6}$$

where *<sup>f</sup>* <sup>=</sup> <sup>−</sup><sup>1</sup> *<sup>ρ</sup> ∂P*/*∂x*. Since the fractional order *α*(*y*) is unknown in Equation (6), we need to solve a nonlinear problem to obtain *α*(*y*). Alternatively, we consider the following optimization problem: given *U* and *f* , find the *α*(*y*) that satisfies

$$J(a(y)) = \inf\_{a(y)\in\mathcal{S}} \left\|\nu(y)D\_y^{a(y)}\mathcal{U} - f\right\|^2,\tag{7}$$

where, *<sup>S</sup>*(Λ) :<sup>=</sup> {<sup>0</sup> <sup>≤</sup> *<sup>a</sup>*(*y*) <sup>≤</sup> 1, *<sup>a</sup>*(*y*) <sup>∈</sup> *<sup>C</sup>*0(Λ)}. If *<sup>α</sup>*∗(*y*) satisfies Equation (6), then we obtain *J*(*α*∗(*y*)) ≡ 0.

Next, we present a numerical method for solving the optimization problem (7). The fractional derivative is discretized with the finite difference method. Then, the fractional order *α*(*y*) can be solved point-by-point; for each point *yn* = *n*Δ*y*, Δ*y* = 1/*N*, *n* = 1, 2, ··· , *<sup>N</sup>*, we calculate the fractional derivative *<sup>D</sup>α*(*yn*) *<sup>y</sup> U<sup>n</sup>* with the DNS data using the finite difference method [21]

$$D\_y^{a(y\_n)} \mathcal{U}^n = \frac{1}{\Gamma(2 - a(y\_n))} \sum\_{j=0}^n b\_j^n \frac{\mathcal{U}^{n+1-j} - \mathcal{U}^{n-j}}{\Delta y^{a(y\_n)}},\tag{8}$$

where *b<sup>n</sup> <sup>j</sup>* := (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)*α*(*yn*) <sup>−</sup> *<sup>j</sup> <sup>α</sup>*(*yn*) and *U<sup>n</sup>* = *U*(*yn*). The discrete optimization problem can now be written as

$$J\_N(a(y)) = \inf\_{a(y)\in\mathcal{S}} \sum\_{n=1}^N \left| \nu(y\_n) D\_y^{a(y\_n)} \mathcal{U}^n - f(y\_n) \right|^2 \Delta y. \tag{9}$$

Finally, we formulate the fractional physics-informed neural network (fPINN) for the inverse problems of the proposed turbulence model; see Figure 1.

The aim of the inverse problem is to estimate the fractional order *α*(*y*) given the mean velocity profile *U* in the DNS data. We approximate the variable fractional order *<sup>α</sup>*(*y*) by a multi-layer feedforward neural network *<sup>α</sup>NN*(*y*; *<sup>θ</sup>* <sup>=</sup> {*Wj*, *<sup>b</sup>j*}*<sup>l</sup> <sup>j</sup>*=1), where *θ* are a collection of parameters of the NN. The locations *y* are the input of the NN, and the output *U* is computed by a recursive formula *Y<sup>j</sup>* = *σ*(*WjY<sup>j</sup>*−<sup>1</sup> + *bj*) with the initial value *Y*<sup>0</sup> = *y*. The weight matrix between the (*<sup>j</sup>* <sup>−</sup> <sup>1</sup>)th and *<sup>j</sup>*th layers has the dimension *<sup>W</sup><sup>j</sup>* <sup>∈</sup> <sup>R</sup>*nj*×*nj*−<sup>1</sup> , and the bias vector *<sup>b</sup><sup>j</sup>* in the *<sup>j</sup>*th layer. The column vectors *<sup>Y</sup>j*−<sup>1</sup> <sup>∈</sup> <sup>R</sup>*nj*−1×<sup>1</sup> and *<sup>Y</sup><sup>j</sup>* <sup>∈</sup> <sup>R</sup>*nj*×<sup>1</sup> denote the input and output of the *j*th layer, respectively. The input vector *Yj*−<sup>1</sup> is first subject to a linear transformation and then an element-wise nonlinear function *σ*(·), which is called the activation function. The NN consists of one input layer (*j* = 0), *l* − 1 hidden layers (*j* = 1, 2, ··· , *l* − 1), and one output layer (*j* = *l*). The depth of the NN is *l*, and the width of the *j*th layer is *nj*. To determine the parameters *θ*, we minimize the following loss function with respect to *θ*

$$\mathcal{L}(\boldsymbol{\theta}) = \frac{1}{N\_t} \sum\_{i=1}^{N\_l} \left( D\_y^{a\_{NN}(y\_i; \boldsymbol{\theta})} \boldsymbol{\mathcal{U}}(y\_i) - 1 \right)^2 + \left( a\_{NN}(0; \boldsymbol{\theta}) - 1 \right)^2, y\_i \in (0, 1]. \tag{10}$$

The first term on the right-hand side is the equation residual, and the second term is the constraint on the fractional order at the wall, i.e., *α*(0) = 1. We select *Nt* training points, {*yi*}*Nt <sup>i</sup>*=1, to enforce the equation residual on them to be zero. The fractional derivative is evaluated using the finite difference method (8). We optimize the loss function with respect to *θ*, employing a stochastic gradient descent, Adam, written in TensorFlow. Finally, we estimate the variable fractional order using *αNN*(*y*; *θ*).

**Figure 1.** Basic structure of fPINN in 1D for the inverse fractional-order problem. The left uninformed DNN processes data to predict the fractional order, which also has to satisfy the correct physics of turbulence for the channel fully developed flow, represented by the right informed DNN induced by the fractional governing equation.

#### *2.2. Two-Sided Turbulent Channel Flow*

#### 2.2.1. Fractional Modeling in Divergence Form

We consider the Reynolds averaged momentum equation for incompressible fully developed channel flow; the governing equation is as follows

$$\frac{\partial}{\partial y}(\nu\_0 \frac{\partial \mathcal{U}}{\partial y} - \overline{u'v'}) + \frac{1}{\rho} \frac{\partial P}{\partial \mathbf{x}} = 0, \ y \in (0, 2), \tag{11}$$

where *ρ* is the density; and *P* and *U* are the mean pressure and velocity, respectively. The process of Reynolds averaging introduces the unclosed Reynolds stress, *τij* = −*ρuv* . The total shear stress on the wall is *τw*. Integrating the above equation from wall to an arbitrary position in wall-wise *y*, we obtain a new formula as follows

$$
\pi\_0 \frac{\partial \mathcal{U}}{\partial y} - \overline{\mu' \nu'} = \pi\_w / \rho - \frac{1}{\rho} \frac{\partial P}{\partial x} y. \tag{12}
$$

We assume the dimensionless wall shear *τ<sup>w</sup>* and pressure gradient *<sup>∂</sup><sup>P</sup> <sup>∂</sup><sup>x</sup>* = *C* are constants. Additionally, we introduce a symmetric divergence variable fractional model for approximating the total shear stress,

$$\text{(DVFM)}\quad\nu\_0\frac{\partial \mathcal{U}}{\partial y} + \overline{u'v'} = \nu(y)D^{a(y)}\_{|y|}\mathcal{U} = 1 - y,\tag{13}$$

with the boundary conditions *α*(0) = *α*(2) = 1, where the fractional derivative is defined as follows

$$\_0D\_{[y]}^{a(y)}\mathcal{U} = \frac{1}{2}(D\_y^{a(y)}\mathcal{U} + \_yD^{a(y)}\mathcal{U}),\tag{14}$$

and *<sup>D</sup>α*(*y*) *<sup>y</sup>* and *yDα*(*y*) *U* are left and right Caputo derivatives, respectively. The definitions are given as follows

$$\text{LeftCaputo derivative:}\;D\_y^a \mathcal{U}(y) = \frac{1}{\Gamma(1-\alpha)} \int\_0^y (y-\tau)^{-\alpha} \mathcal{U}'(\tau)d\tau, y$$

and

$$\text{RightCaputo derivative:}\,\_3D^a\mathcal{U}(y) = -\frac{1}{\Gamma(1-a)}\int\_y^2 (\tau - y)^{-a} \mathcal{U}'(\tau)d\tau, y$$

and it is identical to the Riemann–Liouville derivatives because *U*(0) = 0 and *U*(2) = 0. We also propose the eddy viscosity in the fractional momentum equation, and the explicit formula is as follows

$$\nu(y) = \Gamma(2 - \alpha(y)) Re\_{\tau}^{-\alpha(y)},\tag{15}$$

where *Re<sup>τ</sup>* = *uτR*/*ν*<sup>0</sup> is the friction Reynolds number, *R* is the radius of the pipe (or the half channel width), and *<sup>u</sup><sup>τ</sup>* is the wall friction velocity, *<sup>u</sup><sup>τ</sup>* <sup>=</sup> *τw*/*ρ*, where *<sup>τ</sup><sup>w</sup>* <sup>=</sup> *μ∂U*/*∂y*|*y*=<sup>0</sup> is the wall shear stress with *μ* being the dynamic viscosity.

#### 2.2.2. Numerical Method

We assume that we know the mean velocity *U*(*y*) (also *U*+(*y*+)) from the DNS data or experimental results. Since the fractional order *α*(*y*) is unknown in Equation (13), we need to solve a nonlinear problem to obtain *α*(*y*). Alternatively, we consider the following optimization problem: given *U* and *f* , find *α*(*y*) that satisfies

$$J(a(y)) = \inf\_{a(y)\in S} ||\nu(y)D^{a(y)}\_{|y|}\mathcal{U} - f||^2,\tag{16}$$

where *<sup>f</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>y</sup>* and *<sup>S</sup>*(Λ) :<sup>=</sup> {<sup>0</sup> <sup>≤</sup> *<sup>a</sup>*(*y*) <sup>≤</sup> 1, *<sup>a</sup>*(*y*) <sup>∈</sup> *<sup>C</sup>*0(Λ)}. If *<sup>α</sup>*∗(*y*) satisfies Equation (13), then we obtain *J*(*α*∗(*y*)) ≡ 0.

Next, we present a numerical method for solving the optimization problem (16). The fractional derivative is discretized with the finite difference (FD) method. Then, the fractional order *α*(*y*) can be solved point-by-point; for each point *yn* = *n*Δ*y*, Δ*y* = 1/*N*, *<sup>n</sup>* <sup>=</sup> 1, 2, ··· , *<sup>N</sup>*, we calculate the fractional derivatives *<sup>D</sup>α*(*yn*) <sup>|</sup>*y*<sup>|</sup> *<sup>U</sup><sup>n</sup>* with the DNS data using the finite difference method [21]

$$\text{Left: } D\_y^{a(y\_n)} L^n = \frac{1}{\Gamma(2 - a(y\_n))} \sum\_{j=0}^n b\_j^n \frac{\underline{U}^{n+1-j} - \underline{U}^{n-j}}{\Delta y^{a(y\_n)}},\tag{17}$$

and

$$\text{Right: }\_{\mathcal{Y}} D^{a(y\_n)} \mathcal{U}^n = -\frac{1}{\Gamma(2 - a(y\_n))} \sum\_{j=0}^{N-n+1} c\_j^n \frac{\mathcal{U}^{N-j} - \mathcal{U}^{N-j-1}}{\Delta y^{a(y\_n)}},\tag{18}$$

where *b<sup>n</sup> <sup>j</sup>* := (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)*α*(*yn*) <sup>−</sup> *<sup>j</sup> <sup>α</sup>*(*yn*), *c<sup>n</sup> <sup>j</sup>* = *<sup>b</sup><sup>n</sup> <sup>j</sup>* and *<sup>U</sup><sup>n</sup>* = *<sup>U</sup>*(*yn*).

The discretized optimization problem can be now written as

$$J\_N(a(y)) = \inf\_{a(y)\in\mathcal{S}} \sum\_{n=1}^N \left| \nu(y\_n) D\_{\lfloor y \rfloor}^{a(y\_n)} \mathcal{U}^n - f(y\_n) \right|^2 \Delta y. \tag{19}$$

Here, we use *N* ≈ *Re<sup>τ</sup>* points to solve the above optimization for the channel flow at a given Reynolds number *Reτ*.

Alternatively, we propose the fractional fPINN for solving the inverse DVFM with the loss function

$$\mathcal{L}(\boldsymbol{\theta}) = \sum\_{n=1}^{N\_t} \left| \nu(y\_n) D\_{|y|}^{a\_{NN}(y\_n, \boldsymbol{\theta})} \boldsymbol{L}^n - f(y\_n) \right|^2 + |a\_{NN}(0; \boldsymbol{\theta}) - 1|^2 + |a\_{NN}(2; \boldsymbol{\theta}) - 1|^2. \tag{20}$$

#### *2.3. Turbulent Boundary Layer and Couette Flow*

For a boundary layer and Couette flow with zero pressure gradient, the mean twodimensional continuity and stream-wise momentum reduce to

$$\frac{\partial \mathcal{U} \mathcal{U} \mathcal{U}}{\partial \mathbf{x}} + \frac{\partial \mathcal{V} \mathcal{U} \mathcal{U}}{\partial y} = \frac{\partial}{\partial y} (\nu\_0 \frac{\partial \mathcal{U}}{\partial y} - \overline{\mu' \nu'}). \tag{21}$$

If we assume that the convective effects are small near the wall for the boundary layer problem, then the above equation reduces to

$$
\frac{
\partial
}{
\partial y
}
(\nu\_0 \frac{
\partial U
}{
\partial y
} - \overline{u} \overline{v}) = 0.
\tag{22}
$$

Here, *U* is viewed as a function of *y* due to *<sup>∂</sup><sup>U</sup> <sup>∂</sup><sup>x</sup>* = 0. Since the two plates are infinitely long for the Couette flow, the flow properties cannot change with *x* and all partial derivatives with respect to *x* vanish. Flow motion only occurs in the *x* direction, and thus, *V* = 0. After simplifying the RANS equations, the turbulent Couette flow is governed by Equation (22) too.

Further integrating the above equation provides

$$
\mu\_0 \frac{\partial \mathcal{U}}{\partial y} - \overline{\mu' \overline{\upsilon'}} = \mathcal{C},
\tag{23}
$$

where *C* is a constant and *uv* = 0 at the wall, while *ν <sup>∂</sup><sup>U</sup> <sup>∂</sup><sup>y</sup>* is simply the wall shear stress *τw*/*ρ*. Then, we have the following equation

$$\text{(TCM)}\quad\nu(y)D\_y^{\alpha(y)}\mathcal{U} = \frac{\tau\_w}{\rho}\mathcal{A}$$

with *<sup>α</sup>*(0) = 1, 0 <sup>&</sup>lt; *<sup>α</sup>* <sup>≤</sup> 1, *<sup>D</sup><sup>α</sup> <sup>y</sup>* is the (Caputo) fractional derivative, and *ν*(*y*) is the eddy viscosity defined as

$$\nu(y) = \Gamma(2 - \mathfrak{a}(y)) \mathrm{Re}\_{\mathfrak{r}}^{-\mathfrak{a}(y)}.$$

#### Numerical Method

We solve the fractional order *α*(*y*) for the turbulent boundary layer problem and Couette flow using fPINN (Figure 1) with the loss function

$$\begin{split} \mathcal{L}(\boldsymbol{\theta}) &= \sum\_{k=0}^{N\_l} (\nu(y\_k) D\_y^{a\_{NN}(y\_k;\boldsymbol{\theta})} - \frac{\tau\_{\text{w}}}{\rho})^2 + (a\_{NN}(0;\boldsymbol{\theta}) - 1)^2 \\ &= \sum\_{k=1}^{N\_l} (R \boldsymbol{\varepsilon}\_{\text{\tiny T}}^{-a\_{NN}(y\_k;\boldsymbol{\theta})} \sum\_{j=0}^k \frac{b\_j^k}{\Delta y^{a\_{NN}(y\_k;\boldsymbol{\theta})}} (\boldsymbol{\mathsf{U}}^{k+1-j} - \boldsymbol{\mathsf{U}}^{k-j}) - \frac{\tau\_{\text{w}}}{\rho})^2 + (a\_{NN}(0;\boldsymbol{\theta}) - 1)^2, \end{split}$$

where *U* is the DNS data. It changes with *Re<sup>θ</sup>* for the boundary layer problem, so there is (implicit) *x* dependence as well.

#### **3. Numerical Results**

In this section, we present the results for the turbulent channel, pipe, Couette, and boundary layer flows.

#### *3.1. Channel Flow*

3.1.1. Numerical Results of the One-Sided Models

We first consider turbulent channel flow for which DNS data are available up to *Re<sup>τ</sup>* = 5200 [22]. Here, we use the FD scheme with *N* ≈ *Re<sup>τ</sup>* points to solve the aforementioned inverse problem for the channel flow at a given Reynolds number *Reτ*. Solving for

*α*(*y*), which uniquely determines the Reynolds stresses, Figure 2a depicts the profiles of the fractional order *α*(*y*) for different *Re<sup>τ</sup>* as a function of the non-dimensional distance from the wall *y* ∈ [0, 1]. We see a strong dependence of *α*(*y*) on *Reτ*; however, if we re-plot all data in terms of the viscous wall units, i.e., *y*<sup>+</sup> = *yuτ*/*ν*<sup>0</sup> we see a collapse of all results into a single universal curve, as shown in Figure 2b. Moreover, we employ the empirical Spalding formula [23] for *U*<sup>+</sup> = *u*/*u<sup>τ</sup>* in order to extend the results up to high *Re<sup>τ</sup>* = 106, and again we obtain a similar universal scaling with the exception of low *Reτ* for which the Spalding formula is known to be somewhat inaccurate. We fit the fractional order using these numerical results to obtain the fractional order *α*(*y*+) in wall units as follows

$$a^\*(y^+) = \frac{1 - \phi(y^+)}{2} + \frac{\phi(y^+) + 1}{2} a(y^+),\tag{24}$$

where *<sup>φ</sup>*(*y*+) = tanh(ln(*y*+/9.5)/1.049) and *<sup>a</sup>*(*y*+) = 1/(*<sup>b</sup>* <sup>+</sup> *<sup>κ</sup>*<sup>|</sup> ln(*y*+)<sup>|</sup> 0.9) with *b* = 0.855, *κ* = 0.301 are constants. This is a remarkable result as it goes beyond the logarithmic profile and seamlessly connects the viscous sublayer with the buffer zone, the logarithmic profile, and the wake region. Although at first it appears to be a perfect fitting exercise, it has important consequences due to the nonlocal interpretation of the fractional derivative involved, i.e., it shows that nonlocality is stronger away from the wall and at high Reynolds numbers. Using the same data for *U*(*y*), we show that the alternative model VFM-II with 1 ≤ *α*(*y*) ≤ 2 also leads to the same type of universality (Figure 3). However, unlike the aforementioned VFM-I, we are unable to obtain an explicit formula for *ν*(*y*), relating it to the Reynolds number as in the first model (i.e., *α*(*y*) ∈ (0, 1]); instead, we can compute it numerically from the DNS data of turbulent channel flow. As shown in Section 3, this alternative fractional model also exhibits a universal scaling if plotted in terms of wall units, with the lowest value of *<sup>α</sup>*(105+) <sup>≈</sup> 1.3.

**Figure 2.** Channel flow modeled with VFM-I: Learning the fractional variable order *α*(*y*) using DNS databases at *Re<sup>τ</sup>* = 180 to 5200: (**a**) profiles of the fractional order *α*(*y*); (**b**) rescaled fractional order *α*(*y*+) in viscous units.

**Figure 3.** Alternative fractional modeled with VFM-II with 1 ≤ *α*(*y*) ≤ 2. The numerical fractional orders are computed based on DNS data for turbulent channel flow at *Re<sup>τ</sup>* = 950, 2000, 4200, 5200: (**a**) plots of the fractional orders *α*(*y*+) in wall units; (**b**) corresponding eddy viscosity coefficients.

To evaluate the predictability of the universal scaling, we now solve the forward Equation (2) to obtain *U*(*y*) at *Re<sup>τ</sup>* = [4200, 6000, 8600], which are cases not used in the training of the model for *α*(*y*+). The results presented in Figures 4 and 5 are in good agreement with DNS and experimental data. We also include the turbulent channel flow results obtained by nested LES [24]. Figures 4 and 5 show that the mean velocity profiles predicted by VFM-I exhibit the correct behavior throughout the channel for Reynolds numbers up to *Re<sup>τ</sup>* = 8600, including the correct slope in the logarithmic layer, and agree with DNS and experimental data in the wake region for all *Re<sup>τ</sup>* = [4200, 6000, 8600].

**Figure 4.** VFM-I: Model predictions for the turbulent channel flow at *Re<sup>τ</sup>* = 4200: (**a**) the solid line (−) represents the numerical solution of the optimization problem and the triangle symbols ( ) represent Equation (24). The blue line represents the fractional order *α*(*y*) and the red line is the eddy viscosity coefficient. This Reynolds number *Re<sup>τ</sup>* = 4200 is not included in the training of the model; (**b**) mean velocity obtained by VFM-I corresponding to the fractional order *α*∗(*y*+) from the left plot.

**Figure 5.** VFM-I: Profiles of the mean velocity for turbulent channel flow at *Re<sup>τ</sup>* = 6000, 8600: the triangle symbol ( ) represents experimental data from [25], the circle symbol (◦) represents experimental data from [26], the solid line (−) represents the VFM-I profile, and the dashed line (−−) represents the LES results [24].

We used fPINN to investigate the turbulent channel flows. We used different training points for investigating the convergence using DNS data at *Re<sup>τ</sup>* = 2000. Figure 6 shows the training results with uniform training points in the interval for *Nt* = 500, 1000, 2000. Figure 7 shows the training results with log-uniform training points in wall units scaling for *Nt* = 10, 20, 40, 80. The corresponding loss histories are listed in Table 1. Figure 7 presents the comparison profiles between the training sets. We can observe that the results trained by the log-uniform are smoother than the uniform training points near the wall.

**Table 1.** VFM-I: The history of the loss function with different training data sets for *Re<sup>τ</sup>* = 2000. Log represents the log-uniform training points set.


**Figure 6.** VFM-I: The fractional order obtained from fPINN and from the universal formula derived using point-by-point minimization ("Predict", Equation (24) ). The training results for the uniform training sets at iteration steps *Itr* = 10,000, 20,000, 30,000: (**a**) for *Nt* = 500; (**b**) for *Nt* = 1000; (**c**) for *Nt* = 2000.

**Figure 7.** VFM-I: Fractional order for uniform training sets at iteration steps *Itr* = 10,000, 20,000, 30,000 for different *Nt* = 10, 20, 30, 40. "Predict" presents the profiles from Equation (1). The friction Reynolds number *Re<sup>τ</sup>* = 2000. TP, the distribution of the log-uniform training points.

Next, we test the accuracy of the forward problem and the loss function error with the training fractional order predicted by log-uniform training points *Nt* = 20. We solve the fractional equation as follows:

$$\nu(y)D\_y^{a(y)}\mathcal{U} = f,\ \forall y \in (0,1],\tag{25}$$

with *U*(0) = 0, and the fractional order is obtained by training fPINN with *Nt* = 20 and Equation (24). The corresponding loss functional error is defined as follows

$$\mathcal{L}(\boldsymbol{\theta}) = \sum\_{k=1}^{N\_l} \left( \mathrm{Re}\_{\boldsymbol{\tau}}^{-\operatorname{a}(y\_k)} \sum\_{j=0}^k \frac{b\_j^k}{\Delta y^{\operatorname{a}(y\_k)}} \left( \mathcal{U}^{k+1-j}(\boldsymbol{\theta}) - \mathcal{U}^{k-j}(\boldsymbol{\theta}) \right) - f\_k \right)^2 + \left( \mathcal{U}(0; \boldsymbol{\theta}) \right)^2.$$

Figure 8 plots the pointwise error of the mean velocity and the loss function for *Re<sup>τ</sup>* = 4000 and 5000.

Finally, we use the simplified one-dimensional equation

$$\frac{\partial}{\partial y} \left( \tau\_{\mu \upsilon} - R\_{\mu \upsilon} \right) = \nu(y) D\_y^{a(y)} \mathcal{U} = \frac{\partial P}{\partial \mathbf{x}'}, \mathcal{y} \in (0, 1), \tag{26}$$

where the *Ruv* denotes the Reynolds stress *Ruv* = *uv* , *τuv* denotes the viscous shear stress *τuv* = *ν*0*∂U*/*∂y*, and *U* is the mean velocity, which is the solution to the above fractional Equation (26). Then, we obtain the Reynolds stresses by integration,

$$-R\_{\mu\nu} = \int\_{y}^{1} \nu(s) D\_s^{a(s)} \mathsf{U} ds - \mathfrak{r}\_{\mu\nu}.\tag{27}$$

We can compare the predicted Reynolds stresses to their counterparts, *RD* from DNS data for turbulent channel flow, and using the corresponding viscous shear stress denoted by *τ<sup>D</sup>* = *μ∂UD*/*∂y*, where *UD* denotes the mean velocity from the DNS database. In Figure 9, we plot the predicted and DNS profiles for Reynolds numbers *Re<sup>τ</sup>* = 4000, 5200 and the corresponding pointwise error. We can observe that they are all in very good agreement. The numerical results of the mean velocities and shear stresses for all Reynolds number *Re<sup>τ</sup>* match very well with the DNS data; here, we only show the high Reynolds number cases due to space limitations.

**Figure 8.** VFM-I: The mean velocity (**left**) for different Reynolds numbers, the pointwise errors of the mean velocity between predictor and DNS data (**middle**), and the loss function (**right**). FD, the fractional order solved by the finite difference method; NN, the results from the neural network.

**Figure 9.** VFM-I: Accurate prediction of the shear stress at (**a**,**b**) *Re<sup>τ</sup>* = 4000, 5200 in outer units and wall units: (**left**) outer scaling; (**middle**) wall units scaling; (**right**) pointwise error of the wall shear stress. Here, *τuv* denotes the wall shear stress for the fractional order predicted by the finite difference (FD) method, *τNN uv* denotes the wall shear stress predicted by the NN, and *τ<sup>D</sup>* is the corresponding profile from DNS data. −*Ruv* denotes the Reynolds shear stress predicted by Equation (24), −*Ruv* denotes the wall shear stress predicted by the NN, and −*RD* is the corresponding profile from DNS data.

#### 3.1.2. Numerical Results of the Two-Sided Models

In this subsection, we focus on the two-sided models. Solving for *α*(*y*), which uniquely determines the total shear stresses, Figure 10 plots the profiles of the fractional order *α*(*y*) for different *Re<sup>τ</sup>* as a function of the non-dimensional distance between the two walls *y* ∈ [0, 2]. We see a strong dependence of *α*(*y*) of *Reτ*, which is the same conclusion as for the previous variable fractional model. Furthermore, we re-plot all data in terms of the viscous wall units, i.e., *y*<sup>+</sup> = *yReτ*, and we see an approximate collapse of all results into a single universal curve in the half-plane excluding the wake region (i.e., near the centerline), as shown in Figure 10.

**Figure 10.** Learning the fractional variable-order *α*(*y*) using DNS data bases at *Re<sup>τ</sup>* = 180 to 5200: (**a**) profiles of the fractional order *α*(*y*); (**b**) rescaled fractional order *α*(*y*+) in viscous wall units.

Next, we test the accuracy of the forward problem with the fractional order provided by the inverse optimization problem (19). We solve the divergence variable fractional equation as follows

$$-D\left(\nu(y)D\_{|y|}^{a(y)}\mathcal{U}\right) = 1, \forall y \in (0, 2), \tag{28}$$

with *U*(0) = *U*(2) = 0. Figure 11 plots the solutions (left) of the above equation and the pointwise error (right) of the mean velocity in each subfigure for several *Reτ*. We can observe that this model predicts the mean velocity well. Moreover, it can obtain a smooth mean velocity profile in the whole domain along the wall-wise direction.

We also use fPINN (20) to solve the inverse problem to obtain the variable order *α*(*y*). The two results from the two different methods (i.e., FD and fPINN) agree well for all Reynolds numbers.

**Figure 11.** The mean velocity (**left**) and the pointwise difference between the numerical solution and the DNS data (**right**) in each sub-figure.

#### *3.2. Turbulent Pipe Flow*

In this subsection, we consider turbulent pipe flow and again test the universal variable fraction order *α*(*y*+) against DNS and experimental data. First, we examine the highest Reynolds number available from the superpipe experiment [27,28] at *Re<sup>τ</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>5, estimated at *ReR* <sup>≈</sup> 3.525 <sup>×</sup> 107 based on the pipe radius *<sup>R</sup>*. As the experimental data were only available for *y*<sup>+</sup> > 10,000, we synthesized an entire profile from the pipe wall to centerline using multifidelity Gaussian process regression (M-GPR) [29] as follows: we considered as high fidelity data the superpipe data in the outer region together with the highest DNS data for channel flow at *Re<sup>τ</sup>* = 5200. We then employed the Spalding curve to provide the low-fidelity data and, using M-GPR, we constructed the final profile as shown in Figure 12a. Having this profile and the VFM-I model transformed in polar coordinates, we can then solve the inverse problem and obtain a new variable fractional order *α*(*y*+). Figure 13a shows that the variable fractional order we obtain for this problem is identical to the function defined by Equation (24). This finding further confirms the universality of the variable fractional order even at very high Reynolds numbers. Having validated the accuracy of the variable fractional order, we can now solve the forward fractional differential problem to obtain predictions of the entire velocity profiles from *Re<sup>τ</sup>* = 105 to *Re<sup>τ</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> 105. Figure 12b plots the results, showing that there is excellent agreement with all available data from the superpipe experiment. Figure 13b plots the mean velocity profiles from the DNS data base [30] at low Reynolds numbers, the corresponding VFM predictions, and the Spalding profile. The universal defect law for pipe flows is not valid for the low Reynolds number range, and this is also in agreement with [27], who argued that the lowest *Reτ* for universality is approximately 5000.

**Figure 12.** Predictions of the mean velocity profile for the superpipe flow from *Re<sup>τ</sup>* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> to <sup>5</sup> <sup>×</sup> <sup>10</sup>5: (**a**) velocity profile reconstructed from the experimental data ( , [28]), DNS data at *Re<sup>τ</sup>* = 5200 (, [22]), and the Spalding profile (blue line [23]) using multifidelity Gaussian process regression (M-GPR); (**b**) "- -", fractional order with the M-GPR profile at *Re<sup>τ</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>5; "-", the profile of Equation (24); and '-·', the corresponding Spalding profile; (**c**) velocity profiles solving the forward fractional model and the Spalding curve against the experimental data.

**Figure 13.** VFM-I for turbulent pipe flow: (**a**) "··", VFM-I model with the channel flow DNS data at *Re<sup>τ</sup>* <sup>=</sup> 5200; "- -", VFM-I model with the M-GPR profile at *Re<sup>τ</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>5; "-", the profile of Equation (24); and '-·', the corresponding Spalding profile; (**b**) '-·' and '··' plot the DNS data at *Re<sup>τ</sup>* = 180 and *Re<sup>τ</sup>* = 1140; '-' the VFM-I model at *Re<sup>τ</sup>* = 2000 and the corresponding Spalding profile.

#### *3.3. Turbulent Couette Flow*

In reference [12], the authors proposed the double-log profile to predict the mean velocity for the Couette flow as follows

$$\mathcal{U}I(y) = \frac{1}{2} - \frac{1}{2} \frac{\ln\left( (d+y)/(d+1-y) \right)}{\ln\left( d/(d+1) \right)},\tag{29}$$

where *d* is a small number (*d* 1) that represents a viscous sublayer or roughness height. The non-dimensional boundary conditions are *U*(0) = 0 and *U*(1) = 1.

Here, we consider the predictions from the universal scaling fractional order *α*∗(*y*+), and we also compare it against the double-log profile. The variable fractional order *α*∗(*y*+) is between zero and one in our turbulence model. So, we work in the half-plane *y* ∈ [0, 0.5] (see the dashed square in Figure 14a). We then obtain the results in the other half of the domain with *U*(*y*) = 1 − *U*(1 − *y*), *y* ∈ (0.5, 1]. Figure 14 shows the mean velocity profiles predicted using (29) and the mean velocity, which is predicted by the variable fractional order *α*∗(*y*+). We can observe that the variable fractional model is in agreement with the experiment data as well as the double-log profile. However, the double-log profile is

unable to capture the correct mean velocity near the wall. We also tested the profiles for low Reynolds number *Re<sup>τ</sup>* = 52, where the numerical data were obtained from reference [31]. For the double-log profile, we could not find a suitable parameter *d* to obtain a good fit for the low *Re<sup>τ</sup>* = 52. Finally, we show the comparisons between the TCM predicted mean velocities and DNS data at *Re<sup>τ</sup>* = 250 obtained from reference [32]. Figure 15 shows that the fractional predictions are correct almost everywhere, especially near the wall regions for high Reynolds numbers.

**Figure 14.** Turbulent Couette flow—numerical results for *Re* = 16,500: "-", TCM predictions at *Re<sup>τ</sup>* <sup>=</sup> 1650; "- -", best fit of the double-log profile in Equation (29) with *<sup>d</sup>* <sup>=</sup> 1.06 <sup>×</sup> <sup>10</sup>−5; "", experimental data from [33].

**Figure 15.** Turbulent Couette flow at *Re<sup>τ</sup>* = 250: (**a**) "-", TCM predictions; "- -", best fit of the double-log profile in Equation (29) with *<sup>d</sup>* <sup>=</sup> 1.06 <sup>×</sup> <sup>10</sup>−5; "", DNS data from [32]; (**b**) wall units scaling for the mean velocity profiles.

#### *3.4. Turbulent Boundary Layer Flow*

In this subsection. we focus on the boundary layer problem. We use data available from the KTH turbulence group from the turbulent boundary layer DNS [34,35]. We first investigate the correlations between *Re<sup>θ</sup>* (*x*-variable) and *Re<sup>τ</sup>* (*y*-variable); Figure 16 shows the downstream variations in the friction Reynolds number *Reτ*, and unlike the channel flow, here, *Reτ* is a function of the streamwise distance *x*.

Then, we test if the mean velocity of the boundary layer problem exhibits any universality as the channel and pipe flow. We solve the forward boundary layer problem with the fractional order predicted by Equation (24) (i.e., the formula is the same as the channel flow case) including the wake region. Figure 17 presents the mean velocity profiles from the DNS [34] and fractional modeling near the wall for several *Re<sup>θ</sup>* from 670 to 4060, with the corresponding *Reτ* varying from 252 to 1200. We observe that the mean velocities

are different in the wake region for different *Reτ*. Figure 18 plots the wake region, which is between *δ*<sup>+</sup> and the error *E* = 1%. We define this error as the difference in the mean velocity between the DNS data and the fractional model as follows:

$$E = \frac{\mathcal{U} - \mathcal{U}\_f}{\mathcal{U}\_\infty},\tag{30}$$

where *U* is the DNS data and *Uf* presents the numerical results from the fractional model.

**Figure 17.** TCM: Boundary layer mean velocity profiles from the DNS and fractional modeling near the wall and in the wake region for several *Re<sup>θ</sup>* from 670 to 4060.

Since the mean velocity does not exhibit universality in the wake region, we solve the fPINNs to investigate the variations in the fractional order in the wake region. In Figure 19, we plot the fractional order inferred by fPINN based on the DNS data for *Re<sup>θ</sup>* = 670 to

4060. We can observe that the fractional order varies for different *Re<sup>θ</sup>* in the wake region. Then, we train the fractional order in the wake region selecting the data set *Re<sup>θ</sup>* = 670 to 4060 but excluding *Re<sup>θ</sup>* = 2000. In Figure 20, we present the factional order in the 2D plane for the *x*-axis and *y*+-axis. Finally, we solve the fractional turbulent boundary layer model with the fractional orders presented in Figure 20. The comparison between the numerical results and the DNS data set is presented in Figure 21.

**Figure 18.** Downstream variations in *δ*<sup>+</sup> <sup>99</sup>, and the error *E* = 2% and *E* = 1%. The lower bounds of the wake region are denoted by the blue curve with *E* = 1% and the red curve with *E* = 2% (see Equation (30)).

**Figure 19.** TCM: The fractional order *α*(*y*) learning from a neural network (NN) near the wall and in the wake region for several *Re<sup>θ</sup>* from 670 to 4060, and the corresponding *Re<sup>τ</sup>* from 252 to 1200. We can observe that the fractional order is different for different *Re<sup>θ</sup>* in the wake region. The black line represents the reference fractional order predicted by channel flows; the red curve represents the NN results for different *Re<sup>θ</sup>* .

(**c**) Extract alone the line.

**Figure 20.** We train the fractional order in the wake region and near the wall selecting the data set *Re<sup>θ</sup>* <sup>=</sup> 670 to 4060, excluding *Re<sup>θ</sup>* <sup>=</sup> 2000; the training region is (*Re<sup>θ</sup>* , *<sup>y</sup>*+)<sup>∈</sup> [670, 4060] <sup>×</sup> [0, 1200]. The training data set is represented as black dots: (**a**) we use spline interpolation (IP) in 2D; (**b**) the fractional order is trained by a neural network with 2 hidden layers and 20 neurons in each hidden layer. (**c**) The black line represents the reference fractional order predicted by channel flows; the red curve represents the fPINN results at *Re<sup>θ</sup>* = 2000; the green line plots the interpolation results IP2D along the green line in (**a**); the blue curve presents the NN along the blue line in (**b**).

**Figure 21.** We solve the fractional turbulent boundary layer model with the fractional orders represented in Figure 20c at *Re<sup>θ</sup>* = 2000. (**a**) The mean velocity; (**b**) the viscous shear stress. The black line represents the reference fractional order predicted by channel flows; the red curve represents the NN1D results at *Re<sup>θ</sup>* = 2000; the green line plots the interpolation results IP2D for *Re<sup>θ</sup>* ; the blue curve represents the NN for *Re<sup>θ</sup>* = 2000.

#### **4. Summary**

We proposed multiple fractional models for wall-bounded turbulent flows in benchmark cases where the mean flow is either one-dimensional (channel, pipe, and Couette flows) or two-dimensional (boundary layer). The main idea is to employ a variable-order fractional gradient that depends on the distance from the wall, starting with an integer order at the wall. The computational problem we addressed is the discovery of the fractional variable-order profile given DNS or experimental data for the mean velocity profile. To this end, we formulated an inverse problem for the fractional order as a function of the distance from the wall, and we solved it using a finite difference method point-by-point and through a new fractional physics-informed neural network (fPINN) that encodes the physics of turbulence expressed via the fractional derivative of variable order. The fractional order is a function of the distance from the wall, a unique capability enabled by fractional calculus. We discovered that this fractional order function is universal for all Reynolds numbers and for different geometries.

The main contributions of this work are: (1) new fractional turbulent models with variable order are presented to model the total shear stress of RANS; (2) two solution methods for the non-trivial inverse problem, a FD method, and a fPINN for obtaining the fractional order function; (3) a universal fractional order profile was discovered for the channel and pipe flows that allowed us to accurately predict the fractional order for the boundary layer flows.

**Author Contributions:** Conceptualization, F.S. and G.E.K.; methodology, F.S.; software, F.S.; validation, G.E.K.; formal analysis, F.S.; investigation, F.S.; writing—original draft preparation, F.S.; writing—review and editing, G.E.K.; supervision, G.E.K.; funding acquisition, G.E.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Artificial Intelligence Research Associate (AIRA) program of the Defense Advance Research Projects Agency (DARPA) and the NSF of China (grant No. 11901100).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** G.E.K. acknowledges support by Artificial Intelligence Research Associate (AIRA) program of the Defense Advance Research Projects Agency (DARPA). F.S. was also supported by the NSF of China (grant No. 11901100).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **Applications of Distributed-Order Fractional Operators: A Review**

**Wei Ding, Sansit Patnaik, Sai Sidhardh and Fabio Semperlotti \***

Ray W. Herrick Laboratories, School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA; ding242@purdue.edu (W.D.); spatnai@purdue.edu (S.P.); ssidhard@purdue.edu (S.S.)

**\*** Correspondence: fsemperl@purdue.edu

**Abstract:** Distributed-order fractional calculus (DOFC) is a rapidly emerging branch of the broader area of fractional calculus that has important and far-reaching applications for the modeling of complex systems. DOFC generalizes the intrinsic multiscale nature of constant and variable-order fractional operators opening significant opportunities to model systems whose behavior stems from the complex interplay and superposition of nonlocal and memory effects occurring over a multitude of scales. In recent years, a significant amount of studies focusing on mathematical aspects and real-world applications of DOFC have been produced. However, a systematic review of the available literature and of the state-of-the-art of DOFC as it pertains, specifically, to real-world applications is still lacking. This review article is intended to provide the reader a road map to understand the early development of DOFC and the progressive evolution and application to the modeling of complex real-world problems. The review starts by offering a brief introduction to the mathematics of DOFC, including analytical and numerical methods, and it continues providing an extensive overview of the applications of DOFC to fields like viscoelasticity, transport processes, and control theory that have seen most of the research activity to date.

**Keywords:** fractional calculus; distributed-order operators; viscoelasticity; transport processes; control theory

#### **Content**


**Citation:** Ding, W.; Patnaik, S.; Sidhardh, S.; Semperlotti, F. Applications of Distributed-Order Fractional Operators: A Review. *Entropy* **2021**, *23*, 110. https:// doi.org/10.3390/e23010110

Received: 25 December 2020 Accepted: 11 January 2021 Published: 15 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).


#### **1. Introduction**

Fractional calculus (FC) was first introduced as a mathematical generalization of integer-order integration and differentiation. Started in 1695 from a discussion between Leibniz and de L'Hôpital about the possible interpretation of the operator d*n*/d*x<sup>n</sup>* when *n* = 1/2 [1], FC has been the object of studies for more than 300 years. In the early years, research mostly focused on mathematical aspects of the fractional-order operators; their physical interpretations and potential applications followed much later. Likely, the first application of FC can be traced back to Abel in 1826. Abel [2] applied FC to formulate an integral equation describing a tautochrone problem. Following Abel's study, the integral representation of FC started gaining increasing attention in the mathematics community. Early works mostly focused on the development of analytical formulations to solve selected mathematical problems. The most immediate result of this rapidly growing interest in FC was the expansion of the possible definitions of a fractional operator including, but not limited to, the integral representation (Liouville, Riemann, and Hadamard) and the convergent series representation (Grünwald and Letnikov). While these early studies had pointed out the intriguing role that FC can play when modeling complex processes in physical systems, the bulk of the early research kept focusing on the development of the mathematical framework [3] and on the integration of these operators into ordinary and partial differential equations [4]. It was only in the second half of the twentieth century that the concept of FC started percolating to fields other than mathematics. An area of application that has seen a remarkably rapid growth is that involving the modeling of complex physical phenomena. Unlike integer-order operators, the intrinsic multiscale nature of fractional operators enabled a very unique and effective approach to model historically challenging physical processes involving, as an example, nonlocality or memory effects. Indeed, many of the early applications of FC to physical modeling included viscoelastic effects [5–12], nonlocal behavior [8,12–24], anomalous and hybrid transport [9–11,24–30], fractal media [12,31–35], and even control theory [36–39]. The interested reader is referred to the work in [40] for a detailed account of the birth and evolution of fractional calculus.

For more than a century, the study of fractional calculus focused on operators accepting a constant and single-valued order; we will refer to these operators as constant-order operators in order to differentiate them from the distributed (but constant) order operators that will be introduced below. Despite constant-order operators being considerably more general than their integer-order counterpart, the constant and single-valued nature of the order still limits its ability to accurately capture certain complex phenomena whose underlying physics could either evolve in time or emerge as the result of the interplay of multiple orders. In relatively recent years, this observation led to the formulation of two remarkable and unique forms of FC operators, namely, the distributed-order and the variable-order operators. The latter definition accounts for operators whose order can be a function of either dependent (e.g., state variables of the system) or independent (e.g., space or time) variables and can change value following the evolution of the system. While this

review does not focus on this class of operators, the interested reader is referred to the works in [41,42] for a detailed overview of the mathematical aspects and applications of variable-order operators.

Before proceeding further, we clarify the different acronyms that will be used in this review in order to refer to the different types of fractional-order operators. The single constant-order operators are denoted as "CO" operators, the distributed-order operators (with constant order distribution) are denoted as "DO" operators, and the variable-order operators are denoted as "VO" operators. While VO operators can certainly be single or distributed in nature, with the acronym "VO" we specifically refer to single variable-order operators. Distributed-variable-order operators, which will be introduced later, are denoted as "DVO" operators.

The distributed-order definition of the operator allows considering a superposition of orders and accounting for, as an example, physical phenomena such as memory effects in composite materials [43] or multi-scale effects [44]. A typical example that illustrates the capabilities of this class of operators is the mechanical behavior of viscoelastic materials having spatially varying properties [45]. Distributed-order fractional calculus presents a natural generalization of constant-order fractional calculus (COFC) by integrating the fractional kernel of CO operators over an extended range of orders. Given that the fundamental kernel of a CO operator is retained in the DO operator, DO operators inherit the fundamental properties of COFC, such as the ability to model nonlocality and memory effects, and further extend them to multiple coexisting orders. This latter argument can be interpreted as a superposition of the behavior captured by individual CO operators using different orders within a given range.

The original concept of distributed-order fractional calculus (DOFC) can be traced back to the seminal studies by Caputo on dissipative elastodynamics [46–48]. In these studies, a generalization of the viscoelastic stress–strain constitutive laws, by employing a parallel sequence of fractional-order derivatives, was undertaken. Initially, the author dubbed this operator as the "*mean fractional-order derivative*". A couple of decades later, Caputo [49] formalized the original proposition into the concept of DO derivative and also explored possible solutions to differential equations employing DO derivatives. Later, detailed investigations on the properties of DO operators, and on the properties and solution techniques of DO differential equations (DODE) were conducted in [45,50,51]. Following these pioneering studies on the mathematics of DO operators, in the 1990s and early 2000s, the interest in this topic went beyond the mathematical community and started percolating into several branches of engineering and physics. To date, we estimate that a total of approximately 300 papers have been published in the general area of DOFC. This estimate includes both journal and conference publications spanning a variety of fields including, but not limited to, theoretical and applied mathematics, analytical and numerical methods, viscoelasticity, transport processes, and control theory. A detailed time history and a quantitative assessment of the scientific studies produced in the general area of DOFC are provided in Figure 1.

Given the substantial critical mass reached by this field to date, and in view of the drastic acceleration of the research on DOFC observed in recent years, the time is ripe to assess the state of the field not only in terms of the mathematical formulation, but from the perspective of practical applications. In this review, we will provide a comprehensive discussion of the different fields of application and possible opportunities offered by DOFC to model complex physical problems. We expect that this review would serve as a starting point for the reader interested in approaching this fascinating field. Engineering, physics, chemistry, biology, and finance are only some of the communities that should find several points of interest and material for further consideration in this work.

**Figure 1.** (**a**) Histogram chart showing the historical evolution of scientific publications per year starting from 1995. Note that the first study on distributed-order fractional calculus (DOFC) was published in 1966 by Caputo [46]. Approximately five studies were produced until 1995, which was taken as the starting year for the histogram. (**b**) Pie chart showing the distribution of publications per field. The data used in this figure were collected from Google Scholar.

The remainder of this paper is organized as follows. Section 2 focuses on providing an overview of the main mathematical concepts including basic definitions and properties of DOFC. The section also covers analytical and numerical methods for the calculation of DO operators and for the solution of DODEs. Section 3 briefly discusses the relevance of DO operators with respect to the modeling of complex physical processes. The remaining sections provide a review of the applications of DOFC to real-world problems including viscoelastic systems, transport processes, and control theory.

#### **2. Mathematical Background**

We begin this review by providing a brief summary of the basic definitions and properties of DO operators. Further, we will discuss the properties of differential equations with DO operators, and provide a brief overview of the corresponding analytical and numerical simulation techniques. We highlight here that, unless otherwise mentioned, the DO operator is defined on the basis of a general fractional-order derivative denoted by *<sup>c</sup> D<sup>α</sup> <sup>t</sup>* , evaluated with respect to a generalized independent variable *t*. We emphasize that the notation *t* used in this section must not be interpreted necessarily as time. Note that *c* denotes the lower terminal of the fractional derivative. The fractional derivative *<sup>c</sup> D<sup>α</sup> t* can accept different definitions, although the most common for DO operators are those provided by Riemann–Liouville *RL <sup>c</sup> D<sup>α</sup> <sup>t</sup>* and by Caputo *<sup>C</sup> <sup>c</sup> D<sup>α</sup> <sup>t</sup>* [45]. Finally, also for the sake of brevity, we shall provide only the definitions corresponding to the left-handed fractional derivatives (the right-handed DO derivatives being an immediate extension).

#### *2.1. Definitions and Properties*

From a mathematical perspective, DO derivatives are defined as an integration of either the constant-order or the variable-order fractional derivatives with respect to the noninteger order of differentiation [48–51]. Two approaches to the definition of DO derivatives have been explored [45]. First, the so-called *direct approach* treats the order as a variable so that the DO derivative is defined as [45,49]

$$\mathcal{L}\_{\mathfrak{a}\_1\mathfrak{a}\_2} \mathcal{D}\_{\mathfrak{c},t}^{\mathfrak{a}}(f(t), \mathfrak{x}(\mathfrak{a}), \mathfrak{a}) = \int\_{\mathfrak{a}\_1}^{\mathfrak{a}\_2} \kappa(\mathfrak{a})\_{\mathfrak{c}}^{\square} \mathcal{D}\_t^{\mathfrak{a}} f(t) \, \mathrm{d}\mathfrak{a} \tag{1}$$

where the integrand *κ*(*α*) *<sup>c</sup> D<sup>α</sup> <sup>t</sup> f*(*t*) undergoes integration with respect to the independent variable *α*, that is, the fractional order within the interval *α* ∈ [*α*1, *α*2]. *κ*(*α*) is denominated as the order-weighting/strength function, or simply the strength function. The second approach, referred to as the indirect approach, treats the order as a function of a different independent variable *x* leading to the following definition [45],

$$\mathcal{D}\_{x\_1, x\_2} \mathcal{D}\_{c, t}^{a(\mathbf{x})} (f(t), \mathbf{x}(a), \mathbf{x}) = \int\_{x\_1}^{x\_2} \kappa(\mathbf{x})\_c^{\square} D\_t^{a(\mathbf{x})} f(t) \, \mathbf{d}x \tag{2}$$

where *x* ∈ [*x*1, *x*2] is the interval of integration. Similar to *κ*(*α*), *κ*(*x*) is also an order strength distribution [45]. The strength function (*κ*(*α*) or *κ*(*x*)) determines the contribution of each individual CO derivative to the overall DO derivative. As an example, a constant value of the strength function *κ*(*α*) = *κ*<sup>0</sup> would mean the all the CO derivatives contribute equally to the final DO derivative [49]. The specific choice of this strength function depends on the underlying physics of the problem to be modeled and could be defined as either a continuous or a discrete function of the order *α* (direct approach) or the independent variable *x* (indirect approach). This latter comment is further clarified in the following section by using practical examples.

To better illustrate the above concepts, we present a numerical demonstration of the DO derivatives evaluated for two representative functions of the variable *t*: (1) a sinusoidal function *f*(*t*) = sin *πt* in Figure 2 and (2) a step function *f*(*t*) = H(*t* − 1) in Figure 3, where H is the Heaviside function. In Figures 2a and 3a, the strength function is chosen to be *κ*(*α*) = 1, such that it is constant and continuous. In the Figures 2b and 3b, a discontinuous strength function *<sup>κ</sup>*(*α*) = <sup>Σ</sup>*αj*∈{0.5,0.7,0.9} *<sup>τ</sup>*<sup>0</sup> *<sup>α</sup> <sup>δ</sup>*(*<sup>α</sup>* <sup>−</sup> *<sup>α</sup>j*), where *<sup>τ</sup>*<sup>0</sup> is a positive constant. In generating the above results, we employed the Caputo definition of the fractional derivatives with terminals (−∞, *t*]. The CO Caputo fractional derivative of the two different functions to an order *α* ∈ (0, 1) is [52]:

$$\, \_\infty^C D\_t^\pi (\sin \pi t) = \pi^a \, \sin \left( \frac{\pi \, (2t + a)}{2} \right) \tag{3a}$$

$$\, \_{-\infty}{}^{C}\_{t}D\_{t}^{a}(\mathcal{H}(t-1)) = \mathcal{H}(t-1) \left[ \frac{(t-1)^{-a}}{\Gamma(1-a)} \right] \tag{3b}$$

The above CO derivatives are also provided in the Figures 2 and 3 to facilitate comparison with the DO derivatives. Note that above expressions for the different CO derivatives identically reduce to their respective first-order (integer) derivatives for the choice of *α* = 1.

As evident from the Figures 2 and 3, the DO derivatives can be perceived as the weighted sum of individual CO derivatives over the specified range of fractional-order *α*. Particularly for *κ*(*α*) = 1, as evident from Figures 2a and 3a, the DO derivative is the linear sum of the CO derivatives with fractional-order *α* spanning the range [*α*1, *α*2]. This concept is further illustrated by the examples in Figures 2b and 3b. In these figures, the DO derivatives evaluated for *τ*<sup>0</sup> = 1 are the sum of the individual CO derivatives. In contrast, for *τ*<sup>0</sup> = 2 wherein the strength function is also a function of the order *α*, we observe a weighted contribution of the different CO derivatives to the DO derivative. The above discussion also explains the shift in the phase of the harmonic function in Figure 2a. More specifically, the phase shift in the DO derivative with respect to the original signal is caused due to the contribution of a phase difference of *πα*/2 (see Equation (3a)) by each CO derivative. The effect of the strength function on the amplitude, without changes in the phase, is illustrated in Figure 2b. Similarly, for the case of the Heaviside step function in Figure 3, different decaying characteristics can be obtained by varying the definitions of the strength function *κ*(*α*) and its support [*α*1, *α*2]. Interesting applications to viscoelasticity based on this observation will be discussed in Section 4.

**Figure 2.** DO derivative of a harmonic function *f*(*t*) = sin *πt* derived following the definitions given in Equation (1). The plot shows the behavior of the derivative for (**a**) continuous and (**b**) discrete strength functions.

**Figure 3.** DO derivative of the Heaviside function *f*(*t*) = H(*t* − 1) derived following the definitions given in Equation (1). The plot shows the behavior of the derivative for (**a**) continuous and (**b**) discrete strength functions.

Lorenzo and Hartley [45] also extended the definitions of DO derivatives by allowing for the order distribution to be a function of different variables (such as, for example, space, time, or external loads). This extension introduced the concept of distributed-variableorder (DVO) operator. Following this extension, the direct and indirect approaches to the definition of DO operators can be reformulated as

$$\mathcal{L}\_{a\_1 a\_2} \mathcal{D}\_{c, t}^{a(t)}(f(t), \kappa(a), a) = \int\_{a\_1}^{a\_2} \kappa(a)\_c^{\square} D\_t^{a(t)} f(t) \, \mathrm{d}a \tag{4a}$$

$$\mathcal{D}\_{x\_1, x\_2} \mathcal{D}\_{c, t}^{a(\mathbf{x}, t)}(f(t), \kappa(\mathbf{a}), \mathbf{x}) = \int\_{x\_1}^{x\_2} \kappa(\mathbf{x})\_c^{\square} D\_t^{a(\mathbf{x}, t)} f(t) \, \mathbf{d}x \tag{4b}$$

Although providing a very general form of the operator that can capture both multifractal (DO) and evolutionary (VO) behavior, the application of these operators has been rather limited. To date, most applications of DVO operators have been in the area of complex viscoelastic materials (see Section 4.3).

#### *2.2. Distributed-Order Differential Equations*

The present section is intended to briefly introduce the concept of differential equations based on DO operators. Clearly, the concept of DODEs is fairly extensive in itself and the reader is referred to the works in [53,54] for a detailed discussion on the different forms of DODEs and the corresponding solution techniques. Here, we simply introduce the general concept of DODE in order to facilitate the understanding of the discussion on applications presented in the remainder of the paper. Consider the following DODE [49],

$$\,\_{0,m}\mathcal{D}\_{0,t}^{\mathfrak{a}}(\kappa(\mathfrak{a}),\mathfrak{u}(t),\mathfrak{a}) = f(t) \tag{5}$$

for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>. Note that a discrete distribution function *<sup>κ</sup>*(*α*) = <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *bjδ*(*α* − *αj*) reduces the above equation to following multi-term fractional-order differential equation,

$$\sum\_{j=1}^{n} b\_j \, \_0^{\square} D\_t^{a\_j} u(t) = f(t) \tag{6}$$

At the same time, a continuous distribution *κ*(*α*) = C[0, *m*] can be perceived as a limiting case of the multi-term definition provided above when *n* → ∞ [49]. While Equation (5) is an example of linear DODE, a nonlinear DODE can be given as [55]

$$\int\_{m\_1}^{m\_2} \kappa(\alpha) F\left(\^\Box D\_t^\alpha u(t)\right) \,\mathrm{d}u = f(t, u(t))\tag{7}$$

where *F* <sup>0</sup> *<sup>D</sup><sup>α</sup> <sup>t</sup> u*(*t*) is a nonlinear function in the primary variable *u*(*t*) including its fractional derivatives.

For the linear DODE in Equation (5), some common assumptions are employed to ensure that the problem is well-posed, that is, the solution is both bounded and convergent [55,56]:

**Hypothesis 1.** *κ is absolutely integrable on the interval* [*α*1, *α*2] *and satisfies the following inequality,*

$$\int\_{a\_1}^{a\_2} \kappa(\alpha) s^a \mathrm{d}\alpha \neq 0, \quad \text{for} \quad \mathrm{Re}(s) > 0 \tag{8}$$

**Hypothesis 2.** *<sup>f</sup>* <sup>∈</sup> <sup>L</sup>1[0, <sup>∞</sup>)*, where* <sup>L</sup><sup>1</sup> *is the Lebesgue space.*

**Hypothesis 3.** *The function u*(*t*) *is such that* <sup>0</sup> *<sup>D</sup><sup>α</sup> <sup>t</sup> u*(*t*) < *M* ∀*t* ∈ [0, ∞) ∩ ∀*α* ∈ [*α*1, *α*2]*, where M is a constant. In other terms the fractional-order derivative is always bounded. For the limiting case where either of the order bounds tends to infinity (i.e., α*<sup>1</sup> *or α*<sup>2</sup> → ∞*), the boundedness of the DO derivative requires the strength function κ*(*α*) *to be non-zero only over a finite range, that is, κ*(*α*) *must have a finite support [45].*

Pskhu [57,58] conducted early studies on the solvability of ordinary DODEs. Umarov and Gorenflo [59] extended these studies to analyze the solvability of multipoint problems. Diethelm and Ford [60,61] analyzed the existence and the uniqueness of solutions for linear DODEs, specifically for the case where Caputo-type initial conditions are available. Later, this proof was extended to the case where initial conditions are unknown [55]. It is noteworthy that these studies prove the existence and uniqueness for the fractional order *α* < 1, while for *α* > 1 the existence and uniqueness are still a conjecture. A similar exercise was performed on nonlinear DODEs with specific application to viscoelastic systems [62] and wave propagation [63]. The existence of solutions to hybrid DODEs was analyzed in [64], where the hybrid differential equations are quadratic perturbations to nonlinear DODEs [65,66]. Atanackovi´c et al. also conducted similar studies on selected forms of DODEs encountered in the study of viscoelastic solids [67,68]. Note that all the

aforementioned studies adopt the assumptions Hypothesis 1–3. Very recently, Fedorov studied linear DODEs that violate Hypothesis 2 resulting in an unbounded operator [69]. This study expanded the application of DODEs to initial and boundary value problems of ultra-slow diffusion.

#### *2.3. Solution of DODEs: Analytical Methods*

Concerning the analytical methods for the solution of DODEs, Caputo first proposed the use of Laplace transform to derive solutions [49]. Later, Bagley and Torvik [50,51] analyzed this approach in a systematic manner. The results obtained by the application of Laplace transform to DO derivatives are subject to minor modifications depending on the strength function and its support. Caputo derived the Laplace transform of DO derivatives with the order-distribution being an arbitrary interval [*a*, *b*]. Bagley and Torvik specialized this result for a restricted interval: *α* ∈ [0, 1], given the numerous practical examples encompassed by this choice. Diethelm and Ford extended the domain to [0, *<sup>m</sup>*], *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> [60]. The Laplace transform of a DO derivative with order distributed in [0, *m*], based on the Caputo definition, is given as [56]

$$\begin{aligned} \mathcal{L}\underbrace{\left[\int\_{0}^{\mathfrak{m}} \mathfrak{x}(a)\_{0}^{\mathbb{C}} D\_{t}^{\mathfrak{a}} u(t) \mathrm{d}a \right]}\_{\stackrel{\mathcal{C}}{\longleftarrow} \mathcal{D}\_{0,t}^{\mathfrak{a}} u(t)} &= \int\_{0}^{\mathfrak{m}} \mathfrak{x}(a) \left(\mathfrak{s}^{\mathfrak{a}} \mathcal{L}[u](\mathfrak{s}) - \mathfrak{u}(0) \mathfrak{s}^{\mathfrak{a}-1} \right) \mathrm{d}a \\ &\qquad - \quad \sum\_{j=1}^{m-1} \int\_{j}^{\mathfrak{m}} \mathfrak{x}(a) \mathfrak{u}^{(j)}(0) \mathrm{s}^{\mathfrak{a}-j-1} \mathrm{d}a \end{aligned} (9)$$

The Laplace transform of the DO derivative for other possible cases such as *α* ∈ [0, ∞] and *α* ∈ [*m* − 1, *m*] can be found in [45,70], respectively.

Using the Laplace transform of the DO derivative in Equation (9), Diethelm and Ford derived the analytical solution for the linear DODE: *<sup>C</sup>* 0,*m*D*<sup>α</sup>* 0,*tu*(*t*) = *f*(*t*) as [60]

$$u(t) = u(0) + \mathcal{L}^{-1} \left[ \frac{1}{\int\_0^m \kappa(\boldsymbol{\beta}) s^{\boldsymbol{\beta}} d\boldsymbol{\beta}} F(s) \right] + \sum\_{k=1}^{m-1} y^k(0) \mathcal{L}^{-1} \left[ \frac{\int\_0^m \kappa(\boldsymbol{\beta}) s^{\boldsymbol{\beta}-k-1} d\boldsymbol{\beta}}{\int\_0^m \kappa(\boldsymbol{\beta}) s^{\boldsymbol{\beta}} d\boldsymbol{\beta}} \right] \tag{10}$$

where <sup>L</sup>−<sup>1</sup> is the inverse Laplace transform. Note that the inverse Laplace transform in the above solution can be applied *iff* the assumptions Hypothesis 1–3, that ensure a bounded solution, are satisfied [60]. Lorenzo and Hartley derived analytical solutions for DODEs employing DO derivatives specifically for an order distributed over R<sup>+</sup> [45]. Other common approaches to derive solutions of DODEs include the Fourier method [71–73], the use of Mittag–Leffler functions [74–76], the spectral representation of the fractional operator [77], and series expansion methods [78,79]. The method of Laplace transforms combined with series approximations using Laguerre polynomials was also used to solve linear and nonlinear DODEs [80]. While the work in [80] focuses on obtaining the solution for oneand two-term fractional-order relaxation equations, the method developed in [80] is highly general and may be extended to DODEs with general strength functions.

Although, in the above discussion we have primarily considered DO derivatives based on the Caputo definition, the Laplace transform of DO derivatives based on the Riemann–Liouville definition can also be derived analogously [60]. In fact, as shown in [60], the only difference appears in the terms consisting the initial conditions, similar to the CO case [4]. This difference in behavior was also highlighted by Mainardi et al. [81], who employed Laplace transforms to compare the asymptotic behaviors of fundamental solutions to time-fractional DO diffusion equations. Interestingly, different asymptotic behaviors are observed for DO derivatives based on the Riemann–Liouville and Caputo definitions. The difference in the asymptotic behaviors is primarily due to the difference in the way the initial conditions appear in the Laplace transform of the CO Riemann–Liouville and Caputo derivatives [4,82].

#### *2.4. Solution of DODEs: Numerical Methods*

Although analytical solutions are possible for special types of DODEs [45,60], the rapidly growing application of DOFC to model complex physical systems often requires the use of numerical methods. Starting from basic observations, Diethelm [83] first proposed an approximate numerical method for the solution of multi-term DODEs. Following this initial study, several other numerical methods have been developed. Note that DODEs (see, for example, Equation (5)) can be either ordinary differential equations (ODE) or partial differential equations (PDE), depending on the specific application. The numerical simulation of either a distributed-order ODE or PDE requires the numerical approximation of the DO derivative. Once the approximation of the DO derivative is obtained, the procedure to numerically simulate the DODE follows exactly from classical procedures developed for integer-order equations. In other terms, the main difference between the evaluation of classical integer-order differential equations and DODEs lies in the numerical approximation of the DO derivative. In the interest of brevity, we focus this section only on this latter aspect. In general, the procedure to numerically approximate DO derivatives can be seen as a two-step process:


The above two steps can be more practically visualized by considering the following example of DO derivative,

$$\int\_{a}^{b} \boldsymbol{\phi}(a) \boldsymbol{D}^{\boldsymbol{a}} \boldsymbol{u}(t) d\boldsymbol{a} \stackrel{\text{Step 1}}{\approx} \underbrace{\sum\_{i=0}^{k} \mathcal{W}\_{i} \boldsymbol{\phi}(a\_{i}) \boldsymbol{D}^{\boldsymbol{a}\_{i}} \boldsymbol{u}(t)}\_{\text{Approximation of the integral}} \stackrel{\text{Step 2}}{\approx} \underbrace{\sum\_{i=0}^{k} \mathcal{W}\_{i} \boldsymbol{\phi}(a\_{i}) \boldsymbol{\Psi}(a\_{i}, t)}\_{\text{Incorporate approximation of } \boldsymbol{D}^{\boldsymbol{a}} \boldsymbol{u}(t)} \tag{11}$$

where *Wi* is the weight obtained from numerical integration and Ψ(*αi*, *t*) is the numerical approximation of the CO derivative *Dαiu*(*t*). In summary, at step 1, an approximation of the order integral is computed (often by quadrature rules), and at step 2, the remaining CO derivatives are approximated by employing different types of numerical methods for CO fractional derivatives. Based on this two-step approximation strategy, this section is divided into three parts: (1) a discussion of the most popular quadrature rules for the implementation of step 1, (2) a discussion of the various numerical methods for the implementation of step 2, and (3) a brief discussion on their computational aspects.

#### 2.4.1. Numerical Integration of the Integral Operator (Step 1)

As highlighted in the previous sections, a key difference between DO derivatives and CO derivatives is the existence of an additional integration over the order. To transform the integral form into the multi-term form (first of the two-step process), two common quadrature rules are often used by researchers: (1) Gauss–Legendre quadrature rule and (2) Newton–Cotes quadrature rule. Based on the Gauss–Legendre quadrature rules [84–107], the DO derivative can be approximated using the following multi-term form,

$$\int\_{a}^{b} \phi(\mathbf{a}) D^{a} \mathbf{u}(t) \mathbf{d} \mathbf{a} = \int\_{a}^{b} \mathbf{g}(\mathbf{a}, t) \mathbf{d} \mathbf{a} = \sum\_{i=0}^{k} W\_{i}^{G} \mathbf{g}^{G}(a\_{i}^{G}, t) + \mathbf{R}^{G} \tag{12}$$

where *W<sup>G</sup> <sup>i</sup>* are the weights at the Gauss points *<sup>α</sup><sup>G</sup> <sup>i</sup>* chosen for this integration over the DO. Although the Gauss–Legendre quadrature schemes are known to achieve highly accurate results (particularly when dealing with integrands of specific type such as, for example, polynomials), an analysis of the numerical convergence and of the truncation error (including steps 1 and 2) becomes difficult when the integrand consists of fractional derivatives (like *Dαu*(*t*), as shown in Equation (11)). To overcome these drawbacks of the Gauss– Legendre quadrature, the Newton–Cotes scheme was considered. The Newton–Cotes quadrature scheme can be divided into closed and open approaches, depending on whether the function values at the end points are included. Following the closed approach, different quadrature rules used for DO derivatives include the trapezoid rule [56,87,106,108–117], the Simpson's rule [87,106,111,112,116–121], and the Boole's rule [122]. All these schemes are also associated with different orders of convergence. Following the open Newton–Cotes approach, the mid-point rule is widely used [107,123–143]. The truncation error at the end of step 1, when employing the Newton–Cotes approach, simply follows the classical results. More specifically, the truncation errors are <sup>O</sup>(*h*2) for trapezoid rule and mid-point rule, <sup>O</sup>(*h*4) for Simpson's rule, and <sup>O</sup>(*h*6) for Boole's rule. Given the flexibility in choosing different approximations and the ease of error analysis, Newton–Cotes method is typically preferred over Gauss–Quadrature approach in step 1 approximation.

#### 2.4.2. Approximation of the Multi-term Fractional Derivatives (Step 2)

As described in Equation (11), the second step involves the numerical approximation of the CO fractional derivatives within the multi-term fractional derivative. Strictly speaking, this approximation directly follows the techniques available for CO derivatives. The literature on numerical methods for the approximation of CO derivatives is extensive and has been the object of books [144] and papers [145–147]. Therefore, for the sake of brevity, we do not review again these methodologies.

The more interesting and challenging aspect, in the context of the DO formulation, is the combination of the step 2 approximation with the spatial and/or temporal discretization of the domain in order to develop computational models for space- and/or time-fractional DODEs. The different discretization techniques can be generally divided into (1) mesh-free approaches and (2) mesh-based approaches. The majority of mesh-free approaches are based on the spectral method, which uses basis functions to approximate the multi-term DO expression obtained in the first step. On the other hand, the mesh-based approaches involve most of the classical methods for differential equations including the finite difference method (FDM) and the finite element method (FEM). Depending on the specific implementation, that is, on the numerical technique adopted to approximate the CO fractional derivative in step 2 and the spatial and/or temporal discretization of the domain, the computational approaches differ in their accuracy and computational cost. This review focuses on this latter aspect. In this regard, we report here the accuracy of each method, wherever available. In order to unify the expressions for convergence analysis of different methods, we will use *τ*, *h*, and *σ* to represent the step-sizes in time, space, and order, respectively.

#### Mesh-Free Approaches

In this section, we briefly describe the different mesh-free approaches available in the literature to numerically simulate DODEs. The majority of these techniques adopt the common strategy of converting the DODE into a system of algebraic equations using orthogonal basis functions. This allows formulating operational matrices which approximate the CO derivatives within the step 2 approximation. Depending on the strategy adopted to develop these matrices (or, equivalently, these algebraic equations) the different mesh-free approaches can be broadly categorized as Galerkin methods, collocation methods, and tau

methods. A brief discussion on these methods and some other miscellaneous techniques is provided in the following.


and BPFs and shifted Legendre polynomials [161]. For completeness, we mention that other numerical methods including the Laguerre spectral method [108], Legendre wavelets method [84], fractional pseudo-spectral method [162], reproducing kernel method [163], radial basis function based mesh-free methods [86,114], and elementfree Galerkin method [106] have also been proposed. Further, several semi-analytical approaches including the Homotopy perturbation method [164–167], harmonic approximations [168], and the Adomian decomposition method [169–171] have also been proposed and applied to derive the solution of DODEs and multi-term fractional differential equations (FDE).

#### Mesh-Based Approaches

Although many mesh-free approaches can be implemented relatively easily for DO problems involving simple geometries and boundary conditions, algorithms for numerical computations on complex domains (e.g., involving irregular geometry and highdimensional systems) still present several complexities. This also reflects from the fact that many 2D and 3D problems have been solved using mesh-based approaches, while a majority of mesh-free approaches focus primarily on 1D problems. FEM is particularly useful in exploring numerical solutions over irregular domains. Among the mesh-based approaches for DODEs, two methods have generated the most interest: finite difference methods (FDM) and finite element methods (FEM). Before proceeding to review these mesh-based approaches, it is important to note a specific challenge faced by this class of methods. More specifically, due to weak singularity of the integral kernel within the fractional derivative, numerical solutions for initial boundary-value FDEs normally have non-smooth sharp approximations near the boundary [172–174]. As the DO derivative is approximated via a weighted sum of CO derivatives (see Equation (11)), this phenomenon also occurs when solving initial boundary-value DODEs [143]. To tackle this weak singularity, the commonly used mesh-based methods need to be improved. One possible approach, commonly adopted in literature, consists in the use of a graded mesh [87,143]. Remarkably, the use of the graded mesh also helps achieving a high-order convergence [87,143].

1. Finite difference methods are one of the most widely used mesh-based approaches for the solution of DODEs because they allow easy formulation and implementation. Compared with other approaches, the convergence and accuracy of FDM are easier to analyze [175–177]. A majority of the advanced FDMs are based on the Grünwald– Letnikov method (GLM) [122,142]. Recall that GLM uses a finite number of terms from a convergent series to approximate the fractional derivative and is a widely used approach [4]. Hu [126] used a shifted GLM to simulate a time-fractional DODE with accuracy up to <sup>O</sup>(*τ*1+*σ*/2 <sup>+</sup> *<sup>h</sup>* <sup>+</sup> *<sup>σ</sup>*2). Second-order accurate schemes for spacefractional DODEs were developed in [136] by using a Crank–Nicolson scheme in time and a shifted GLM. Similar second-order accurate algorithms can also be found in [133,178]. The second-order accurate backward difference formula, first proposed by Diethelm [145], also appears to be popular among several researchers [124,129,138]. To further improve the numerical accuracy, more elaborate methods were developed using the weighted and shifted GLM (WSGLM). Li [179] developed a numerical scheme with high spatial accuracy (O(*τ*<sup>2</sup> <sup>+</sup> *<sup>h</sup>*4.5 <sup>+</sup> *<sup>σ</sup>*2)) by combining WSGLM and the parametric quintic spline method. Another scheme capable of delivering high spatial accuracy (O(*τ*<sup>2</sup> <sup>+</sup> *<sup>h</sup>*<sup>4</sup> <sup>+</sup> *<sup>σ</sup>*4)) was proposed by using the WSGLM for temporal approximation and high-order compact difference scheme for spatial approximation [117]. Yang [180] also proposed a similar composite method based on WSGLM in time and orthogonal spline collocation method in space. This scheme was shown to be unconditionally stable and accurate up to <sup>O</sup>(*τ*<sup>2</sup> <sup>+</sup> *<sup>h</sup>r*+<sup>1</sup> <sup>+</sup> *<sup>σ</sup>*2) (here *<sup>r</sup>* is the polynomial degree used in the spatial domain).

FDM schemes have also been developed for high-dimensional problems, with particular attention being given to accuracy and convergence performance [141,181]. For applications requiring high accuracy, two techniques are often used: (1) compact FDM (CFDM) and (2) extrapolation method. Based on a fully discrete difference scheme [182], Ye [132] proposed a CFDM and demonstrated its convergence to be <sup>O</sup>(*τ*1+*σ*/2 <sup>+</sup> *<sup>h</sup>*<sup>4</sup> <sup>+</sup> *<sup>σ</sup>*2). Pimenov [121] constructed a linearized difference scheme for nonlinear time delay DODE. Several researchers [110,120,183] also obtained a CFDM with order <sup>O</sup>(*τ*<sup>2</sup> <sup>+</sup> *<sup>h</sup>*<sup>4</sup> <sup>+</sup> *<sup>σ</sup>*4) based on higher order temporal approximation techniques. Gao [111,116] applied two extrapolation methods in time to achieve high temporal convergence: <sup>O</sup>(*τ*2) and <sup>O</sup>(*τ*2|ln*τ*<sup>|</sup> <sup>2</sup>). For high-dimensional problems, ADI schemes become highly popular and help achieve highly accurate (second-order in time and fourth-order in space) numerical schemes [107,184].


#### Computational Aspects of DODEs

The previously discussed numerical schemes for the approximation of fractional derivatives typically generate dense matrices; a clear consequence of the intrinsic nonlocal character of the operator. For discretizations with *N* number of elements (temporal or spatial), these dense matrices generally require <sup>O</sup>(*N*3) floating point operations and <sup>O</sup>(*N*2) memory, for each iteration. In order to reduce this high computational cost, several alternate approaches were considered. Based on the idea of relabeling employed in ADI methods, Jia [203] developed a fast FDM which stores a coefficient matrix in O(*N*) memory and performs matrix-vector multiplication in O(*N*log*N*) computations. Two numerical algorithms offering comparable time and space complexity were developed by Jian [142] and Zheng [202]. By expressing the matrix of coefficients as a sum of special diagonal-Toeplitz matrices, Jian derived a fast solution technique based on the preconditioned Krylov subspace method. Zheng proposed an efficient biconjugate gradient stabilized method to solve system of equations with a Toeplitz structured coefficient matrix. More recently, a reducedorder ADI method [184] was developed to reduce the computational cost involved in the numerical solution of DODEs.

Before proceeding further, it is worth noting that the computational time for the numerical simulation of DODEs can also be reduced via parallel computation and preconditioning of the operational matrices used to approximate the fractional derivatives. While parallel computation has not been directly applied to DODEs, parallel solvers have been developed for CO FDEs [204–206]. Besides the parallel algorithm itself, the effect of different hardware platforms (GPU v/s CPU) [207] and different memory architectures (shared memory v/s distributed memory) [206] on the computational times for simulation of CO FDEs, have also been studied. Further, preconditioners are often designed to accelerate matrix computations in nonlinear CO FDEs involving iterative problem solving procedures. Many studies have proposed different types of preconditioners such as, for example, preconditioned biconjugate gradient method [208] and generalized minimal residual method [209], for solving nonlinear CO FDEs. Both the above described techniques, that are parallel computing and preconditioning, present possible opportunities to reduce the computational time for solving DODEs and are hence worthy of detailed investigation in the future.

#### **3. Relevance of Distributed-Order Operators**

As evident from the definitions presented in Section 2, DO operators can be interpreted as a parallel distribution of derivatives of either integer or fractional orders. It follows that one of the most immediate application of these operators is to model physical systems whose response is characterized by a superposition of different processes operating in parallel and individually described by either fractional- or integer-order operators. As an example, consider electro-rheological fluids that can change their properties following the application of an electric field. This means that, in these media, the order of a small fluid element is dependent on the local field strength. Therefore, if the applied electric field is nonuniform, a corresponding order distribution will exist throughout the material [45]. A similar example consists of modeling the response of an electrical circuit with a distributed network of capacitors exhibiting the well-known fractional-order Curie's law. According to this law, current through a capacitor varies with time *t* as *i*(*t*) = *V*0/*Ctα*, where *V*<sup>0</sup> is a constant voltage and *α* ∈ (0, 1) [210]. These simple examples suggest that there exists a class of physical problems that can be better described by DO operators.

Broadly speaking, the above-described class of physical problems is characterized by the presence of multifractal or equivalently multifractional systems [211]. The response of such systems is marked by the presence of multiple temporal and spatial scales, which can be accurately captured via time-fractional and space-fractional DO operators, respectively. The advantage of the DO operator in capturing the hierarchy of scales as well as anomalous scaling effects has been analyzed in detail in [44]. The occurrence of this hierarchy of scales could be better visualized by considering, for example, the modeling of turbulence via the Lévy walk approach. This approach associates a time scale with jump distances, and the multiplicity of scales is explicitly taken into account via an integral equation which contains a coupled memory kernel similar to the DO operator [212]. Other examples of such multifractional processes include the analysis of structures with simultaneous nonlocal and strain-gradient (multiscale) effects [213], diffusion of particles in microporous materials [214], analysis of financial markets where distributions of financial data usually possess fast falling power-law tails [215], and even state functions of complex quantummechanical systems [216,217].

From a different perspective, DO operators can also be used to retrofit models to experimental data derived from systems with an unknown fractional behavior. The fractionalization of differential equations commonly used in mathematical physics leads to the analysis of the order-parameter, say *α*, to be determined via experimental results. As experiments can lead to several values of the fractional order, as a result of different experimental conditions, it is convenient to introduce a DO fractional derivative. This is equivalent to integrating the product of a fractional derivative (*D<sup>α</sup>* (·)) of the primary response variable (say *u*) and a weight function (or distribution) with respect to the order of the derivative, that is, to evaluate supp *<sup>φ</sup> <sup>φ</sup>*(*α*)*D<sup>α</sup> u*d*α*. In this way, one may use several experimental results and determine a continuous function *φ* rather than focusing on a single variable that is the fractional-order *α*. This can be interpreted as a homogenization of the different possible fractional processes and the resulting epistemic uncertainties. In other terms, such an approach would enable a valid and accurate analysis of experimental data and allow the development of fractional-order models, without having to identify the specific underlying fractional behavior.

The above remarkable properties of DO operators have led to the development of fractional models capable of describing numerous complex physical processes. Most of the work to date has concentrated on the general areas of viscoelasticity, transport processes, and control theory. We make a few concluding remarks, before proceeding to review the most significant applications of DOFC reported to date in the different areas. Note that the application of DOFC to viscoelasticity and control theory primarily involves the use of timefractional DO derivatives, while the application to transport processes involve both spaceand time-fractional DO derivatives. This separation follows from the underlying physics being captured. In this regard, recall that, while time-fractional DO derivatives are typically used to account for memory effects and dissipation across multiple temporal scales, spacefractional derivatives are used to model nonlocal effects and spatial heterogeneity over multiple spatial scales. In the applications presented below, we do not specify if the DO model is based on a Riemann–Liouville or Caputo (or any other) definition, as it only marginally affects the overall discussion. Finally, we use the following notation in all the subsequent sections: *t* and *x* refer to the independent variables in time and space, respectively.

#### **4. Applications to Viscoelasticity**

Fractional-order derivatives are well suited to capture the dissipation in viscoelastic solids. The differ-integral definition of the fractional derivatives allows the effects of deformation history to be realized within the stress–strain constitutive models, thus combining the elastic response across different time scales. In this regard, Gemant [218,219], Caputo [46], Bagley and Torvik [5,6], and Chatterjee [7] provided seminal contributions towards the use of fractional-order models to simulate the effect of dissipation in viscoelastic solids. While an approach based on CO time-fractional derivatives is intuitive and has drawn much interest, it is not well suited for applications involving materials characterized by multiple relaxation times. In order to address this gap in modeling viscoelastic systems via the CO derivatives, DO models were proposed [48,49,220]. As mentioned in Section 3, the DO operators allow the multiple relaxation scales to be visualized as separate viscoelastic connections operating simultaneously. Thus, a superposition of multiple CO derivatives (or equivalently, multiple relaxation scales) is achieved via the definition of the DO derivative for viscoelastic solids.

#### *4.1. Constitutive Models*

As mentioned in Section 2.1, the DO derivatives were originally conceptualized to model the dissipative elastic response with several temporal relaxation scales [48]. Following this seminal work, several other models of viscoelasticity either based on DO derivatives now exist in literature. These models can be viewed as simplified versions

of the following generalized DO stress–strain constitutive law, proposed by Atanackovi´c, for viscoelastic solids [221,222]:

$$\int\_{0}^{1} \phi\_{\sigma}(\gamma)\_{0} D\_{t}^{\gamma} \sigma(t) \mathrm{d}\gamma = E \int\_{0}^{1} \phi\_{\mathfrak{e}}(\gamma)\_{0} D\_{t}^{\gamma} \epsilon(t) \mathrm{d}\gamma \tag{13}$$

where *φσ* and *φ* represent the strength functions corresponding to stress and strain (these are constitutive functions that characterize the viscoelastic response), *E* is the Young's modulus, and <sup>0</sup>*D<sup>γ</sup> <sup>t</sup>* (·) is the CO time-fractional derivative. The formulation in Equation (13) is referred to as the most general model because all other models, already existing in literature, can be derived from this model via suitable assumptions on the additional (fractionalorder) constitutive parameters. For instance, the choice *φσ* = *δ*(*γ*) and *φβ* = *δ*(*γ* − 1) for the for strength functions results in the standard dashpot. Additional abstractions of the DO constitutive model in Equation (13), describing different viscoelastic elements, are illustrated in the Figure 4. Further, as discussed in Equation (6), a discrete choice for the order-distribution weights in Equation (13) would result in a multi-term fractional-order expression for the DO definition given above. Employing discrete strength functions in the above equation, the stress and its temporal derivatives (of real order, not necessarily integer) can be recast in terms of strain and its (real-order) temporal derivatives as follows [223],

$$\sum\_{n=0}^{N} a\_n \left[ {}\_0 D\_t^{a\_n} \sigma \right] = \sum\_{m=0}^{M} b\_m \left[ {}\_0 D\_t^{\mathcal{G}\_m} \epsilon \right], \quad t > 0 \tag{14}$$

where the fractional-orders are assumed to satisfy: 0 ≤ *α*<sup>0</sup> < *α*1... < *α<sup>N</sup>* < 1, 0 ≤ *β*<sup>0</sup> < *β*1... < *β<sup>M</sup>* < 1. The constants *a* and *b* can be interpreted to be relaxation times for the viscoelastic solid. As demonstrated in [223], the above-presented multi-term model is effective in modeling both stress relaxation and creep response in viscoelastic structures. The integral constitutive relation given in Equation (13) can be interpreted as the continuum limit of the discrete multi-term constitutive relation given in Equation (14). This is also illustrated in Figure 4b, which depicts the DO integral model as the continuum limit of the discrete model in Figure 4a.

**Figure 4.** Examples illustrating the different DO models of viscoelasticity along with their respective constitutive relations. It appears that DO operators can model multiple viscoelastic elements within the same general formulation. Dashpots characterized by material constants *η* and order *α* indicate the individual viscoelastic elements. Schematic illustration of (**a**) the multi-term DO viscoelastic model, (**b**) the generalized DO model depicted as an infinite ensemble of elements with *α<sup>i</sup>* ∈ (0, 1] such that Span {*αi*} is (0, 1], and (**c**) the generalized temperature field-dependent VO definition for the DO viscoelastic model.

#### 4.1.1. DO Integral Models

All existing models catering to different lossy materials can be recast into the DO form in Equation (13) (or equivalently, Equation (14)) by considering different choices for order-distribution functions. In other words, each of the several distinct classifications of the viscoelastic solids proposed by Caputo and Mainardi [224] based on the creep and relaxation moduli relations, can be described by the single DO constitutive law via suitable choices of the fractional-order constitutive parameters. This highlights the relevance of DO operators and their scope in modeling viscoelastic constitutive relations when compared with other more classical integer—and fractional—(CO or VO) models available in the literature. To better illustrate this, consider the following two cases: case I: *φσ* = *δ*(*γ*), *φ* = *τ<sup>α</sup>* <sup>0</sup> , and case II: *φσ* = *<sup>τ</sup><sup>α</sup> <sup>σ</sup>* , *φ* = *τ<sup>α</sup>*  , *τσ* < *τ*, *τ* being a material constant. These two choices for the integral forms of the DO constitutive relation are commonly used in modeling viscoelastic solids [43,225–227]. Depending on the choice of the strength functions, Equation (13) can successfully characterize both fluid-like and solid-like viscoelastic materials. Remarkably, salient mechanical characteristics of the viscoelastic materials modeled by these choices, such as the creep and stress relaxation functions, exhibit the experimentally observed power-law attenuation [228].

#### 4.1.2. Multi-Term Fractional Models

Compared to integral models, the discrete multi-term approach has been more widely used for the modeling of viscoelastic constitutive relations. This is a direct consequence of the simplicity with which discrete models could be modified in order to account for different lossy behaviors observed in real materials. The discrete form also facilitates a direct comparison between the viscoelastic behavior captured by DO models with respect to the more traditional and established integer-order models. This enables a better understanding of the physical relevance of DO models and it also allows a more natural approach to material characterization. The following instances of the different viscoelastic models that can be recovered from the multi-term DO law in Equation (14) further illustrate the strength of the DO approach:


In the above discussion, {*a*, *b*, *c*} denote different material constants corresponding to different relaxation times and {*α*, *β*, *η*} are the fractional-orders associated with different lossy behaviors of the DO model (see Equation (14)). In conclusion, we note that the multiterm fractional model is highly general and offers much flexibility in modeling different types of lossy behavior in viscoelastic solids. This is unlike CO or VO approaches that require separate models to capture these different behaviors.

#### *4.2. Material Characterization: Methods and Experiments*

It is clear from the discussion in Section 4.1 that several possibilities for the viscoelastic constitutive theories exist, considering suitable choices for the DO model parameters. Before proceeding to review the application of these DO theories to the characterization of viscoelastic materials, we make an important remark. Note that the application of these DO theories to real-world viscoelastic problems requires that these models are physically as well as mathematically consistent. To ensure consistency of the DO viscoelastic theories, there exist restrictions on the choice of the fractional model parameters which are derived in accordance with the principles of (1) time invariance, (2) causality, and (3) thermodynamics (dissipation inequality given by the Clausius–Duhem inequality) [49]. The conditions over the strength distribution functions *φσ* and *φ*, corresponding to the integral definition of the DO law given in Equation (13), are available in [222]. For instance, the thermodynamic law restricts the choice of DO constitutive parameters for the fluid-like viscoelastic materials, discussed in Section 4.1.1, as follows, *τ*<sup>0</sup> > 0. An analogous study conducted on the discrete form of the DO constitutive law (see Equation (14)) identified the restrictions on relevant constitutive parameters [223]. The investigations conducted in the aforementioned studies were further extended in [53] which analyzed the physical as well as mathematical consistency of the generalized DO model of viscoelasticity. In this regard, note that mathematical consistency ensures the existence and uniqueness of a linear viscoelastic response corresponding to the generalized DO formulation. The framework developed in [53] provides the foundation for a rigorous and consistent application of DOFC to modeling the response of viscoelastic solids.

The discussion in Section 4.1 highlighted the ability of DO operators to capture multiple scales of relaxation time and thereby different lossy behaviors observed in real materials [220]. For this purpose, the constitutive parameters of the DO constitutive model in Equation (13) that require to be identified are the fractional-order parameters and their numerical range. Initial investigations [82,220] laid a theoretical foundation for this fractional-order system identification problem. Further experiments on the characterization of viscoelastic properties corresponding to the different class of DO models for commercial polymers are reported in [238]. Such studies were carried out by matching the experimental profiles of the loss and storage moduli for viscoelastic materials [53]. Recall from Section 4.1.2 the relevance of DO operators in modeling multiple forms of viscoelastic behavior. This feature of the DO constitutive models for viscoelastic elements presents an interesting opportunity. To better illustrate this aspect, consider the multi-term DO models depicted in Figure 4a as the sum of several independent viscoelastic connectors with their associated relaxation timescales. This type of arrangement allows incorporating multiple timescales within a single DO model in order to design an optimized fractional damper. The incorporation of multiple timescales (using the DO derivative) can also be visualized from the DO derivative of the Heaviside step function in Figure 3. The relaxation time of the viscoelastic damper can be tuned by an appropriate choice of the constituent CO derivatives and their associated weights within the definition of DO derivative. This approach presents an opportunity to identify the damper that can deliver a desired behavior in terms of overshoot, peak time, and integrated tracking error [239]. This feature is unlike the classical integer-order or CO constitutive theories that allow only a single type of lossy behavior to be captured with a given model.

#### *4.3. Distributed-Variable-Order Models*

The above discussion presented an overview of the applications that DO models, based on CO derivatives, enable in the general area of viscoelastic solids. A few studies have also explored the extension of these models to employ DO operators based on VO derivatives; here below referred to as distributed-variable-order (DVO) operators. Lorenzo and Hartley presented one of the first works exploring the combination of both VO and DO operators to the formulation of the stress–strain constitutive law of viscoelastic solids [45]. They discussed how a DVO operator defined using a spatially-dependent VO law could be used to model the response of a thermorheologically complex material subject to a spatially and temporally varying temperature field. By choosing a spatiallydependent VO law, the resulting DVO model is capable of describing the spatial variation of the viscoelastic properties. The spatial variation of viscoelastic properties can be the result of a combination of internal as well as external conditions such as, for example, varying microstructure, presence of thermal loads, and a distribution of thermal gradients. We merely note that, very recently, this concept of defining a spatially-dependent VO law was used to model nonlocal solids with spatially varying microstructure in [240]. Further, an example of the temperature-dependent DVO viscoelastic model is illustrated in Figure 4c. In this case, the DVO model is required to introduce the effect of a spatially varying temperature field *T*(*x*, *t*) on the multiple timescales present within the DO model for viscoelasticity. This allows an accurate representation of the transient viscoelastic response [220]. It is important to mention that, unlike the DO models employing CO derivatives, the thermodynamic basis for the DVO models still remains to be ascertained.

#### *4.4. Some Practical Applications*

The DO constitutive models have been successfully applied in the analysis of viscoelastic solids. Recall that the different DO constitutive models can be classified primarily into two classes: (1) integral-models and (2) multi-term models, corresponding to the choice of DO derivative. Further, within each of these classes, further subdivisions exist depending on the specific functions chosen for (a) weights of the order-distribution functions and (b) bounds of the fractional-order *α*. Here, we shall present some prominent examples studied in literature that cater to a specific class of viscoelastic solids. These studies include finite solids with appropriate boundary conditions, and also the infinite solids.

Some examples of the constitutive parameters within DO integral models in Equation (13) were discussed previously in Section 4.1.1. Employing specific choices of the constitutive parameters, successful modeling of the creep response [225] and stressrelaxation [226] in finite solids is possible. Further, these integral models find relevance in modeling the vibration of fractional DO oscillators [227]. Patnaik and Semperlotti [168] demonstrated a successful application of DO viscoelastic models in the analysis of nonlinear oscillators with distributed nonlinear properties. In this study, the effect of the order-distribution on the phase and frequency response was captured analytically using asymptotic techniques and some important characteristics, such as simultaneous phase and amplitude modulation (that is not seen in integer-order models) were presented. Recently, the scope of DO constitutive models is also being explored to describe viscoelasticity within complex materials like composites [43].

These studies can also be extended to modeling and analyzing the damping of the structural response. DO models can be utilized to derive moment–curvature relations of viscoelastic rods [241–243]. The DO constitutive relation between moment (*M*) and curvature (*κ*) for the viscoelastic rod is given by

$$\int\_0^1 \phi\_{\overline{\mathcal{M}}}(\gamma) {}\_0 D\_t^{\gamma} \overline{\mathcal{M}} \mathrm{d}\gamma = \int\_0^1 \phi\_{\overline{\pi}}(\gamma) {}\_0 D\_t^{\gamma} \overline{\pi} \mathrm{d}\gamma \tag{15}$$

In this equation, the choice of *φ<sup>M</sup>* = *δ*(*γ*) and *φκ* = *EIδ*(*γ*) (*EI* is the bending modulus) reduces the above expression to the classical Euler–Bernoulli beam theory. The solution to the above DODE would reflect the influence of viscoelastic damping over the bending response of beams. Similar exercises can be conducted over more complex shapes with the help of advanced numerical techniques discussed in Section 2.4.

Employing the multi-term definition of the DO constitutive relations, the DO moment– curvature relations can be revisited for different classes of viscoelastic solids. For instance, DO bending relations analogous to the generalized Zener model were derived to study the dynamics of a viscoelastic rod in [243,244]. Similarly, the lateral vibration of a viscoelastic rod modeled according to the generalized Kelvin-Voigt behavior was studied in [229]. The choice of *φ<sup>M</sup>* = *δ*(*γ*) + *aδ*(*γ* − *α*) and *φκ* = *EI*(*δ*(*γ*) + *bδ*(*γ* − *α*) + *cδ*(*γ* − *β*)), which is a generalization of the standard Zener model, was proposed in [235] and used in [245] to study the lateral vibrations of viscoelastic rod. DO models were also used to analyze the influence of viscoelastic foundations on the dynamic stability of local and nonlocal rods [246]. Similarly, Varghaei et al. [247] investigated the nonlinear vibration of viscoelastic beams using a generalized Kelvin–Voigt model implemented via DO derivatives. Finally, Duan and Chen [248] investigated oscillatory shear flow between two parallel plates using DO form of the constitutive law for for viscoelastic fluids. Different effects of viscoelasticity over the structural response can be realized thanks to the generality of the DO models of viscoelasticity by employing specified choices for constitutive parameters. For instance, different viscoelastic constitutive models were employed in a study over the damping influence on the propagation of an initial Dirac delta disturbance through an infinite media. This provides the necessary foundation for designing an optimized damper as in [239].

#### **5. Applications to Transport Processes**

Several experimental investigations have shown that transport processes in many classes of materials are often characterized by anomalous mechanisms exhibiting either memory effects over various temporal scales or nonlocal effects over several spatial scales [249–251]. A direct consequence of this, as an instance, is a loss of the scaling invariance (CO or VO) noted in classical transport processes. Consequently, such processes cannot be modeled by using CO (integer or fractional) or even VO differential equations, as CO and VO diffusion equations lead to self-similar probability densities with a characteristic displacement exhibiting spatio-temporal scaling. The loss of the spatio-temporal scaling is a direct result of the presence of a spectrum of temporal or spatial scales in the transport process. The presence of several temporal scales, as an example, can be the result of the presence of a mixture of delay sources of variable strength [252] while the presence of distributed spatial scales can occur in transport through multifractal materials [211,215,253] (see Figure 5). Real-world examples of such complex transport processes include applications in geophysical and atmospheric phenomena [254–257], financial markets [258], turbulence [259], and even biology and medicine [211]. As discussed in Section 3, DODEs are very well suited to model such non-scaling anomalous transport processes exhibiting effects over multiple temporal and/or spatial scales.

**Figure 5.** (**a**) Underground aquifers contain heterogenous layers of soils where each layer is characterized by a different level of porosity. The diffusion of groundwater through this multifractal media can be better described by DO operators, by replicating (mathematically) the parallel action of the different porous media in the order-distribution (see Section 5.3). Additional examples of multifractal systems where transport processes are better described via DO operators: (**b**) the diffusion of ions in neuronal dendrites [211], (**c**) the diffusion of pigments to form patterns in animals (see Section 5.2), and (**d**) turbulent flows. The subfigures (**a**–**d**) are taken from Wikipedia.

From a thorough review of the literature it appears that anomalous diffusion, among other types of anomalous transport processes, has seen the maximum applications of DOFC. Therefore, we start by reviewing the application of DO models to complex diffusive transport processes, and then move on to other processes including reaction–diffusion, advection–diffusion, and hybrid propagation. In an effort to keep this review contained and focused on the main applications of DOFC to physical modeling, we present the key aspects and mathematical characteristics of the use of DODE in the modeling of transport processes. The interested reader can find extensive mathematical details on the implementation of DO transport models in [54].

#### *5.1. Anomalous Diffusion Processes*

As highlighted previously, diffusion processes in several classes of media exhibit strong anomalies wherein the mean square displacement (MSD) is not characterized by a definite (or unique) scaling exponent, [260–263]. As an example, the MSD in several systems grows as a power of the logarithm of time (*strong anomaly*) and shares the interesting property that the probability distribution of the particle's position at long times is a doublesided exponential [261–264]. More specifically, the MSD varies as

$$
\langle \mathbf{x}^2(t) \rangle \approx \log^\nu t \tag{16}
$$

where *ν* is a positive constant. These diffusion processes are indicated as ultraslow diffusion (or, sometimes, superslow diffusion) processes and they do not conform to self-affine random processes. The most commonly referred example of such a strong anomalous diffusion process is the Sinai diffusion (*ν* = 4) in which the particle moves in a quenched random force field [265]. Additional examples of such ultraslow diffusion behavior include polymer physics [266], numerical experiments on an area-preserving parabolic map on a cylinder [267], motion in aperiodic environments [268], and in a family of iterated maps [269]. We highlight that, apart from ultraslow diffusion, there exist other strong anomalies including retarding subdiffusion and accelerating subdiffusion, as well as retarding superdiffusion and accelerating superdiffusion. The specific form of the DO governing equation suitable to model either phenomena depends entirely on two factors: (1) the use of time and/or space-fractional DO derivatives, and (2) support of the strength function corresponding to the time- and/or space-fractional DO derivative. In the following, we will review the different modeling possibilities arising from combinations of the aforementioned factors.

In a series of seminal studies, Chechkin et al. [261,270,271] developed a DO framework for strongly anomalous diffusion mechanisms. They considered the time-fractional DO diffusion equation:

$$\int\_0^1 \pi^{\oint \cdots} \phi(\beta) D\_t^{\beta} c(t, \mathbf{x}) \, \mathbf{d}\beta = \mathbb{T} D\_{\mathbf{x}}^2 c(t, \mathbf{x}) \tag{17}$$

where *c*(*t*, *x*) denotes the particle concentration, and D denotes the diffusion coefficient. *τ* is a positive constant representing a characteristic time of the problem, and the strength function was chosen as *φ*(*β*) = *νβν*−1. The normalization condition for *φ*(*β*) on [0, 1], i.e., <sup>1</sup> <sup>0</sup> *φ*(*β*)d*β* = 1 assumes *v* > 0. As established in [261], this choice of *φ*(*β*) leads to ultraslow kinetics. More specifically, for the above mathematical setup, the MSD is obtained as

$$
\langle \mathbf{x}^2(t) \rangle \approx \begin{cases}
\frac{2\mathsf{D}}{\mathsf{V}} t \log(\mathsf{r}/t) & t/\mathsf{r} \ll 1 \\
\frac{2\mathsf{D}}{\mathsf{V}(1+\nu)} \mathsf{r} \log^\nu(t/\mathsf{r}) & t/\mathsf{r} \gg 1
\end{cases} \tag{18}
$$

As evident, strong diffusion anomalies are described within the above DO diffusion formalism. In fact, it appears that the DODE in Equation (17) describes a subdiffusion random process which is subordinate to the Wiener process with a diffusion exponent decreasing in time (*retarding subdiffusion*). The same behavior was further highlighted by demonstrating that the modes of the solution, obtained via separation of variables, show an ultraslow, logarithmic, decay pattern. The waiting times (*ψ*(*t*)) of the diffusing particles corresponding to this setup are [271]

$$
\psi(t) \propto \frac{1}{t[\log(t/\tau)]^{1+\nu}} \tag{19}
$$

and they do not have finite moments. Clearly, the DO diffusion equation can be interpreted as a limit of the continuous time random walk (CTRW) model with an extremely broad waiting-time probability density function (PDF), so that there are no finite moments [271].

We highlight that several authors have also analyzed the diffusion characteristics obtained via discrete order distributions [272–274] as well as a uniform strength distribution [261,272–274]. For the discrete time-fractional DO with *φ*(*β*) = *φ*1*δ*(*β* − *β*1) + *φ*2*δ*(*β* − *β*2) (0 < *β*<sup>1</sup> < *β*<sup>2</sup> ≤ 1, *φ*<sup>1</sup> > 0, *φ*<sup>2</sup> > 0, and *φ*<sup>1</sup> + *φ*<sup>2</sup> = 1), the characteristic displacement grows initially as *tβ*<sup>2</sup> , whereas at large times it grows as *tβ*<sup>1</sup> indicating slow yet power-law growing diffusion. For the uniform strength function, that is *φ*(*β*) = 1, the MSD is given as

$$
\langle \chi^2(t) \rangle \approx \begin{cases}
2\overline{\mathbb{D}} t \log(\mathfrak{r}/t) & t/\mathfrak{r} < \mathfrak{r} < 1 \\
2\overline{\mathbb{D}} \tau \log\left(t/\mathfrak{r}\right) & t/\mathfrak{r} > \mathfrak{r} \end{cases} \tag{20}
$$

It appears that the DODE with the uniform strength function leads to slightly anomalous superdiffusion at small times, and to ultraslow diffusion at large times.

Another example of strongly anomalous diffusion processes corresponds to accelerating superdiffusion wherein the MSD, similar to ultraslow diffusion, does not exhibit a unique spatio-temporal scaling. In this class of diffusion processes, the diffusion exponent increases with time. Such processes are characterized using the following space-fractional diffusion equation [261],

$$D\_t^1 c(\mathbf{x}, t) = \int\_{0^+}^2 l^{a-2} \overline{\mathbb{D}} \, \Phi(\alpha) D\_x^a c(\mathbf{x}, t) \, \mathrm{d}\alpha \tag{21}$$

where *l* is dimensional positive constant. In [261], the authors obtained the MSD behavior by considering a two-term space-fractional diffusion equation, that is by choosing the strength function to be Φ(*α*) = Φ1*δ*(*α* − *α*1) + Φ2*δ*(*α* − *α*2) with 0 < *α*<sup>1</sup> < *α*<sup>2</sup> ≤ 2. For this DO diffusion equation, it was shown that at small times the characteristic displacement grows as *t* 1/*α*<sup>2</sup> , whereas at large times it grows as *t* 1/*α*<sup>1</sup> ; clearly exhibiting superdiffusion with acceleration. The fundamental solutions for this discrete order distribution can be found in [275]. Exact solutions for a triple-order discrete distribution can be found in [276]. Random walk models corresponding to the space-fractional DO diffusion equation are presented in [275,277].

Notably, independently of the specific nature of the DODE (space-fractional or timefractional) as well as of the strength function, the DO diffusion model no longer exhibits self-similarity or scale invariance. This is a direct result of the fact that the DO derivative modifies the constant- or even variable-order formulation, by integrating all possible orders over a certain range. The resulting solutions exhibit memory and/or nonlocal effects over several temporal and/or spatial scales leading to strong anomalities.

Building upon the time- and space-fractional DO diffusion models presented in Equations (17) and (21), several authors [278–280] developed DO diffusion models that lead to accelerating subdiffusion and retarding superdiffusion contrary to retarding subdiffusion and accelerating superdiffusion obtained via Equations (17) and (21), respectively. These DO diffusion models are given as [278–280]

$$D\_t^1 c(\mathbf{x}, t) = \int\_0^1 \phi(\beta) \mathbb{D} D\_t^{1 - \beta} \left[ D\_x^2 c(\mathbf{x}, t) \right] d\beta \tag{22a}$$

$$\int\_0^2 \phi(\alpha) l^{2-\kappa} D\_{|\mathbf{x}|}^{2-\kappa} \left[ D\_t^1 c(\mathbf{x}, t) \right] d\mathbf{a} = \overline{\mathbb{D}} D\_{\mathbf{x}}^2 c(\mathbf{x}, t) \tag{22b}$$

A direct comparison of the above equations with Equations (17) and (21) indicates an exchange in the presence of the time- and space-fractional DO derivatives, resulting in a class of mixed spatio-temporal DO derivatives. The detailed expressions of the MSD of the particles described via the above equations can be found in [278–280]. The MSD obtained via these formulations indicates that the anomalous diffusion phenomena described via Equation (22a) and Equation (22b) exhibit accelerating subdiffusion and retarding superdiffusion, respectively; that is, they become less anomalous in the course of time. Additional details on these anomalous behaviors are provided in the following. The DO time-fractional diffusion equation (Equation (22a)) describes a subdiffusion process which becomes less subdiffusive or, in other words, more classical in the course of time. The MSD demonstrates the occurrence of a transition from a growth characterized by a smaller exponent to a growth with a larger exponent. Equivalently, the probability density for a particle to remain around the origin exhibits a transition from slow to a faster decay. We highlight here that the fundamental solution for a discrete form of the Equation (22a), considering an infinite domain, can be found in [281]. The DO space fractional diffusion equation (Equation (22b)) describes power-law truncated Lévy flights, that is, a random process showing a slow convergence to a Gaussian, but exhibiting Lévy-like behavior at short times. This behavior manifests itself in the non-Gaussian Lévy scaling of the probability density to stay at the origin and in superdiffusive behavior. At short times, the central part of the PDF has a Lévy-stable shape, whereas the asymptotics decay with the power-law, faster than the decay of the Lévy-stable law. At long times, the central part of the PDF approaches the classical Gaussian shape, however, the asymptotics decay with the same power-law.

In addition to the above studies, several researchers have demonstrated the suitability of DOFC for modeling strongly anomalous diffusion behavior, particularly ultraslow diffusion, via stochastic descriptions [215,282–287]. Meerschaert et al. [282,288] developed a stochastic model based on random walks with a random waiting time between jumps. Scaling limits of these random walks are subordinated random processes whose density functions solve the DO ultraslow diffusion equation. Ultraslow diffusion has also been modeled using Langevin stochastic representations in [217,253,284,289]. As shown in [284], the solutions of DO Langevin equations have MSDs which describe retarding subdiffusion and ultraslow diffusion with logarithmic growth. Ultraslow diffusion is also obtained via the wait-first and jump-first Lévy walk models, which underlie the fractional dynamics

involving DO material derivatives [289]. The approach in [289] is based on a strongly coupled CTRW, with the distribution of waiting times displaying ultraslow (logarithmic) decay of the tails. Similarly, the authors of [283,285] obtained the space-fractional DO diffusion formulation as the continuum limit of a random process which is characterized by the presence of a distribution of spatially-dependent jumping rate and the Lévy distributed jumping size. As described in [283,285], such a system is well suited to describe diffusion in multifractal systems which do not possess a unique Hurst exponent and, consequently, exhibit a lack of scaling. The lack of scaling in multifractals requires a generalization of stochastic Lévy equation by admitting a spectrum of the Lévy index. The continuum limit of this stochastic equation is the DO diffusion equation. A detailed mathematical analysis of the Lévy models is presented in [286] and a Lévy mixing based probabilistic interpretation of the DO diffusion model is presented. The characteristics of the model are exemplified by a direct application to slow diffusion, particularly the delayed Brownian motion. A similar stochastic representation, given in the form of the Brownian motion subordinated by a Lévy process was to model accelerating subdiffusion in [290]. Additionally, the authors of [290] also constructed an algorithm for computer simulations of accelerating subdiffusion paths via Monte Carlo methods.

Before proceeding further, we briefly review the contributions that several researchers made to the different mathematical aspects of the DO diffusion equations. Exact solutions corresponding to Dirichlet, Neumann, and Cauchy boundary conditions for the timefractional DO diffusion Equation (17) can be found in [291]. The fundamental solution of the DODE corresponding to a uniform strength distribution can be found in [272–274]. Mainardi et al. [292] obtained the fundamental solution of the time-fractional DO diffusion equation based on its Mellin–Barnes integral representation. They also presented a series expansion of the fundamental solution that clearly highlights, within the solution, the presence of several time-scales related to the distribution of the fractional-orders in the DO diffusion equation. Asymptotic solutions to initial and boundary value problems based on the DO time-fractional diffusion equations can be found in [293,294]. Some additional and important mathematical aspects, such as the existence of the solution to different types of DO diffusion equations, the solvability of DO diffusion equations, subordination properties, and positivity of the solution were addressed in [59,63,263,287,295–300]. In a series of papers [71,72,301], Luchko analyzed the well-posedness of the DO formulation via maximal principles, and obtained *a priori* norm estimates for solutions to both linear and nonlinear DO diffusion equations. Luchko has also provided a survey of these maximal principles in [302]. Further, the well-posedness of the inverse problem, that is the determination of the strength distribution of the DO and its support, has been analyzed in detail in [303–307]. The analysis of the well-posedness of the inverse problem is highly essential to promote applications of DOFC since it determines whether the DO framework is suited to model a given real-world application. In other terms, given a set of experimental or real-world data, the analysis of the inverse problem determines whether DOFC is well suited to model the dataset and hence, it also indicates if the corresponding system exhibits multiscale (temporal and/or spatial) characteristics.

The remarkable properties of the DO diffusion formalism provided a strong foundation for the development of other DO transport formulations: DO reaction–diffusion, DO advection–diffusion, and DO wave propagation. Before reviewing these other applications, we briefly overview some recent, yet remarkable, real-world applications of the DO diffusion formulation (see Figure 5). Grain boundary diffusion in engineering materials at elevated temperatures, that often determines the evolution of microstructure, phase transformations, and certain regimes of plastic deformation and fracture, was modeled via a DO diffusion framework in [308]. DO diffusion equations have also been used to model the diffusion of mobile ions in different electrolytic cells [309–311]. The predictions of the DO model closely matched experimental data which indicated the presence of different

diffusive regimes. A similar application was presented in [312], where DO operators were introduced into the Letokhov model of photon diffusion to model non-resonant random lasers. Very recently, the effect of disordering of nanotubes within an electrode, on the impedance of a supercapacitor, was modeled using the DO subdiffusion model in [313]. All these applications highlighted the ability of the DO diffusion formulation to accurately capture highly anomalous diffusion behavior arising out of the presence of multiple temporal and/or spatial scales.

#### *5.2. Reaction–Diffusion Processes*

An interesting application of DOFC involves the modeling of reaction–diffusion systems. Reaction–diffusion processes describe changes in the concentration of interacting chemical substances both in space and time. Reaction–diffusion processes have been linked to the formation of spots and patterns in different animals and birds [314,315], among many other real-world applications [125,316] (see Figure 5c). Distributed-order derivatives help to account for the heterogeneity and multifractal nature of the diffusing medium, typical of these applications. More importantly, the DO derivatives also account for the multiple sources of the reacting chemicals within the heterogeneous system. This allows for compact yet more comprehensive theoretical formulations of the reaction–diffusion mechanisms when compared to classical integer-order based approaches. Several authors have analyzed complex reaction–diffusion systems using DO derivatives [102,129,149,316,317]. Detailed mathematical formulations along with closed form solutions for DO reaction– diffusion equations can be found in [316,318]. The effect of different strength functions as well as the specific nature of the DO reaction–diffusion equation was analyzed numerically in [102,129,149]. Very recently, Guo et al. [148] analyzed a 3D Gordon-type reaction–diffusion model of colliding and diffusing Gordon-type solitons. The numerical results provided a deeper understanding of the complicated nonlinear behavior of the 3D Gordon-type solitons system while highlighting the remarkable capabilities of the DO derivatives in capturing the collision and diffusion of the solitons.

#### *5.3. Advection-Diffusion Processes*

The VO diffusion equation formed the basis of several interesting investigations involving strongly anomalous advection-diffusion processes in complex systems, particularly those related to hydrology such as, for example, geomigration [319], transport of solutes in heterogeneous media [257,320], the spread of contaminants in groundwater [321], as well as groundwater flow [322]. Indeed, several theoretical and experimental studies have shown that the transport of fluids and pollutants through geological aquifers exhibits the presence of multiple spatio-temporal scales arising from the multifractal nature of the aquifers. The multifractality is a direct consequence of the porous, fractured, layered, and heterogeneous nature of the aquifers (see Figure 5a). The underlying distinctive characteristics of DOFC make it a very well suited modeling approach for the aforementioned anomalous transport phenomena experienced in hydrology.

The detailed mathematical analysis of a DO advection-diffusion equation with a discrete distribution of orders was presented in [77]. Analytical solutions were obtained in [77] for a time- and space-fractional formulation and some interesting derivations including the spectral representation of the fractional Laplacian operator were presented. Later, several researchers used DOFC to model advection–diffusion in complex problems, particularly those related to hydrology. A DO advection–diffusion model was proposed in [256] to model infiltration, absorption, and water exchange in mobile and immobile zones of swelling soils. A similar formulation was adopted in [319] to model a geomigration process in a geoporous medium saturated with a salt solution that exhibits subdiffusive characteristics. Several researchers also used DOFC to model subdiffusive characteristics observed in the transportation of solutes in heterogeneous porous media [257,320,323]. Very recently, an interesting application of DOFC was proposed to simulate superdiffusion of dissolved phase contaminants in groundwater [321]. In this study, several insights including the specific impact of different geometric properties of the contaminants on their spatial distribution pattern, were derived using the DO advection-diffusion model.

#### *5.4. Wave Propagation*

Several authors investigated DO models for wave propagation by directly extending the DO diffusion approaches reviewed in Section 5.1. More specifically, this process involved altering the support of the strength function corresponding to the DO time-fractional derivative from [0, 1] to an interval within [1, 2]. The most generalized versions of the onedimensional DO wave equation can be obtained by modifying Equations (17) and (21) as

$$\int\_{1}^{2} \pi^{6-1} \phi(\beta) D\_t^{\beta} u(t, x) \,\mathrm{d}\beta = E\_0 D\_x^2 u(t, x) \tag{23a}$$

$$D\_t^2 u(\mathbf{x}, t) = \int\_{0^+}^2 l^{a-2} E\_0 \Phi(\alpha) D\_\mathbf{x}^a u(\mathbf{x}, t) \, d\alpha \tag{23b}$$

where *u*(*x*, *t*) denotes the particle displacement and *E*<sup>0</sup> denotes a material constant. A different set of DO wave equations can be obtained by modifying the support of the strength function and using mixed spatio-temporal DO derivatives, similar to Equations (22a) and (22b). The qualitative discussions on the application of DO models for multifractal systems, presented for other types of transport processes reviewed in this Section 5, also holds for DO wave propagation. As an example, the propagation of elastic waves through dissipative media exhibiting multifractal viscoelastic behavior (see Section 4) is described via timefractional DO models [221,324]. Similarly, elastic wave propagation via attenuating media characterized by simultaneous microstructural and nonlocal (hence, multiscale) effects can be described via space-fractional DO models [213]. Important mathematical aspects such as the existence and uniqueness of the solution to the DO time-fractional wave equation have been outlined in detail in [63,325–327]. Additionally, the fundamental solutions of the DO wave equation have been derived in [298,325,327,328] using the technique of the Fourier and Laplace transforms. Numerical experiments highlighting the specific effects of the DO model parameters have been used to derive interesting insights into the DO wave equation in [298,325,328].

Another possible route to develop the DO wave propagation formulation consists in formulating DO stress–strain constitutive relations within the classical elastodynamic problem as proposed in [324,329]:

$$
\sigma = E\_0 \int\_0^1 \phi(\beta) D\_t^\beta \varepsilon \mathbf{d} \beta \tag{24}
$$

This approach resembles the formulation of DO viscoelastic models (see Section 4) and indeed leads to a hybrid propagation model that also captures dissipation. The DO wave propagation model was then used to simulate the interaction of compressional waves with an interface separating two dissimilar media. Further, the impact of the support and definition of the strength function were analyzed on the wave scattering at the interface.

#### **6. Applications to Control Theory**

In this section, we analyze the applications of DOFC to control theory. The foundation as well as motivation for the application of DOFC to control theory follows from a successful application of COFC to model complex control phenomena. The use of CO fractional controllers has enabled robust control and helped achieving highly desirable dynamic control characteristics. A detailed review of theory and applications of COFC in control theory can be found in [36]. In this regard, recall that a fractional derivative

implicitly embeds within itself time-delays, or in other terms, it accounts for the memory of past events. Consequently, the presence of a distribution of fractional-order derivatives translates, physically, to the presence of a mixture of delay sources (similar to what is discussed in Section 5). These DO characteristics have helped achieve high performance controllers with several applications ranging from secure messaging [330], to control of motors [331,332] as well accurate frameworks to model robust stability of gene regulatory networks [332]. Broadly speaking, the applications of DOFC to control theory can be divided into two categories: (1) the development of DO controllers and (2) study of the stability and control of DO systems; the majority of the studies being focused on the latter category. In the following, we first review the DO controllers and their applications, before considering their stability. A few other studies have numerically analyzed various DO system identification techniques [220,333] and DO optimal control problems [100,334]. However, the basic DO control theory employed in the latter studies are derived from the two broad categories mentioned above.

#### *6.1. DO Controllers and Filters*

Several theoretical and experimental studies have shown that fractional-order designs can enhance both the flexibility and robustness of the controllers as a result of the additional parameters represented by the fractional-orders themselves. Tuning of the fractional-orders allows for superior control characteristics. As an example, consider the CO PID controller PI*λ*D*μ*. The value of the order *λ* in PI*λ*D*<sup>μ</sup>* control affects the slope of the low frequency range of the system as well as the peak value of the system. On the other hand, the value of the order *μ* affects the accuracy of the dynamic closed-loop response, the system overshoot, and the stability. For a more detailed discussion of the roles of *λ* and *μ*, the interested reader is referred to the work in [36]. It is immediate that a distribution of several CO controllers can lead to highly accurate and robust control. In fact, DOFC allows the development of a highly generalized controller from which all other types of controllers (such as, for example, the classical integrator and differentiator, the classical PID, and the fractional PI*λ*D*μ*) can be recovered.

In the most general form, the transfer function of a DO controller can be expressed as [36]

$$G(\mathbf{s}) = \int\_{\beta\_1}^{\beta\_2} \phi(\beta) \frac{1}{\mathbf{s}^{\beta}} \mathbf{d}\beta \tag{25}$$

where *s* is a complex variable. The interval [*β*1, *β*2] dictates the specific nature of the controller. Note that a DO low-pass filter can be obtained from the DO controller via the transformation *s* → *T*(*β*)*s* + 1 [335]. The above formulation is highly general in the sense that all the classical, CO, and DO controllers can be recovered from the same by an appropriate choice of the strength function. As an example, the classical integrator can be obtained by choosing *φ*(*β*) = *δ*(*β* − 1), the classical differentiator can be obtained from *φ*(*β*) = *δ*(*β* + 1), the classical PID from *φ*(*β*) = *kPδ*(*β*) + *kIδ*(*β* − 1) + *kDδ*(*β* + 1) (*kP*, *kI* and *kD* are constants to be tuned), the fractional PID from *φ*(*β*) = *kPδ*(*β*) + *kIδ*(*β* − *λ*) + *kDδ*(*β* + *μ*), and so on. It is immediate to see that a DO PID controller can be also obtained directly from the controller in Equation (25), by insisting that the support of the weight function lies within the interval [−1, 1]. DO PID controllers have been studied in detail in a series of papers by Jakovljevi´c et al. [336–338]. Note that in the case of a DO controller, the strength function in Equation (25) can have infinite support. In fact, as established in [339], any DO controller can be developed by appropriate composition of the DO integrator (0 ≤ *β*<sup>1</sup> < *β*<sup>2</sup> ≤ 1), the classical integrator (1/*s*) and the classical differentiator (*s*). The different DO controllers have been schematically illustrated in Figure 6.

The impulse response and asymptotic behavior of the DO controllers have been derived in [335,340]. Additionally, a physical realization of the DO integrator using a series of capacitors has been developed in [210,340]. The DO controllers have been applied to control motors [338] and robots [331] among many other applications [36]. As observed in these studies, the DO controllers reduce the maximum overshoot while guaranteeing a fast dynamic response and a zero steady-state error [36,336–338]. Furthermore, the phase curves of DO PID controllers are non-constant and much wider than the corresponding CO controllers making them more robust to system uncertainties [331]. Therefore, the DO controllers exhibit unique frequency response characteristics, and provide highly robust and accurate control.

**Figure 6.** Block diagram illustrating the feedback DO controller based on Equation (25). The fractionalorders *μk*, *λ<sup>k</sup>* ∈ (0, 1]. This is a highly general controller from which all classical, CO, and DO controllers, as well as the DO PID controller can be recovered by an appropriate choice of the controller constants. As an example, the DO differentiator can be obtained by setting *Kλ<sup>k</sup> <sup>I</sup>* = 0, *KP* <sup>=</sup> 0, and *<sup>K</sup>μ<sup>k</sup> <sup>D</sup>* = 0. As evident, the DO differentiator consists of a network of CO differentiators. Similarly, the DO PID controller would require that *KP* <sup>=</sup> 0, *<sup>K</sup>μ<sup>k</sup> <sup>D</sup>* <sup>=</sup> 0 and *<sup>K</sup>λ<sup>k</sup> <sup>I</sup>* = 0.

#### *6.2. Stability and Control of DO Systems*

The development of robust and accurate DO controllers prompted several researchers to analyze the stability of both linear and nonlinear DO dynamical systems. Most of the studies conducted on linear systems correspond to the bounded-input bounded-output (BIBO) stability analysis of DO linear time-invariant (LTI) systems. On the other hand, the nonlinear studies have focused primarily on the Lyapunov stability of the equilibrium points of the DO system. First, we briefly review the key highlights of the DO LTI systems and their applications. Consider a DO system described via the following LTI DODE and algebraic output equation,

$$\begin{array}{l} \int\_{0}^{1} \phi(\beta) D\_{t}^{\beta} x(t) \mathrm{d}\beta = Ax(t) + Bu(t) \\ y(t) = Cx(t) + Du(t) \end{array} \tag{26}$$

where *x*(*t*) is the state vector, *u*(*t*) indicates the input, and *y*(*t*) indicates the output of the system. *A*, *B*, *C*, and *D* are matrices of appropriate dimensions. Note that the interval of the DO derivative in the above single-input single-output (SISO) system can be converted to a more general interval [*β*1, *β*2] ∈ [0, 1]. Applying a set of Laplace and inverse Laplace transform to the above DODE with the assumption that *x*(0) = 0 and *u*(*t*) = *δ*(*t*), the following expression can be obtained,

$$\mathbf{x}(t) = \mathcal{L}^{-1}\left[\underbrace{\left[\left(\int\_0^1 \boldsymbol{\phi}(\boldsymbol{\beta}) \boldsymbol{s}^{\boldsymbol{\beta}} \mathrm{d}\boldsymbol{\beta}\right) \boldsymbol{I} - \boldsymbol{A}\right]^{-1} \boldsymbol{B}}\_{G(s)}\right](t) \tag{27}$$

where *I* denotes the identity matrix. As established in [341–343], the DO LTI system in Equation (26) with the transfer function *H*(*s*) = *CG*(*s*)*B* + *D* is BIBO stable *iff* all the roots of the secular equation corresponding to |*G*(*s*)*I* − *A*| = 0 have negative real parts. The contours of this stability region have been derived based on the latter principle for different definitions of the strength function in [342,344]. The stability contours are often

impossible to express via elementary functions, which makes the stability tests of DO systems more complicated than their constant- and integer-order counterparts. In this regard, the Lagrange inversion theorem was utilized in [345] to obtain explicit expressions for the stability contours. Several interesting properties of these stability curves such as the slope of the tangent at very high and very low frequencies, convexity, inability to cut itself, location in the first and fourth quadrants, and shifting and enhancement of the area of the stability via multiplication of suitable functions to the strength distribution, have been presented in [346–348].The above mentioned properties of the stability boundaries were used in [347] to present a remarkable framework for the robust stability analysis of DO LTI systems with uncertain strength distributions and dynamic matrices. More specifically, these properties were used to show that the stability boundary of DO LTI systems can be accurately located in a certain region on the complex plane defined by the upper and lower bounds of the strength distribution. These results are sufficient to ensure robust stability in DO LTI systems with uncertain strength functions and uncertain dynamic matrices. The above framework presented in [347] is highly relevant for real-world applications that are commonly accompanied by uncertainties. Additional discussions on the stabilization, controllability, and passification of DO LTI systems can be found in [349–352].

The DO LTI framework discussed above has been used to analyze different systems: the solar wind-driven magnetosphere ionosphere system (a complex driven-damped dynamical system which exhibits a variety of dynamical states) [341,348], a DO Lotka– Volterra predator–prey system (a system with multiple time-delays) [353], the DO Chen system [354], and gene regulatory systems [332]. All the aforementioned applications differ primarily in the choice of the strength function which directly affects the stability and control of the system.

In nonlinear systems, researchers have focused mainly on analyzing the Lyapunov stability of systems, as also mentioned previously. The Lyapunov direct method, used for analysis of stability, was first generalized for nonlinear time-varying DO systems in [355–357] and was used to determine the stability or asymptotic stability of certain nonlinear systems including a DO analog of the Lorenz system. The theoretical framework proposed in the studies [355,356] was then used to analyze different nonlinear time-varying DO systems including a DO consensus model [358], the DO Lorenz system [359], and the DO Van der Pol oscillator [330,360]. The consensus of multi-agent systems with fixed directed graphs and described by DODE, was analyzed in [358] and sufficient conditions were obtained for robust consensus in the presence and absence of external disturbances. Recently, the stability and control of a DO Van der Pol were analyzed in [330], wherein the intervals of the different model parameters at which this oscillator exhibits periodic, chaotic, and hyperchaotic behaviors, were calculated using Lyapunov exponents. Further, a robust scheme was presented in [330] to achieve complete synchronization between two DO hyperchaotic unforced Van der Pol oscillators. This synchronization allowed the development of a secure messaging system for a text which contains alphabets, numbers, and symbols.

#### **7. Conclusions**

This paper presented an overview of the general area of Distributed-Order Fractional Calculus (DOFC) with particular focus on its applications to scientific modeling of complex systems. A branch of the broader field of fractional calculus, DOFC has rapidly emerged and captured the attention of many researchers in science and engineering. This rapid growth was mostly due to its remarkable ability to capture complex multiscale processes. Phenomena like multiple relaxation times in viscoelasticity, multiple temporal and spatial scale effects in transport processes, and mixture of time delays in control theory, just to name a few, have all illustrated the significant performance of DOFC over more traditional integer-order techniques. The main goal of this review was to provide a snapshot in time of

the field of DOFC and to guide the interested reader into an introductory journey through this fascinating topic. In this regard, we highlight that the content of technical papers was only briefly addressed in order to favor a more general discussion of the evolution of the field in its different areas of application.

Despite the recent substantial growth in DOFC research, there are still many areas holding significant opportunities for further development. While some preliminary work is available on distributed-variable models, a comprehensive framework for distributedvariable-order fractional calculus (DVOFC) is still lacking. A key factor that adds to the complexity of formulating DVOFC is the existence of different definitions for VO operators that exhibit different memory characteristics. Thus, a unified definition of the different variable- and distributed-order operators and an analysis of their mathematical properties would certainly be beneficial. In these operators, the order-variation can be a function of different dependent or independent physical variables (such as, for example, temperature, space, time, and energy). The combination of the DO and VO formalisms should allow the simulation of highly complex physical systems which are both evolutionary (therefore, requiring VO operators) and multifractal (requiring DO operators) in nature. Another possible extension of currently available DO operators follows from the use of normalized self-similar strength functions within the definition of DO operators, which can be considered analogous to random-order operators. Particularly lacking is a rigorous mathematical analysis of the properties of such operators. Despite the above challenges, the extension of DOFC to these areas can have important applications in modeling random and chaotic dynamics observed, as an example, in turbulent dynamics, noise and vibration control, or even in financial systems. These models could even form the basis for the development of highly accurate risk analysis and control models.

It should be pointed out that, despite the rapidly growing number of related studies, there are still several open questions that need to be addressed before DOFC could become a mainstream modeling approach for common real-world applications. A critical step to promote the broader use of DOFC models is to establish the connection between the mathematical properties of DO operators (i.e., the strength function and its support) and the physical properties and parameters of the system to be modeled. In other terms, the identification of closed form relations linking the mathematical parameters of the DO operators to the physical parameters of the system at hand are of paramount importance to foster the use of DOFC tools in scientific modeling.

**Author Contributions:** W.D., S.P. and S.S. performed the literature review. All authors contributed equally to the manuscript writing. All authors have read and agreed to the published version of the manuscript.

**Funding:** The following work was partially supported by the National Science Foundation (NSF) under grants MOMS #1761423 and DCSD #1825837 and by the Defense Advanced Research Project Agency (DARPA) under grant #D19AP00052. The content and information presented in this manuscript do not necessarily reflect the position or the policy of the government. The material is approved for public release; distribution is unlimited.

**Data Availability Statement:** This article has no additional data.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-2827-4