**Recent Advances in Single-Particle Tracking: Experiment and Analysis**

Editors

**Janusz Szwabi ´nski Aleksander Weron**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Janusz Szwabinski ´ Wrocław University of Science and Technology Poland

Aleksander Weron Wrocław University of Science and Technology Poland

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Entropy* (ISSN 1099-4300) (available at: https://www.mdpi.com/journal/entropy/special issues/ single-particle tracking).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-3485-5 (Hbk) ISBN 978-3-0365-3486-2 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editors**

**Janusz Szwabi ´nski** obtained his B.Sc. in Physics from the University of Wrocław, Poland, and his Ph.D. in Science from Saarland University in Saarbruecken, Germany. He held postdoctoral positions at the University of Wrocław, Poland, and the University of Geneva, Switzerland. He is currently a university professor in the Department of Applied Mathematics at the Wrocław University of Science and Technology and the chair of the Polish chapter of EU-MATH-IN, a European Service Network of Mathematics for Industry and Innovation. He is mainly interested in complex systems and has a track record in multidisciplinary research, with applications in statistical physics, biology, social science end economy.

**Aleksander Weron** obtained his B.Sc. in Mathematics from the University of Wrocław, Poland, and his Ph.D. in Mathematics from Wrocław University of Science and Technology, Poland. He held postdoctoral positions at Tbilisi State University, Georgia 1971–1972, Visiting Research Professorship at Center for Stochastic Processes, Univ. of North Carolina, Chapel Hill, USA 1983–1984, and Senior Research Fellowship under the Fulbright Program at University of California: UCSB Santa Barbara and UCLA Los Angeles, USA 1995–1996. He is currently a full professor in the Department of Applied Mathematics at the Wrocław University of Science and Technology and has been recently awarded the status of professor magnus for his scientific excellence. He is also the head of the Hugo Steinhaus Center for Stochastic Methods. Professor Weron is an author or editor of 13 books and has published over 145 research papers on probability theory, stochastic processes, and their applications to physics, biology, and economy. His recent research interests include anomalous diffusion, ergodicity testing, fractional dynamics, and molecule imaging.

## **Preface to "Recent Advances in Single-Particle Tracking: Experiment and Analysis"**

Studying diffusion is not the newest topic in the field of statistical physics. The idea is commonly linked to Robert Brown, who investigated the motion of pollen grains in water in 1827. However, the transportation of particles and their dynamics had already been analyzed by Jan Ingenhousz. His description of the movement of coal dust suspended in alcohol dates back to 1785. Since then, interest in the molecular phenomenon of diffusion has remained practically unbroken, because it appears in many domains, including physics, chemistry, biology, sociology, economics, and finance.

Starting with the pioneering experiments by Perrin and Nordlung in the 1910s, a quantitative analysis of microscopy images of diffusing particles has become an important technique for various disciplines. Over time, this method has evolved into what is now known as single-particle tracking (SPT). The concept has deeply penetrated molecular biology and statistical and chemical physics because it helps to unveil the local physical properties of molecules and their environment. It has also become a popular field in applied mathematics.

The growth of single-particle tracking as a research topic is enormous. A recent query for SPT in one of the popular scientific databases resulted in almost 2 million direct hits. Advances in recent years have led to a vast array of new scientific lines of inquiry. Given the deluge of information available, the idea behind this Special Issue was to summarize the recent findings in single-particle tracking and bring them to a broader audience. The 13 contributions focus on different aspects of SPT, both experimental and theoretical. A short summary of the topics covered by the papers can be found in the following.

The first three contributions cover some experimental aspects of SPT. Scheda et al. introduce an original pipeline for the segmentation and analysis of phase-contrast images of the wound-healing scratch assay acquired in a time-lapse. They use an ensemble of pseudo-particles to represent the wound edge. By tracking their stochastic motion, they are able to overcome some limitations of standard approaches due to the change in the shape and density of cells during migration. Speckner and Weiss focus on transport phenomena in intermediate systems, i.e., biochemically active cell extracts. They have performed extensive SPT experiments on beads in native and chemically treated Xenopus laevis extracts to show that the beads feature an anti-persistent subdiffusion that is consistent with fractional Brownian motion. Finally, Zhang and Welsher present a novel 3D single-particle tracking system with a 20% increase in precision compared to traditional approaches. They use smart off-center sampling patterns for the optimal utilization of photons coming from illumination. Their method may be of particular importance for studying biological samples, where photons are often limited to small amounts.

The most commonly used method for the analysis of SPT trajectories is based on the mean-squared displacement (MSD) of particles. Although quite simple in principle, the method is known to have some drawbacks, mainly related to the short lengths of the experimental trajectories, their heterogeneity (i.e., several types of motion within a single path), and the presence of noise. Consequently, there is still a need for more robust analytical methods that go beyond MSD and allow for a proper interpretation of the experimental results. The next five papers in our collection fit into this research direction. Balcerek and Burnecki present a rigorous statistical test to detect a multifractional Brownian motion (i.e., a fractional Brownian motion with a time-dependent Hurst exponent) within the trajectories. Their approach is based on the covariance function and should be helpful in the analysis of anomalous diffusion. Hidalgo-Soria et al analyze the two-state "jumping diffusivity" model. They show, with the help of the perturbation theory, that a non-analytical behavior (a cusp) may be found in the distribution of displacements within the model in the short time limit. Korabel et al. study the heterogeneous intracellular transport of endosomes. To improve the prediction power of the local analysis, they split the ensemble of trajectories into fast and slow subsets prior to the actual analysis. This step allows for a separate treatment of different motion regimes. Lanoiselee et al. propose a new structural approach to detect transient trapping of particles. ´ Their method is based on the recognition of block structures along the diagonal of the recurrence matrix. Stanislavsky and Weron use the conjugate Bernstein function theory to find a connection between the tempered subdiffusion and the diffusion-limited aggregation. Since the model allows for the detection of confined random walks within the trajectories, it may be applied to SPT data.

In recent years, machine learning (ML) has been employed for the analysis of single-particle tracking data. In contrast to standard algorithms, where the user is required to explicitly define the rules of data processing, ML algorithms can directly learn those rules from a series of data. Three papers in this Special Issue cover different aspects of this approach to SPT. Gajowczyk and Szwabinski ´ use the deep recurrent neural network for the classification of trajectories (i.e., the detection of diffusion modes). Loch-Olszewska and Szwabinski tackle the same problem with a more traditional ´ ML approach. Compared to deep learning, the feature-based methods they use do not work with raw trajectories; instead, they require a set of human-engineered features for each trajectory in order to feed a classifier. Although deep learning performs a little better, the traditional method was better in terms of interpretability. Szarek et al. combine the two methods. They use a neural network, together with features calculated from trajectories (autocovariance function), in order to estimate the anomalous exponent from the trajectories. Their approach outperforms the analytical one.

Last but not least, the two remaining contributions cover some topics related to diffusion. Dinis and Parrondo show how to optimally extract work from a Brownian particle that is reversibly contained in an optical tweezer potential. Their model of a molecular motor should work, at least in principle, for systems with much shorter response time of measurements than the systems' relaxation times. Lachowicz and Debowski investigate models that may lead to diauxic growth at the mesoscopic scale. They may help us to understand some complex motion patterns of bacterial cells.

We express our thanks to the authors of the contributions, and to the journal Entropy and MDPI for their support during the preparation of this Special Issue. We hope that you will enjoy reading it, whether you are a newcomer to the field and looking for a place to start, or already working in the field and looking for stimulation. We also hope that you will recommend this issue to your colleagues.

#### **Janusz Szwabi ´nski, Aleksander Weron**

*Editors*

## *Article* **Study of Wound Healing Dynamics by Single Pseudo-Particle Tracking in Phase Contrast Images Acquired in Time-Lapse**

**Riccardo Scheda 1,†, Silvia Vitali 2,\*, Enrico Giampieri 3, Gianni Pagnini 2,4 and Isabella Zironi <sup>1</sup>**


**Abstract:** Cellular contacts modify the way cells migrate in a cohesive group with respect to a free single cell. The resulting motion is persistent and correlated, with cells' velocities self-aligning in time. The presence of a dense agglomerate of cells makes the application of single particle tracking techniques to define cells dynamics difficult, especially in the case of phase contrast images. Here, we propose an original pipeline for the analysis of phase contrast images of the wound healing scratch assay acquired in time-lapse, with the aim of extracting single particle trajectories describing the dynamics of the wound closure. In such an approach, the membrane of the cells at the border of the wound is taken as a unicum, i.e., the wound edge, and the dynamics is described by the stochastic motion of an ensemble of points on such a membrane, i.e., pseudo-particles. For each single frame, the pipeline of analysis includes: first, a texture classification for separating the background from the cells and for identifying the wound edge; second, the computation of the coordinates of the ensemble of pseudo-particles, chosen to be uniformly distributed along the length of the wound edge. We show the results of this method applied to a glioma cell line (T98G) performing a wound healing scratch assay without external stimuli. We discuss the efficiency of the method to assess cell motility and possible applications to other experimental layouts, such as single cell motion. The pipeline is developed in the Python language and is available upon request.

**Keywords:** wound healing dynamics; single pseudo-particle tracking; phase contrast image segmentation

#### **1. Introduction**

In this paper, we propose a pipeline for the segmentation and analysis of phase contrast images acquired in time-lapse in the wound healing scratch assay, to overcome some limitations of standard approaches due to the change in shape and density of the cells during migration.

Cellular migration is a fundamental process for animal's physiology during both the period of development and that of maturity. Cells migrate to shape organs and tissues and, in the case of damage, regenerate them. Furthermore, motility is a primary skill in cancer metastatic processes and in the immune responses [1,2]. The capability to migrate is a highly regulated process in which cells respond to external and internal mechanical, electrical, and chemical stimuli by complex physiological processes that promote, enhance, or suppress cell motility [3,4]. Cells can be induced to move in a particular direction by positive and negative guidance signals, while in the absence of external guidance, cells move randomly [5,6].

**Citation:** Scheda, R.; Vitali, S.; Giampieri, E.; Pagnini, G.; Zironi, I. Study of Wound Healing Dynamics by Single Pseudo-Particle Tracking in Phase Contrast Images Acquired in Time-Lapse. *Entropy* **2021**, *23*, 284. https://doi.org/10.3390/e23030284

Academic Editors: Janusz Szwabi ´nski and David Holcman

Received: 24 November 2020 Accepted: 23 February 2021 Published: 26 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In cutaneous wound healing, which is a complex cellular and biochemical process necessary to restore structurally damaged tissue, skin cells migrate from the wound edges towards the empty space to restore skin integrity. In this case, the cohesive group of cells organized in a layer modifies the classical characteristics of single cell migration, and the presence of the wound induces peculiar migration behaviors. In fact, while a certain freedom of movement is maintained inside the tissue, the cells along the edge of the wound (front) move preferentially toward the gap. Such a process involves dynamical interactions between both the contacting cells (which are absent in single cell migration) and the extracellular matrix. These interactions regulate motility enhancement or suppression [7,8].

The wound healing scratch assay is a widespread experimental tool applied to study the collective migration of cells cultured in vitro. Standard protocols provide that a highly confluent monolayer of cells is scratched by a fine pipette tip to create a gap, which is then allowed to heal. As a protocol of analysis, the area of the scratch is measured as a function of time to determine the speed of the closure. This method is meant to simulate a natural wound, and the procedure is simple and easy to set up, but it is difficult to analyze and produce precise and reproducible results [9].

The mathematical continuum models that focus on the collective properties of cells can explain the requirements for the onset of movement and some typical characteristics of cell motility, but are usually limited to small space-time scales. Therefore, they provide little information on how the integration of the lamellipodium protrusion, the retraction of the posterior part, and the transduction of force on the extracellular matrix lead to the long-term prolonged movement of the entire cell. This process is characterized by alternating phases of direct migration and changes of direction and polarization. The coordinated interaction of these phases suggests the existence of intermittency, strong space-time correlations, and a close relationship between units (cell-cell interaction). It is therefore an important question whether the long-term movement of the entire cell can still be understood as a simple diffusive behavior such as Brownian movement or a random walk or whether more advanced dynamic modeling concepts should be applied [10–12].

The change in shape and density of the cells during migration make it difficult to apply standard automatic single particle tracking (SPT) pipelines to extract the cell migration trajectory in phase contrast images acquired in time-lapse. These difficulties are even greater when collective motion is considered and a dense agglomerate of cells is present. To overcome such limitations, here, we propose a pipeline for segmentation and SPT extraction in phase contrast images of the wound healing scratch assay. The pipeline is original and follows the principle of Occam's razor, based on a simple measure as linear binary patterns (LBPs), which results in being sufficient to classify the texture as cells or background by using a principal component analysis (PCA) and Gaussian mixture classification, the code is available at the git-hub repository https://github.com/ riccardoscheda/AnomalousDiffusion (accessed on 1 November 2020). We chose the manual segmentation performed over one experiment as the ground truth. We further compared the performance of our pipeline with segmentation by Otsu thresholding [13] without manual adjustment of the parameters for different frames of the same image. For all the cases, the wound edge is approximated to a unique membrane and its dynamics approximated by the stochastic motion of a point on the membrane, i.e., a pseudo-particle. This choice is motivated by the fact that in the experiment under study, faster cells do not separate from the borders during wound closure. Therefore, the dynamics and the heterogeneity of the process are characterized through the collection of such SPT trajectories.

The paper is organized as follows: in the the Methods Section, we present step by step our pipeline for phase contrast image processing and SPT; in the Results Section, we show the trends of the pseudo-particle trajectories' statistics for a wound healing scratch assay, performed with glioblastoma T98G cells; in the Conclusions Section, we discuss the performance and the SPT statistics of our pipeline, in comparison with the corresponding measurements obtained through the professional tool ImageJ [14].

#### **2. Methods**

#### *2.1. Data*

Glioma cells (T98G), derived from brain human tumor glioblastoma multiforme (GBM), were plated at a density of 1 <sup>×</sup> <sup>10</sup><sup>5</sup> cells/cm2 on 35 (-) mm sterile Petri dishes with a 10 (-) mm glass microwell (MatTek Corporation, Ashland, MA, USA) suitable for optical microscopy. The cell culture, with a population doubling time (PDT) approximately of 28 h, as reported by the ATTC Company, which provided the cell line, was maintained in GibcoTM Minimum Essential Medium (MEM) with Earle's salts (Fisher Scientific, Milano, Italy) supplemented with 10% fetal bovine serum, 1% L-glutamine, 1% sodium pyruvate, and antibiotics (1% penicillin and 1% streptomycin) inside the incubator at 5% of CO2 and 37 ◦C. All chemicals were purchased from Merck KGaA (Darmstadt, Germany). After 48 h from seeding, the population covered the entire surface as a monolayer of confluent and tightly contacting cells. Using a sterile pipette tip for Gilson (10–200 μL), a scratch ranging 200–400 μm along the middle axis was done. Right after, the specimen was placed into the pre-heated microscope stage incubator in the motorized table of the inverted optical microscope Eclipse Ti (Nikon, Bologna, Italy). The phase-contrast micrographs of multiple visual fields, pre-selected along the narrow scrape by the NIS Elements AR 4.0 (Nikon, Bologna, Italy) software, were acquired at 100× magnification for 20 h at the rate of 4 frames/hour. The setup allowed the acquisition of time-lapse images of living cultured cells maintained in standard conditions for the entire duration of the experiment.

#### *2.2. Image Processing*

The aim of the pipeline was to identify the wound edges, which correspond to the free edge of the two cell layers, separated by the wound. The procedure was done in the following steps, which should be applied to all the frames of an experiment: (i) equalization, to make all the frames comparable; (ii) binarization, to separate the background regions from the cell layers; (iii) wound edge identification; and (iv) storage of the coordinates. We describe here two alternative procedures of binarization, the first based on texture classification and the second on hand drawing the wound edges over the images by using the professional tool ImageJ [14].

#### 2.2.1. Equalization of the Frames

To improve the difference of the wound borders from the background regions, we applied to all the frames contrast limited adaptive histogram equalization (CLAHE) [15].

The image was divided into small blocks (tiles), with a tile size of 50 × 50, to enhance the difference between the cell border and the background (tile size is 8 × 8 by default in [16]). Then, each of these blocks was histogram equalized. Therefore, in a small area, the histogram would be confined to a small region (unless there was noise). If noise was there, it would be amplified. To avoid this, contrast limiting was applied. If any histogram bin was above the specified contrast limit (by default, 40 in [16]), those pixels were clipped and distributed uniformly to other bins before applying histogram equalization. This procedure increased the image contrast and enhanced the texture patterns (e.g., Figure 1b) by equalizing pixels' intensity distribution of all the frames to a fixed range wider than the original ones.

#### 2.2.2. Image Binarization by Texture Analysis

Image binarization was performed by dividing each frame into 10,000 subimages (12 × 16 pixel subimages in 1200 × 1600 pixel image) and by classifying each of them as the background or cell layer on the bases of a score. The score was built to characterize the texture of each subimage and corresponded to the distribution of the local binary pattern (LBP) values for all the pixels of the subimage (scikit-image Python library [17], skimage.feature.local\_binary\_pattern). We considered grayscale images; thus, the LBP of each pixel corresponded to a scalar value. We calculate the LBP for a pixel by comparing the pixel with its 8 first neighbors. To each couple was assigned a score: if the central pixel

value was greater than or equal to the neighbor pixel value, we assigned 1, otherwise, if the central pixel value was less than the neighbor pixel value, the score was 0. The LBP value of the pixel corresponded to the sum of these scores, ranging from 0 to 8, and contained information about the 3 × 3 square of pixels. The frequency of such LBP scores for each subimage was an array of 9 values representing a texture feature of the subimage. Therefore, each frame (image) was characterized by a matrix of 10,000 (sub-images) × 9 (LBP score) values. Principal component analysis (PCA) [18] was performed over the 9 dimensions of the texture score to separate the 10,000 subimages into two clusters: one corresponding to the background regions and one containing the cell layer regions (Figure 1c). Taking the first 5 principal components, the points belonging to the two clusters were classified and labeled (0 or 1) using the Gaussian mixture model clustering algorithm (scikit-learn Python library [19], sklearn.mixture.GaussianMixture). Each point in Figure 1c corresponds to a subimage; hence, the obtained binary color labels (yellow or blue) were used as binarized intensities for the corresponding subimages to build the binarized image (Figure 1d).

The performance of the algorithm with respect to the size of the subimages was studied in terms of the Pearson correlation of the segmented fronts with the ground truth for squared and rectangular shapes of different sizes. For complete tessellation of the image, it supported the choice of the subimages' size of 12 × 16 pixels (see the Supplementary Material).

**Figure 1.** Original image (**a**); transformed image by using adaptive histogram equalization (**b**); 3D scatter plot of the first 3 principal components of the linear binary pattern (LBP) score PCA (**c**); the data points in the scatter plot are clustered by the Gaussian mixture model clustering algorithm; color labels refer to cells (blue) or background (yellow); binarized frame image by using texture analysis (**d**).

2.2.3. Wound Edge Recognition from Binarized Images

Contour lines can be easily recognized in a binarized image as the contour of 0 or 1 regions. We applied a function for contour identification, returning a list of all the contours in an image (OpenCV-Python library, findContours). In the frames, the longest contour line refers to the central part of the image until the two cellular fronts remain separate, identifying at the same time the background regions an the two borders of the cell layers (Figure 2a).

**Figure 2.** Image binarized by texture analysis with borders recognized by OpenCV-Python (**a**); example comparison of the borders obtained through the professional tool ImageJ (blue line) and the texture analysis (red dashed line) method superposed on the original image frame (**b**).

#### 2.2.4. Wound Edge Recognition with the ImageJ Professional Tool

The extraction of the fronts with the professional tool ImageJ for image analysis was performed by hand drawing the line of the front over each image (Figure 2b) and then by saving the corresponding coordinates for each frame and for left (L) and right (R) front in a .txt file [14].

#### 2.2.5. Wound Edge Recognition with Otsu Thresholding

For the simple thresholding of the images, we performed an adaptive histogram equalization (CLAHE) to improve the difference of the wound borders from the background, then we blurred the image with OpenCV Gaussian Blur, in order to have better results for the Otsu thresholding. Then, we applied Otsu thresholding on the image [17]. Then, we applied morphological transformations in order to have a smoother border of the wound. After morphological transformations, we collected the coordinates of the borders (OpenCV-Python library, findContours).

#### *2.3. Pseudo-Particles' Trajectories*

The wound edge (L and R) of the cell layers was considered as a single homogeneous elastic membrane. The movement of such a membrane was tracked by means of *N* points uniformly distributed along its length as pearls on an elastic necklace. To derive the coordinates of the *N* pseudo-particles, we interpolated the wound edges' 2D coordinates as a function of the front length (scipy Python library, scipy.interpolate.interp1d), and then, we computed the coordinates of the *N* pseudo-particles uniformly distributed along its length [20].

The collection of the *N* points constituted the collection of pseudo-particles, and for each of them, an SPT was built by considering its coordinates in the time sequence of the experiment frames. The SPTs in 2D were allowed to cross because of invaginations and protrusions of the front, despite the cells being attached to each other and the membrane of the wound edge being considered as a unicum. However, the average displacement along the membrane was approximately zero because it was constrained by the geometry of the system and the microscope field.

The pseudo-particle *n* at time *tk* was defined by the coordinates of the *n*-th pseudoparticle in the *k*-th frame of the image. The collection of the coordinates of the pseudoparticle *n* for all the frames represented the trajectory of the pseudo-particle *n*. Thus, the dynamics of the membrane could then be tracked by working on a matrix *N* × *M*, where *N* is the number of tracked pseudo-particle and *M* is the number of frames. The latter represent the time steps of the sampling.

#### *2.4. SPT Statistics*

To study the SPT statistics, we applied the discrete version of mean squared displacement (MSD) and autocorrelation functions (ACFs). In fact, discrete SPT statistics can be performed directly on an SPT *N* (pseudo-particles)×*M* (frames) matrix dataset, one for the *x* coordinate and one for the *y* coordinate, where the frame index *k* = 0, 1, 2 . . . , *M* − 1 corresponds to the sampling time, i.e., the time step of the process, and *n* = 1, 2 ... , *N* corresponds to the index of the pseudo-particle. Statistics on the single trajectory could be performed as matrix operations along the *M* columns, while the ensemble average could be performed by averaging over the *N* rows of the transformed matrix. In the present work, we considered only the movement toward the free edge of the layers for the sake of simplicity, i.e., the *x* component of the quantities of interest. For statistical analysis, we applied a shift to the pseudo-particle position such that *X*(*t* = 0) = 0. Moreover, due to the lack of long stationary trajectories, we considered here only ensemble averages:

$$\mathbb{E}(Y(k)) = \frac{1}{N} \sum\_{n=1}^{N} y\_n(k) \, , \tag{1}$$

where *yn*(*k*) is the value of the variable *Y* for the *n*-th pseudo-particle at time *t* = *k*. The velocity of the pseudo-particle is defined as the increment of the pseudo-particle position *X* per unit sampling time (0.25 h):

$$V(\tau) = X(\tau + 1) - X(\tau), \quad \tau = 0, 1, 2 \dots, M - 2. \tag{2}$$

The increments of the velocity per unit sampling time are defined as the following:

$$A(\tau) = V(\tau + 1) - V(\tau) \,, \quad \tau = 0, 1, 2 \dots, M - 3 \,. \tag{3}$$

The autocorrelation function *ACFY* for the generic variable *Y* reads:

$$ACF\_Y(\tau) = \frac{\mathbb{E}[(Y(t\_0) - \mu\_{t\_0})(Y(t\_0 + \tau) - \mu\_{t\_0 + \tau})]}{\sigma\_{t\_0}\sigma\_{t\_0 + \tau}}, \quad \tau = 0, 1, 2, \dots, M - 1,\tag{4}$$

where the initial time is *t*<sup>0</sup> = 0 and *σ<sup>k</sup>* and *μ<sup>k</sup>* represent respectively the standard deviation and the mean of the variable *Y* at time *t* = *k*.

The mean squared value (*MSY*) for the generic variable *Y* reads:

$$MSY(\tau) = \mathbb{E}[\left(Y(t\_0 + \tau) - Y(t\_0)\right)^2], \quad \tau = 0, 1, 2, \dots, M - 1,\tag{5}$$

where the initial time is again *t*<sup>0</sup> = 0.

#### *2.5. Fit Procedure*

All the fits were performed through a ordinary least squares (OLS) regression (scipy Python library, scipy.optimize.curve\_fit), which returned the optimized parameters of the model and their matrix of covariance [20]. Poissonian uncertainty for counts in histograms was considered. We further compared (results not shown) the parameters estimated by OLS with the ones obtained by the maximum likelihood estimate (MLE) (stats Python library, scipy.stats.rv\_continuous.fit).

#### **3. Results**

We considered a single field in an experiment of the wound healing scratch assay (without external stimuli applied to the cell substrate) as the test image.

The trends of the *N* SPTs obtained through the texture analysis for the experiment under study are compared with the ones obtained by using the professional tool ImageJ in Figure 3. SPTs from the right front are also mirrored.

**Figure 3.** Comparison of the single particle trajectories obtained with the professional tool ImageJ (blue line) and the texture analysis (red or dashed line) method for the left wound edge (**a**) and the right wound edge (**b**), for *N* = 103.

In Figure 4, we display the temporal trends of the area between the wound edges during wound closure, estimated by using the texture analysis in comparison with the wound edges recognized manually.

**Figure 4.** Normalized area between the wound edges during wound closure as a function of time for the texture analysis (blue line) and the professional tool ImageJ (red line).

The pseudo-particle average position and average velocity for the two methods of analysis are displayed in Figures 5 and 6. The ACFs of the pseudo-particle position and velocity along the *x* coordinate are compared for the two methods of analysis in Figures 7 and 8, respectively. A regime with stationary increments of the velocity was identified for the time range between 5 h and 8 h (Table 1), corresponding to the duration of the regime with constant drift velocity in the ensemble averaged position (Figure 5). The medium could be roughly approximated as viscous, and a constant velocity implies constant force, on average, applied against friction by the cells. This stationary regime with constant drift velocity was supported by zero correlation in the VACF(Figure 8) and by the symmetric distribution of velocity increments with a zero average. For such a time range, the distribution of the instant acceleration (velocity increments) along the *x* coordinate is shown in Figure 9 for the two methods of image segmentation. The tails of these distribution are compatible with both the exponential and the Gaussian scaling, with comparable characteristic scales (Table 2). However, the linear decay of the tails in Figure 10 suggests that a truncated-exponential decay is more plausible. To estimate the consistency between the two methods, we computed the Pearson's correlation (Table 3) for their estimates of the pseudo-particles' coordinates, i.e., the entire collection of estimated position for the *x* and *y* coordinates, the average position *X*, the average velocity *V*, and the average velocity increments *A* for *<sup>N</sup>* = <sup>10</sup><sup>3</sup> and *<sup>M</sup>* = 40.

**Figure 5.** Comparison of the average position of the pseudo-particle obtained with the professional tool ImageJ (blue dotted line) and the texture analysis (red dashed line) and their linear OLS best fit (black dashed line; see Table 1 for details) for the left wound edge (**a**) and the right wound edge (**b**).

**Figure 6.** Comparison of the ensemble averaged pseudo-particle velocity obtained with the professional tool ImageJ (blue dotted line) and the texture analysis (red dashed line) for the left wound edge (**a**) and the right wound edge (**b**).

**Figure 7.** Comparison of the coordinate *X* autocorrelation function obtained with the professional tool ImageJ (blue dotted line) and the texture analysis (red dashed line) for the left wound edge (**a**) and the right wound edge (**b**).

**Figure 8.** Comparison of the velocity autocorrelation function obtained with the professional tool ImageJ (blue dotted line) and the texture analysis (red dashed line) method for the left wound edge (**a**) and the right wound edge (**b**).



**Figure 9.** Comparison of the ensemble averaged pseudo-particle acceleration trajectory obtained with the professional tool ImageJ (blue dotted line) and the texture analysis (red dashed line) method for the left wound edge (**a**) and the right wound edge (**b**); comparison of the standard deviation of pseudo-particle acceleration of the ensemble of pseudo-particles obtained with the professional tool ImageJ (blue dotted line) and the texture analysis (red dashed line) method for the left wound edge (**c**) and the right wound edge (**d**).

**Figure 10.** Comparison of the velocity increments (absolute value) obtained with the professional tool ImageJ (blue) and the texture analysis (red) method for the left wound edge (**a**) and the right wound edge (**b**); the best OLS fit of the frequencies of a exponential (dashed line) and a normal distribution (bold line) is shown for the two methods (see Table 2 for details).


**Table 2.** Estimated parameters with their corresponding standard error obtained through OLS regression of the corresponding model and goodness of fit (Adj. R-squared).

**Table 3.** Pearson correlation of the collection of the pseudo-particles' coordinates, *x* and *y*, the average position *X*, the average velocity *V*, and the average velocity increments *A* estimated by ImageJ with the one estimated by texture analysis for *N* = 103.


#### **4. Conclusions**

We present an original method to extract a 2D discrete representation of the wound edge in phase contrast images acquired in time-lapse by texture analysis, and we compare the results with the ones obtained by using the professional tool ImageJ and by Otsu thresholding (see the Supplementary Material for edges derived by thresholding). The dynamics of the wound edges is defined by the SPT of *N* pseudo-particles uniformly distributed along the length of the fronts.

Thus, discrete SPT statistics can be performed directly on the SPT *N* (pseudo-particles) ×*M* (frames) matrix dataset for the *x* coordinate (crossing the wound gap).

We compare SPT statistics of the data obtained by hand drawing with the texture analysis: average values, squared mean values, and the autocorrelation function of position and velocity. The two approaches lead to consistent results in terms of the trends of the dynamics (qualitative analysis) and in terms of Pearson's correlation (Table 3). By a visual check, the texture analysis appears more capable of recognizing lamellipodium protrusions than the professional tool ImageJ, because such tiny structures could be occasionally missed by human recognition (Figure 2). On the other side, the automatized procedure may also produce artifacts in the front profile, for example it would consider as part of the cell layer pieces of dead cells remaining in the middle of the wound when the cells at the wound edges get close to them. For these reasons, the wound edges detected by texture analysis are associated with larger fluctuations between different frames than the ones detected manually. For the same reasons, the pseudo-particles position fluctuates more in the texture analysis dataset between different frames, generating larger tails in the distribution of increments along the *x* coordinate for the velocity (and position) of the pseudo-particle (Figure 10) and larger mean squared velocity, in comparison to the one obtained by the professional tool ImageJ method. Despite such discrepancy, the average drift velocities (Figure 5), which correspond to the mean of the distribution of the position increments and the average velocity increments (Figure 10), are comparable.

Finally, by studying the SPT statistics, we are able to identify an intermediate regime characterized by a constant average of the cellular front velocity and by exponential tails for the velocity increments' distribution (Table 2). We leave the full characterization of the stochastic process and the biological meaning, which are beyond the scope of the present paper, to future research with an enlarged cohort of experiments, in order to increase the statistics, but also to characterize the inherent variability of the phenomena.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/1099-430 0/23/3/284/s1.

**Author Contributions:** Conceptualization, S.V., G.P. and I.Z.; software, R.S. and E.G.; investigation, I.Z.; resources, I.Z.; formal analysis, R.S. and S.V.; methodology, S.V. and G.P.; visualization, R.S. and S.V.; writing—original draft preparation, all authors; supervision, S.V., E.G. and G.P.; funding acquisition, G.P. and I.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** S.V. and G.P. are supported by the Basque Government through the BERC 2018– 2021 program and also funded by the Spanish Ministry of Economy and Competitiveness MINECO via the BCAM Severo Ochoa SEV-2017-0718 accreditation.

**Data Availability Statement:** not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Single-Particle Tracking Reveals Anti-Persistent Subdiffusion in Cell Extracts**

**Konstantin Speckner and Matthias Weiss \***

Experimental Physics I, University of Bayreuth, Universitätsstr. 30, D-95447 Bayreuth, Germany; konstantin.speckner@uni-bayreuth.de

**\*** Correspondence: matthias.weiss@uni-bayreuth.de

**Abstract:** Single-particle tracking (SPT) has become a powerful tool to quantify transport phenomena in complex media with unprecedented detail. Based on the reconstruction of individual trajectories, a wealth of informative measures become available for each particle, allowing for a detailed comparison with theoretical predictions. While SPT has been used frequently to explore diffusive transport in artificial fluids and inside living cells, intermediate systems, i.e., biochemically active cell extracts, have been studied only sparsely. Extracts derived from the eggs of the clawfrog *Xenopus laevis*, for example, are known for their ability to support and mimic vital processes of cells, emphasizing the need to explore also the transport phenomena of nano-sized particles in such extracts. Here, we have performed extensive SPT on beads with 20 nm radius in native and chemically treated Xenopus extracts. By analyzing a variety of distinct measures, we show that these beads feature an anti-persistent subdiffusion that is consistent with fractional Brownian motion. Chemical treatments did not grossly alter this finding, suggesting that the high degree of macromolecular crowding in Xenopus extracts equips the fluid with a viscoelastic modulus, hence enforcing particles to perform random walks with a significant anti-persistent memory kernel.

**Keywords:** anomalous diffusion; random walk; single-particle tracking

**Citation:** Speckner, K.; Weiss, M. Single-Particle Tracking Reveals Anti-Persistent Subdiffusion in Cell Extracts. *Entropy* **2021**, *23*, 892. https://doi.org/10.3390/e23070892

Academic Editors: Janusz Szwabi ´nski and Aleksander Weron

Received: 15 June 2021 Accepted: 9 July 2021 Published: 13 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Quantifying transport phenomena in soft and living matter on mesoscopic scales virtually always involves optical microscopy techniques, due to their high spatiotemporal resolution. Presumably, the most informative approach in this context is single-particle tracking (SPT). In SPT experiments, the rapid imaging of a sparse set of (fluorescent) particles, e.g., molecules, beads, quantum dots, or even whole organelles, allows for retrieving individual particle positions over time, eventually providing complete trajectories (see, for example, Refs. [1–3] for reviews and [4] for a quantitative comparison of SPT to other techniques). Direct access to particle trajectories facilitates the application of refined analysis approaches [5], with the mean square displacement (MSD) supposedly being the easiest and most familiar measure.

Having SPT data at hand, one can calculate, for example, the time-averaged MSD (TA-MSD), *r*2(*τ*)*t*, for each trajectory and compare these to their ensemble-average, *r*2(*τ*)*t*,*E*. A commonly observed feature is a power-law scaling of both MSDs *r*2(*τ*)*t*,*<sup>E</sup>* ∼ *r*2(*τ*)*<sup>t</sup>* <sup>∼</sup> *<sup>τ</sup>α*, with normal Brownian diffusion being indicated by *<sup>α</sup>* <sup>=</sup> 1. Scaling exponents *α* < 1 are commonly referred to as 'subdiffusion', whereas values 1 < *α* < 2 are termed 'superdiffusion'. Subdiffusion with scaling exponents 0.3 < *α* < 0.9 has been observed very frequently, at least on short and intermediate time scales, for tracer particles in complex media, e.g., in equilibrated biomimetic crowded fluids [6–10], in the cytoplasm [6,11–15] and in the nucleoplasm [11,16–18] of living cells, as well as on biomembranes [19–22]; see also [23,24] for extensive reviews.

Cell extracts, which are basically only the cytosol of an ensemble of cells without larger organelle compartments, constitute an intermediate between artificial equilibrated media and nonequilibrium fluids inside living cells. Although extracts support vital processes, such as gene transcription [25] or even the formation of a mitotic spindle [26], diffusional transport in these fluids has so far been explored only sparsely. Biochemically active extracts may be derived from different sources, with the eggs of the clawfrog *Xenopus laevis* supposedly being the most popular: here, unfertilized oocytes are collected from the spawn, cooled down and crushed by centrifugation (see [27] for details). Due to different buoyancies, most membranes and the yolk can be separated from the aqueous cytosol, which eventually is obtained as extract. This extract not only includes all necessary biomolecules at physiological concentrations, but also allows for native interactions of proteins and/or nucleic acids and/or sugars. Due to its amphibian origin, the Xenopus extract provides full functionality, e.g., a dynamic cytoskeleton and even functional spindles, already at temperatures around 20 ◦C [27]. Given the crowdedness of these fluids and their inherent nonequilibrium background noise due to active proteins, it is unclear how particles explore such a (nearly) living fluid. While an early study, which was focused on the rheology of Xenopus extracts, revealed already that particles with sizes around 1 μm move subdiffusively in these fluids [28], a study on smaller particles but also a detailed investigation of the random walk process associated with the observed subdiffusion has been lacking so far.

As a stochastic process, subdiffusion may arise when the accessible space has a fractal geometry [29], e.g., imposed by a sufficiently dense set of immobile obstacles that form a random percolation cluster. In most experimentally relevant cases, however, obstacles are too mobile to induce an obstructed random walk in a fractal environment (see [20] for a discussion). Power-law distributed waiting times between successive steps can also induce subdiffusion [30], yet at the cost of weak ergodicity breaking [31,32], i.e., the scaling of TA-MSDs is that of normal Brownian motion (*α* = 1), whereas a simple ensemble-averaged MSD of many trajectories shows subdiffusive scaling. The difference between both measures, at least in the unconfined case (see [33,34] for results in confined geometries), is related to a successive aging of the system. While such processes were sometimes observed [13,19], most experimental reports are more consistent with fully ergodic random walk processes, albeit often with a marked anti-correlation of successive steps instead of a purely Markovian random walk. For an abstract description of such a subdiffusive mode of motion, fractional Brownian motion (FBM) [35] with a Hurst coefficient *H* = *α*/2 ≤ 1/2 was used in many cases (see [5,23] for an extensive discussion). In a nutshell, FBM is a Gaussian process with stationary increments whose anti-persistent memory kernel may encode the viscoelastic characteristics of the surrounding medium. Using SPT, experimental data can be tested directly for FBM features not only via MSDs, but also via the power-spectral density (PSD) and correlation functions that report on the memory kernel [5,23].

Here, we show via SPT that beads with 20 nm radius move subdiffusively in native and chemically treated Xenopus extracts. A sublinear scaling of MSDs with an average scaling of *α* ≈ 0.9 is found, accompanied by a significant anti-persistence peak in the velocity autocorrelation function (VACF). The VACF shows excellent agreement with the FBM prediction, and the distribution of step increments is Gaussian, suggesting that the particles perform a subdiffusive random walk of the FBM type. Further support of this notion is given by the PSD and the associated coefficient of variation, both of which also agree very well with the FBM predictions. Chemical treatments of the extract, e.g., depolymerizing the cytoskeleton, do not grossly alter the results, suggesting that the high degree of crowding in Xenopus extracts equips the fluid with a viscoelastic modulus that forces particles to perform FBM-like random walks.

#### **2. Materials and Methods**

*2.1. Microscopy and Single-Particle Tracking*

Fluorescence images were taken with a customized spinning-disk confocal microscope, consisting of a Leica DMI 6000 microscope stand (Leica Microsystems, Wetzlar, Germany)

equipped with a CSU-X1 spinning disk unit (Yokogawa Electric, Tokyo, Japan). Samples were illuminated by a 491/561 nm dual-combined DPSS laser (Cobolt, Stockholm, Sweden), and fluorescence was detected in the range of 500–550 nm or 575–625 nm, respectively. The setup was controlled by a custom written LabView software (National Instruments, Austin, TX, USA). Time series of images were recorded at room temperature (about 19 ◦C) with a Hamamatsu Orca Flash 4V2.0 sCMOS camera (Hamamatsu Photonics, Hamamatsu City, Japan ), using an HCPL APO 63x/1.4 oil immersion objective (Leica Microsystems). With a 2 × 2 hardware camera binning, the size of the squared pixels was determined as 112.4 nm.

Rhodamine-labeled microtubules in Xenopus extracts were imaged with an exposure time of 250 ms, using the 561 nm excitation channel. To improve the contrast between microtubules and unbound fluorescent tubulin monomers, images were post-processed in Fiji: the images were filtered with a median filter (radius set to 0.7 pixels), and background fluorescence was removed using a rolling-ball algorithm with a 10-pixel radius (built-in function 'subtract background'). Subsequently, the colormap *mpl-viridis* was assigned to all fluorescence images.

In our SPT experiments, fluorescent polystyrene microspheres with a diameter of 40 nm (FluoSpheres NeutrAvidin-Labeled, F8771, Thermo Fisher Scientific, Dreieich, Germany) were used. In contrast to carboxylate surface coupling, neutravidin minimizes unspecific interactions with DNA and RNA complexes or negatively charged surfaces. For calibration measurements, 1:100 stock solutions in DNase/RNase-Free Distilled water (Invitrogen) or 1:20 stock solutions (for Xenopus egg extract experiments) were prepared. On average, about 200–400 fluorescent particles were observed in the field of view of the camera sensor (110 × <sup>70</sup> <sup>μ</sup>m2). For tracking, 2000 images were recorded with an exposure time of Δ*t* = 25 ms per frame, using the 491 nm excitation channel.

Particle positions were detected and linked to trajectories by the ImageJ/FIJI plug-in TrackMate [36]. As an input parameter for TrackMate, the diameter of fluorescent particles was estimated via the intensity profiles of 119 particles embedded in pure glycerol, yielding an average full width half maximum of the point-spread function of 2.9 ± 0.5 pixels. Particle tracking was performed using the Laplacian-of-Gaussian detection algorithm (diameter set to four pixels, threshold set to 50 ± 15 grey values and using sub-pixel localization). No additional filters were applied to the detected spots. Identified particle positions were linked with the simple linear assignment problem tracker adopted from [37]. Here, a maximum linking distance of three pixels was used and frame gaps were not allowed. The minimum trajectory length was set to *N* ≥ 50 positions and trajectories with a total displacement of less than one pixel were discarded.

Non-assigned detections were cleaned from the time series experiments; particle trajectories were exported as XML and converted to ASCII files for further processing in Matlab (Matlab 2018b, The Mathworks Inc., Natick, MA, USA). All statistical analyses of particle trajectories were performed with custom-written codes in Matlab that were prior checked for proper function by random walk simulation data. In our analyses, particle trajectories were clipped exactly to lengths of *N* = 70 or *N* = 150 time steps for better comparability within the ensemble (i.e., all shorter trajectories were discarded for the analysis). In total, our ensemble (=the number *M* of trajectories of a given length for a given condition) consisted of 1000–3000 trajectories (*N* = 70) and 150–600 (*N* = 150) for Xenopus extract experiments. For varying glycerol water mixtures, 1000–5000 (*N* = 70) and 75–1000 (*N* = 150) trajectories were available. The specific ensemble sizes are given in Tables 1 and 2.

#### *2.2. Xenopus Extract Preparation and Modification*

Cytostatic factor-arrested (CSF) cytoplasmic extracts were prepared from freshly laid *Xenopus laevis* eggs based on standard protocols [26,27,38]. In brief, eggs in the metaphase stage of meiosis II were collected, dejellied and packed into a centrifugation tube. The packed eggs were crushed and fractioned into three distinctive layers by centrifugation. The mid cytoplasmic layer was carefully isolated and supplemented with 10 μg/mL of

protease inhibitors, leupeptin, pepstatin and chymostatin (diluted in DMSO), 10 μg/mL of cytochalasin D, and ATP regeneration mix (190 mM creatine phosphate, 25 mM adenosine triphosphate, 25 mM MgCl2 and 2.5 mM K-EGTA at pH 7.7) that was diluted 1:50 to the extract. Finally, 0.35 μL from the 1:20 stock solution of fluorescent particles was added to 20 μL of the extract; optionally, pharmaceuticals for affecting microtubule structures were added. Microtubules were labeled by fluorescent tubulin (TL 331M, Cytoskeleton Inc., Denver, CO, USA) according to the manufacturer's protocol: 10 mg/mL stock solution of fluorescent tubulin dissolved in general tubulin buffer (80 mM PIPES, 0.5 mM EGTA and 2 mM MgCl2 at pH 6.9) with 1 mM GTP that was held on ice and added to Xenopus extracts to a working concentration of 50 μg/mL. The CSF extract was chilled immediately on ice and used within a maximum of two hours for the experiments.

Different chemicals were optionally added to Xenopus extracts to affect the microtubule integrity. Here, attention was paid to dilute the extract as little as possible. Preliminary investigations have shown that a total dilution of Xenopus extracts by up to 10% (due to the addition of beads or drugs) appears unproblematic and does not affect the microtubule structures observed. For all experimental conditions, the total volume added for modifying microtubules was balanced by the addition of distilled water to the otherwise untreated extract.

Nocodazole (Sigma-Aldrich, Munich, Germany) was used to depolymerize microtubules [18]. To this end, a stock solution of 10 mM dissolved in dimethylsulfoxide was diluted to a final concentration of 33.3 μM in Xenopus extract that was kept on ice for 10 min. Afterward, the extract was incubated for 15 min at 25 ◦C and then transferred to microscopy. Paclitaxel (in the remainder referred to as 'taxol', Sigma Aldrich, Germany) was used to stabilize the microtubules. It was added to the extract at a working concentration of 25 μM. Non-hydrolizable analogues of ATP (ATP*γ*S, Merck, Darmstadt, Germany) or GTP (GTP*γ*S, Merck, Darmstadt, Germany) were used to affect the turnover of chemical energy in the extract. To this end, ATP*γ*S and GTP*γ*S (stored as stock solutions of 25 mM for ATP*γ*S and 12.5 mM for GTP*γ*S in distilled water) were added to final concentrations of 500 μM and 250 μM to the Xenopus extracts.

#### **3. Results and Discussion**

#### *3.1. Calibration Experiments in Viscous Media*

To arrive at a proper baseline for our SPT experiments, we first tracked fluorescent beads with a radius of *R* = 20 nm in purely viscous media with varying viscosity (see Section 2 for details). In particular, we used here glycerol–water mixtures in the range of 70–90% (per weight) for which the viscosity values are known from the literature. Adding 8% of the bead stock solution to the total volume of these mixtures, viscosities in the range *η* ∈ [0.016, 0.096] Pa · s were probed by our SPT experiments. For consistency among the trajectories and for better comparison to subsequent experiments in Xenopus cell extracts, we fixed the trajectory length to *N* = 70 or alternatively to *N* = 150, bearing in mind that short trajectories may suffer from statistical fluctuations [39] while longer trajectories might show a bias for picking slower particles [15] that are easier to track without losing them (e.g., due to leaving the focal plane).

Two-dimensional trajectories **r**(*t*) = **r**1, ... ,**r***<sup>N</sup>* acquired at discrete times *t* = Δ*t*, 2Δ*t*, ... , *N*Δ*t* with frame time Δ*t* = 25 ms and a total measurement time *T* = *N*Δ*t* were first evaluated with their individual TA-MSDs, defined via the following:

$$
\langle r^2(\mathbf{r}) \rangle\_t = \frac{1}{N-k} \sum\_{j=1}^{N-k} \left( \mathbf{r}\_{j+k} - \mathbf{r}\_j \right)^2 \tag{1}
$$

where *τ* = *k*Δ*t* denotes the lag time. The resulting TA-MSDs of trajectories were fitted with a simple power-law as follows:

$$
\langle r^2(\tau) \rangle = 4K\tau^a \tag{2}
$$

in the range *<sup>τ</sup>* ∈ [0.05, 0.3] s by linear regression of log[*τ*] versus log[*r*2(*τ*)*t*]. Therefore, the generalized diffusion coefficient *K* becomes equivalent to the familiar diffusion constant, *D*, for normal Brownian motion (*α* = 1). In this case, the Einstein–Stokes relation predicts the diffusion constant to depend on particle radius *R* and medium viscosity *η* as follows:

$$D = \frac{k\_B T}{6\pi\eta R} \tag{3}$$

yielding predictions for diffusion constants in the range *<sup>D</sup>* ∈ [0.11, 0.67] <sup>μ</sup>m2/s for our SPT experiments in glycerol–water mixtures.

In line with our expectations for purely viscous fluids, we observed on average normal diffusion, i.e., *α* = 1, for all viscosities and trajectory lengths (see summary in Table 1), albeit the individual TA-MSD scaling exponents showing marked fluctuations in the range *α* ∈ [0.8, 1.2] (see Figure 1 for representative MSDs and the probability density function of scaling exponents, *p*(*α*)) due to the limiting statistics in TA-MSDs (see [39] for a detailed discussion). Therefore, the extracted diffusion coefficients *K* also showed marked fluctuations, and, yet again the average overall trajectories revealed a value *K* that compared favorably to the predicted values of *D* (cf. Table 1). This finding also indicates that particle radii are, on average, near to their declared and expected value, in line with previous findings [4]. Please note the slight but visible bias toward lower values of *K* for longer trajectories, indicating an unwanted bias toward slower particles that could be tracked over longer periods. Moreover, the amount of trajectories available for the analysis clearly increases for increasing viscosity, since slower-moving particles can be tracked easier. Overall, these calibration experiments demonstrate that normal diffusion with the anticipated mobility is found via SPT in purely viscous fluids. Deviations from unity in the (mean) scaling exponent of MSDs can therefore be taken as a clear signature of a significant anomalous diffusion.

**Table 1.** Summary of glycerol concentrations (weight percent) in glycerol–water mixtures, along with the respective viscosities *η*, and predicted diffusion constants *D*. Average scaling exponents *α* (found via fitting all TA-MSDs for trajectories of 20 nm radius particles as described in the main text, followed by averaging the individual values of *α*) are near to unity and mean diffusion coefficients *K* (also obtained by averaging the results for individual TA-MSDs) compare favorably to the predicted values of *D*. Result for trajectories with length *N* = 70 and *N* = 150 are given in upper and lower lines, respectively. The ensemble size of evaluated trajectories for the respective condition is given by *M*.


**Figure 1.** (**a**) Representative TA-MSDs for trajectories with length *N* = 70 (randomly chosen from the ensemble) from experiments in glycerol–water mixtures (red thin lines) and in Xenopus extract (black thin lines), together with the respective ensemble-averaged TA-MSDs (colored thick lines). For better visibility, data for calibration experiments have been shifted upward tenfold. The scaling for normal diffusion (*r*2(*τ*)*<sup>t</sup>* <sup>∼</sup> *<sup>τ</sup>*) is indicated by a red dashed line; vertical grey dashed lines indicate the fit region used to analyze individual TA-MSDs. (**b**) The PDF of anomaly exponents, *p*(*α*), as obtained from fitting TA-MSDs in glycerol–water mixtures features a mean *α* ≈ 1, irrespective of the trajectory length (black-grey histogram: *N* = 70, blue histogram: *N* = 150).

#### *3.2. Evaluation of Tracer Motion in Native Xenopus Extract*

As a next step, we explored the diffusive motion of the same fluorescent particles in Xenopus extracts (see Materials and Methods for details). Given that these extracts are complex and crowded fluids with an active biochemistry, we anticipated considerable differences to the simple glycerol–water mixtures. As before, we restricted ourselves to trajectory lengths *N* = 70 and *N* = 150, and used the TA-MSD and its ensemble average for a first characterization. Representative TA-MSDs are shown together with the ensemble average in Figure 1a.

Next, we fitted all TA-MSDs with Equation (2) to extract the respective scaling exponents, *α*, and generalized diffusion coefficients, *K*. Here, we tacitly assume that static and dynamic localization errors are negligible for our data; we will confirm this assumption below. Still, to soften any remnant influence of localization errors, especially for retrieving the scaling exponent *α*, we did not take the first point of TA-MSDs (at *τ* = Δ*t*) into account for fitting. Fitting was performed as in the calibration measurements.

Evaluating all TA-MSDs yielded probability density functions (PDFs) for the scaling exponent, *p*(*α*), and the generalized diffusion coefficient, *p*(*K*). Inspecting *p*(*α*) (shown in Figure 2a) reveals that the ensemble of trajectories shows, on average, a slight subdiffusive scaling of TA-MSDs with a mean *α* = 0.89. The width of the PDF (about ±0.2 around the mean) highlights, again, marked fluctuations between individual trajectories. In fact, similar fluctuations in the trajectory-wise values of *α* are expected already from mere statistical fluctuations due to fairly short trajectories (see [39] for discussion). Part of the width in *p*(*α*), however, may also reflect the spatially varying properties of the extract that is explored and reported on by different particles. Remarkably, trajectories of length *N* = 70 and *N* = 150 resulted in comparable PDFs and the same mean, i.e., longer trajectories were not biased toward lower scaling exponents. Notably, our observation of a subdiffusive motion of beads with 20 nm radius in Xenopus extracts is consistent with an earlier report [28] that reported scaling exponents in the range *α* ∈ [0.7, 0.95], with lower values emerging for larger particles (radii in the range 0.1–1 μm).

**Figure 2.** (**a**) The PDF of anomaly exponents, *p*(*α*), as obtained from fitting TA-MSDs in the interval *τ* ∈ [0.05, 0.3] s, features a mean *α* ≈ 0.9, irrespective of the trajectory length (black-grey histogram: *N* = 70, blue histogram: *N* = 150). The considerable width of the PDF may not only reflect statistical fluctuations but is likely to also report on spatially varying material properties of the Xenopus extract. Performing a bootstrapping approach with geometric averaging (black-open histogram) confirms the slightly subdiffusive motion of particles, while an arithmetic averaging (red histogram) overestimates the mean scaling exponent; see also main text for discussion. Please note the logarithmic *y*-axis. (**b**) The PDF of generalized diffusion coefficients, *<sup>p</sup>*(*K*), shown here versus the average area covered in one second, *<sup>K</sup>* <sup>×</sup> <sup>1</sup>*sα*, features an almost lognormal shape (indicated by full lines) for trajectory lengths *N* = 70 (grey/black) and *N* = 150 (blue), with a slight tendency for lower mobilities in longer trajectories. Please see the main text for discussion. (**c**) A scatter plot of trajectory-wise values of *α* and *K* (blue and grey symbols) highlights a correlation between these two quantities, in good agreement with results on simulated FBM trajectories with a Hurst coefficient *H* = *α*/2 = 0.45 (red symbols). The black dashed line is an empiric guide for the eye. FBM simulation data have been shifted upward fivefold for better visibility.

To further confirm and validate the significance of the mean exponent *α* ≈ 0.9, we exploited a bootstrapping approach [15]. Based on a total set of several hundred TA-MSDs, we randomly selected 100 trajectories and averaged these to a single, sub-ensemble averaged MSD from which we determined the scaling exponent *α*. This random drawing from the total set of TA-MSDs and the subsequent averaging was repeated 200 times to obtain a PDF for these scaling exponents. By construction, the width of this PDF can be expected to be much smaller [15], yielding a better estimate for the mean. Averaging over the subensemble (SE) of TA-MSDs was either done arithmetically (*r*2(*τ*)*t*SE), or by geometric averaging (exp[log(*r*2(*τ*)*t*)SE]). As a result, we found that the mean scaling exponent found by bootstrapping with arithmetic averaging was *α*SE,*<sup>a</sup>* = 0.98, which is consistent with normal diffusion. In contrast, geometric averaging yielded *α*SE,*<sup>g</sup>* = 0.90, in agreement with the mean of the raw PDF *p*(*α*) shown in Figure 2a. Since geometric averaging boils down to an arithmetic averaging of individual scaling exponents (due to the invoked logarithm), this result is, in fact, the more trustworthy approach for retrieving the average scaling exponent. Arithmetic averaging rather averages TA-MSDs with respect to their individual (and strongly fluctuating) diffusion coefficients, hence obscuring the mean scaling law and overestimating *α* (see [15] for another example). The difference in scaling obtained by the two averaging procedures may also be understood as a consequence of Jensen's inequality. The mapping *ϕ* : *α* → *t <sup>α</sup>* is convex for any choice of a real number *t*. Jensen's inequality then states that *t α* <sup>=</sup> *<sup>ϕ</sup>*(*α*) ≤ *ϕ*(*α*) <sup>=</sup> *<sup>t</sup> <sup>α</sup>*, in line with our observation.

To check for the influence of localization errors when retrieving the scaling exponent from individual TA-MSDs, we took the following approach: using immobilized beads, we determined the contribution of the static localization offset to be a positive additive constant of about 8 × <sup>10</sup>−<sup>4</sup> <sup>μ</sup>m<sup>2</sup> (i.e., 20 nm accuracy of positions), whereas the dynamic localization error for our frame time and diffusion coefficients amounts to a negative constant with modulus 0.008 μm2 or lower. Considering either of these two extreme values as additive constants to the power law Equation (2) while fitting TA-MSDs resulted in minor deviations of ±0.03 from the previously found value for *α*. We therefore conclude that our SPT data show mild yet significant subdiffusion. In the following paragraphs, we further corroborate this conclusion by additional measures.

Let us now focus on the PDF of generalized diffusion coefficients, *p*(*K*), retrieved from individual TA-MSDs (Figure 2b). To remove the ambiguity of units for varying scaling exponents (units of *<sup>K</sup>* are <sup>μ</sup>m2/s*<sup>α</sup>*), we report these PDFs as a function of *<sup>K</sup>* <sup>×</sup> <sup>1</sup> <sup>s</sup>*α*, which represents the typical area covered by the particle within a second. As indicated already in the discussion of the previous paragraph, individual values of *K* varied considerably between trajectories, resulting in a very broad, almost lognormal-shaped PDF. Again, statistical fluctuations as well as locus-specific mobilities are likely to contribute to the width of *p*(*K*). In contrast to the anomaly exponents, a slight bias for smaller diffusion coefficients was visible for longer trajectories, seen as a slight shift of the peak in *p*(*K*) between *N* = 70 and *N* = 150 (cf. Figure 2b). Still, in both cases, the typical area explored within a second matches roughly with our calibration experiments in highly viscous glycerol–water mixtures (cf. Table 1), albeit the mean scaling exponent is clearly different. Furthermore, a scatter plot of trajectory-wise values for *α* and *K* (Figure 2c) highlights a correlation of these two quantities. This finding is in line with results of FBM simulations (*M* = 2000 or *M* = 1000 two-dimensional trajectories of length *N* = 70 or *N* = 150, obtained via the circulant method [40] with *H* = *α*/2 = 0.45 and *K* = 0.4 μm2/s*<sup>α</sup>*; also shown in Figure 2c), which gives a first hint that an anti-persistent stochastic process underlies the acquired trajectories.

Let us briefly insert here an intuitive explanation of why a lognormal-like shape of *p*(*K*) is seen here and has also been reported frequently in the literature for other experiments (see [18,22,41–48] for examples). For the simplicity of the argument, we restrict ourselves to standard Brownian motion with Gaussian increment statistics and no memory kernel. Then, the diffusion constant *<sup>D</sup>* = *r*2(*τ*)*t*/(4*τ*), as retrieved from a TA-MSD fit, is the finite mean of positive, squared Gaussian random numbers, Δ*r*<sup>2</sup> (cf. Equation (1)). The associated PDF of *<sup>ϑ</sup>* = <sup>Δ</sup>*r*2/Δ*r*2 follows a special variant of the gamma distribution, known as Porter–Thomas distribution [49], *p*(*ϑ*) = exp{−*ϑ*/2}/ <sup>√</sup>2*πϑ*. Similar PDFs, featuring a power law that is cut by an exponential, have been observed, for example, for blinking quantum dots [50]. For power-law PDFs with a cutoff, it is known that a finite sum (or average) of random numbers will only slowly approach a Gaussian PDF, the hallmark of the central limit theorem. Indeed, numerically drawing *N* random numbers from the Porter–Thomas PDF and averaging (summing) them as *x* = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *ϑi*/*N* yields a PDF *p*(*x*) with a nonzero skewness ∼ 1/ <sup>√</sup>*<sup>N</sup>* that resembles a lognormal PDF. Hence, even mere statistical reasons can lead already to an apparently lognormal PDF of diffusion constants if trajectories are short enough. Varying mobilities, encountered by particles in different spatial positions of a heterogeneous sample, broaden the PDF even further. Please also note that applying a logarithmic transformation is basically only one variant of the more general class of Box–Cox transformations *x* <sup>→</sup> (*x<sup>λ</sup>* <sup>−</sup> <sup>1</sup>)/*<sup>λ</sup>* [51], namely the one for *<sup>λ</sup>* <sup>=</sup> 0. These transformations were introduced as a means to symmetrize a given data set, eventually yielding a Gaussian-shaped PDF of the transformed data when choosing the optimal *λ*. Restricting the choice to *λ* = 0, i.e., simply logarithmizing the data, therefore will reduce the skewness in virtually all practical cases but may not yet completely symmetrize the data, leaving a residual non-zero skewness. As a consequence, any skewed data set will assume a more Gaussian shape upon applying a logarithmic transformation, although this mere statistical approach does, in general, not yet reveal the reason for the skewness and apparent lognormal PDF of the original data. In particular, inferring from an apparent lognormal shape of the PDF that the underlying data is the product of independent, identically-distributed variables may not be a compelling conjecture.

Coming back to our experimental data, we next explored whether the underlying random walk process is Gaussian. To this end, we considered the step increments (*δxi*, *δyi*)=(*xi*+*<sup>n</sup>* − *xi*, *yi*+*<sup>n</sup>* − *yi*) taken within a period *δt* = *n*Δ*t*. These follow a Gaussian PDF if the process is a simple FBM process. Recently, however, deviations from a Gaussian PDF were reported [14,15,52,53], highlighting heterogenous diffusion characteristics that could be rationalized by random walks with spatiotemporally fluctuating transport coefficients [54,55] and/or by systems with spatial disorders [56,57]. To account for the

strongly fluctuating diffusion coefficients, we normalized the step increments *δx* and *δy* of each trajectory by the respective root-mean-square values. The resulting set of normalized steps did not exhibit significant differences between x- and y-coordinates. We therefore combined both into a single set of normalized increments, *χ*, which resulted in a symmetric PDF so that inspecting *p*(|*χ*|) was sufficient.

Data for different *δt* show overall a very good agreement with a standard Gaussian and are incompatible with a simple exponential (Figure 3), indicating that no major diffusion heterogeneity is present in our SPT data from Xenopus extracts. For |*χ*| > 3 and *δt* ≥ 5Δ*t*, the experimental PDF falls slightly below the Gaussian benchmark for unknown reasons. Despite this slight deviation, it appears fair to conclude that the trajectories emerged from a mildly subdiffusive Gaussian process, suggesting that FBM is the most likely model that describes our experimental data.

**Figure 3.** The PDF of normalized increments taken within a period *δt*, shown here as *p*(|*χ*|), complies well with a standard Gaussian (black full line) for different choices of *δt* (color-coded symbols). For *δt* ≥ 5Δ*t* and |*χ*| > 3, consistently lower probabilities than the Gaussian benchmark are observed for unknown reasons.

To follow up on this hypothesis and probe the existence of a non-trivial memory kernel in our experimental data, we employed the ensemble- and time-averaged velocity autocorrelation function (VACF) of each trajectory, defined as follows:

$$\mathcal{C}(\mathbf{r}) = \left\langle \frac{\langle \mathbf{v}(t)\mathbf{v}(t+\mathbf{r})\rangle\_t}{\langle \mathbf{v}(t)^2\rangle\_t} \right\rangle\_E \,'\, \tag{4}$$

with **v**(*t*)=[**r**(*t* + *δt*) − **r**(*t*)]/*δt* denoting the instantaneous velocity that is simply the two-dimensional step **r**(*t* + *δt*) − **r**(*t*) taken within integer multiples of the frame time, *δt* = *n*Δ*t*.

It is convenient to rescale the lag time *τ* = *k*Δ*t* with *δt*, yielding a dimensionless time *ξ* = *τ*/*δt* = *k*/*n*. For FBM, an analytical prediction for the VACF was derived [5,35]:

$$\mathcal{L}\_{\text{FBM}}(\xi) = \left\{ (\xi + 1)^{a} + |\xi - 1|^{a} - 2\xi^{a} \right\} / 2 \,. \tag{5}$$

The fact that the VACF does not depend on *n* and *k* but only on the ratio *ξ* = *k*/*n* reflects the self-similarity of FBM processes. Localization errors in SPT experiments can break this self-similarity [58,59], i.e., using different *δt* for rescaling *τ* to *ξ* leads to progressive deviations from Equation (5). In fact, very recently, the VACF was shown to be a sensitive reporter for detecting localization errors for FBM from the sub- to the superdiffusive regime [60], as even small localization errors lead to significant changes of *C*(*ξ* = 1) at different choices of *δt*. In our case, however, rescaling with different *δt* lead to an almost perfect collapse of all data to the master curve predicted by Equation (5), see Figure 4. In fact, this finding is in favorable agreement with the earlier rheology results on Xenopus extracts that revealed a significant viscoelastic response [11,28], linking the anti-persistent dip in the VACF to a viscoelastic memory kernel of the medium. Moreover, the good agreement with Equation (5) for all choices of *δt* confirms our previous notion that localization offsets (to which VACFs are very sensitive) are negligible for our data.

**Figure 4.** The normalized VACF, *C*(*ξ*), for different choices of *δt* (color-coded symbols) shows excellent agreement with the FBM prediction [Equation (5)] when inserting the mean scaling exponent *α* = 0.9 (full black line). In particular, a clearly negative value of *C*(*ξ* = 1) confirms an antipersistent random walk, most likely of the FBM type. No significant changes of the VACF minimum are seen for different *δt*, confirming that trajectories are not plagued by localization errors.

As a further piece of evidence that the mild subdiffusion seen for particle motion in Xenopus extracts is due to an antipersistent FBM process, we probed the power-spectral density (PSD) of individual trajectories:

$$S(f) = \frac{1}{T} \left| \int\_0^T \boldsymbol{\epsilon}^{if} \boldsymbol{x}(t) dt \right|^2 + \left| \int\_0^T \boldsymbol{\epsilon}^{if} \boldsymbol{y}(t) dt \right|^2 \tag{6}$$

and the corresponding ensemble average, *S*(*f*)*E*. A wealth of analytical information is available for PSDs and their trajectory-wise fluctuations [61,62]. Alerted by the observations that arithmetic and geometric averaging can perturb power-law effects in the ensemble of trajectories (cf. above) we aimed at softening the influence of the grossly varying diffusion coefficients *K* in the subsequent analysis. Therefore, we normalized all trajectories by their respective root-mean-square step length within successive frames, in line with the approach taken when probing the Gaussian shape of the statistics of increments (cf. context of Figure 3).

As expected, the ensemble-averaged PSD of these normalized trajectories (for *N* = 70 and *<sup>N</sup>* <sup>=</sup> 150) followed the analytical prediction *<sup>S</sup>*(*f*) <sup>∼</sup> 1/ *<sup>f</sup>* <sup>1</sup>+*α* around which PSDs of individual trajectories fluctuated to a considerable extension (Figure 5). These fluctuations encode another important hallmark of FBM via the coefficient of variation, defined as *γ*(*f*) = *σ*/*S*(*f*)*<sup>E</sup>* with *σ*(*f*) denoting the standard deviation of trajectory-wise PSDs. For FBM, asymptotic values *<sup>γ</sup>* <sup>=</sup> 1 for subdiffusion and *<sup>γ</sup>* <sup>=</sup> <sup>√</sup>5/2 for normal diffusion were predicted and verified before [61,62]. To calculate the coefficient of variation for our data, we randomly drew 1000 curves from the ensemble of one-dimensional TA-PSDs for the *x*and *y*-direction and removed those 5% of TA-PSDs with the largest deviations from the ensemble-averaged PSD. The resulting values for *γ* fully complied with the FBM predictions (Figure 6). Normally diffusive trajectories from calibration experiments converge toward the predicted value *<sup>γ</sup>* <sup>=</sup> <sup>√</sup>5/2, whereas subdiffusive trajectories clearly assume lower values that eventually converge to the universal unity value for large frequencies.

**Figure 5.** The PSD of individual trajectories (black and blue thin lines, representing trajectories with length *N* = 70 and *N* = 150, respectively) fluctuate around the ensemble-averaged PSD (thick colored lines). In both cases, the FBM prediction for a scaling *<sup>S</sup>*(*f*) <sup>∼</sup> 1/ *<sup>f</sup>* <sup>1</sup>+*α* (with *α* <sup>=</sup> 0.9, dashed line) are nicely met. For better visibility, data for *N* = 150 have been shifted upward 100-fold.

**Figure 6.** The coefficient of variation of individual PSDs with respect to the ensemble mean, *γ*(*f*), for normally diffusive trajectories from calibration experiments (red line) clearly assumes higher values than those for trajectories from the Xenopus extract (blue and black lines), irrespective of the trajectory length, *N*. As predicted for FBM, these subdiffusive SPT data converge toward *γ* = 1, whereas normally diffusive data from calibration experiments converge to the predicted value *<sup>γ</sup>* <sup>=</sup> <sup>√</sup>5/2. Both are clearly distinct from the prediction for superdiffusive FBM motion, *<sup>γ</sup>* <sup>=</sup> <sup>√</sup>2. For convenience, frequencies *f* were made dimensionless by multiplication with the total time *T* = *N*Δ*t* covered in each trajectory.

Altogether, we conclude from the analyses of our SPT data that beads with 20 nm radius feature a mild antipersistent subdiffusion in Xenopus extract with a mean scaling exponent *α* = 0.9. Typical signatures of an FBM with a Hurst coefficient *H* = *α*/2 = 0.45 are clearly visible, suggesting that the viscoelasticity of the extract determines these distinct random-walk properties.

#### *3.3. From Native to Pharmaceutically Treated Xenopus Extracts*

To explore to what extent the observed subdiffusion is altered when challenging biochemical processes in the extract, we also performed single-particle tracking experiments after applying pharmaceuticals to the extract. In particular, we applied taxol or nocodazole to either stabilize or completely disrupt the microtubule filaments (see Materials and Methods for details). Typical fluorescence images of the extract (stained for the beads and the microtubule filaments) highlight the strong differences between untreated and chemically challenged extracts (Figure 7). While untreated extracts feature some microtubule filaments that might obstruct the free diffusion of beads, the addition of nocodazole completely erradicates these structures, potentially resulting in a decreased obstruction of bead motion. In contrast, the addition of taxol stabilizes microtubules and, therefore, even enhances the gel-like geometry within the extract.

**Figure 7.** Representative fluorescence images of beads (upper panel) and microtubules (lower panel) in native and pharmaceutically treated Xenopus extracts (see Materials and Methods for details); scale bars indicate 10 μm. While native extracts feature a significant amount of microtubule filaments (left column), the addition of nocodazole completely eradicates these higher-order structures (right column). In contrast, stabilizing microtubules by taxol further enhances the 'filament jungle' (middle column).

Somewhat unexpectedly, however, altering the microtubule filament array had, on average, only minor effects (Table 2). While disrupting microtubules had, on average, no significant effect at all (with *α* and *K* being almost unchanged), the addition of taxol induced a slight enhancement of the subdiffusion, i.e., a lower *α*, in line with the notion that increased density of filaments may further hamper the beads' free diffusion. Still, the effect is fairly small when bearing in mind that scaling exponents around and below *α* ≈ 0.5 were reported already for similar sized particles in the comparable cytoplasm of living cells [6,11,15]. Our findings, therefore, indicate that mainly macromolecular crowding of the fluid on length scales 1 μm, which induces a viscoelastic memory kernel, underlies the observed subdiffusion. Higher-order structures, such as cytoskeletal assemblies or endomembranes, appear to be less important for the observed (sub)diffusion of beads in Xenopus extract.


**Table 2.** Summary of results found for different conditions of Xenopus extracts. Data for trajectories with length *N* = 70 and *N* = 150 are given in upper and lower lines, respectively. The ensemble size for the respective condition is given by *M*.

To complement these insights, we also applied non-hydrolizable analogues of ATP and GTP to prevent non-equilibrium processes that are fueled by these nucleotides (see Materials and Methods for details). In this case, the extract became very heterogeneous with tracking results from separated loci differing strongly (from total immobilization up to normal diffusion). Removing the immobilized tracks, the average behavior was in reasonable agreement with our findings for untreated and taxol-treated extracts. This finding suggests that (apart from immobilization loci) the ATP- and GTP-dependent processes are also of little importance for the motion of small beads. As a caveat, we would like to emphasize, however, that all of these findings might be subject to change when larger particles or different surface properties are considered, as these might interact differently with macromolecules and higher-order structures. In support of this statement, we would like to refer the reader to previous measurements on the diffusion of quantum dots in the cytoplasm of mammalian cells, where strong variations in the mobility and apparent diffusion anomaly were observed upon varying the particles' surface chemistry [63]. In fact, understanding anomalous diffusion in complex media at non-equilibrium conditions, e.g., in the crowded interior of living cells, is still a major challenge (see [64] for a recent study).

#### **4. Conclusions**

In summary, we have shown here that beads with 20 nm radius explore cell extracts from eggs of *Xenopus laevis* by mild subdiffusion that bears all properties of a FBM random walk. This mode of motion is largely conserved when treating the extract with pharmaceuticals that alter microtubule filaments or ATP/GTP-dependent processes. Therefore, the emergence of subdiffusion is most likely a consequence of the extracts' viscoelasticity that is induced by a high degree of macromolecular crowding, albeit changes in the beads' surface chemistry might also enhance or lower the diffusion anomaly due to transient and unspecific interactions with larger structures in the fluid [63]. In any case, given that even mild subdiffusion was predicted and observed to significantly alter biochemical reactions (see, e.g., [65–67]) our data provide a helpful clue for a deeper understanding of self-organization and pattern formation processes in cell extracts.

**Author Contributions:** Experiments and analyses: K.S.; concept, design and supervision of the study: M.W. Both authors wrote the manuscript and have read and agreed to the published version of the manuscript.

**Funding:** VolkswagenStiftung (Az. 92738), EliteNetwork of Bavaria (Study Program Biological Physics).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available from the corresponding author upon reasonable request.

**Acknowledgments:** Financial support by the VolkswagenStiftung (Az. 92738) and by the EliteNetwork of Bavaria (Study Program Biological Physics) are gratefully acknowledged. We thank B. Neumann and O. Stemmann (University of Bayreuth, Genetics) for providing Xenopus eggs and M. Thampi, S. Krauss, P.-Y. Gires, and A. Hanold for extract preparation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Information-Efficient, Off-Center Sampling Results in Improved Precision in 3D Single-Particle Tracking Microscopy**

**Chen Zhang and Kevin Welsher \***

Department of Chemistry, Duke University, Durham, NC 27708, USA; chen.zhang2@duke.edu **\*** Correspondence: kdw32@duke.edu

**Abstract:** In this work, we present a 3D single-particle tracking system that can apply tailored sampling patterns to selectively extract photons that yield the most information for particle localization. We demonstrate that off-center sampling at locations predicted by Fisher information utilizes photons most efficiently. When performing localization in a single dimension, optimized off-center sampling patterns gave doubled precision compared to uniform sampling. A ~20% increase in precision compared to uniform sampling can be achieved when a similar off-center pattern is used in 3D localization. Here, we systematically investigated the photon efficiency of different emission patterns in a diffraction-limited system and achieved higher precision than uniform sampling. The ability to maximize information from the limited number of photons demonstrated here is critical for particle tracking applications in biological samples, where photons may be limited.

**Keywords:** 3D single-particle tracking; Fisher information; non-uniform illumination

#### **1. Introduction**

Single-particle tracking (SPT) [1] has led to numerous advances in unveiling sophisticated intracellular biophysical events, including diffusion of membrane proteins [2], transportation of intracellular vesicles [3], and viral internalization events [4,5]. Despite the tremendous progress made in the field, there are limits to what can be gleaned from singleparticle trajectories by the intrinsic localization precision. In a conventional microscope, this limit is given by *σ*/ (*N*), where *σ* is the size of the microscope's point-spread function (PSF) and *N* is the number of photons collected [6,7]. The diffraction limit dictates the size of the PSF, so increasing the number of photons collected per localization is typically the only method available for increasing precision. While the use of artificial particles [8] can produce a higher flux of photons and improve localization precision, conventional organic fluorophores and fluorescent proteins remain essential in biophysical studies. These probes can only yield a finite number of photons before undergoing irreversible photobleaching, so it is crucial to maximize the information available from this limited number of photons. In a typical particle tracking experiment, the particle is uniformly illuminated as, typically, the emitter's position is not known at the outset of the experiment. An under-explored avenue for increasing precision is adjusting the excitation pattern around the emitter to get beyond the localization limit described above. This type of advance is only possible if the particle position is known a priori, at least to some degree of certainty. Recent efforts have focused on improving localization precision through non-uniform illumination in the context of a super-resolution technique [9]. Gallatin et al. proposed a globally optimized strategy in which the particle should be sampled at the maximum of the first-order derivative of the square-root of the intensity [10]. This theory indicated that the optimized sampling pattern is non-uniform, and the particle position should not be directly sampled, but did not provide experimental support. Both of these works suggest that non-uniform illumination shows promise for improved localization with a limited number of photons.

This study builds upon 3D single-molecule active real-time tracking microscopy (3D-SMART) [11], a previously introduced real-time 3D single-particle tracking (RT-3D-SPT)

**Citation:** Zhang, C.; Welsher, K. Information-Efficient, Off-Center Sampling Results in Improved Precision in 3D Single-Particle Tracking Microscopy. *Entropy* **2021**, *23*, 498. https://doi.org/ 10.3390/e23050498

Academic Editors: Janusz Szwabi ´nski and Aleksander Weron

Received: 21 March 2021 Accepted: 19 April 2021 Published: 22 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

system [12], by utilizing the emerging concept of non-uniform illumination to improve localization precision significantly. A 2-fold increase in precision was observed in both 2D (XY-plane) and 1D (Z-axis) localization, as has previously been noticed in several superresolution-based methods [13,14]. Unlike existing methods that require sophisticated PSF engineering or specific materials, here, we have achieved higher photon efficiency in 3D localization with a diffraction-limited point-scanning confocal microscope.

In this work, we investigate the potential of non-uniform illumination in the context of real-time 3D single-particle tracking (RT-3D-SPT) [1]. Developed by several groups over the past decades [15–22], RT-3D-SPT uses active feedback to keep a single particle at the center of the microscope objective's focal volume. In 3D-SMART, the laser spot is guided to sample the XY-plane following a Knight's Tour pattern, while simultaneously sampling along the Z-axis following a sine wave, creating a scanning volume (Figure 1a–c) that samples the vicinity of the particle of interest in an approximately uniform manner (Figure 1d). Since the particle is held stationary in the lab frame, a priori information about the particle position is available during the measurement. The question then occurs: If a priori information is available regarding the particle's position, can the above-described precision limit be surpassed? We explore this potential in the following work. By examining the expected information extracted from photons collected at various locations relative to the particle center, we show that off-center sampling leads to dramatically increased precision compared to uniform illumination schemes. We then demonstrate that this photon-efficient sampling can be applied in practice using a 3D patterned laser spot. The sampling density of an off-center, information-efficient pattern in 3D is shown in Figure 1e.

**Figure 1.** (**a**) Complete laser scanning volume with a dimension of 1 × 1 × 2 μm implemented in 3D-SMART; (**b**) The laser is scanned in a Knight's Tour pattern in the XY-plane using an EOD; (**c**) Along the Z-axis, a TAG lens is used to drive the focus in a sinusoidal pattern. The color bar indicates photon arrival density along the Z-axis; (**d**) Scanning density in the volume when sampling in the default pattern shown in (**b**,**c**); (**e**) Scanning density of a 3D information-efficient pattern, with the XY-plane, scanned in an off-center, information-efficient pattern. The Z-axis is scanned with a sine wave with the laser power unmodulated.

#### **2. Theory**

The probability density of observing a photon from an emitter in a microscope in one dimension is approximated by:

$$f(\mathbf{x}|\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{-(x-\mu)^2}{2\sigma^2}} \tag{1}$$

where *μ* is the particle position and *σ* is related to the width of the PSF. The Gaussian distribution is a good approximation for the actual diffraction pattern of a point-source, which is an Airy function [23]. The size of the PSF is ultimately limited by diffraction and is not user-adjustable. The prefactor 1/ √ 2*πσ*<sup>2</sup> is a normalization factor and will be neglected for the rest of this discussion. The Fisher Information (FI) is a statistical measure employed to quantify the amount of information expected when estimating a parameter of the underlying distribution [24]. The FI (*J*) is inversely related to the precision and is given by:

$$J(\theta) = E\_{\theta} \left[ \left( \frac{\partial}{\partial \theta} \ln f(\mathbf{x}|\theta) \right)^{2} \right] \tag{2}$$

Here, *θ* is a parameter (or vector of parameters) of the underlying distribution, and *E<sup>θ</sup>* is the expectation value. The FI for Equation (1) above is:

$$J(\mu) = \int\_{\mathcal{X}} f(\mathbf{x}|\mu) \frac{(\mathbf{x} - \mu)^2}{\sigma^4} d\mathbf{x} \tag{3}$$

Taken over all space, the expectation value above yields a constant value of *J*(*μ*) = <sup>1</sup> *σ*2 , the average amount of FI contributed by each observed photon. The expectation is replaced by the integral in this expression. It is also noticeable that *σ* will be the constant prefactor upon integration and does not affect the solution. For a total of *N* photons, the FI is simply *J*(*μ*) = *<sup>N</sup> <sup>σ</sup>*<sup>2</sup> . The FI is the inverse of the expected variance, so it is straightforward to see that this is simply the limit of localization precision (*σ*/ <sup>√</sup>*N*) described above. However, this is only the average value, and photons collected from certain parts of the distribution contribute more to the overall FI than others. To see this, we can more closely examine the integrand above.

$$f(\mathbf{x}|\mu) \frac{\left(\mathbf{x} - \mu\right)^2}{\sigma^4} \tag{4}$$

This integrand, which we will refer to as the "information density", is plotted in Figure 2 to show the contribution of observed photons versus *x* − *μ* (so the origin is the particle position). From Figure 2, it is easily seen that there are three critical points. There is a minimum in the information density at *x* = *μ*, and two maxima at *x* = *μ* ± *σ* <sup>√</sup>2. The minimum at *x* = *μ* contributes zero FI, meaning that photons collected precisely from the particle center yield no information on the particle position. The maxima indicate that photons collected from the off-center positions yield the most FI. In a typical imaging experiment, which employs uniform illumination, the above analysis is not applicable. First, the particle's location is unknown, so it is impossible to collect photons only from specific areas around the particle's position. Second, while the photons collected exactly from the particle center yield zero FI, there is no downside to collecting them if unlimited photons are available. However, there are experimental conditions under which it is possible, and even desirable, to tailor the excitation pattern. For single-molecule tracking or super-resolution imaging, a finite number of photons can be extracted from each molecule before irreversible photobleaching occurs. In these experiments, it is therefore beneficial to collect photons from high information areas only, if possible. That being said, this approach is typically impossible because there is no a priori information of the particle position. This is where the a priori information regarding the particle position, available from active-feedback single-molecule tracking, comes in.

In the following section, we survey three-dimensional laser scanning patterns to identify the most information-efficient sampling patterns. We start with a discussion of applying this sampling along the XY-plane and Z-axis separately, followed by a discussion of achieving isotropic information-efficient sampling in all three dimensions.

**Figure 2.** Demonstration of the proposed information-efficient sampling in a single dimension. The green dots represent the experimentally observed intensity of a bead along the X-axis, and the green line shows the Gaussian fit of the PSF. The red line shows the information density of the PSF. The orange-shaded areas indicate photons with high information density.

#### **3. Results**

#### *3.1. Identification of a Photon-Efficient Sampling Pattern in the XY-Plane*

In the RT-3D-SPT system reported by Hou et al. [12], a focused laser spot is guided by an electro-optic deflector (EOD) to scan a 5 × 5 grid with 250 nm between adjacent pixels in the XY-plane (Figure 3a). Each pixel is sampled for 20 μs in a scanning cycle. The 5 × 5 grid, typically scanned in a Knight's Tour pattern, ensures a uniform illumination pattern. However, the FI-based analysis above suggests that off-center positions should be selectively sampled to achieve the highest photon efficiency. To obtain an unbiased search for photon-efficient patterns, we evaluated subsets of the default 5 × 5 pattern. The 25 pixels were divided into six different groups of inequivalent pixels based on their distance to the scan center (Figure 3b). Immobilized fluorescent beads distributed in PBS buffer were then scanned using the default 5 × 5 EOD pattern in the XY-plane. A piezoelectric stage was then used to step the particle position in 20 nm increments through the center of the scan area. All 62 possible combinations of inequivalent patterns (Figure S1) were tested to identify the most photon-efficient sampling pattern. In each experiment, photons were collected from each of the 25 pixels, but only photons obtained from pixels in a given pattern were used to perform data analysis.

For each different sampling pattern, the particle position was estimated using maximum likelihood estimates (MLEs). In MLEs, the likelihood (*L*) for a position estimate (*μ*) from an arbitrary number of photons (*N*) is defined as the product of probability density of photon arrival positions based on a given model *f* which is a function of *μ*:

$$L(\mu X) = \prod\_{i=1}^{N} f(x\_i \mu) \tag{5}$$

Here the model *f* is a Gaussian distribution described in (1), *X* refers to the set of arrival positions *xi*(*i* = 1, 2 . . . *N*) of photons used for estimation. The best estimate for the particle position *μ* is obtained when *L* is maximized. A detailed example of MLE is shown in Figure S2. Localization precision at different numbers of photons of each estimation is shown in Figure S3. In this study, 2000 consecutively collected photons were used for each position estimate unless otherwise stated.

MLE analysis was performed on immobilized 190 nm fluorescent beads that were stepped in 20 nm intervals over a 100 nm range for the XY-plane. The average standard deviation of MLE positions at different stage positions was used to quantify the precision (Figure 3d–f). It was observed that the MLE positions were proportional, but not exactly equal, to the expected particle positions. Off-center sampling generally results in an overestimation of the actual particle motion, compared to uniform sampling which generally yields realistic position estimation. An underestimation of the actual particle motion is associated with sampling patterns in which the center pixel is oversampled compared to uniform sampling. This observation was validated by a simulation of sampling a 2D

Gaussian emitter (Figure S4). Calibration was performed to account for these differences in response to the same changes in particle position (Figure S5).

**Figure 3.** (**a**) 5 × 5 Knight's Tour scanning pattern; (**b**) Inequivalent pixels based on distance from the scan center; (**c**) 4-pixel 4-Corners scan, which uses only pixels from pattern 3 in (**b**), that was found to have the highest precision; (**d**–**f**) MLE positions versus stage positions obtained with the default 5 × 5, 4-Corners, and 4-Corners plus center pixel scan patterns. The average standard deviation of the estimated positions was measured to be 5.5, 2.8, and 4.4 nm, respectively. The orange line shows the relative stage position. The dark and light blue lines show the estimated position calculated for every 2000 photons. The different shades of blue indicate estimation based on two different stage steps, 20 nm apart; (**g**) Precision obtained from selectively using photons obtained from pixels in different inequivalent patterns (pattern 2–6 in 4b) alone (blue), specific pattern plus center pixel (red), and default 5 × 5 pattern (orange).

Interestingly, though we did not propose an a priori model based on Fisher information, the unbiased search resulted in an off-center, FI-efficient pattern. One 4-pixel pattern, which we refer to as the 4-Corners pattern (Figure 3c), yielded the highest precision of 2.6 ± 0.3 nm, compared with the default pattern (where all 25 pixels are sampled) which

gave 4.5 ± 0.3 nm precision (Figure 3d,e). In both cases, localization was performed using 2000 photons. Additional data were obtained with the laser excitation matching the 4-Corners pattern to validate that selectively using specific photons in data post-processing was equivalent to sampling with the designated pattern. The resulting precision was 2.8 ± 0.4 nm, in excellent agreement with the post-acquisition processed data above. The effect of the number of photons on the relative advantage of the 4-Corners pattern was also investigated, showing a doubling in precision for various values of *N* (Figure S6). Notably, when the center pixel is added back to the 4-Corners pattern, the precision is nearly two-fold worse than the 4-Corners pattern alone and comparable to the full 25-pixel scan (4.7 ± 1.1 nm vs. 4.6 ± 0.8 nm, Figure 3f). Complete 2D trajectories of Figure 3d–f are shown in Figure S7. We note here that different sampling patterns yield different emission rates for the same laser power. For example, the default 5 × 5 pattern gives a roughly 1.5-fold increase in intensity compared to the 4-Corners pattern, but the 4-Corners pattern still exhibits doubled precision when sampling with equal bin time (Figure S8). A more thorough investigation of patterns consisting of inequivalent pixels No. 2–6 (Figure 3b) with or without the center pixel showed that precision obtained with the center pixel was always worse than without (Figure 3g). The poor precision upon sampling the center pixel shows the importance of not sampling low FI areas around the particle.

#### *3.2. Laser Modulation in Z-Axis*

We then proceeded to achieve photon-efficient sampling along the Z-axis. A Tunable Acoustic Gradient (TAG) lens [25,26] was used to create custom illumination patterns along the axial direction. The TAG lens deflects a focused laser spot in a sine wave with an amplitude of ~1 μm around the focal plane at a frequency of ~70 kHz (Figure 1c). The following mathematical relation describes the probability density of this sine wave with an amplitude of 1:

$$f(z) = \frac{1}{\pi\sqrt{1 - z^2}}\tag{6}$$

The probability density has a distribution where most probability is piled up at the edges (top and bottom) of the scanning volume (Figure 1d). Photon arrivals obtained by scanning an immobilized fluorescent bead with a non-modulated, continuous wave (CW) laser spot confirm this distribution (Figure 4a). According to the theory described above, photons collected at a certain distance away from the center contain the highest information density and lead to the most efficient sampling. However, using CW laser modulation, many photons are still collected from the center of the scanning volume (where the particle spends most of its time). Real-time modulation of the laser intensity was applied to shift the photon arrival distribution away from this low information density area. To do so, the frequency and phase of the TAG lens were captured by a field-programmable gate array (FPGA, NI-7852R). A digital signal with the same frequency and adjustable phase delay was sent to a lock-in amplifier (SR850, Stanford Research Systems) and then to a multiplier circuit to double the original frequency. The frequency-doubled signal was then used to modulate the laser's power, creating custom illumination patterns along the Z-axis. A detailed illustration of how the signal was processed in the system is shown in Figure S9. Phase delays between 0◦ and 90◦ were tested. Figure 4 shows photon arrival distribution from immobilized 190 nm fluorescent beads sampled at each of the various conditions (CW, in-phase modulation, out-of-phase modulation). At 0◦ phase delay, photon arrivals occurred at the imaging volume center (in-phase modulation, Figure 4b). When the delay was 90◦, (out-of-phase modulation) photon arrivals were clustered at the edges, with a minimal number of photons at the center (Figure 4c). Complete trajectories of data shown in Figure 4d–f are shown in Figure S10.

**Figure 4.** (**a**–**c**) Laser intensity versus laser position in Z-axis with photon arrival distribution of 10,000 photons from imaging a 190 nm fluorescent bead with TAG lens scanning in unmodulated (continuous wave) or modulated power (no XY scanning). Note that the scanning rate of the TAG lens was held constant at 70 kHz; (**d**–**f**) Estimated particle position versus stage position obtained from step tests. Each position estimation was based on 2000 photons. The average standard deviation of estimated positions at each stage position in (**d**) CW, (**e**) in-phase modulation, and (**f**) out-of-phase modulation from the partial trajectories shown in the figures was measured to be 17.6, 50.3, and 5.9 nm, respectively.

Upon performing step tests and data calibration similar to the XY scanning above, the average precision of particles scanned with CW, in-phase modulation, and out-of-phase modulation were 18.2 ± 2.3, 52.0 ± 24.2, and 9.2 ± 1.3 nm, respectively (Figure 4d–f). These results reaffirmed that off-center sampling (out-of-phase modulation) is the most information-efficient along the Z-axis, as precision was nearly doubled compared with CW power (9.2 vs. 18.2 nm). In-phase modulation, which only samples near the particle center, yielded very poor position estimates.

#### *3.3. Determination of Optimized Sampling Parameters in 3D*

Photon-efficient sampling patterns with the ability to localize in all three dimensions were determined by step tests similar to those described above. Step tests with the full 25 XY pixels and CW laser modulation along Z were first performed as a reference. These yielded precisions of 9.3 ± 0.8 nm in X and 22.2 ± 1.6 nm in Z (*n=5*). Step tests were then performed with the 4-Corners EOD pattern and CW laser, which gave average precisions of 6.8 ± 0.6 nm in X and 19.6 ± 1.7 nm in Z, confirming the advantage of information-efficient off-center sampling. A comparison of step tests in X obtained with the default 5 × 5 pattern and the 4-Corners EOD pattern with TAG lens operating in CW power is shown in Figure 5. These results again confirm the importance of not collecting photons from the center of the 3D volume. The magnitude of the EOD scale (size of each pixel) and amplitude of TAG lens scan that gave the highest precision in X and Z were found to be 200 nm and 30% TAG lens amplitude (FWHM = 1.93 μm), respectively (Figure S11). Apart from the amplitude of the TAG lens, the laser power modulation pattern also altered the precision. When imaging with the 4-Corners pattern and out-of-phase modulation, the Z precision was high (12.6 ± 0.4 nm), but the X precision was low (22.8 ± 3.1 nm). Inversely, the 4-Corners pattern and in-phase modulation resulted in high X precision (4.7 ± 0.2 nm) and low precision in Z (36.0 ± 5.0 nm). Photon arrival distribution along the Z-axis of 190 nm beads sampled with the 4-Corners pattern and modulated power at different phase delay is shown in Figure S12. It is noticeable that out-of-phase modulation in axial-only scanning resulted in a "bi-plane" distribution (Figure 4c), similar to previously reported work [27].

It should be noted that when XY scanning is enabled, photons from non-edge positions are not necessarily excluded (Figure S12h).

**Figure 5.** (**a**,**b**) Estimated position versus stage position sampled with the 4-Corners and default patterns. Both used CW laser power modulation along the axial direction. The standard deviation of estimated positions (based on 2000 photons) was (**a**) 7.4 and (**b**) 10.5 nm from the partial trajectories shown above; (**c**) X, Z, and 3D precision of five different immobilized 190 nm fluorescent beads sampled with the 4-Corners pattern in the XY-plane and modulated power at different phase delay along the Z-axis.

To get a comprehensive standard of overall precision in all dimensions in each condition, we define an overall 3D precision = *<sup>P</sup>*<sup>2</sup> *<sup>x</sup>*+*P*<sup>2</sup> *<sup>y</sup>* +*P*<sup>2</sup> *z* <sup>3</sup> = <sup>2</sup>*P*<sup>2</sup> *<sup>x</sup>*+*P*<sup>2</sup> *z* <sup>3</sup> . The 3D precision is derived from X and Z precision here since X and Y were sampled via the same mechanism. Step test trajectories obtained with the 4-Corners pattern and modulated laser power with a phase delay of 50.4◦ yielded the highest 3D precision (12.3 ± 0.9 nm). This result is comparable to the 3D precision found for the 4-Corners pattern and CW laser power (12.6 ± 1.1 nm). It is also noteworthy that it is possible to achieve equivalent X and Z precision when a phase delay of ~70◦ is applied, as is shown in Figure 5c, where the X and Z precisions intersect. The estimated precision at this point is *P* = *Px* = *Pz* = 14.3 nm. This condition makes it possible to conduct isotropic sampling, despite having non-uniform point spread function scales in different dimensions (Figure S13c–d).

#### **4. Discussion and Conclusions**

This study showed that implementing information-efficient laser scanning patterns led to dramatic improvements in precision in all three dimensions. A 4-pixel, 4-Corners pattern yielded the highest precision of 2.6 ± 0.3 nm in the XY-plane compared to 4.6 ± 0.8 nm given by a default 5 × 5 pattern (43.5% increase). When sampling the Z-axis only, out-ofphase modulation of the laser power relative to the TAG lens phase (which gave a bi-modal distribution of photon arrivals) gave the highest precision of 9.2 ± 2.6 nm, compared to 18.2 ± 4.6 nm given by CW power modulation (49.5% increase in precision). In 3D scanning, the 4-Corners pattern with laser power modulated with a 50.3◦ phase delay gave a 3D precision of 12.3 ± 0.9 nm, compared to 14.9 ± 1.1 nm given by the default 5 × 5 pattern with CW power modulation (17.5% increase). These results are consistent with the hypothesis that sampling at higher Fisher information regions leads to the best precision. Moreover, it is also shown that sampling directly at the center of the particle is inefficient, with those photons carrying little or no information. Achieving high photon efficiency is extremely important as typical fluorophores give out only a limited number of photons. The off-center, information-efficient imaging proposed in this work is the first to achieve higher photon efficiency by merely scanning a focused laser spot without requiring any PSF engineering.

It is noticeable that RT-3D-SPT methods might have used similar off-center excitation patterns. For example, orbital tracking methods, developed by Gratton [28], Mabuchi [29,30], Lamb [19,31], and others, utilize a circular scanning pattern in the XY-plane, which minimizes sampling of the particle center. The motivation for such an approach was to sense changes in the particle position by modulating the particle's intensity. Others have utilized

a method where the laser is scanned across the vertices of a tetrahedron [32,33]. All of these methods benefit from the off-center sampling that we demonstrate above. Unlike previous works [15,20] that utilized scan-free localization, using the EOD and TAG lens here is advantageous in defining a custom excitation pattern due to their highly tunable nature and the intrinsically information-efficient pattern caused by the TAG lens' sinusoidal motion.

Our work revealed the importance of information-efficient excitation patterns in particle localization and tracking using non-uniform illumination. The information-efficient patterns investigated in this work could shed light on an emerging field in biology: slowly moving particles. Recent studies have shown that even the previously considered "static" intracellular vesicles could still undergo small-scale motions of different rates, which could profoundly influence the fate of these particles [34]. Information-efficient sampling should optimize combined spatiotemporal precision in such demanding experiments, where the diffusive step sizes are extremely small and the fluorophores are short lived.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/e23050498/s1, Figures S1–S10, and detailed methods.

**Author Contributions:** Conceptualization, C.Z. and K.W.; methodology, C.Z. and K.W.; data collection; C.Z.; formal analysis, C.Z. and K.W.; writing—original draft preparation, C.Z. and K.W.; writing—review and editing, C.Z. and K.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors acknowledge financial support from the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM124868, the National Science Foundation under Grant No. 1847899, and Duke University.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data from this study are available from the corresponding author upon request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Testing of Multifractional Brownian Motion**

#### **Michał Balcerek \*,† and Krzysztof Burnecki †**

Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wroclaw University of Science and Technology, Wyspianskiego 27, 50-370 Wroclaw, Poland; krzysztof.burnecki@pwr.edu.pl

**\*** Correspondence: michal.balcerek@pwr.edu.pl

† These authors contributed equally to this work.

Received: 18 November 2020; Accepted: 10 December 2020; Published: 12 December 2020

**Abstract:** Fractional Brownian motion (FBM) is a generalization of the classical Brownian motion. Most of its statistical properties are characterized by the self-similarity (Hurst) index 0 < *H* < 1. In nature one often observes changes in the dynamics of a system over time. For example, this is true in single-particle tracking experiments where a transient behavior is revealed. The stationarity of increments of FBM restricts substantially its applicability to model such phenomena. Several generalizations of FBM have been proposed in the literature. One of these is called multifractional Brownian motion (MFBM) where the Hurst index becomes a function of time. In this paper, we introduce a rigorous statistical test on MFBM based on its covariance function. We consider three examples of the functions of the Hurst parameter: linear, logistic, and periodic. We study the power of the test for alternatives being MFBMs with different linear, logistic, and periodic Hurst exponent functions by utilizing Monte Carlo simulations. We also analyze mean-squared displacement (MSD) for the three cases of MFBM by comparing the ensemble average MSD and ensemble average time average MSD, which is related to the notion of ergodicity breaking. We believe that the presented results will be helpful in the analysis of various anomalous diffusion phenomena.

**Keywords:** multifractional Brownian motion; autocovariance function; power of the statistical test; Monte Carlo simulations

#### **1. Introduction**

Over the last decades, massive advances in single-particle tracking (SPT), partially based on superresolution microscopy of fluorescently tagged tracers, or fluorescence correlation spectroscopy allow experimentalists to obtain insight into the motion of submicron tracer particles or even single molecules in complex environments, such as living biological cells, down to nanometer precision and at submillisecond time resolution [1,2].

The observed data obtained by SPT experiments often show pronounced deviations from Brownian motion, namely, anomalous diffusion of the power-law form

$$\mathbb{E}X(t)^2 \simeq \mathbb{K}\_a t^a \tag{1}$$

of the mean-squared displacement (MSD) is observed [3–8]. *Kα* is the anomalous diffusion coefficient. Depending on the magnitude of the anomalous diffusion exponent *α* we distinguish subdiffusion for 0 < *α* < 1 from superdiffusion with *α* > 1 [5,8]. Subdiffusion is typically observed for submicron particles in both bacterial and eukaryotic cells [6,9–13], in artificially crowded [14,15] and structured [16–19] liquids, in pure and protein-crowded lipid bilayer systems [12,20–25], as well as in groundwater systems [26]. Superdiffusion occurs in the presence of active motion, for instance, in living biological cells [27–29] or due to bulk-surface exchange [30,31]. While typically anomalous

diffusion refers to the power-law behavior (1) with a fixed *α*, an increasing number of systems are reported in which the local scaling exponent of the MSD (1) is an explicit function of time, *α*(*t*). Such transient behavior has, for instance, been observed for green fluorescent proteins in cells or for the motion of lipid molecules in protein-crowded bilayer membranes [25,32].

Fractional Brownian motion (FBM) introduced by Kolmogorov in 1940 and rediscovered by Mandelbrot and van Ness in 1968 [33–35] is a generalization of the classical Brownian motion (BM). Most of its statistical properties are characterized by the self-similarity (Hurst) index 0 < *H* < 1. FBM is *<sup>H</sup>*-self-similar, namely for every *<sup>c</sup>* <sup>&</sup>gt; 0 we have *BH*(*ct*) *<sup>D</sup>* = *cHBH*(*t*) in the sense of all finite dimensional distributions, and has stationary increments. It is the only Gaussian process satisfying these properties. FBM is the overdamped description for viscoelastic motion and thus intimately connected to the fractional Langevin processes, an attractive framework for many physical systems [36], for instance, of lipid molecules in bilayer membranes [22,25,37]. The second moment of FBM reads *EB*<sup>2</sup> *<sup>H</sup>*(*t*) = *<sup>σ</sup>*2*<sup>t</sup>* <sup>2</sup>*H*, where *EB*<sup>2</sup> *<sup>H</sup>*(1) = *<sup>σ</sup>*<sup>2</sup> > 0. As a consequence, for *<sup>H</sup>* < 1/2 we obtain subdiffusive dynamics with persistent motion, whereas for *H* > 1/2, the process is superdiffusive and antipersistent. Since FBM is the classical model for power-law dependence a number of statistical tests have been already introduced for this process in the literature. Let us mention here the tests based on the autocovariance function (ACVF), MSD, and detrending moving average statistics [38–40].

FBM has stationary increments that do not allow us to model processes whose regularity of paths and "memory depth" change in time [41]. Several generalizations of FBM have been proposed recently. One of these, called multifractional Brownian motion (MFBM), was proposed by Peltier and Véhel [42] with time-varying Hurst exponent *H*(*t*) which is a Hölder function. The variance at time *t* of the MFBM *BH*(*t*)(*t*) is given by Var(*BH*(*t*)(*t*)) = *σ*2*t* <sup>2</sup>*H*(*t*) [43]. The time-varying Hurst exponent *H*(*t*) characterizes the path regularity of the process at time *t*: sample paths near *t* with small *Ht*, close to 0, are space-filling and highly irregular, while paths with large *Ht*, close to 1, are very smooth. The variance constant *σ*<sup>2</sup> determines the "energy level" of the process. This natural extension of FBM results in some loss of some of FBMs basic properties, in particular, the increments of MFBM are non-stationary and the process is no longer self-similar.

Other, similar generalizations are limited to a piecewise constant *H* [44] but, what is important from a data analysis point of view, is that they lead to continuous Gaussian processes with stationary increments. Let us also mention an idea involving an appropriate class of covariance functions. Ryvkina [45] uses such covariance functions to define Gaussian processes to extend FBM and MFBM to a class of fractional Brownian motions with a variable Hurst parameter parameterized by a set of all measurable functions with values in (1/2, 1), and different from MFBMs. However, from a biological data point of view, such a range for *H* values is not practical since it only corresponds to a superdiffusive (long-range dependent) case.

MFBMs have become popular as flexible models in describing real-life signals of high-frequency features in geoscience, microeconomics, and turbulence, to name a few [43]. They are closely related to the notion of transient diffusion dynamics observed in biological experiments. The article is structured as follows. In Section 2, the MFBM is defined and its basic properties are presented. We also recall formulas for the ensemble average MSD, time average MSD, and present three Hurst exponent functions that will be analyzed in the sequel. In Section 3, the main results are presented. We introduce a statistical test on MFBM based on its ACVF which is presented as a quadratic form. Next, the power of the test is studied for the three cases corresponding to different Hurst exponent functions. We show the areas where the test is very strong in distinguishing between the processes and the cases when it fails in this respect. Finally, Section 4 summarizes and concludes our work.

#### **2. Model and Methods**

Let us start with a definition of the MFBM.

**Definition 1.** *(Multifractional Brownian motion). Process* -*BH*(*t*)(*t*) *t*≥0 *is called a multifractional Brownian motion (MFBM) if it is a centered Gaussian process with covariance function*

$$\text{Cov}(B\_{H(t)}(t), B\_{H(s)}(s)) = D(H(t), H(s)) \left( t^{H(t) + H(s)} + s^{H(t) + H(s)} - |t - s|^{H(t) + H(s)} \right), \tag{2}$$
 
$$\text{where}$$

$$D(\mathbf{x}, y) = \frac{\sigma^2 \sqrt{\Gamma(2\mathbf{x} + 1)\Gamma(2y + 1)\sin(\pi\mathbf{x})\sin(\pi y)}}{2\Gamma(\mathbf{x} + y + 1)\sin\left(\pi \frac{\mathbf{x} + y}{2}\right)}\tag{3}$$

*for some σ* > 0 *and Hölder function H* : [0, ∞) → [*a*, *b*] ⊂ (0, 1) *of some exponent β* > 0 *[46].*

The second moment of MFBM scales as E - *B*2 *H*(*t*) (*t*) = *σ*2*t* <sup>2</sup>*H*(*t*). Hence, we will call *H*(*t*) the Hurst exponent function. Furthermore, for *H*(*t*) ≡ *H* ∈ (0, 1) MFBM becomes standard FBM. In general, MFBM has non-stationary increments. Its increment process *<sup>Y</sup>*(*t*) *def* = *BH*(*t*+1)(*t* + 1) − *BH*(*t*)(*t*) possesses the long-range dependence property, in the sense that

$$\forall \delta > 0, \forall s \ge 0 \quad \sum\_{k=0}^{\infty} \left| \text{Corr} \left( \mathcal{Y}(s), \mathcal{Y}(s + k\delta) \right) \right| = +\infty,$$

where Corr is the correlation function, i.e., Corr(*Y*(*t*),*Y*(*s*)) = <sup>√</sup>Cov(*Y*(*t*),*Y*(*s*)) E2*Y*(*t*)E2*Y*(*s*) [46].

Furthermore, the function *H*(*t*) can be pointwise interpreted as a local self-similarity parameter, i.e.,

$$\lim\_{\varepsilon \to 0^{+}} \left( \frac{B\_{H(u+\varepsilon t)}(u+\varepsilon t) - B\_{H(u)}(u)}{\varepsilon^{H(u)}} \right)\_{t \in \mathbb{R}\_{+}} = s(u) \left( B\_{H}(t) \right)\_{t \in \mathbb{R}\_{+} \times \mathbb{R}}$$

where *BH* is a fractional Brownian motion with index *H* ≡ *H*(*u*), *s*(*u*) is a scaling function [47] and the convergence is on the space of continuous functions endowed with the topology of the uniform convergence on compact sets.

#### *2.1. Mean-Squared Displacement*

Let us now recall different estimators of the MSD for a sample of *n* trajectories **X1**, **X2**, ... **Xn**, each with *N* observations, that is, **Xi** consists of *Xi*(*t*1), *Xi*(*t*2), ... , *Xi*(*tN*) equally spaced in time, *i* = 1, 2, . . . , *n*. Ensemble average MSD (EAMSD) is defined as follows:

$$\text{EAMSD}(\tau = m\Delta t) = \frac{1}{n} \sum\_{k=1}^{n} \left( X\_k(t\_1 + \tau) - X\_k(t\_1) \right)^2,\tag{4}$$

where *m* = 1, . . . , *N* and Δ*t* = *t*<sup>2</sup> − *t*1. Time average MSD (TAMSD) is defined for each trajectory as

$$\text{TAMSD}(\tau = m\Delta t, k) = \frac{1}{N - m} \sum\_{j=1}^{N-m} \left( X\_k(t\_j + m\Delta t) - X\_k(t\_j) \right)^2. \tag{5}$$

Finally, we consider ensemble and time average MSD (EATAMSD) which is an average of TAMSDs:

$$\text{EATAMSD}(\tau) = \frac{1}{n} \sum\_{k=1}^{n} \text{TAMSD}(\tau, k). \tag{6}$$

Physicists often observe systems where EA and EATAMSD are different. Such behavior is called weak ergodicity breaking. In mathematics, the notion of ergodicity is restricted to stationary processes. Since the increments of MFBM lack stationarity, when analyzing the results, one has to be exceedingly meticulous.

#### *2.2. Three Cases of the Hurst Exponent Function*

Following [48], we consider three basic families of the function *H*(*t*), namely

$$\begin{aligned} \text{linear} \quad &H(t) = at + b, \quad t \in [0, T], \\ \text{logistic} \quad &H(t) = \frac{c - b}{1 + \exp\left\{-d\frac{t - t\_0}{T}\right\}} + b, \quad t \in [0, T], \\ \text{periodic} \quad &H(t) = a\sin\left(4\pi \frac{t}{T}\right) + b, \quad t \in [0, T] \end{aligned}$$

for some time horizon *T* > 0. Furthermore, in the sequel we consider only case with parameter *σ*<sup>2</sup> = 1 in (2). Such functions are continuous and as a consequence satisfy the Hölder condition, so in order to MFBM be properly defined we only require *H*(*t*) ∈ (0, 1), for all *t* ∈ [0, *T*] [42]. Such choice of considered functions can be interpreted as follows. In the linear case, MFBM can switch steadily from short- to long-range dependence or vice versa, whereas in the logistic case such change is quite rapid and it happens between two levels. The latter case closely resembles instantaneous change in dependence or jump-type regime switching (such cases would lead to non-Hölder function). An alternative function to the logistic which is also considered is the literature is the arctan function [49]. Finally, the periodic case represents a situation where such changes are gradual and repetitive.

In the paper, we focus on the following special cases with specified parameters:

$$\begin{aligned} \text{linear function}: \quad &H^{(1)}(t) = \frac{0.3}{1000}t + 0.3, \quad t \in [0, 1000], \\\text{logistic function}: \quad &H^{(2)}(t) = \frac{0.3}{1 + \exp\left\{-100\frac{t - 500}{1000}\right\}} + 0.3, \quad t \in [0, 1000], \\\text{periodic case}: \quad &H^{(3)}(t) = 0.15 \sin\left(4\pi \frac{t}{1000}\right) + 0.45. \quad t \in [0, 1000]. \end{aligned}$$

We choose those specific parameters so that all of the cases have a similar "average" behavior of the function *H*(*t*), i.e., its mean is close to 0.45, and the function itself has values in the interval [0.3, 0.6]. We illustrate those cases in Figures 1–3. On the top left panel of each of these figures, we can see three simulated trajectories. The function *H*(*t*) is presented on the top right panel, whereas on the bottom panel we can see a behavior of the corresponding MSDs. It is important to note that EAMSD (blue line) is directly related to the variance of the model at time *τ*, i.e., EAMSD(*τ*) = Var (*X*(*τ*)), thus, from (2), it should behave like *τ*2*H*(*τ*).

For the linear case, see Figure 1, since *H*(1)(*t*) increases steadily from 0.3 to 0.6 we can see trajectories exhibit more variability for bigger times. This is also related to the EAMSD dynamics. In addition, such a model exhibits weak ergodicity breaking behavior (i.e., lack of equality between EATAMSD and EAMSD), which can be inferred from the bottom panel.

Next, for the logistic case, see Figure 2, *H*(2)(*t*) increases quite rapidly from 0.3 to 0.6 near *t* = 500. As a consequence, we can see a switch in the behavior of simulated trajectories: for times *t* < 500 they exhibit far less variability than for *t* > 500. Intuitively, for times *t* < 500 trajectories locally exhibit short-range dependence, whereas for *t* > 500 they locally exhibit long-range dependence. Again, we can see weak ergodicity breaking on the bottom panel.

**Figure 1.** MFBM with the linear Hurst exponent function. Top left panel: three simulated trajectories. Top right panel: illustration of the function *H*(*t*) used in simulations. Bottom panel: comparison of EAMSD (solid blue line) with EATAMSD (dashed red line) and its 95% confidence interval (red shaded area). EAMSD and ETAMSD with confidence interval were calculated on the basis of 1000 simulated trajectories of MFBM.

**Figure 2.** *Cont.*

**Figure 2.** MFBM with the logistic Hurst exponent function. Top left panel: three simulated trajectories. Top right panel: illustration of the function *H*(*t*) used in simulations. Bottom panel: comparison of EAMSD (solid blue line) with EATAMSD (dashed red line) and its 95% confidence interval (red shaded area). EAMSD and ETAMSD with confidence interval were calculated on the basis of 1000 simulated trajectories of MFBM.

Finally, for the periodic case, see Figure 3, *H*(3)(*t*) varies between 0.3 and 0.6. We can clearly see two different regimes of behavior: for times when *H*(3)(*t*) is bigger, trajectories are smoother and generally have larger values, in contrast to times when *H*(3)(*t*) is smaller. Despite lack of stationarity, here, EAMSD almost always lies in the confidence region of EATAMSD, which could suggest there is no weak ergodicity breaking.

**Figure 3.** MFBM with the periodic Hurst exponent function function. Top left panel: three simulated trajectories. Top right panel: illustration of the function *H*(*t*) used in simulations. Bottom panel: comparison of EAMSD (solid blue line) with EATAMSD (dashed red line) and its 95% confidence interval (red shaded area). EAMSD and ETAMSD with confidence interval were calculated on the basis of 1000 simulated trajectories of MFBM.

#### **3. Results**

In applications, it is crucial to be able to check whether a stochastic model describes empirical data well. Despite dedicated identification methods for the MFBM [50–53], to the best of the authors' knowledge, there is no rigorous statistical test designed for such process. Here, we propose an approach using a simple test statistic which also contains useful information about the process itself.

#### *3.1. Test*

For the testing purposes, we follow an approach based on the ACVF which was introduced by Balcerek and Burnecki [38]. ACVF is a very popular statistic and it is also one of the simplest quadratic forms. For a random sample **X***<sup>N</sup>* = {*X*(1), *X*(2), ... , *X*(*N*)} and *τ* ∈ {1, 2, ··· , *N* − 1}, it is defined as follows:

$$\text{ACVF}\_N(\tau) = \frac{1}{N - \tau} \sum\_{i=1}^{N-\tau} X(i + \tau)X(i). \tag{7}$$

Here, we only consider a version of ACVF without subtracting the sample mean as it does not influence performance of tests based on this statistic for a centered process [38] and it makes the formulas much simpler.

Let us now introduce a matrix **<sup>A</sup>**(*τ*) = {*a*(*τ*; *<sup>i</sup>*, *<sup>j</sup>*)}*<sup>N</sup> <sup>i</sup>*,*j*=1, where

$$a(\pi; i, j) = \begin{cases} \frac{1}{N} \mathbb{I}(i = j) & \text{if } \pi = 0\\ \frac{1}{2} \frac{1}{N - \pi} \mathbb{I}(|i - j| = \pi) & \text{if } \pi = 1, 2, \dots, N - 1\\ 0 & \text{otherwise} \end{cases} \tag{8}$$

and I is the indicator. To summarize, the matrix **A**(*τ*) is either diagonal (for *τ* = 0) with elements <sup>1</sup> *N* on diagonal or Toeplitz, with only two nonzero subdiagonals (starting at (1 + *τ*)th row and (1 + *τ*)th column) with elements <sup>1</sup> <sup>2</sup> <sup>1</sup> *<sup>N</sup>*−*<sup>τ</sup>* . The statistic ACVF*<sup>N</sup>* can be now expressed as a quadratic form (as shown in [38]) as a generalized *χ*<sup>2</sup> distribution, that is

$$\text{ACVF}\_N(\tau) = \sum\_{i=1}^N \lambda\_i(\tau) Z\_{i\text{\textquotedblleft}i\text{\textquotedblright}}^2 \tag{9}$$

where *Zi*s are i.i.d standard normal variables (so *Z*<sup>2</sup> *<sup>i</sup>* has a *<sup>χ</sup>*<sup>2</sup> <sup>1</sup> distribution) and *λk*(*τ*) are eigenvalues of the matrix **Σ***N*(*τ*) = **Σ**1/2**A**(*τ*)**Σ**1/2 with **Σ** being the (theoretical) autocovariance matrix of our trajectory **X***N*. It is important to note that this result is true regardless of whether the considered model is stationary or not.

Let us now formulate a test for checking whether a random sample **X***<sup>N</sup>* comes from the MFBM with function *H* : [0, *T*] → (0, 1), where *T* is the time horizon:

*H*<sup>0</sup> : sample comes from the model with function *H*(*t*)

versus

*H*<sup>1</sup> : sample comes from the model with function different than *H*(*t*).

We will use ACVF*<sup>N</sup>* as a test statistic with its distribution given by Equation (9) to calculate critical regions of such test for a given significance level. Naturally, eigenvalues *λi*(*τ*) depend on the matrix **A**(*τ*) as well as on the matrix **Σ**. Elements of **Σ** are given by ACVF (2) and are calculated using the function *H*(*t*) from the null hypothesis.

#### *3.2. Three Power Case Studies*

The power of the test is the probability to reject the null hypothesis when the alternative is true. The power is an important characteristic of any statistical test. We consider the following null hypotheses, which correspond to the examples presented in Figures 1–3.

> linear function null hypothesis *H*<sup>0</sup> : *H*(*t*) = *H*(1) (*t*) *t* ∈ [0, 1000], logistic function null hypothesis *H*<sup>0</sup> : *H*(*t*) = *H*(2) (*t*), *t* ∈ [0, 1000], periodic case null hypothesis *H*<sup>0</sup> : *H*(*t*) = *H*(3) (*t*), *t* ∈ [0, 1000].

In our studies, for all considered cases, we calculate the power of the test by using Monte Carlo simulations. We assume that the significance level is equal to 5%. In our Monte Carlo simulations we consider the time horizon *T* = 1000 and equally spaced time points *t* = 1, 2, ... , 1000. For each set of parameters from the alternative hypothesis, we simulate *n* = 1000 trajectories, calculate test statistic (7), and check if the null hypothesis is rejected at 5% significance level. In the test statistic, we consider only *τ* = 1 since other choices of *τ* lead to worse results. Finally, we estimate the power of this test for each considered case by calculating the fraction of rejected null hypotheses.

We present the results in the form of power functions with arguments being the parameters of the function *H*(*t*) from the alternative hypothesis. For all of the cases, we considered the alternative coming from the same family of functions as the function *H*(*t*) in the null hypothesis, i.e., linear alternative for linear null, etc.

First, let us consider testing MFBM with the linear Hurst exponent function. We can see the power function related to that case in Figure 4. The left panel presents the power function with respect to parameters *a* and *b* from the alternative hypothesis *H*<sup>1</sup> : *H*(*t*) = *at* + *b*, the right panel the corresponding heat map. We can see "layers" (regions on the heat map with the same color) for which our test has a very similar power. For example, the deep blue region on the right panel corresponds to the processes indistinguishable from the null hypothesis process. We believe that the shape of the region is related to the construction of our test, namely our test statistic ACVF*<sup>N</sup>* takes into account all addends *X*(*t*)*X*(*t* + *τ*) with the same weight, thus it is not that relevant whether *t* is big or not. In the case of MFBM, for which neither the process nor its increments are stationary, it might be an important factor. As a consequence, we can see that the test has a difficulty in distinguishing between MFBMs with increasing and decreasing Hurst exponent functions if their means are similar. However, this conjecture is not very precise, namely the mean case *H* ≡ 0.45, which matches the alternative hypothesis with *a* = 0, *b* = 0.45, yields a much higher power of the test than the significance level. We also note that for parameters from the null hypothesis: *a* = 0.3, *b* = 0.3 power of such test is approximately equal to 5%, which is the assumed significance level. On the heat map, white regions represent areas for which the process MFBM is not well-defined, i.e., *H*(*t*) ∈/ (0, 1) for some *t* ∈ [0, 1000].

Let us now consider the second case, that is MFBM with the logistic function *H*(*t*). We can observe the power function in Figure 5. Left panel presents the power function with respect to parameters *b* and *c* from the alternative hypothesis *H*<sup>1</sup> : *H*(*t*) = *<sup>c</sup>*−*<sup>b</sup>* 1+exp <sup>−</sup>*<sup>d</sup> <sup>t</sup>*−*t*<sup>0</sup> *T* + *b*, the right panel the corresponding heat map. The parameter *b* is related to the local self-similarity parameter for *t* < 500, and *c* for *t* > 500. Similarly to the case with the linear function null hypothesis, here we can observe "layers" in which parameters *b* and *c* are almost symmetric (e.g., the null hypothesis case where *b* = 0.3 and *c* = 0.6 is closely related to the case *b* = 0.6 and *c* = 0.3). Moreover, we can see that the power function seems to be quite high in the cases when a tested sample has the *b* parameter close to the value 0.3 from the null hypothesis, but *c* is far from 0.6, or vice versa.

**Figure 4.** Power of the introduced test for the linear Hurst exponent function *H*(*t*) = *at* + *b* with respect to parameters *a* and *b*. The null hypothesis is *a* = 0.3 <sup>1000</sup> and *b* = 0.3. The right panel depicts the results in the form of a heat map with the red 'x' sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basis of simulated data from the MFBM with different *a*s and *b*s.

**Figure 5.** Power of the introduced test for the logistic Hurst exponent function *H*(*t*) = *c*−*b* <sup>1</sup>+exp{−<sup>100</sup> *<sup>t</sup>*−<sup>500</sup> <sup>1000</sup> } <sup>+</sup> *<sup>b</sup>* with respect to parameters *<sup>c</sup>* and *<sup>b</sup>*. The null hypothesis is *<sup>c</sup>* <sup>=</sup> 0.6 and *<sup>b</sup>* <sup>=</sup> 0.3. The right panel depicts the results in the form of heat map with the red 'x' sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basic of simulated data from the MFBM with different *c*s and *b*s.

Lastly, let us consider the case of MFBM with the periodic function *H*(*t*). We can observe the power function in Figure 6. The left panel presents the power function with respect to parameters *a* and *b* from the alternative hypothesis *H*<sup>1</sup> : *H*(*t*) = *a* sin 4*π <sup>t</sup> T* + *b*, the right panel the corresponding heat map. Parameter *b* is related to the "mean" behavior of the function *H*(*t*), whereas the parameter *a* corresponds to its amplitude. Again, similarly to the two previous null hypotheses, we can observe "layers" of similar power values. Those layers are symmetric with respect to *a* = 0. This means that for alternatives with opposite parameters *a* the test seems to return the same power. Let us note that this is not intuitive, namely such opposite *a*s are related to completely different local behaviors of the self-similarity parameter. On the heat map, white regions represent areas for which process MFBM is not well defined, i.e., *H*(*t*) ∈/ (0, 1) for some *t* ∈ [0, 1000].

**Figure 6.** Power of the introduced test for the periodic Hurst exponent function *H*(*t*) = *a* sin 4*π <sup>t</sup>* 1000 + *b* with respect to parameters *a* and *b*. The null hypothesis is *a* = 0.15 and *b* = 0.45. The right panel depicts the results in the form of heat map with the red 'x' sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basic of simulated data from the MFBM with different *a*s and *b*s.

Finally, we would like to emphasize that the introduced test requires MFBM parameters to be fixed (we test if the data follow MFBM with fixed parameters). In practice, when analyzing empirical data the parameters are often estimated. In the literature, methods for estimation of the Hurst exponent function *H*(*t*) in the MFBM framework have been already introduced [50–52] and later combined to improve both the goodness of fit and the computational speed of the algorithm [53].

#### **4. Discussion and Conclusions**

For power-law anomalous diffusion of the form (1) with constant anomalous diffusion exponent *α* a number of models exist, including continuous-time random walks, fractional Langevin equation motion, FBM, or scaled Brownian motion [8]. These models all have different physical properties such as the PDF or their ergodic and aging properties [8].

In this paper, we concentrated on MFBM which is a generalization of FBM for Hölder continuous functions *H*(*t*) that allows the Hurst exponent to vary in time. The time-varying Hurst exponent has an impact on both the statistical properties of the process and trajectory characteristics. MFBM helps to model phenomena whose regularity of paths and anomalous diffusion exponent change in time. The process has no longer stationary increments and it is not self-similar but the variance scales in a natural way as *t* <sup>2</sup>*H*(*t*).

Following the idea of testing FBM based on the ACVF statistic [38], in this paper, we introduced a rigorous statistical test on MFBM with the ACVF statistic presented as a quadratic form. We derived the distribution of the statistic which is the generalized *χ*2. In order to study the efficiency of the test, we took into consideration three possible classes of the Hurst exponent function, namely linear, logistic, and periodic. For those cases, we conducted power studies with the help of Monte Carlo simulations. As alternatives, we considered MFBMs within the same class of the Hurst exponent function but with different parameters.

We found ranges of the parameters where the test is more sensitive to differences and ranges where it fails to distinguish between the models. It appears that for the linear Hurst exponent function the test is most sensitive to changes in the mean of the function. If the means are similar then the test often fails, even if the functions have completely different patterns, namely, one is increasing and the other decreasing. The latter observation may sound like a serious objection for using the test, but, in practice, an experimentalist knows whether the anomalous diffusion exponent increases or decreases in time. In the logistic case, the situation is different, namely the mean does not matter

much as for the linear case. Now, the test is most sensitive to deviations from the true values of the two levels (*c* and *b*) with the exception that replacing *c* with *b* does not change the power of the test (so, again, it does not matter if the function increases or decreases). For the periodic case, we have again a different situation. The test is sensitive to the changes of the amplitude of the sine function and the value of the free term but it does not detect a sign of the parameter related to the magnitude.

Finally, we note that we checked the behavior of the test for other sets of parameters of the null hypotheses and different sample lengths, and the conclusions were similar. We only found that the range of possible *H* values from the null hypothesis has an influence on the width of the acceptance regions (the wider the range the wider the acceptance region, which is reasonable). We present some of the additional tests' power simulation studies in Appendix A. Figure A1 presents different cases of the null hypothesis for the linear case, Figure A2 for the logistic case, and Figure A3 for the periodic case. Tests for the linear case were performed for length *N* = 1000, whereas logistic and periodic cases were studied for length *N* = 200.

In sum, we introduced a rigorous statistical test for MFBM based on ACVF statistic presented as a quadratic form. We highlighted the weak and strong points of the test. Improving the efficiency of the test will be a subject of our future studies. We believe that the obtained results can help to understand the mechanisms underlying various anomalous diffusion phenomena.

**Author Contributions:** Conceptualization, M.B.; methodology, M.B. and K.B.; visualization, M.B.; writing—original draft preparation, M.B. and K.B.; writing—review and editing, M.B. and K.B.; investigation, K.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** The second author would like to acknowledge the Beethoven Grant no.: DFG-NCN 2016/23/G/ST1/04083.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A**

In Figures A1–A3, we present power functions of tests related to different null hypotheses. In Figure A1, we consider *H*<sup>0</sup> : *H*(*t*) = <sup>−</sup>0.3 <sup>1000</sup> *t* + 0.7 (top panel), so a case in which function *H* begins in the superdiffusive regime and then decreases linearly to value 0.4; *H*<sup>0</sup> : *H*(*t*) = <sup>−</sup>0.4 <sup>1000</sup> *t* + 0.5 (middle panel), so a case in which function *H* begins in the diffusion regime and then decreases linearly to value 0.1; and *H*<sup>0</sup> : *H*(*t*) = 0.6 <sup>1000</sup> *t* + 0.2 (bottom panel), so a case in which function *H* begins in the strong subdiffusive regime and then increases linearly to 0.8.

**Figure A1.** Power of the introduced test for the linear Hurst exponent function *H*(*t*) = *at* + *b* with respect to parameters *a* and *b*. The null hypotheses are: *a* = <sup>−</sup>0.3 <sup>1000</sup> and *<sup>b</sup>* <sup>=</sup> 0.7 (top panel), *<sup>a</sup>* <sup>=</sup> <sup>−</sup>0.4 <sup>1000</sup> and *b* = 0.5 (middle panel), *a* = 0.6 <sup>1000</sup> and *b* = 0.2 (bottom panel). All of the panels depict the results in the form of a heat map with the red 'x' sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basis of simulated data from the MFBM with different *a*s and *b*s.

In Figure A2, we consider *H*<sup>0</sup> : *H*(*t*) = <sup>−</sup>0.4 <sup>1</sup>+exp{−<sup>100</sup> *<sup>t</sup>*−<sup>500</sup> <sup>1000</sup> } <sup>+</sup> 0.5 (top panel), so a case in which function *H* begins in the diffusive regime and ends in the strong subdiffusive regime; *H*<sup>0</sup> : *H*(*t*) = −0.3 <sup>1</sup>+exp{−<sup>100</sup> *<sup>t</sup>*−<sup>500</sup> <sup>1000</sup> } <sup>+</sup> 0.6 (middle panel), so a case in which function *<sup>H</sup>* begins in the superdiffusive regime and ends in the subdiffusive regime; and *H*<sup>0</sup> : *H*(*t*) = 0.6 <sup>1</sup>+exp{−<sup>100</sup> *<sup>t</sup>*−<sup>500</sup> <sup>1000</sup> } <sup>+</sup> 0.2 (bottom panel), so a case in which function *H* begins in the strong subdiffusive regime and ends in the strong superdiffusive regime.

**Figure A2.** Power of the introduced test for the logistic Hurst exponent function *H*(*t*) = *c*−*b* <sup>1</sup>+exp{−<sup>100</sup> *<sup>t</sup>*−<sup>500</sup> <sup>1000</sup> } <sup>+</sup> *<sup>b</sup>* with respect to parameters *<sup>c</sup>* and *<sup>b</sup>*. The null hypotheses are: *<sup>c</sup>* <sup>=</sup> 0.1 and *<sup>b</sup>* <sup>=</sup> 0.5 (top panel), *c* = 0.3 and *b* = 0.6 (middle panel), *c* = 0.8 and *b* = 0.2 (bottom panel). All of the panels depict the results in the form of heat map with the red 'x' sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basic of simulated data from the MFBM with different *c*s and *b*s.

In Figure A3, we consider *H*<sup>0</sup> : *H*(*t*) = 0.1 sin 4*π <sup>t</sup>* <sup>1000</sup> + 0.8 (top panel), so a case in which function *H* varies periodically between 0.7 and 0.9, i.e., in the strong superdiffusion regime; *H*<sup>0</sup> :

*H*(*t*) = 0.2 sin 4*π <sup>t</sup>* <sup>1000</sup> + 0.7 (middle panel), so a case in which function *H* varies periodically between 0.5 and 0.9, i.e., in the superdiffusion regime; and *H*<sup>0</sup> : *H*(*t*) = 0.5 sin 4*π <sup>t</sup>* <sup>1000</sup> + 0.4 (bottom panel), so a case in which function *H* varies periodically between 0.1 and 0.9, i.e., in the whole spectrum of anomalous diffusion.

**Figure A3.** Power of the introduced test for the periodic Hurst exponent function *H*(*t*) = *a* sin 4*π <sup>t</sup>* 1000 + *b* with respect to parameters *a* and *b*. The null hypotheses are: *a* = 0.1 and *b* = 0.8 (top panel), *a* = 0.2 and *b* = 0.7 (middle panel), *a* = 0.5 and *b* = 0.4 (bottom panel). All of the panels depict the results in the form of heat map with the red 'x' sign representing parameters in the null hypothesis. White regions represent areas for which MFBM is not well defined. The powers were calculated by means of Monte Carlo simulations on the basic of simulated data from the MFBM with different *a*s and *b*s.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Cusp of Non-Gaussian Density of Particles for a Diffusing Diffusivity Model**

**M. Hidalgo-Soria 1,\*,†, E. Barkai 1,† and S. Burov 2,\*,†**


† These authors contributed equally to this work.

**Abstract:** We study a two state "jumping diffusivity" model for a Brownian process alternating between two different diffusion constants, *D*<sup>+</sup> > *D*−, with random waiting times in both states whose distribution is rather general. In the limit of long measurement times, Gaussian behavior with an effective diffusion coefficient is recovered. We show that, for equilibrium initial conditions and when the limit of the diffusion coefficient *D*<sup>−</sup> −→ 0 is taken, the short time behavior leads to a cusp, namely a non-analytical behavior, in the distribution of the displacements *P*(*x*, *t*) for *x* −→ 0. Visually this cusp, or tent-like shape, resembles similar behavior found in many experiments of diffusing particles in disordered environments, such as glassy systems and intracellular media. This general result depends only on the existence of finite mean values of the waiting times at the different states of the model. Gaussian statistics in the long time limit is achieved due to ergodicity and convergence of the distribution of the temporal occupation fraction in state *D*+ to a *δ*-function. The short time behavior of the same quantity converges to a uniform distribution, which leads to the non-analyticity in *P*(*x*, *t*). We demonstrate how super-statistical framework is a zeroth order short time expansion of *P*(*x*, *t*), in the number of transitions, that does not yield the cusp like shape. The latter, considered as the key feature of experiments in the field, is found with the first correction in perturbation theory.

**Keywords:** CTRW; diffusing-diffusivity; occupation time statistics

#### **1. Introduction**

The emergence of non-Gaussian features for the positional probability density function (PDF) of particle spreading, denoted *P*(*x*, *t*), in a disordered environment is a common attribute that arises in many different physical and biological systems. Specifically, a tent like shape of the PDF, in the semi-log scale, together with a linear time dependence of the mean square displacement (MSD) appear for diffusion in glassy system [1], biological cells [2–7], and colloidal suspensions [8–11]. This tent shape, sometimes fitted with a Laplace distribution *P*(*x*, *t*) ∼ exp(−*C*|*x*|) with *C* a constant, suggests that the decay of the PDF is exponential. This feature is becoming a more frequent observation for the spreading of molecules. Phenomenological approaches are diffusing diffusivity models, in which non-Gaussianity is obtained by coupled stochastic differential equations with random diffusion coefficients [7,12–20], and path integrals formalism for Brownian motion in the presence of a sink [21]. More recently, theoretical frameworks describing this behavior emerged from continuous time random walk (CTRW) approaches employing large deviations theory [22–24] and microscopical models like molecular dynamics of tracer particles in polymer networks [25,26] and interacting particles with fluctuating sizes [27–29], the so-called Hitchhiker model [28].

While in some of the systems the non-Gaussian behavior disappears when the measurement time is made long enough, the short time tent-like decay of the PDF seems to be a universal phenomenon [22]. It is then natural to ask if there is some sort of universality

**Citation:** Hidalgo-Soria, M.; Barkai, E.; Burov, S. Cusp of Non-Gaussian Density of Particles for a Diffusing Diffusivity Model. *Entropy* **2021**, *23*, 231. https://doi.org/10.3390/ e23020231

Received: 20 December 2020 Accepted: 9 February 2021 Published: 17 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> Department of Physics, Institute of Nanotechnology and Advanced Materials, Bar-Ilan University, Ramat-Gan 5290002, Israel; Eli.Barkai@biu.ac.il

that can be deduced for the temporal limit of short times. Within the diffusing diffusivity models for the large *x* limit, exponentially decaying propagators have been observed by employing a dichotomous process for the diffusivity [13]. The latter model consists of a "fast" and a "slow" phases, each one with a diffusion coefficient *D*<sup>+</sup> and *D*−, respectively [13,30]. Furthermore, the appearance of a cusp at small displacements also has been reported in different diffusive approaches like the Sinai model [31], employing the quenched trap model [32–37] or spatial dependence in the diffusivity [38], within the Lévy–Lorentz gas model [39] and using the fractional Fokker–Planck equation [40]. It is important to notice that the cusp found in [31–34,36,38–40] is within the context of anomalous diffusion in the MSD sense, and those presented in [35–37] are for normal diffusive systems.

It is worth mentioning that several systems in nature exhibit, or can be reduced to, a dichotomous process. Examples of two state systems include nuclear magnetic imaging to measure the diffusion of heterogeneous molecules [41], diffusion in glassy materials [1], blinking quantum dots [42,43], diffusion in single molecules tracking experiments [4,7], and protein conformational dynamics [44]. Other approaches for analyzing two state systems were also devised over the years; see heterogeneous molecular transport [41], telegraphic noise [43], Lèvy Flights [45], and CTRW models [1,22].

In this work, we deal with a two state jumping diffusivity model with equilibrium initial conditions, i.e., we assume that the process started long before the measurement began. The long measurement time behavior of the positional PDF for this model is Gaussian and is independent of the specifics of the waiting times at the different diffusive states. A rather unexpected result is achieved for the opposite temporal regime. We obtain that the behavior in the limit of the short measurement times and the shape of the positional PDF of the molecule spreading in the two state jumping diffusivity model attains a cusp or a general tent-like shape. Our result is based on the statistics of the temporal occupation fraction of the diffusivity states, the latter is defined as the time spent in state *D*+ over the total measurement time. The Gaussian behavior in the long measurement time is dictated by the *δ*-function shape of the distribution of this temporal occupation fraction, a feature that is solely based on the ergodic properties of the system. We show that, in the limit of short measurement time, the distribution of the temporal occupation fraction attains a uniform distribution that leads to the mentioned cusp behavior of *P*(*x*, *t*). The uniformity of the occupation fraction is a general result in the sense that it does not depend on the statistics of the waiting times in the two states, and the latter can be arbitrary. The non-Gaussian behavior of *P*(*x*, *t*) for short measurement times is similarly general as the Gaussian behavior of the propagator for long times. We then show that our approach reproduces the results of a specific representative system with exponentially distributed waiting times.

Our manuscript is organized as follows: in Section 2.1, we introduce the jumping diffusivity model and the initial conditions utilized in this work. In Section 2.2, we develop our theory for the statistics of the occupation time in the short measurement time limit, for which the PDFs of the waiting times in states *D*<sup>±</sup> are rather general. The obtained behavior of the occupation fraction is used in order to describe the non-Gaussian features of *P*(*x*, *t*), i.e., its cusp shape, that is observed in this model. In Section 2.3, we corroborate our previous results for a system with exponentially distributed waiting times. In Section 3.1, we discuss briefly how these theoretical results differ from those found within the super-statistical approach [12,14,21] and further how our approach may be applicable in experiments. Finally, in Section 4, we present a summary of our results, and we discuss briefly recent work of Postnikov et al. [37] who considered a model with quenched disorder, emphasizing the importance of equilibrium initial conditions. The main derivations are given in the corresponding Appendixes.

#### **2. Results**

#### *2.1. The Model*

We consider a two state renewal model, with a stochastic diffusion field *D*(*t*) for a particle in a random medium. The position of the particle is following a diffusion process given by *dx*(*t*)/*dt* <sup>=</sup> 2*D*(*t*)*ξ*. *<sup>D</sup>*(*t*) ∈ {*D*+, *<sup>D</sup>*−} is a dichotomous model, considering the case when *D*<sup>−</sup> < *D*<sup>+</sup> and *ξ* is a standard white noise, i.e., with mean zero, variance one, and delta correlated. As an example of the dynamics of the model, at a given time, the particle follows a pure diffusion process with a diffusion coefficient *D*+ > 0 during a period *τ*. After this time period has elapsed, the diffusion coefficient jumps and, during the next time interval, the particle diffuses with diffusion coefficient *D*−. The waiting times at each state *D*<sup>±</sup> are distributed according to a general PDF *ψ*±(*τ*), with mean waiting times *τ*±. The subscript ± denotes whether the waiting times are defined for the *D*<sup>+</sup> or *D*<sup>−</sup> states. In the following, we present the two-state model with *D*<sup>−</sup> = 0, while the case with *D*<sup>+</sup> > *D*<sup>−</sup> > 0 is analyzed in Appendix A. In Figure 1, we show representative trajectories for the position at time *t*, *x*(*t*), while, in Figure 2, we present the same for *D*(*t*) and we show the notation we use.

**Figure 1.** Typical trajectory of *x*(*t*) given by Equation (2) with *D*<sup>+</sup> = 10 (blue regions), and *D*<sup>−</sup> = 0 (red regions). For this trajectory, exponential waiting times with *τ*<sup>+</sup> = 1 and *τ*− = 5 were used.

We define *T*<sup>±</sup> as the occupation time in state "±", namely the total amount of time that the process diffuses with *D*<sup>+</sup> or *D*<sup>−</sup> during *t*. Jumps between states *D*<sup>+</sup> and *D*<sup>−</sup> occur at random times *t*1, *t*2, etc., until a final measurement time *t* and clearly *t* = *T*<sup>+</sup> + *T*−. The intervals of time between each jump are defined by *τ*<sup>1</sup> = *t*1, *τ*<sup>2</sup> = *t*<sup>2</sup> − *t*1, *τ*<sup>3</sup> = *t*<sup>3</sup> − *t*2, etc., see Figure 2. Then, the occupation times in each state, when started from *D*+, are explicitly provided by

$$\begin{array}{rcl} T\_{+} &=& \tau\_{1} + \tau\_{3} + \dots + \tau\_{N} \\ T\_{-} &=& \tau\_{2} + \tau\_{4} + \dots + \tau\_{N-1} + \tau^{\*} & if \quad N = 2k + 1, \\ T\_{+} &=& \tau\_{1} + \tau\_{3} + \dots + \tau\_{N-1} + \tau^{\*}, \\ T\_{-} &=& \tau\_{2} + \tau\_{4} + \dots + \tau\_{N} & if \quad N = 2k, \\ \end{array} \tag{1}$$

where *N* is the random number of transitions that were performed between the two states during the measurement time *t* and *k* is an integer. The measurement time *t* and *N* satisfy *t* ≥ *tN*, with *tN* = *τ*<sup>1</sup> + *τ*<sup>2</sup> + ... + *τN*, i.e., the exact time when the *N*th jump was performed. The backward recurrence time *τ*<sup>∗</sup> is defined by *τ*<sup>∗</sup> = *t* − *tN* [46]. Each waiting time *τ<sup>i</sup>* follows *τ<sup>i</sup>* = *ti* − *ti*−<sup>1</sup> with *i* ∈ (1, *N*). For this particular initial condition, odd values of *i* in *τ<sup>i</sup>* refer to waiting times at *D*<sup>+</sup> and even values of *i* to waiting times during which

the diffusion coefficient is *D*<sup>−</sup> (see Figure 2). Expressions similar to Equation (1) are also obtained when the process starts from *D*−, see Equation (A65) .

Since the particle is diffusing with a constant diffusion constant *D*+ for time *τ*1, when starting from *<sup>D</sup>*+, the position *<sup>x</sup>*(*τ*1) is simply *<sup>x</sup>*(*τ*1) = <sup>√</sup>2*D*+*τ*1*ξ*1, where *<sup>ξ</sup>*<sup>1</sup> is a zero mean Gaussian random variable with *ξ*<sup>2</sup> <sup>1</sup> = 1. When at the state with diffusion constant *D*<sup>−</sup> = 0, the particle is not moving, therefore *<sup>x</sup>*(*t*2) <sup>−</sup> *<sup>x</sup>*(*t*1) = 0 and *<sup>x</sup>*(*t*3) <sup>−</sup> *<sup>x</sup>*(*t*2) = <sup>√</sup>2*D*+*τ*3*ξ*3, where *<sup>ξ</sup>*<sup>3</sup> is a zero mean Gaussian random variable with *ξ*<sup>2</sup> <sup>3</sup> = 1 independent of *ξ*1. Generally, *<sup>x</sup>*(*ti*) <sup>−</sup> *<sup>x</sup>*(*ti*−1) = <sup>√</sup>2*D*±*τiξi*, where all *<sup>ξ</sup><sup>i</sup>* are independent zero mean Gaussian random variables that satisfy *ξ*<sup>2</sup> *<sup>i</sup>* = 1. By using Equation (1) and exploiting the properties of summation of independent Gaussian variables, we obtain that the position at general time *t* is provided by

$$x(t) = \sqrt{2D\_{+}T\_{+}}\xi\_{\prime} \tag{2}$$

when *D*<sup>+</sup> > 0 and *D*<sup>−</sup> = 0. Equation (2) holds irrespective of the state at *t* = 0. We see that the particles' position is a product of two independent random variables, the square root of the time staying at the state *D*+ times a standard Gaussian random variable.

**Figure 2.** Alternating process for the diffusivity, starting from the state '+' and *N* = 2*k* + 1. For the case of equilibrium initial conditions exposed in Section 2.2, for *N* = 1, *τ*<sup>1</sup> works as the forward recurrence time with PDF Equation (10).

In the following, we consider a situation in which the process has started long before the measurement began, i.e., at *t* = 0, the process was already running for a very long time. In this way, the measurement begins from an initial condition in which the system is in equilibrium, meaning that the probability to start from *D*<sup>+</sup> is *τ*+/[*τ*<sup>+</sup> + *τ*−] and accordingly the probability to start from *D*<sup>−</sup> is *τ*−/[*τ*<sup>+</sup> + *τ*−] (see [46,47]). For this set-up, the PDF of the occupation time *T*+, *ft*(*T*+), is determined by the contribution to start from *D*<sup>+</sup> and the contribution to start from *D*−, yielding

$$f\_t(T\_+) = \frac{\langle \tau \rangle\_+}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} f\_t^+(T\_+) + \frac{\langle \tau \rangle\_-}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} f\_t^-(T\_+),\tag{3}$$

where *f* ± *<sup>t</sup>* (*T*+) is the PDF of *T*<sup>+</sup> for measurement time *t*, given that the process has started from ±. Since *D*<sup>−</sup> = 0, Equation (2) dictates that the positional PDF, provided that the system has occupied the state with *D*+ for a time *T*+, is given by

$$P(\mathbf{x}|T\_+) = \frac{e^{-\frac{x^2}{4D\_+T\_+}}}{\sqrt{4\pi D\_+T\_+}}.\tag{4}$$

The propagator of the system is obtained via integrating over all possible values of the occupation time *T*+, whose PDF is *ft*(*T*+) Equation (3), yielding

$$P(\mathbf{x}, t) \quad = \int\_0^t P(\mathbf{x}|T\_+) f\_l(T\_+) dT\_+. \tag{5}$$

Likewise, we can work with the temporal occupation fraction, which is defined by *p*+ = *T*+/*t* with 0 ≤ *p*<sup>+</sup> ≤ 1. In this case, the positional PDF for a specific value of *p*<sup>+</sup> follows

$$P(\mathbf{x}|p\_{+}) = \frac{e^{-\frac{\mathbf{x}^{2}}{4D\_{+}tp\_{+}}}}{\sqrt{4\pi D\_{+}tp\_{+}}}.\tag{6}$$

and the propagator is obtained similarly to Equation (5), but using the PDF of *p*+, which we denote by *gt*(*p*+),

$$P(\mathbf{x}, t) \quad = \int\_0^1 P(\mathbf{x}|p\_+) \mathbf{g}\_t(p\_+) dp\_+. \tag{7}$$

Since the properties of *P*(*x*|*T*+) or *P*(*x*|*p*+) are known, the task of computing the propagator completely depends on our ability to calculate the PDF of *T*+ or *p*+. In the following section, we address this problem.

#### *2.2. The General Case: Arbitrary Distribution of Waiting Times*

Two regimes of the process are of special interest. The long and the short limits of the measurement time *t*. The two different limits involve different considerations when computing the PDFs of the occupation time (*T*+) and fraction (*p*+). We first handle the regime of small *t* and then we treat the *t* → ∞ limit.

#### 2.2.1. Short Time Regime

The PDF of the occupation time *T*+ is defined by Equation (3). We condition on the number of transitions *N*, and each term *f* ± *<sup>t</sup>* (*T*+) is provided by

$$f\_t^{\pm}(T\_+) = \sum\_{N=0}^{\infty} f\_t^{\pm}(T\_+|N) Q\_t^{\pm}(N),\tag{8}$$

where *Q*± *<sup>t</sup>* (*N*) is the probability to perform exactly *N* transitions during *t* when the process started at ±. *f* <sup>±</sup> *<sup>t</sup>* (*T*+|*N*) is the PDF of *T*<sup>+</sup> when exactly *N* transitions were performed (during *t*), and the process has started from ±. This conditional probability is obtained by counting the number of trajectories of temporal span *t* that started from the ± state and performed exactly *N* transitions, out of the total number of trajectories that started from the ± state and for which the diffusion spent a total time *T*<sup>+</sup> at this state. Utilizing Equation (8), we rewrite Equation (3) as

$$f\_t(T\_+) = \frac{\langle \mathbf{r} \rangle\_+}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \sum\_{N=0}^{\infty} f\_t^+(T\_+|N) Q\_t^+(N) + \frac{\langle \mathbf{r} \rangle\_-}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \sum\_{N=0}^{\infty} f\_t^-(T\_+|N) Q\_t^-(N). \tag{9}$$

Since we consider a renewal process, the expression for *Q*± *<sup>t</sup>* (*N*) is known in the Laplace space [42], as *Q*ˆ <sup>±</sup> *<sup>s</sup>* (*N*) = L{*Q*<sup>±</sup> *<sup>t</sup>* (*N*)} <sup>=</sup> <sup>∞</sup> <sup>0</sup> *Q*<sup>±</sup> *<sup>t</sup>* (*N*) exp(−*ts*) *dt*, for any general *<sup>ψ</sup>*ˆ±(*s*) = L{*ψ*±(*τ*)}. Concretely, *Q*<sup>±</sup> *<sup>t</sup>* (*N*) is obtained by taking into account all the possibilities to perform *N* jumps up to time *tN* < *t*, and no additional jumps during the backward recurrence time *τ*∗. This sums up to a convolution of *N* + 1 random variables. It is important to notice that, since we assume equilibrium initial conditions, *τ*1, which is measured from *t* = 0, is only a part of a full renewal event and is termed the forward recurrence time. The PDF of *τ*<sup>1</sup> for the ± state, *f* <sup>±</sup> *eq* (*τ*1), is provided by (see [46])

$$f\_{\epsilon\eta}^{\pm}(\pi\_1) = \left(1 - \int\_0^{\mathbb{T}\_1} \psi\_{\pm}(\pi) \, d\pi\right) / \langle \pi \rangle\_{\pm} \tag{10}$$

and in the Laplace space L{ *f* <sup>±</sup> *eq* (*τ*1)} = (<sup>1</sup> <sup>−</sup> *<sup>ψ</sup>*ˆ±(*s*)) *τ*±*s*. This initial condition stems from the equilibrium of the underlying process, in which we do not have a jump at the initial time (*t*<sup>0</sup> = 0 in Figure 2). In the literature [13,42,46,48,49], the case where the renewal process starts at *t* = 0 is called ordinary or non-equilibrium, and as we will see below, by following our approach, this does not yield any universal features for *P*(*x*, *t*), hence the assumption of an equilibrium process is important in our methodology, (see discussion about non-equilibrium initial conditions in Appendix B).

The probability of not performing any jumps during *τ*∗ is equivalent to the probability of obtaining a waiting time *<sup>τ</sup>N*+<sup>1</sup> <sup>&</sup>gt; *<sup>τ</sup>*∗, i.e., 1 <sup>−</sup> *<sup>τ</sup>*<sup>∗</sup> <sup>0</sup> *ψ*±(*τ*) *dτ*. Eventually, by implementing the initial equilibrium condition, we obtain

$$\begin{aligned} \label{eq:1} \hat{Q}\_{s}^{\pm}(0) &= \quad \frac{1 - \frac{1 - \hat{\Psi}\_{\mp}(s)}{\langle \tau \rangle \pm s}}{s}, \\ \hat{Q}\_{s}^{\pm}(1) &= \quad \left( \frac{1 - \hat{\Psi}\_{\pm}(s)}{\langle \tau \rangle \pm s} \right) \left( \frac{1 - \hat{\Psi}\_{\mp}(s)}{s} \right), \\ \hat{Q}\_{s}^{\pm}(2) &= \quad \left( \frac{1 - \hat{\Psi}\_{\pm}(s)}{\langle \tau \rangle \pm s} \right) \hat{\Psi}\_{\mp}(s) \left( \frac{1 - \hat{\Psi}\_{\pm}(s)}{s} \right), \\ \hat{Q}\_{s}^{\pm}(3) &= \quad \left( \frac{1 - \hat{\Psi}\_{\pm}(s)}{\langle \tau \rangle \pm s} \right) \hat{\Psi}\_{\mp}(s) \psi\_{\pm}(s) \left( \frac{1 - \hat{\Psi}\_{\mp}(s)}{s} \right). \end{aligned} \tag{11}$$

In all the equations above on the right-hand side, we have a multiplication of functions in the Laplace space, this implies convolutions as we transform from *s* to *t*. The first term in the multiplication on the right-hand side of Equation (11) obviously stems from the equilibrium initial condition under study. We assume that the PDF of the waiting times is analytic for *τ* → 0, thus we can express *ψ*±(*τ*) as [22,23]

$$\psi\_{\pm}(\tau) \sim \mathbb{C}\_{A\_{\pm}}^{\pm} \tau^{A\_{\pm}} + \mathbb{C}\_{A\_{\pm} + 1}^{\pm} \tau^{A\_{\pm} + 1} + \dots,\tag{12}$$

with *A*<sup>±</sup> ≥ 0 an integer number. As an example, consider the case with exponential waiting times, i.e., *ψ*±(*τ*) = *ψ*(*τ*) = exp(−*τ*/*τ*)/*τ*, namely the waiting times at the *D*<sup>±</sup> states are identically distributed. Its analytic expansion is *<sup>ψ</sup>*(*τ*) <sup>∼</sup> 1/*τ* − *<sup>τ</sup>*/*τ*2, with *<sup>A</sup>*<sup>±</sup> <sup>=</sup> 0, *C*± *<sup>A</sup>*<sup>±</sup> <sup>=</sup> 1/*τ* and *<sup>C</sup>*<sup>±</sup> *<sup>A</sup>*±+<sup>1</sup> <sup>=</sup> 1/*τ*2. The analyticity of *<sup>ψ</sup>*±(*τ*) Equation (12) is a very mild demand that covers a wide range of sojourn times distributions. Since we are interested in the small *t* limit, the corresponding behavior in the Laplace space is found for *s* → ∞, where the leading terms of *<sup>ψ</sup>*ˆ±(*s*) are [22]

$$\psi\_{\pm}(\mathbf{s}) \sim \frac{\Gamma(A\_{\pm} + 1)\mathbb{C}\_{A\_{\pm}}^{\pm}}{s^{A\_{\pm} + 1}} + \frac{\Gamma(A\_{\pm} + 2)\mathbb{C}\_{A\_{\pm} + 1}^{\pm}}{s^{A\_{\pm} + 2}} + \dots,\tag{13}$$

For the mentioned example with exponential waiting times, *<sup>ψ</sup>*ˆ(*s*) ∼ 1/[*τs*]. Using Equation (13) for *Q*ˆ <sup>±</sup> *<sup>s</sup>* (*N*), we obtain that, in the *s* → ∞ limit, corresponding to the short time limit, which is at the focus of our interest

$$\begin{array}{rcl} \mathcal{Q}\_{s}^{\pm}(0) & \sim & \frac{1}{s} - \frac{1}{\langle \tau \rangle\_{\pm s^{2}}} + \frac{\Gamma(A\_{\pm} \pm 1)\mathcal{C}\_{\pm}^{\pm}}{\langle \tau \rangle\_{\pm s^{A \pm 3}} + 3} + \dots \\ \mathcal{Q}\_{s}^{\pm}(1) & \sim & \frac{1}{\langle \tau \rangle\_{\pm s^{2}}} - \frac{2\mathcal{C}\_{A\_{\pm}}^{\pm}\Gamma(A\_{\pm} \pm 1)}{\langle \tau \rangle\_{\pm s^{A \pm 3}} + 3} + \dots \\ \mathcal{Q}\_{s}^{\pm}(2) & \sim & \frac{\Gamma(A\_{\mp} + 1)\mathcal{C}\_{A\_{\mp}}^{\mp}}{\langle \tau \rangle\_{\pm s^{A \mp 3}} + 3} + \dots \\ \mathcal{Q}\_{s}^{\pm}(3) & \sim & \frac{\Gamma(A\_{\pm} + 1)\Gamma(A\_{\mp} + 1)\mathcal{C}\_{A\_{\pm}}^{\pm}\mathcal{C}\_{A\_{\mp}}^{\mp}}{\langle \tau \rangle\_{\pm s^{A \pm \pm A\_{\mp}}} + 4} + \dots \end{array}$$

We see that the leading terms for all *Q*ˆ <sup>±</sup> *<sup>s</sup>* (*N*) with *N* > 1 are of the order 1/*s<sup>γ</sup>* with *γ* > 2. Thus, in the small *t* limit, terms with *N* > 1 contain contributions that scale like *t <sup>γ</sup>*−<sup>1</sup> and are negligible with respect to the *N* ∈ {0, 1} cases. Therefore, only the first two *Q*<sup>±</sup> *<sup>t</sup>* (*N*)s are taken into account, i.e.,

$$Q\_t^{\pm}(0) \quad \sim \quad 1 - \frac{t}{\langle \tau \rangle\_{\pm}},\tag{15}$$

$$Q\_t^{\pm}(1) \quad \sim \quad \frac{t}{\langle \tau \rangle\_{\pm}}.\tag{16}$$

This is an expected result, as, for short times, only contributions from a single transition and zero transitions are important. By calculating *Q*± *<sup>t</sup>* (*N*), we advanced towards obtaining the behavior of the PDF of *T*+, according to Equation (9), in order to complete this mission, one needs to compute the relevant contributions of *f* ± *<sup>t</sup>* (*T*+|*N*) in the *t* → 0 limit. First, we see that the conditional distribution *f* ± *<sup>t</sup>* (*T*+|0) depends only on the starting state. There are only two types of trajectories that have performed 0 transitions, i.e., for all the time, they have been either at *D*<sup>+</sup> or at *D*−. Consequently,

$$f\_t^+(T\_+|0\rangle\_- = \delta(t - T\_+),$$

$$f\_t^-(T\_+|0\rangle\_- = \delta(T\_+). \tag{18}$$

The calculation of *f* ± *<sup>t</sup>* (*T*+|*N*) is obtained by conditioning over the first event. If starting from the + state, the process will spend a time *τ*<sup>1</sup> at this state before jumping to the − state. *τ*<sup>1</sup> can attain any value 0 ≤ *τ*<sup>1</sup> ≤ *T*<sup>+</sup> and for the remaining time *t* − *τ*<sup>1</sup> the process has to perform one transition less. In general, without regarding the initial conditions of the problem, an integration over all possible *τ*1's provides the relation

$$f\_t^+\left(T\_+|N'+1\right) = \int\_0^{T\_+} \frac{1}{B\_+} \psi\_+\left(\tau\_1\right) f\_{t-\tau\_1}^-\left(T\_+ - \tau\_1|N'\right) d\tau\_1 \tag{19}$$

with *N* + 1 = *N*, and *B*+ a normalization factor. For instance, for *N* = 1, we have that *t* <sup>0</sup> *ψ*+(*τ*1)/*B*<sup>+</sup> *dτ*<sup>1</sup> = 1, which stems from the fact that we consider only trajectories of time span *t*. The corresponding formula for *f* − *<sup>t</sup>* (*T*+|*N* + 1) is

$$f\_{l}^{-}(T\_{+}|N'+1) = \int\_{0}^{t-T\_{+}} \frac{1}{B\_{-}} \psi\_{-}(\tau\_{1}) f\_{l-\tau\_{1}}^{+}(T\_{+}|N') \,d\tau\_{1}.\tag{20}$$

Since we are assuming equilibrium initial conditions, the *ψ*<sup>±</sup> in the *N* + 1 element of the iterative forms (Equations (19) and (20)) must be replaced by *f* ± *eq* (Equations (10)). As was already noted above, only the *N* = 0 and *N* = 1 are of interest in the small *t* limit, then, according to Equations (17), (20), and (10), we get for *N* = 1

$$f\_t^+(T\_+|1) = \frac{f\_{eq}^+(T\_+)}{\int\_0^t f\_{eq}^+(t') \, dt'},\tag{21}$$

$$f\_t^-\left(T\_+|1\right) = \frac{f\_{cq}^-(t-T\_+)}{\int\_0^t f\_{cq}^-(t') \, dt'}.\tag{22}$$

Using the small time approximation of *ψ*±(*τ*) Equation (12), in Equations (21) and (22), we obtain that, independently of the starting state,

$$f\_t^{\pm}(T\_+|1) \sim \frac{1}{t}.\tag{23}$$

The 1/*t* dependence comes from the integral factors in Equations (21) and (22), all the other terms in the numerator and denominator simply cancel out. See Appendix B for a complementary derivation of Equation (23) using the definition of the joint PDF of *T*+ and *N*. Gathering Equations (15)–(18), and (23) in Equation (9), we find that

$$\begin{split} f\_t(T\_+) &\sim \frac{\langle \tau \rangle\_+}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \Big( 1 - \frac{t}{\langle \tau \rangle\_+} \Big) \delta(t - T\_+) \\ &\quad + \frac{\langle \tau \rangle\_-}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \Big( 1 - \frac{t}{\langle \tau \rangle\_-} \Big) \delta(T\_+) + \frac{2}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-}. \end{split} \tag{24}$$

The PDF of the occupation fraction is obtained trivially from Equation (24) by changing variables to *p*+ = *T*+/*t*

$$\begin{split} g\_t(p\_+) &\sim \frac{\langle \mathbf{r} \rangle\_+}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \Big( 1 - \frac{t}{\langle \mathbf{r} \rangle\_+} \Big) \delta(1 - p\_+) \\ &+ \frac{\langle \mathbf{r} \rangle\_-}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \Big( 1 - \frac{t}{\langle \mathbf{r} \rangle\_-} \Big) \delta(p\_+) + \frac{2t}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-}. \end{split} \tag{25}$$

The third term in Equations (24) and (25) is uniform, i.e., terms which are independent of *T*+ or *p*+, and this is the first main result of this paper. All the additional terms and contributions to the PDF of *p*+ only introduce terms that depend on higher orders of *t* and are thus negligible in the small *t* limit. This means that, for equilibrium initial conditions, regardless of the exact form of *ψ*±(*τ*), the PDF of *p*<sup>+</sup> (Equation (25)) is always uniform for 0 < *p*+ < 1. This general uniform behavior of the PDF of the occupation fraction is applicable for an extremely large class of waiting times PDFs *ψ*±(*τ*). As a remark, the connection between the conditional PDF of *T*+, *f* <sup>±</sup> *<sup>t</sup>* (*T*+|*N*), and the joint PDF of *T*<sup>+</sup> and *N*, *f* <sup>±</sup> *<sup>t</sup>* (*T*+, *N*) is discussed in Appendix B.1. In the following, it is shown that this uniformity leads to universal features of the propagator in the limit of small *t*. In Sections 2.3.1 and 2.3.2, we treat particular examples (with exponential waiting times) that are exactly tractable, without any simplifications or assumptions. The results agree perfectly with the general form in Equation (25). It is important to notice that our approximations affect only the form of *gt*(*p*+) and do not affect *P*(*x*|*p*+). This allows us to obtain the behavior of *P*(*x*, *t*) for any −∞ < *x* < ∞, as is shown below.

#### 2.2.1.1. *P*(*x*, *t*) for Arbitrary Waiting Times

In order to obtain the positional PDF for small *t*, we combine Equations (6), (7), and (25), which, after integration, gives

$$P(\mathbf{x},t) = \frac{\langle \mathbf{r} \rangle\_{+}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \left(1 - \frac{t}{\langle \mathbf{r} \rangle\_{+}}\right) \frac{e^{-\frac{\mathbf{r}^{2}}{4D\_{+}t}}}{\sqrt{4\pi D\_{+}t}} + \frac{\langle \mathbf{r} \rangle\_{-}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \left(1 - \frac{t}{\langle \mathbf{r} \rangle\_{-}}\right) \delta(\mathbf{x})$$

$$+ \quad \left(\frac{2t}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}}\right) \left\{\frac{e^{-\frac{\mathbf{r}^{2}}{4D\_{+}t}}}{\sqrt{\pi D\_{+}t}} - \frac{|\mathbf{x}|}{2D\_{+}t} \left(1 - Erf\left(\frac{|\mathbf{x}|}{\sqrt{4D\_{+}t}}\right)\right)\right\}.\tag{26}$$

Considering *<sup>x</sup>* = 0, in the limit of *<sup>x</sup>* −→ 0 when exp(−*x*2/4*D*+*t*) ∼ <sup>1</sup> − *<sup>x</sup>*2/4*D*+*<sup>t</sup>* and 1 − *Er f*(|*x*|/ <sup>√</sup>4*D*+*t*) <sup>∼</sup> <sup>1</sup> <sup>−</sup> <sup>2</sup>|*x*|/ <sup>√</sup>4*πD*+*t*. After substituting in Equation (26), it turns into

$$P(\mathbf{x},t) \quad \sim \frac{(3t + \langle \mathbf{r} \rangle\_{+})}{\sqrt{4\pi D\_{+}t}[\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}]} - \frac{|\mathbf{x}|}{D\_{+}[\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}]} + K\_{1}\mathbf{x}^{2},\tag{27}$$

with *K*<sup>1</sup> = (5*t* − *τ*+)/[8(*τ*<sup>+</sup> + *τ*−) <sup>√</sup>*π*(*D*+*t*) 3 <sup>2</sup> ]. We can see that, in Equation (27), there is a linear dependence on |*x*| in the vicinity of *x* = 0. This means that for short enough measurement times the PDF of *x* will always have a tent like shape, irrespective of the distributions *ψ*<sup>±</sup> that were chosen (see Figure 3 below). Only the mean sojourn times affect the shape. This is a general result for the short time regime, and it is based on the general fact that the PDF of the temporal occupation fraction is uniform for 0 < *p*+ < 1. Concretely, at short times when |*x*| is small, the decay of *P*(*x*, *t*) will always resemble an exponential one. For large |*x*|, the form of *P*(*x*, *t*) must be Gaussian, due to the fact that this limit is determined by the instances when no transition to *D*<sup>−</sup> was ever made and the transport is controlled by diffusion with *D*+. However, if we only look at the particles that have moved, i.e., we get rid of the delta function at *x* = 0 in Equation (26). We can relate these dynamics with some experiments which condition the measurements on the movement of the particles. This procedure is called population splitting; see [50,51]. Technically, if *D*<sup>−</sup> > 0, the cusp is not found; however, as long as *D*−/*D*<sup>+</sup> << 1, the tent like shape will be found; for further details, see Appendix A.

#### 2.2.2. Long Time Regime

In the limit *t* −→ ∞, the PDF of the temporal occupation fraction *gt*(*p*+) follows a different but also a general form. As mentioned, we are focusing on the case where both *ψ*<sup>±</sup> have finite first moments, *τ*± > 0. In the long time limit, ergodicity is satisfied, namely the equivalence of *ensemble* and temporal averages are attained. Particularly, in this case, the *ensemble* average of the occupation fraction at *D*+ is equal to the temporal average which is defined by the fraction of average waiting times at *D*<sup>+</sup> and *D*−, i.e., *p*+ = *τ*+/[*τ*<sup>+</sup> + *τ*−] (see Appendix F). Thus, in the long time limit, *gt*(*p*+) converges to a *δ*-function,

$$\log\_t(p\_+) \xrightarrow[t \to \infty]{} \delta \left(p\_+ - \frac{\langle \tau \rangle\_+}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-}\right). \tag{28}$$

Since ergodicity prevails, by using Equation (28) in Equation (7), the positional PDF gets the form

$$P(\mathbf{x},t) = \sqrt{\frac{\langle \mathbf{r} \rangle + \langle \mathbf{r} \rangle\_{-}}{4\pi D\_{+}t \langle \mathbf{r} \rangle\_{+}}} \mathbf{e}^{-\frac{\mathbf{x}^{2}(\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})}{4D\_{+}t \langle \mathbf{r} \rangle\_{+}}}.\tag{29}$$

In the long time limit, the positional PDF given by Equation (29) represents a Gaussian propagator with an effective diffusion coefficient *D*+*τ*+/[*τ*<sup>+</sup> + *τ*−]. Since *τ*+/[*τ*<sup>+</sup> + *τ*−] < 1, and the effective diffusion coefficient is always smaller compared with *D*+. Indeed, this slow-down is an expected result due to the portion of the time that the particle spends in the state with *D*<sup>−</sup> = 0 and basically not moving during this period.

**Figure 3.** Distribution of displacements *P*(*x*, *t*) in semi-log scale, obtained by simulations, for a two state system with uniform distributed waiting times and gamma distributed waiting times. The left panel presents short time results where a tent like shape is clearly visible and a non-analytical feature is obvious, while the right panel exhibits Gaussian statistics for long times. Left: *P*(*x*, *t*) for *t* = 1 for *τ* ∼ *U*(0, 5) at *D*<sup>+</sup> and *τ* ∼ *U*(0, 10) at *D*<sup>−</sup> (red triangles)—with *τ*<sup>+</sup> = 2.5 < *τ*− = 5. In addition, *t* = 2 with *τ* ∼ *Gamma*(0.5, 8) at *D*<sup>+</sup> and *τ* ∼ *Gamma*(0.5, 12) at *D*<sup>−</sup> (blue squares), such that *τ*<sup>+</sup> = 4 < *τ*− = 6. Both cases fit with Equation (26) (red and blue solid lines) with a tent like shape. In both normalized histograms at *x* = 0, there is a peak representing the Dirac delta function in Equation (26). Right: *P*(*x*, *t*) for *t* = 30 and waiting times uniformly distributed (green triangles) with the same parameters as above and for gamma distributed waiting times (orange squares) with *τ* ∼ *Gamma*(2, 1) at *D*+, *τ* ∼ *Gamma*(8, 1) at *D*−, and *τ*<sup>+</sup> = 2 < *τ*− = 8. We employed the last set of parameters in the gamma distributed waiting times in order to avoid an overlapping between curves. *P*(*x*, *t*) converges to the Gaussian statistics Equation (29) (green and orange solid lines). In all the presented cases, *D*<sup>+</sup> = 10 and *D*<sup>−</sup> = 0 were used.

#### 2.2.3. Simulations

The two general limits of *gt*(*p*+) Equations (25) and (28) produce two different prevailing distributions of *P*(*x*, *t*) Equations (26) and (29). In Figure 3, we compare analytical formulas Equations (26) and (29) (solid lines) with simulations of two different state models one with uniform distributed waiting times *τ* ∼ *U*(0, 5) for *D*<sup>+</sup> and *τ* ∼ *U*(0, 10) for *D*<sup>−</sup> and with *t* = 1 (red triangles) and *t* = 30 (green triangles), such that *τ*<sup>+</sup> = 2.5 < *τ*− = 5. Here, the notation *τ* ∼ *U*(*a*, *b*) means that *τ* has a uniform distribution with *a* and *b* the minimum and maximum values, respectively. In addition, the other with gamma distributed waiting times, such that *τ* ∼ *Gamma*(*k*, *θ*). The latter notation denotes that *τ* has a gamma distribution with *k* its shape parameter and *θ* the corresponding scale parameter. In this case, the PDF follows

$$\psi\_{\pm}(\tau) = \frac{\tau^{k-1} \varepsilon^{-\frac{\tau}{\theta}}}{\Gamma(k)\theta^{k}} , \tag{30}$$

particularly the PDF of the gamma distribution Equation (30) implies a cumulative distribution function *F*(*τ*) = *γ*(*k*, *τ*/*θ*)/Γ(*k*), with *γ*(*x*, *y*) the incomplete gamma function and Γ(*x*) the standard gamma function. For the latter case, we used *τ* ∼ *Gamma*(0.5, 8) at *D*<sup>+</sup> and *τ* ∼ *Gamma*(0.5, 12) at *D*−, for *t* = 2 (blue squares) and *t* = 30 (orange squares), with *τ*<sup>+</sup> = 4 < *τ*− = 6. As we can see in the short time regime, for uniform and gamma distributed waiting times (red triangles and blue squares), *P*(*x*, *t*) has a tent shape for short displacements, and it agrees with Equation (26), joined with a peak at *x* = 0 due to the Dirac delta function in Equation (26). For *t* = 30 (green triangles and orange squares), each case of *P*(*x*, *t*) converges to Gaussian statistics (Equation (29)).

The cusp we have found for small |*x*| implies that we may approximate the distribution on a small scale with a Laplace like distribution, *P*(*x*, *t*) ∼ exp(−*C*|*x*|). However, clearly this does not hold globally for large *x*, see Figure A2 in Appendix C. Still within the interval of short displacements, due to the presence of the delta peak at the origin, we expect for this span a considerable contribution on the normalization of *P*(*x*, *t*). Particularly, we find that the area underneath the curve for the case of uniformly distributed waiting times (red line) in Figure 3 in the left panel has a value of 0.88 for *x* ∈ (−4, 4). In addition, the corresponding area within the same figure, but, for gamma distributed waiting times, the (blue curve) has a value of 0.89 for *x* ∈ (−8, 8).

#### *2.3. Exponentially Distributed Waiting Times*

In this section, we obtain *gt*(*p*+) for a specific distribution of waiting times, but using different methods, which let us corroborate the validity of our general approach described above. We analyze the case of exponential waiting times in states with *D*<sup>+</sup> and *D*−, each waiting time following a PDF given by

$$\psi\_{\pm}(\tau) \quad = \begin{array}{c} \frac{e^{-\frac{\mathbf{v}}{\langle \tau \rangle\_{\pm}}}}{\langle \tau \rangle\_{\pm}}. \end{array} \tag{31}$$

We show first the case of a two state system with the same mean waiting times and then investigate the complimentary case. In Appendix D, we analyze both cases for non-equilibrium initial conditions, e.g., a system starting from *D*+.

### 2.3.1. Equal Mean Waiting Times *τ*<sup>+</sup> = *τ*−

Let us consider a system with *τ*<sup>+</sup> = *τ*− = *τ*. We know that the temporal fraction occupation *p*+ and *T*+ can be related to the difference of occupation times defined by *St* = *T*<sup>+</sup> − *T*−, as *St* = 2*T*<sup>+</sup> − *t* = 2*p*+*t* − *t* [46]. In this section, we analyze the double Laplace transform of the PDF of *St*, called *φt*(*St*), with Laplace pairs *St* ⇔ *v* and *t* ⇔ *s*. In [46], *φt*(*St*) is provided by

$$\hat{\phi}\_s(\upsilon) = \frac{s[1 - \psi(s+\upsilon)\psi(s-\upsilon)] + \upsilon[\psi(s+\upsilon) - \psi(s-\upsilon)]}{(s^2 - \upsilon^2)[1 - \psi(s+\upsilon)\psi(s-\upsilon)]}.\tag{32}$$

The Laplace transform of *ψ*(*τ*) in Equation (31) is given by L *ψ*(*τ*) = *ψ*ˆ(*s*) = <sup>1</sup> <sup>1</sup>+*τs*. Substituting *ψ*ˆ(*s*) in Equation (32), we obtain

$$
\hat{\phi}\_s(\upsilon) = \frac{s + 2\langle \tau \rangle}{s^2 + 2\langle \tau \rangle s - \upsilon^2}. \tag{33}
$$

In Appendix E, an analytical expression for the PDF of *St* is found, i.e., inverse Laplace transform of Equation (33) is performed (see Equation (A44)). Then, remember that the temporal occupation fraction in the plus state *p*+ is related to the difference of occupation times as *St* = 2*p*+*t* − *t*. We can employ Equation (A44) for obtaining the PDF of *p*+, which is given by

$$\begin{split} g\_t(p\_+) &= \,\_2^1 e^{-\frac{t}{\langle \tau \rangle}} \left\{ \delta(1 - p\_+) + \delta(p\_+) \right\} \\ &+ \,\_2^{t\Theta(t - \lfloor 2p\_+ t - t \rfloor) e^{-\frac{t}{\langle \tau \rangle}}} \left[ I\_0 \left( \frac{2t \sqrt{p\_+(1 - p\_+)}}{\langle \tau \rangle} \right) + \frac{I\_1 \left( \frac{2t \sqrt{p\_+(1 - p\_+)}}{\langle \tau \rangle} \right)}{2 \sqrt{p\_+(1 - p\_+)}} \right] . \end{split} \tag{34}$$

A similar expression for a system with non-equilibrium initial conditions (always starting from *D*+) is found in Appendix D. Expanding Equation (34) in the short time limit *t* −→ 0, i.e., *t* << *τ*, Equation (34) can be approximated by a uniform distribution

$$\mathcal{G}\_t(p\_+) \sim \frac{e^{-\frac{t}{\langle \tau \rangle}}}{2} \left\{ \delta(1 - p\_+) + \delta(p\_+) \right\} + \frac{t}{\langle \tau \rangle}. \tag{35}$$

For *τ*<sup>+</sup> = *τ*− = *τ*, Equation (35) agrees with Equation (25) obtained by the general approach of Secton 2.2. In the left panel of Figure 4, we show the short time approximation of *gt*(*p*+) (Equation (35)) compared with the general formula in Equation (34); it is evident that both results agree perfectly. In the right panel of Figure 4, we show Equation (34) for short and long measurement times. *gt*(*p*+) evolves from a uniform distribution to a peaked distribution centered at its mean value *p*+ = 1/2 (see Appendix E for a deduction of the central moments of *gt*(*p*+)).

**Figure 4.** Left: Comparison between *gt*(*p*+) Equation (34) (red solid line) and the short time uniform approximation Equation (35) (black asterisks) for exponentially distributed waiting times Equation (31) with *τ*± = *τ* = 1 and *t* = 0.1. Right: *gt*(*p*+) Equation (34) for *τ* = 1 and *t* ∈ {0.1, 0.5, 1, 2, 5, 10}.

Positional Distribution Function

An analytical expression for the positional distribution function *P*(*x*, *t*) (given by Equation (7)), with *gt*(*p*+) provided by Equation (34), can be deduced by using the series representation of the modified Bessel functions, *<sup>I</sup>ν*(*y*) = <sup>∞</sup> ∑ *k*=0 ( *y* <sup>2</sup> )2*k*+*ν*/[*k*!Γ(*<sup>ν</sup>* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> <sup>1</sup>)]. The integration in Equation (7) yields

$$P(x,t) = \frac{e^{-\frac{t}{\pi}} \cdot \frac{\pi^2}{4Dt}}{2\sqrt{4\pi Dt}} + \frac{\delta(x)e^{-\frac{t}{\pi}}}{2} +$$

$$\frac{e^{-\frac{t}{\pi}} \cdot \frac{\pi^2}{4Dt}}{2(\pi)\sqrt{4\pi Dt}} \left\{ \sum\_{k=0}^{\infty} \frac{(-1)^k \pi}{k!} \left(\frac{t}{\pi}\right)^{2k} \left[\frac{{}\_1F\_1(k+1; \frac{1}{2}-k; \frac{x^2}{4Dt})}{\Gamma(2k+\frac{3}{2})\Gamma(\frac{1}{2}-k)} - \frac{\left(\frac{x^2}{4Dt}\right)^{k+\frac{1}{2}}{\Gamma(k+1)\Gamma(k+\frac{3}{2})}}{\Gamma(k+1)\Gamma(k+\frac{3}{2})}\right] + \\ \frac{1}{2} \sum\_{k=0}^{\infty} \frac{(-1)^k \pi}{(k+1)!} \left(\frac{{}\_1F\_1(k+1; \frac{1}{2}-k; \frac{x^2}{4Dt})}{\Gamma(2k+\frac{3}{2})\Gamma(\frac{1}{2}-k)} - \frac{\left(\frac{x^2}{4Dt}\right)^{k+\frac{1}{2}}{\Gamma(k+1)\Gamma(k+\frac{3}{2})} \frac{\pi^2}{4Dt} \right) \right\},$$

with <sup>1</sup>*F*1(*a*; *b*; *z*) the confluent hypergeometric function of the first kind. Nonetheless, in a short time limit, we can use the uniform approximation of *gt*(*p*+) (Equation (35)), and then Equation (7) provides

$$P(\mathbf{x},t) \sim \frac{e^{-\frac{t}{\langle \mathbf{r} \rangle} - \frac{\mathbf{x}^2}{4D\_+t}}}{2\sqrt{4\pi D\_+t}} + \frac{\delta(\mathbf{x})e^{-\frac{t}{\langle \mathbf{r} \rangle}}}{2} + \frac{t}{\langle \mathbf{r} \rangle} \left\{ \frac{2e^{-\frac{\mathbf{x}^2}{4D\_+t}}}{\sqrt{4\pi D\_+t}} - \frac{|\mathbf{x}|}{2D\_+t} \left[1 - Erf\left(\frac{|\mathbf{x}|}{\sqrt{4D\_+t}}\right) \right] \right\} \tag{37}$$

which agrees with the results obtained above in Equation (26), when *τ*<sup>+</sup> = *τ*− and for *t* −→ 0, since, in that limit, exp(−*t*/*τ*) ∼ 1 − *t*/*τ*. Particularly for *x* = 0 and taking *x* −→ 0, Equation (37) yields a tent shaped propagator described by

$$P(x,t) \sim \frac{3t + \langle \pi \rangle}{4 \langle \pi \rangle \sqrt{\pi D\_{+}t}} - \frac{|x|}{2D\_{+} \langle \pi \rangle} + K\_{2} x^{2},\tag{38}$$

with *K*<sup>2</sup> = (5*t* − *τ*)/[16*τ* <sup>√</sup>*π*(*D*+*t*) 3 <sup>2</sup> ] and in concordance with Equation (27). On the other hand, within this short time limit, for large displacements *x* −→ ∞, the two terms between curly braces in Equation (37) cancel each other, and only the first term in Equation (37) is left (when *x* = 0). This is due to the expansion of 1 − *Er f*(*z*) ∼ exp(−*z*2)/( <sup>√</sup>*πz*) for *<sup>z</sup>* −→ <sup>∞</sup>, in our case *<sup>z</sup>* <sup>=</sup> <sup>|</sup>*x*|/ <sup>√</sup>4*D*+*t*. Then, Equation (37) can be approximated by

$$P(x,t) \underset{x \to \infty}{\underset{x \to \infty}{\text{ or }}} \frac{e^{-\frac{t}{\langle \pi \rangle} - \frac{x^2}{4D\_+t}}}{2\sqrt{4\pi D\_+t}}.\tag{39}$$

This Gaussian behavior of *P*(*x*, *t*) at the tails is expected. The large |*x*| limit is dominated by trajectories for which no transitions to *D*<sup>−</sup> were performed and a pure diffusion process with *D*+ occurs.

When *t* >> *τ*, ergodicity is satisfied and, therefore, the system on average visits the two states the same amount of time. Namely, the *ensemble* average of *p*+ is equal to the corresponding fraction of the average waiting times. In this case, when *τ*<sup>+</sup> = *τ*−, the occupation fraction is concentrated at *p*+ = 1/2. Thus, the PDF of *p*+ is represented by the delta function

$$
\xi g\_t(p\_+) \xrightarrow[t \to \infty]{} \delta \left(p\_+ - \frac{1}{2}\right). \tag{40}
$$

Substituting Equation (40) in Equation (7), we recover Gaussian statistics for the displacements

$$P(x,t) \sim \frac{e^{-\frac{x^2}{2D\_+t}}}{\sqrt{2\pi D\_+t}}.\tag{41}$$

In Figure 5, we present the two different limit distributions for *P*(*x*, *t*) in the short time limit *t* = 0.1 (red circles) and *t* = 0.5 (blue crosses) Equation (37) and the Gaussian limit for *t* = 5 (orange circles) and *t* = 10 (green crosses) Equation (41), for the normalized variable *z* = *x*/ <sup>√</sup>*t*. As we can see, the displacements for short times follow a tent shape (black solid line) and a Gaussian one in the long time limit (magenta solid line).

**Figure 5.** *P*(*z*, *t*) in semi-log scale, with *z* = *x*/ <sup>√</sup>*t*. Left: For short times *<sup>t</sup>* <sup>=</sup> 0.1 (red circles) and *t* = 0.5 (blue crosses), *P*(*z*, *t*) is represented by Equation (37) (black solid line) with a tent like shape. Right: The same for large times *t* = 5 (orange circles) and *t* = 10 (green crosses), *P*(*z*, *t*) converges to the Gaussian distribution Equation (41) (magenta solid line). In all the cases, *D*<sup>+</sup> = 10, *D*<sup>−</sup> = 0, and *τ* = 1 were used.

2.3.2. Different Mean Waiting Times *τ*<sup>+</sup> = *τ*−

Relaxing the assumption of equal mean waiting times for exponentially distributed sojourn times in the model, we have that *τ*<sup>+</sup> = *τ*−, with waiting times following Equation (31). As mentioned, for equilibrium initial conditions, the PDF of *T*+ is

given by Equation (3). Let ˆ *f* ± *<sup>s</sup>* (*u*) be the double Laplace transform of *f* <sup>±</sup> *<sup>t</sup>* (*T*+), defined as ˆ *f* ± *<sup>s</sup>* (*u*) = <sup>∞</sup> 0 <sup>∞</sup> <sup>0</sup> *ft*(*T*+) exp(−*uT*<sup>+</sup> − *st*) *dT*<sup>+</sup> *dt*. Then, the different terms of the PDF of *T*<sup>+</sup> in Equation (3) are provided in Laplace space, by [42,47,49]

$$\hat{f}\_s^+(u) = \left\{ \hat{\psi}\_+(s+u) \left[ \frac{1-\hat{\psi}\_-(s)}{s} \right] + \frac{1-\hat{\psi}\_+(s+u)}{s+u} \right\} \frac{1}{1-\hat{\psi}\_+(s+u)\hat{\psi}\_-(s)},\tag{42}$$

$$\hat{f}\_s^-\left(u\right) = \left\{\hat{\psi}\_-\left(s\right) \left[\frac{1-\hat{\psi}\_+\left(s+u\right)}{s+u}\right] + \frac{1-\hat{\psi}\_-\left(s\right)}{s}\right\} \frac{1}{1-\hat{\psi}\_+\left(s+u\right)\hat{\psi}\_-\left(s\right)}.\tag{43}$$

Summing up Equations (42) and (43) according to Equation (3), we obtain, for exponentially distributed waiting times,

$$f\_s(u) = \frac{\langle \mathbf{r} \rangle\_-^2 + \langle \mathbf{r} \rangle\_+^2 (1 + \langle \mathbf{r} \rangle\_- s) + \langle \mathbf{r} \rangle\_+ \langle \mathbf{r} \rangle\_- [2 + \langle \mathbf{r} \rangle\_- (s + u)]}{(\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-) [\langle \mathbf{r} \rangle\_- s + \langle \mathbf{r} \rangle\_+ (1 + \langle \mathbf{r} \rangle\_- s)(s + u)]}. \tag{44}$$

Taking the double inverse Laplace transform of Equation (44) with respect to *u* ⇔ *T*<sup>+</sup> and *s* ⇔ *t* and changing variables to *p*<sup>+</sup> = *T*+/*t*, we obtain the PDF for *p*<sup>+</sup> (see details in Appendix F)

$$g\_t(p\_+) = \frac{\langle \tau \rangle\_{-\varepsilon}^{-\frac{l}{\langle \tau \rangle\_-}}}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \delta(p\_+) + \frac{\langle \tau \rangle\_{+\varepsilon}^{-\frac{l}{\langle \tau \rangle\_+}}}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \delta(1 - p\_+) + \frac{2t}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \left\{ I\_0 \left( 2t \sqrt{\frac{p\_+ (1 - p\_+)}{\langle \tau \rangle\_+ (\tau)}} \right) \right.$$

$$+ \left[ \frac{(1 - p\_+)\sqrt{\langle \tau \rangle\_+ (\tau)\_-}}{\langle \tau \rangle\_+} + \frac{p\_+ \sqrt{\langle \tau \rangle\_+ (\tau)\_-}}{\langle \tau \rangle\_-} \right] \frac{I\_1 \left( 2t \sqrt{\frac{p\_+ (1 - p\_+)}{\langle \tau \rangle\_+ (\tau)}} \right)}{2 \sqrt{p\_+ (1 - p\_+)}} \right] e^{-\frac{l p\_+}{\langle \tau \rangle\_+} - \frac{l(1 - p\_+)}{\langle \tau \rangle\_-}}. \tag{45}$$

For the case when *τ*<sup>+</sup> = *τ*− = *τ*, Equation (45) recovers Equation (34) obtained by the methods reported in [52,53]. The case of non-equilibrium initial conditions is shown in Appendix D.

In the short time regime, strictly speaking when *t* << *τ*±, by expanding Equation (45) for *t* −→ 0, *gt*(*p*+) can be approximated by the uniform distribution

$$g\_t(p\_+) \sim \frac{\langle \tau \rangle\_- e^{-\frac{t}{\langle \tau \rangle\_-}}}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \delta(p\_+) + \frac{\langle \tau \rangle\_+ e^{-\frac{t}{\langle \tau \rangle\_+}}}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \delta(1 - p\_+) + \frac{2t}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-}. \tag{46}$$

As mentioned above, Equation (25) that was deduced for general PDFs of waiting times, encloses the particular case of Equation (46). For the uniform approximation of *gt*(*p*+) (Equation (46)), the positional PDF (Equation (7)) is

$$\begin{array}{rcl}P(\mathbf{x},t) & \sim & \frac{\langle\mathbf{r}\rangle\_{+}e^{-\frac{t}{\langle\mathbf{r}\rangle\_{+}}} + \frac{\mathbf{x}^{2}}{4D+t}}{\langle(\mathbf{r}\rangle\_{+} + \langle\mathbf{r}\rangle\_{-})\sqrt{4\pi D\_{+}t}} + \frac{\langle\mathbf{r}\rangle\_{-}}{\langle\mathbf{r}\rangle\_{+} + \langle\mathbf{r}\rangle\_{-}}e^{-\frac{t}{\langle\mathbf{r}\rangle\_{-}}}\delta(\mathbf{x}) \\ & + & \frac{2t\varepsilon^{-\frac{1}{4D+t}}}{\langle(\mathbf{r}\rangle\_{+} + \langle\mathbf{r}\rangle\_{-})\sqrt{\pi D\_{+}t}} - \frac{|\mathbf{x}|}{D\_{+}(\langle\mathbf{r}\rangle\_{+} + \langle\mathbf{r}\rangle\_{-})} \left[1 - Erf\left(\frac{|\mathbf{x}|}{\sqrt{4D\_{+}t}}\right)\right], \end{array} \tag{47}$$

which agrees with the general case described by Equation (26).

Similar to Section 2.2, in the limit *t* −→ ∞, the PDF of the occupation fraction *gt*(*p*+) follows Equation (28). In addition, the PDF of the displacements in the long time regime is given by Equations (7) and (29), recovering Gaussianity.

In Figure 6, we show *gt*(*p*+) for exponential waiting times with *τ*<sup>+</sup> = 1 and *τ*− = 5, in the left panel, we compare the uniform approximation of Equation (46) (black asterisks) with the full solution Equation (45) (red solid line), observing an excellent agreement. In the right panel of Figure 6, the behavior of *gt*(*p*+) (as provided by Equation (45)) is displayed. As we can see, it starts with a uniform distribution for short times and then it evolves to a peaked distribution centered at *p*<sup>+</sup> = *τ*+/(*τ*<sup>+</sup> + *τ*−) = 1/6. As shown in

Appendix D, for non-equilibrium initial condition, the PDF of *p*+ is still uniform within the short time regime. See also Appendix B for other similar cases.

**Figure 6.** Left: Comparison between *gt*(*p*+) Equation (45) (red solid line) and the uniform approximation Equation (46) (black asterisks) for *τ*<sup>+</sup> = 1, *τ*− = 5 and *t* = 0.1. Right: *gt*(*p*+) Equation (45) for *τ*<sup>+</sup> = 1, *τ*− = 5 and *t* ∈ {0.1, 0.5, 2, 5, 10, 20}.

Finally, in Figure 7, we show the corresponding positional spreading for the normalized variable *z* = *x*/ <sup>√</sup>*t*. As we can see in the short time, *<sup>t</sup>* <sup>=</sup> 0.1 (red circles) and *<sup>t</sup>* <sup>=</sup> 0.5 (blue crosses) *P*(*z*, *t*) (given by Equation (47)) attain a tent-like shape. In the long run, *t* = 20 (orange circles) and *t* = 30 (green squares) *P*(*z*, *t*) have a Gaussian distribution given by Equation (29).

**Figure 7.** For a system with, *τ*<sup>+</sup> = 1 and *τ*− = 5, *P*(*z*, *t*) in semi-log scale, with *z* = *x*/ <sup>√</sup>*t*. For short times *t* = 0.1 (red circles) and *t* = 0.5 (blue crosses), *P*(*z*, *t*) is represented by Equation (47) (black solid line) with a tent like shape. For large times *t* = 20 (orange circles) and *t* = 30 (green diamonds), *P*(*z*, *t*) converges to the Gaussian statistics Equation (29) (magenta solid line). In all the cases, *D*<sup>+</sup> = 10 and *D*<sup>−</sup> = 0 were used. Compared with Figure 5, in this case, the Gaussian curve is above the tent curve, contrary to the case with equal mean waiting times. This is because the coefficient of the Gaussian curve Equation (29) is bigger compared with the weight of the delta peak in Equation (47). In Figure 5, we have the opposite, and the weight of the corresponding delta function in Equation (37) is bigger compared with the Gaussian Equation (41).

#### **3. Discussion**

### *3.1. The Histogram of the Diffusion Coefficient as Extracted from Experimental Data*

#### 3.1.1. Super-Statistics

We have found that at *x* = 0, *P*(*x*, *t*) exhibits a cusp. A mathematically similar non-analytical behavior is found using an approach called super-statistics [12,14,21,54], which was used to explain laboratory observations. This framework postulates that the distribution of diffusion constants in the system is exponential, namely *P*(*D*) =

exp(−*D*/*D*)/*D* for *D* > 0 and *D* the average diffusivity. Then, the diffusion follows a Gaussian process with a random *D*. This approach gives

$$P(\mathbf{x},t) = \int\_0^\infty \frac{e^{-\frac{\mathbf{x}^2}{4Dt}}}{\sqrt{4\pi Dt}} \frac{e^{-\frac{D}{\langle D \rangle}}}{\langle D \rangle} dD = \frac{e^{-\frac{\|\mathbf{x}\|}{\langle D \rangle t}}}{4\langle D \rangle t}. \tag{48}$$

Here, on the right-hand side, we have the Laplace PDF, which was used by Laplace in 1774 [55] to describe his linear law of errors [56]. In addition, as in our case, within the super-statistics method, we see in Equation (48) a non-analytical behavior since *P*(*x*, *t*) ∼ *C*<sup>1</sup> − *C*2|*x*|, for small *x* and with *C*1, *C*<sup>2</sup> constants. Our work does not support the Laplace law, see Equations (26) and (27). However, maybe more importantly, the whole approach presented in this manuscript differs from the super-statistical approach in the following way. In our model, we have two diffusion constants, *D*<sup>+</sup> and *D*<sup>−</sup> = 0 (see Appendix A for the case when *D*<sup>−</sup> = 0). Hence, the PDF of diffusion constants is *P*(*D*) = *aδ*(*D*) + *bδ*(*D* − *D*+), with *a*, *b* ≥ 0. It follows that the super-statistical approach predicts that the diffusing packet *P*(*x*, *t*) is a sum of a delta function corresponding to non-moving particles and a Gaussian packet describing the movers. Thus, when the non-moving particles are excluded, we have perfect Gaussian behavior. This is actually correct, to leading order, for very short times. Thus, the super-statistical approach gives the correct *t* −→ 0 behavior but fails to predict the main issue (in our opinion), and that is the cusp on *x* = 0. To explore the non-analytical behavior, one needs to go to the next order terms in the expansion to include paths with a transition between states. Then, as we have shown, the equilibrium initial condition yields a uniform distribution of the occupation fraction Equation (25). It is this fact that brings the non-analytical behavior in the final result for *P*(*x*, *t*) Equation (27), graphically represented by a "tent" see Figures 3, 5 and 7. It follows that the exponential conspiracy in which distribution of diffusion constants is exponential is not a necessary condition for a cusp like behavior of *P*(*x*, *t*). We further remark that the non-analytical behavior is found also in the context of normal diffusion in [35–37] and within the anomalous one at [31–34,36,38–40].

#### 3.1.2. Time Average MSD

We note that, in single molecule experiments, the time average mean squared displacement (TAMSD) is used in many cases to estimate the distribution of diffusion constants [2,5,6]. Since time averages are recorded over a finite measurement time, the time average fluctuates. Hence, we have naturally a distribution of the estimator for the diffusion parameters. In addition, the aforementioned two delta peak distribution of *D*, i.e., on *D*<sup>+</sup> and on *D*−, is expected to be smeared out. This topic was extensively studied in a wide variety of models [57,58].

We now investigate the fluctuations of the time averaged diffusivities in a two state model and their implications in the distribution of diffusion coefficients obtained from real experimental data. For a further analysis of the time average diffusivity within a two state system, see [59,60].

We note that, in different single particle tracking experiments with non-Gaussian propagators, the recorded distribution of the diffusion coefficient *D* (obtained by means of TAMSD analysis) is relatively broad and peaked close to the origin [2,5,6]. Those experimental distributions of *D* are typically fitted by exponential [6] or gamma [2] distributions. Within the two state model, the diffusivity takes only two possible values *D*<sup>−</sup> or *D*+, but the respective TAMSD analysis gives values of *D* around *D*<sup>−</sup> and *D*<sup>+</sup> [59]. The average *D* is given by *D* = (*D*+*τ*<sup>+</sup> + *D*−*τ*−)/(*τ*<sup>+</sup> + *τ*−). Thus, how different is the distribution of the diffusivities, extracted via TAMSD techniques, in a two state model compared with the one present in single molecule experiments? As we show next, this will be determined by the values of *D*<sup>±</sup> and *τ*±. In Figure 8, we show the distribution of the diffusion coefficients obtained by means of TAMSD analysis for *D*<sup>+</sup> = 10, *D*<sup>−</sup> = 0 and exponentially distributed waiting times. We show two different cases, the first one with

the same mean waiting times *τ*<sup>+</sup> = *τ*− = 1 (see red boxes). In addition, the second one with different mean waiting times, such that *τ*<sup>+</sup> = 1 and *τ*− = 5 (see blue boxes).

**Figure 8.** Distribution of diffusion coefficients *P*(*D*) obtained via TAMSD analysis of simulated trajectories of a two state system with *D*<sup>+</sup> = 10, *D*<sup>−</sup> = 0 and exponentially distributed waiting times. From the linear plots of the TAMSD versus the lag time estimates of *D* were extracted. We show two cases, the first for a system with the same mean waiting times *τ*<sup>+</sup> = *τ*− = *τ* = 1 (red boxes). In addition, the PDF of *D* for a system with different mean waiting times with *τ*<sup>+</sup> = 1 and *τ*− = 5 is also shown (blue boxes). For the system with the same mean waiting times, the average diffusivity found in the simulations is *D* = 4.98, and, for the case of different mean waiting times, we have *D* = 1.69. In both cases, we used *t* = 1000 and 1000 trajectories.

As we can see in Figure 8, when the difference between the diffusion coefficients is large, as in our case *D*<sup>+</sup> = 10 > *D*<sup>−</sup> = 0, *P*(*D*) is relatively broad. Nonetheless, for the case with *τ*<sup>+</sup> = 1 and *τ*− = 5, the peak of *P*(*D*) is closer to the origin compared to the case with *τ* = 1.

This difference between mean waiting times in each state is the second factor that determines the shape of *P*(*D*). For instance, when this difference is such that *τ*<sup>+</sup> < *τ*−, it is straightforward that the more the process spends in the state "−", the more the observed values of *D* will be closer to *D*−. In this latter case, the distribution of *D* is peaked close to the origin since *D*<sup>−</sup> < *D*+. Thus, we can say that, when the differences between the diffusivities (and the mean waiting times) in the different states are pronounced, i.e., *D*<sup>−</sup> << *D*<sup>+</sup> and *τ*<sup>+</sup> << *τ*−, *P*(*D*) in the two sate model resembles the distributions found in single molecule experiments [2,5,6].

#### **4. Conclusions**

From symmetry of the density of spreading particles *P*(*x*, *t*) = *P*(−*x*, *t*), we expect an analytical expansion of the propagator as *<sup>P</sup>*(*x*, *<sup>t</sup>*) ∼ *<sup>K</sup>*<sup>1</sup> − *<sup>K</sup>*2*x*<sup>2</sup> + ..., with *<sup>K</sup>*1, *<sup>K</sup>*<sup>2</sup> constants. Instead, in the two state model handled throughout this work, we get an expansion that is linear in |*x*|, see Equation (27). This is a non-analytical expansion graphically represented by a tent like structure, see Figures 3, 5 and 7. As mentioned above, Laplace in 1774 considered a similar non-analytical PDF, *P*(*x*) = exp(−|*x*|)/2 for −∞ < *x* < ∞ [55,56]. However, the expression we find is clearly non-exponential, see Equation (26). Furthermore, for large *x*, we get a Gaussian behavior for *P*(*x*, *t*). It should be noted that a non-analytical behavior is found only if *D*<sup>−</sup> = 0, see Appendix A for further details. In practice, we may approach the non-analytical features of *P*(*x*, *t*), as *D*<sup>−</sup> is getting small.

Recently, a very general theory was developed for the non-Gaussian spreading of packets of particles. Using a CTRW framework, it was shown that, for any analytical PDF of waiting times, for large *x* limit *P*(*x*, *t*) ∼ exp(−*C*|*x*| ln |*x*|), with *C* a constant [22]. In the former model, we thus find exponential tails for large *x*, while, here, the anomaly, i.e., the cusp or tent like feature of *P*(*x*, *t*), comes from the small *x* limit.

Recently, Postnikov et al. [37] investigated a model of diffusion in a quenched disordered setting, where the diffusive field is spatially varying. They showed that equilibrium

initial conditions play a major role stating: "within the class of models with quenched disorder, the Itô model under equilibrium conditions is the only promising candidate for the description of Brownian Non Gaussian diffusion (BnG)." Note that here the definition of BnG means a model or system where the MSD is increasing linearly *for all times* and the propagator is non-Gaussian. Our model uses a time dependent diffusivity, and we showed that equilibrium initial conditions are indeed a key requirement. Here, we note that BnG does not imply a cusp, and vice versa. Namely, we may find a system where the MSD is increasing linearly in time, for the entire span of time, with or without a cusp for *P*(*x*, *t*) at *x* = 0. The main focus of our work is the presence of a cusp for *P*(*x*, *t*). Regarding the behavior of the MSD, it can be shown that, when equilibrium initial conditions are applied, *T*+ = (*τ*+*t*)/[*τ*<sup>+</sup> + *τ*−], for all times *t* (see Appendix G). Then, by Equations (A62) and (A68), the MSD is provided by

$$
\langle \mathbf{x}^2(t) \rangle = \left( \frac{D\_+ \langle \mathbf{r} \rangle\_+ + D\_- \langle \mathbf{r} \rangle\_-}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \right) \mathbf{t}\_\prime \tag{49}
$$

for any time *t*. Thus, if the process starts from equilibrium, the MSD grows linearly for all times and we have BnG. Nevertheless, we would like to emphasize that our model is exhibiting BnG, but specifically *P*(*x*, *t*) has a cusp only if *D*<sup>−</sup> = 0, and practically when *D*<sup>−</sup> << *D*+.

To summarize, we emphasize that we have shown, by means of the statistics of the temporal occupation, that there is a universality for the PDF of the temporal occupation fraction in a two state model. For PDFs of waiting times with finite first moments, *gt*(*p*+) can be approximated by a uniform distribution following Equation (25). This leads to tent like decaying propagators (Equation (26)) similar to those found in many experimental systems. We corroborate our results by solving analytically a two state system with exponentially distributed waiting times. We have shown that, either for short or long times, the distribution of displacements *P*(*x*, *t*) has a general form, either a "tent" or a Gaussian bell curve. These two endpoints of the positional PDF are independent of the actual form of the distribution of waiting times. The crucial point within our framework is the generality of the behavior of the PDF of the occupation fraction *p*+, being a uniform distribution for short times and a delta peak for long times. The former was found for a system with equilibrium initial conditions. We note that, for certain types of non-equilibrium initial conditions, we can still get a uniform PDF for the fraction occupation time; however, this is not generic (see details in Appendix B). Therefore, the non-Gaussian features are readily present in our model within the short time regime, and regardless of the specifics of the waiting times.

Mathematically, we presented an expansion in terms of the number of transitions from state + to − and backwards. Naturally, for very short times, the leading contribution to the packet comes from the paths with zero transitions, and then the packet is simply a sum of two Gaussian curves with diffusion coefficients *D*<sup>+</sup> and *D*−. However, we showed that, by going to next order terms in the expansion, namely considering the paths with a single jump, we get the cusp like shape, found in the limit *D*<sup>−</sup> → 0. Thus, the whole effect is achieved by using a perturbation approach obtaining the leading order correction to the trivial behavior. Put differently, a widely popular super statistical approach is found to miss one of the main issues of the field, namely the cusp in *P*(*x*, *t*). A super-statistical approach [14,54] uses a distribution of diffusivities, which in our model is a sum of two delta functions, at *D*<sup>−</sup> = 0 and *D*+. This does not give the cusp, as it is merely the zeroth order of the perturbation theory developed here.

**Author Contributions:** Conceptualization, E.B. and S.B.; methodology, E.B. and S.B.; software, M.H.- S.; validation, E.B., S.B. and M.H.-S.; formal analysis, M.H.-S. and S.B.; investigation, M.H.-S.; writing original draft preparation, M.H.-S.; writing—review and editing, S.B. and E.B.; visualization, M.H.-S.; supervision, E.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** E.B. and M.H.-S. are thankful for the support of the Israel Science Foundation Grant No. 1898/17. S.B. is grateful for the support of the Pazy foundation Grant No. 61139927 and the Israel Science Foundation Grant No. 2796/20.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All the data sets obtained by numerical simulations or data analysis are available from the corresponding authors upon request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

CTRW Continuous Time Random Walk TAMSD Time Average Mean Squared Displacement

### **Appendix A. A Two State Model with** *D***<sup>+</sup>** *> D<sup>−</sup> >* **0**

When *D*<sup>+</sup> > *D*<sup>−</sup> > 0, the process for the displacements becomes

$$\mathbf{x}(t) = \sqrt{2D\_{+}T\_{+}}\mathbf{\tilde{s}}\_{1} + \sqrt{2D\_{-}(t - T\_{+})}\mathbf{\tilde{s}}\_{2},\tag{A1}$$

with *ξ*<sup>1</sup> and *ξ*<sup>2</sup> each i.i.d. Gaussian variables. In this case, the form of the conditioned PDF is given by

$$P(\mathbf{x}, t | T\_+) = \frac{e^{-\frac{\mathbf{x}^2}{4[D\_+T\_+ + D\_-(t-T\_+)]}}}{\sqrt{4\pi[D\_+T\_+ + D\_-(t-T\_+)]}}.\tag{A2}$$

Then, the marginal distribution for the displacements follows

$$P(\mathbf{x},t) = \int\_0^1 \frac{e^{-\frac{y^2}{4l[D\_+p\_+ + D\_-(1-p\_+)]}}}{\sqrt{4\pi t[D\_+p\_+ + D\_-(1-p\_+)]}} g\_I(p\_+) dp\_+. \tag{A3}$$

*Appendix A.1. P*(*x*, *t*) *for Arbitrary Waiting Times*

As we did in Section 2.2.1.1, using the general forms obtained above, i.e., Equations (9), (17), (18), (23), and (28), we can analyze *P*(*x*, *t*) in the short and long time limits.

**Figure A1.** Distribution of displacements *P*(*x*, *t*) obtained by simulations of a two state system with *D*<sup>+</sup> > *D*<sup>−</sup> > 0 and gamma distributed waiting times *τ* ∼ *Gamma*(3, 1) at *D*<sup>+</sup> and *τ* ∼ *Gamma*(6, 1) at *D*<sup>−</sup> following Equation (30). We compare with Equation (A4) (solid lines) with *t* = 0.5, *τ*<sup>+</sup> = 3, *τ*− = 6, *D*<sup>+</sup> = 10. For *D*<sup>−</sup> = 0.1 (red triangles), *D*<sup>−</sup> = 5 (cyan squares) and *D*<sup>−</sup> = 9 (magenta circles). Exponential like decaying is present at small values for *x*, when *D*<sup>+</sup> = 10 >> *D*<sup>−</sup> = 0.1 (red solid line). In the cases when *D*<sup>−</sup> −→ *D*<sup>−</sup> (cyan and magenta solid lines), *P*(*x*, *t*) follows a full Gaussian distribution.

Appendix A.1.1. Short Time Regime

Substituting Equation (25) in Equation (A3), we get

$$\begin{split} P(\mathbf{x},t) &= \begin{array}{c} \frac{\langle \mathbf{r} \rangle\_{+}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \left(1 - \frac{t}{\langle \mathbf{r} \rangle\_{+}} \right) \frac{e^{-\frac{\mathbf{r}^{2}}{4D\_{+}t}}}{\sqrt{4\pi D\_{+}t}} + \frac{\langle \mathbf{r} \rangle\_{-}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \left(1 - \frac{t}{\langle \mathbf{r} \rangle\_{-}} \right) \frac{e^{-\frac{\mathbf{r}^{2}}{4D\_{-}t}}}{\sqrt{4\pi D\_{-}t}} \\ &+ \frac{2}{\sqrt{\pi}(\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})[D\_{-} - D\_{+}]} \left\{ \sqrt{D\_{-}t} e^{-\frac{\mathbf{r}^{2}}{4D\_{-}t}} - \sqrt{D\_{+}t} e^{-\frac{\mathbf{r}^{2}}{4D\_{+}t}} \\ &- \frac{\sqrt{\pi}|\mathbf{x}|}{2} \left[ \mathrm{Erf} \left(\frac{|\mathbf{x}|}{\sqrt{4D\_{+}t}}\right) - \mathrm{Erf} \left(\frac{|\mathbf{x}|}{\sqrt{4D\_{-}t}}\right) \right] \right\}. \end{split} \tag{A4}$$

In Figure A1, we compare Equation (A4) (in solid lines) and *P*(*x*, *t*) obtained by simulations of a two state model for a fixed value of *D*<sup>+</sup> and different values of *D*−, such that *D*<sup>+</sup> > *D*−. In all cases, we used gamma distributed waiting times *τ* ∼ *Gamma*(3, 1) for the state with *D*<sup>+</sup> and *τ* ∼ *Gamma*(6, 1) for the state with *D*<sup>−</sup> (the gamma distribution is defined by Equation (30)). As we can see when *D*<sup>+</sup> >> *D*−, e.g., *D*<sup>+</sup> = 10 and *D*−0.1 (red triangles), the PDF of the displacements at small values of *x* has a non-Gaussian peak; thereafter, for large values of *x*, it follows a Gaussian distribution. When the values of *D*<sup>−</sup> approach *D*<sup>+</sup> (cyan squares and magenta circles), *P*(*x*, *t*) is fully described by Gaussian statistics even in the short time limit.

Appendix A.1.2. Long Time Regime

In the long time limit, the PDF of temporal occupation fraction is provided by Equation (28), then, according to Equation (A3), the PDF of the displacements is determined by

$$P(\mathbf{x},t) \sim \sqrt{\frac{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}}{4\pi t [D\_{+} \langle \mathbf{r} \rangle\_{+} + D\_{-} \langle \mathbf{r} \rangle\_{-}]}} e^{-\frac{x^{2} (\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})}{4t [D\_{+} \langle \mathbf{r} \rangle\_{+} + D\_{-} \langle \mathbf{r} \rangle\_{-}]}}.\tag{A.5}$$

Thus, the Gaussian limit is also restored.

*Appendix A.2. P*(*x*, *t*) *for Exponentially Distributed Waiting Times with τ*<sup>+</sup> = *τ*−

In the short time regime, we can use the uniform approximation Equation (46) in Equation (A3), and the distribution for the displacements yields

$$\begin{split} P(\mathbf{x},t) &\sim \frac{\langle \mathbf{r} \rangle\_{+}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \frac{e^{-\frac{t}{\langle \mathbf{r} \rangle\_{+}} - \frac{\mathbf{r}^{2}}{4D\_{+}t}}}{\sqrt{4\pi D\_{+}t}} + \frac{\langle \mathbf{r} \rangle\_{-}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \frac{e^{-\frac{t}{\langle \mathbf{r} \rangle\_{-}} - \mathbf{r} \frac{\mathbf{r}^{2}}{4D\_{-}t}}}{\sqrt{4D\_{-}t}} \\ &+ \frac{1}{\langle \langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-} \rangle (D\_{-} - D\_{+}) \sqrt{\pi}} \Bigg\{ \sqrt{4D\_{-}t} e^{-\frac{\mathbf{r}^{2}}{4D\_{-}t}} - \sqrt{4D\_{+}t} e^{-\frac{\mathbf{r}^{2}}{4D\_{+}t}} \\ &+ \pi \Biggl[ \mathrm{Erf} \Big( \frac{|\mathbf{x}|}{\sqrt{4D\_{-}t}} \Big) - \mathrm{Erf} \Big( \frac{|\mathbf{x}|}{\sqrt{4D\_{+}t}} \Big) \Bigg\}. \end{split} \tag{A6}$$

For the long time regime, we use *gt*(*p*+) provided by Equation (28); then, according to Equation (A3), the PDF of the displacements follows Gaussian statistics described by Equation (A5).

**Appendix B. A Complementary Deduction of** *f ± <sup>t</sup>* **(***T***+***|***1)**

In this section, we obtain Equation (23) from the definition of conditional probability. The conditional probability of *T*+, given that *N* jumps have been made, follows

$$f\_t^{\pm}(T\_+|N) = \frac{f\_t^{\pm}(T\_+,N)}{Q\_t^{\pm}(N)},\tag{A7}$$

with the distribution of jumps defined by [46]

$$Q\_t^{\\\pm}(N) = \langle 1\_{\left(t\_{N,tN+1}\right)}(t) \rangle\_{\prime} \tag{A8}$$

with (*a*,*b*)(*t*) the indicator function, such that it is equal to one if *<sup>t</sup>* <sup>∈</sup> (*a*, *<sup>b</sup>*) and zero if *t* ∈/ (*a*, *b*). The average · is over all the the values of *τi*'s, with *i* ∈ {1, 2, ... , *N* + 1}.In addition, *f* ± *<sup>t</sup>* (*T*+, *N*) is the joint probability of *T*<sup>+</sup> and *N*, which satisfies [46]

$$f\_t^{\pm}(T\_+,N) = \langle \delta(y - T\_+) \mathbf{1}\_{\left(t\_N t\_{N+1}\right)}(t) \rangle,\tag{A9}$$

with *tN* = *τ*<sup>1</sup> + ... + *τ<sup>N</sup>* and the average · defined as above.

Let us find Equations (A8) and (A9) and therefore Equation (A7) for the case *N* = 1. It is important to notice that, for the case of equilibrium initial conditions such as the one handled in Section 2.2, when *N* = 1, the corresponding average on *τ*<sup>1</sup> is given by the forward recurrence distribution Equation (10). Following Equation (A8), and taking the Laplace transform defined as *Q*ˆ <sup>±</sup> *<sup>s</sup>* (*N*) = <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*<sup>−</sup>*stQ*<sup>±</sup> *<sup>t</sup>* (*N*)*dt*, after simple manipulations, we obtain

$$\begin{split} \hat{Q}\_s^{\pm}(1) &= \int\_0^\infty e^{-s\tau\_1} f\_{eq}^{\pm}(\tau\_1) d\tau\_1 \int\_0^\infty \left( \frac{1 - e^{-s\tau\_2}}{s} \right) \psi\_\mp(\tau\_2) d\tau\_2 \\ &= \left( \frac{1 - \hat{\psi}\_\pm(s)}{\langle \tau \rangle \pm s} \right) \left( \frac{1 - \hat{\psi}\_\mp(s)}{s} \right) , \end{split} \tag{A10}$$

which is already the result shown in Equation (11).

For the joint distribution *f* ± *<sup>t</sup>* (*T*+, 1), following Equation (A9) and taking the double Laplace transform defined as ˆ *f* ± *<sup>s</sup>* (*u*, *<sup>N</sup>*) = <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*−*uT*<sup>+</sup> <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*−*st <sup>f</sup>* <sup>±</sup> *<sup>t</sup>* (*T*+, *N*)*dtdT*+, after performing the corresponding integrals in the case we started from "+", we get

$$\begin{split} \hat{f}\_s^+(u,1) &= \int\_0^\infty e^{-(s+u)\tau\_1} f\_{cq}^+(\tau\_1) d\tau\_1 \int\_0^\infty \left(\frac{1-e^{-s\tau\_2}}{s}\right) \psi\_-(\tau\_2) d\tau\_2 \\ &= \left(\frac{1-\hat{\psi}\_+(s+u)}{\langle \tau \rangle\_+(s+u)}\right) \left(\frac{1-\hat{\psi}\_-(s)}{s}\right). \end{split} \tag{A11}$$

Following the same procedure for the case when the process started from "−", we obtain

$$\hat{f}\_s^-\left(\boldsymbol{\mu},1\right) = \left(\frac{1-\hat{\psi}\_-\left(\mathbf{s}\right)}{\langle\mathbf{r}\rangle\_-\left(\mathbf{s}\right)}\right)\left(\frac{1-\hat{\psi}\_+\left(\mathbf{s}+\boldsymbol{\mu}\right)}{\mathbf{s}+\boldsymbol{\mu}}\right).\tag{A12}$$

Next, we show the connection of the joint PDf Equation (A9) with the uniform distribution of the occupation times Equation (24). Let ˆ *f eq <sup>s</sup>* (*u*, 1) be the double Laplace transform of *f eq <sup>t</sup>* (*T*+, 1), i.e., the joint PDF of *T*<sup>+</sup> and one single jump, starting from equilibrium. Clearly, the former follows

$$\hat{f}\_s^{eq}(u,1) = \frac{\langle \tau \rangle\_+}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \hat{f}\_s^+(u,1) + \frac{\langle \tau \rangle\_-}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \hat{f}\_s^-(u,1). \tag{A13}$$

Using Equations (A11) and (A12) in Equation (A13), we get

$$\hat{f}\_s^{\text{eq}}(\boldsymbol{u},1) = \frac{2}{\langle \boldsymbol{\tau} \rangle\_+ + \langle \boldsymbol{\tau} \rangle\_-} \left( \frac{1 - \hat{\psi}\_+(\mathbf{s} + \boldsymbol{u})}{s + \boldsymbol{u}} \right) \left( \frac{1 - \hat{\psi}(\boldsymbol{s})}{s} \right). \tag{A14}$$

One of the key features of our paper is found when we consider both *u* and *s* to be large. This corresponds to the short time limit, when *T*+ and *t* are of the same order. Then, we may use *<sup>ψ</sup>*ˆ+(*<sup>s</sup>* + *<sup>u</sup>*), *<sup>ψ</sup>*ˆ−(*s*) −→ 0 in Equation (A14), yielding to

$$\hat{f}\_s^{eq}(\boldsymbol{\mu}, 1) \sim \frac{2}{[\langle \boldsymbol{\pi} \rangle\_+ + \langle \boldsymbol{\pi} \rangle\_-](s+\boldsymbol{\mu})s}.\tag{A15}$$

Equation (A15) is easy to invert, and we find in the short time limit

$$f\_t^{eq}(T\_+,1) \sim \frac{2}{\langle \pi \rangle\_+ + \langle \pi \rangle\_-} ; \text{ for } T\_+ < t. \tag{A16}$$

This is the short time uniformity we have found that, in turn, as explained in Section 2.2, gives the cusp like shape in *P*(*x*, *t*). Equation (A16) is the last term in Equation (24), corresponding to *N* = 1 (the first two terms in Equation (24) are contributions from *N* = 0).

Now, we are interested in inverting Equations (A10)–(A12) and then applying the definition of conditional probability Equation (A7). However, first, since we are dealing with the short time limit *t* −→ 0, in the Laplace space, this corresponds to the limit of *s* −→ ∞ and *u* −→ ∞. Thus, for this particular approximation, due to the definition of the Laplace transform *<sup>ψ</sup>*ˆ±(*s*) = <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*−*stψ*±(*t*)*dt*, we have that lim*s*→<sup>∞</sup> *<sup>ψ</sup>*ˆ±(*s*) −→ 0 for a general *ψ*±(*τ*). In this case, Equations (A10)–(A12) are approximated by

$$
\hat{Q}\_s^{\pm}(1) \quad \sim \quad \frac{1}{\langle \tau \rangle\_{\pm} s^2} \tag{A17}
$$

$$
\hat{f}\_s^{\pm}(\mu, 1) \quad \sim \quad \frac{1}{\langle \pi \rangle\_{\pm}(s+\mu)s}.\tag{A18}
$$

Inverting Equation (A17) with respect to *s* and Equation (A18) with respect to *u* and *s* with 0 < *T*+ < *t*, we obtain

$$Q\_t^{\pm}(1) \quad \sim \quad \frac{t}{\langle \mathbf{r} \rangle\_{\pm}},\tag{A19}$$

$$f\_t^{\pm}(T\_{+\prime},1) \quad \sim \quad \frac{1}{\langle \mathbf{r} \rangle\_{\pm}}.\tag{A20}$$

Now, substituting Equations (A19) and (A20) in the conditional probability Equation (A7) for *N* = 1, we obtain

$$f\_t^{\pm}(T\_+|1) \sim \frac{1}{t'} \tag{A21}$$

which is the same result shown in Equation (23) of Section 2.2. As expected, since the joint distribution *f* ± *<sup>t</sup>* (*T*+, *N*) Equation (A20) does not depend on the time or any other variable, when it is used for computing the PDF of the occupation time, it gives the uniform distribution Equation (24). The same procedure for values of *N* ≥ 2 gives a joint distribution *f* ± *<sup>t</sup>* (*T*+, *N*) such that, in the double Laplace space, it is defined as

$$\begin{split} \hat{f}\_{s}^{+}(\boldsymbol{u},\boldsymbol{N}) &= \quad \left( \frac{1-\hat{\boldsymbol{\psi}}\_{+}(\boldsymbol{s}+\boldsymbol{u})}{\langle\boldsymbol{\tau}\rangle\_{+}(\boldsymbol{s}+\boldsymbol{u})} \right) \hat{\boldsymbol{\psi}}\_{-}^{k}(\boldsymbol{s}) \hat{\boldsymbol{\psi}}\_{+}^{k}(\boldsymbol{s}+\boldsymbol{u}) \left( \frac{1-\hat{\boldsymbol{\psi}}\_{-}(\boldsymbol{s})}{\mathbf{s}} \right); \\ \boldsymbol{if} &= 2k+1. \end{split} \tag{A22}$$
 
$$\begin{split} \hat{f}\_{s}^{+}(\boldsymbol{u},\boldsymbol{N}) &= \quad \left( \frac{1-\hat{\boldsymbol{\psi}}\_{+}(\boldsymbol{s}+\boldsymbol{u})}{\langle\boldsymbol{\tau}\rangle\_{+}(\boldsymbol{s}+\boldsymbol{u})} \right) \hat{\boldsymbol{\psi}}\_{+}^{k-1}(\boldsymbol{s}+\boldsymbol{u}) \hat{\boldsymbol{\psi}}\_{-}^{k}(\boldsymbol{s}) \left( \frac{1-\hat{\boldsymbol{\psi}}\_{+}(\boldsymbol{s}+\boldsymbol{u})}{\mathbf{s}+\boldsymbol{u}} \right); \\ \boldsymbol{if} &= 2k. \end{split} \tag{A23}$$
 
$$\begin{split} \hat{f}\_{s}^{-}(\boldsymbol{u},\boldsymbol{N}) &= \quad \left( \frac{1-\hat{\boldsymbol{\psi}}\_{-}(\boldsymbol{s})}{\langle\boldsymbol{\tau}\rangle\_{-}(\boldsymbol{s})} \right) \hat{\boldsymbol{\psi}}\_{-}^{k}(\boldsymbol{s}) \hat{\boldsymbol{\psi}}\_{+}^{k}(\boldsymbol{s}+\boldsymbol{u}) \left( \frac{1-\hat{\boldsymbol{\psi}}\_{+}(\boldsymbol{s}+\boldsymbol{u})}{\mathbf{s}+\boldsymbol{u}} \right); \end{split} \tag{A23}$$

$$\begin{array}{rcl} \hat{f}\_s^-\left(\mathsf{u}, N\right) &=& \left(\frac{1-\mathsf{\hat{\upmu}}\_{-}(s)}{\langle\mathsf{r}\rangle\_{-}(s)}\right)\hat{\mathsf{\hat{\upmu}}\_{-}^k(s)\hat{\upmu}\_{+}^k(s+u)\left(\frac{1-\mathsf{\hat{\upmu}}\_{+}(s+u)}{s+u}\right); \\\ &if & N=2k+1. \\ &\ddots & \ddots & \ddots & \ddots \end{array} \tag{A24}$$

$$\begin{array}{rcl}f\_s^-\left(\mathsf{u},N\right)&=& \left(\frac{1-\hat{\mathsf{y}}\cdot\mathsf{(s)}}{\langle\tau\rangle\_{-}(s)}\right)\hat{\mathsf{y}}\_{-}^{k-1}(s)\hat{\mathsf{y}}\_{+}^{k}(s+\mathsf{u})\left(\frac{1-\hat{\mathsf{y}}\cdot\mathsf{(s)}}{\mathsf{s}}\right);\\if&N=2k.\end{array}\tag{A25}$$

In order to analyze the joint, we have to deal with the analytical expression of *ψ*±(*τ*) defined by Equation (12). For instance for *N* = 2, after substituting Equation (13) in Equations (A23) and (A25), we have that the dominant term is ˆ *f* + *<sup>s</sup>* (*u*, 2) ∼ [Γ(*A*<sup>−</sup> + 1)*C*− *<sup>A</sup>*<sup>−</sup> ]/[*τ*+(*<sup>s</sup>* <sup>+</sup> *<sup>u</sup>*)2*sA*−+1], and <sup>ˆ</sup> *f* − *<sup>s</sup>* (*u*, 2) <sup>∼</sup> [Γ(*A*<sup>+</sup> <sup>+</sup> <sup>1</sup>)*C*<sup>+</sup> *<sup>A</sup>*<sup>+</sup> ]/[*τ*−*s*2(*<sup>s</sup>* <sup>+</sup> *<sup>u</sup>*)*A*++1], respectively. Inverting the double Laplace transform, in each case, it gives a positive power of *T*+ and therefore, with respect to *t*, e.g., *f* ± *<sup>t</sup>* (*T*+, 2) <sup>∼</sup> *<sup>T</sup>A*∓+<sup>1</sup> <sup>+</sup> . *A*<sup>∓</sup> ≥ 0 is a positive integer number, this correction term for *t* −→ 0 (*T*+ −→ 0) is negligible, and also the remaining terms in *f* ± *<sup>t</sup>* (*T*+, *N*) with *N* > 2. We conclude that, for the case of equilibrium initial conditions, the uniformity in the short time limit, for the PDF of the occupation/fraction time, is always preserved, as long *ψ*±(*τ*) is analytical.

#### *Appendix B.1. Non-Equilibrium Initial Conditions*

Still, by employing the joint distribution *f* ± *<sup>t</sup>* (*T*+, *N*), it can be shown, as follows that, for non-equilibrium initial conditions, when *ψ*±(*τ*) in the Laplace space is approximated by *<sup>ψ</sup>*ˆ±(*s*) ∼ 1/*<sup>s</sup>* for short times, it can lead to a uniform distribution in the occupation time and therefore to a tent shape in *P*(*x*, *t*).

In the case of non-equilibrium initial conditions, either starting just from *D*+ or from *D*−, the averages over *τ*1, within *Q*<sup>±</sup> *<sup>t</sup>* (*N*) Equation (A8) and the joint distribution *f* <sup>±</sup> *<sup>t</sup>* (*T*+, *N*) Equation (A9), are no longer given by *f* ± *eq* (*τ*1) Equation (10). Now, in the non-equilibrium case, these corresponding averages are performed using the waiting time PDF *ψ*±(*τ*1). Following the same procedure as above, for the case *N* = 1, the double Laplace transform of the joint distribution *f* ± *<sup>t</sup>* (*T*+, 1) yields to

$$
\hat{f}\_s^{\pm}(\mu, 1) = \hat{\Psi}\_{\pm}(s + \mu) \left( \frac{1 - \hat{\Psi}\_{\mp}(s)}{s} \right). \tag{A26}
$$

For a system with non-equilibrium initial conditions, in order to recover the uniform distribution in the PDF of *T*+, it is enough to ask that, for large *s* (short times), the PDF of the waiting times in the Laplace space follows

$$
\hat{\psi}\_{\pm}(s) \sim \frac{1}{s}.\tag{A27}
$$

Substituting Equation (A27) in Equation (A26), we have that ˆ *f* ± *<sup>s</sup>* (*u*, 1) ∼ 1/[(*s* + *u*)*s*]. This implies in the real space, for 0 < *T*+ < *t*, that the joint distribution follows *f* ± *<sup>t</sup>* (*T*+, 1) ∼ 1, and therefore, because of Equation (3), we also have a uniform distribution for *T*+. As an example of a model in which *<sup>ψ</sup>*ˆ±(*s*) goes as Equation (A27) and for non-equilibrium initial

conditions *ψ*±(*τ*) = *f* <sup>±</sup> *eq* (*τ*1) as expected, we have the case in which the PDF of waiting times is the sum of two exponential functions, e.g., *ψ*±(*τ*)=(1/2){[exp(−*τ*/*C*1±)/*C*1±] + [exp(−*τ*/*C*2±)/*C*2±]}, with *<sup>C</sup>*1±, *<sup>C</sup>*2<sup>±</sup> <sup>&</sup>gt; 0. In this case for *<sup>s</sup>* −→ <sup>∞</sup>, *<sup>ψ</sup>*ˆ±(*s*) <sup>∼</sup> [*C*1<sup>±</sup> <sup>+</sup> *C*2±]/[2*C*1±*C*2±*s*] and *f* <sup>±</sup> *eq* (*τ*1)=[exp(−*τ*1/*C*1±) + exp(−*τ*1/*C*2±)]/[*C*1<sup>±</sup> + *C*2±]. For this latter case, since *<sup>ψ</sup>*ˆ±(*s*) satisfies Equation (A27), following the same analysis as above, we find that the joint distribution *f* ± *<sup>t</sup>* (*T*+, 1) is uniform.

In Appendix D, we show that, for a system with non-equilibrium conditions and exponentially distributed waiting times with equal and different mean values, the distribution of the occupation/fraction time is also uniform. This is mainly because, for exponentially distributed waiting times, the forward recurrence time distribution in equilibrium *f* ± *eq* (*τ*1) Equation (10) is equal to *ψ*±(*τ*1), as in the non-equilibrium case. Furthermore, for exponentially distributed sojourn times, Equation (A27) is also satisfied, and its PDF in the Laplace space for *<sup>s</sup>* −→ <sup>∞</sup> follows *<sup>ψ</sup>*ˆ±(*s*) ∼ 1/[*τ*±*s*]. By using Equation (3), this gives the uniform distribution shown in Equations (A35) and (A40).

For *<sup>ψ</sup>*ˆ±(*s*) given in Equation (A27), the correction terms when *<sup>N</sup>* ≥ 2, following the same analysis as in the case of equilibrium initial conditions, yield to elements of the form *f* ± *<sup>t</sup>* (*T*+, *<sup>N</sup>*) <sup>∼</sup> *<sup>T</sup>N*−<sup>1</sup> <sup>+</sup> , which are negligible for *<sup>t</sup>* −→ <sup>0</sup> ⇐⇒ *<sup>T</sup>*<sup>+</sup> −→ 0. Thus, they do not contribute in the PDF of the fraction occupation time.

#### **Appendix C.** *P***(***x***,***t***) from Simulations with Uniform and Gamma Distributed Waiting Times within the Complete Range of** *x*

Following Figure 3 in the left panel in which, for short time and displacements, the cusp of *P*(*x*, *t*) is displayed. Now, from simulations (with the same parameters as above) of a two state model, with uniform (red triangles) and gamma (blue squares) distributed waiting times. In Figure A2, we show *P*(*x*, *t*) in semi-log scale but for the whole span of *x*. In each case, we compare the normalized histogram of the simulation data with the short time analytical formula Equation (26), finding a perfect agreement. As we can see, the cusp is located at the origin, and, for large displacements, Gaussianity is recovered.

**Figure A2.** Distribution of displacements *P*(*x*, *t*) in semi-log scale, obtained from simulations, of a two state system with uniform and gamma distributed waiting times within the short time limit and displaying the whole span of *x*. *P*(*x*, *t*) for uniformly distributed waiting times is shown in red triangles. In addition, the case of gamma distributed waiting times is shown in blue squares. We employed the same set of parameters as those used in Figure 3 in the left panel. Both cases fit with Equation (26) (red and blue solid lines).

#### **Appendix D. PDF of Occupation Times for Exponentially Distributed Waiting Times and Non-Equilibrium Initial Conditions**

We consider the case of a system with exponentially distributed waiting times, with *τ*<sup>+</sup> = *τ*− in Equation (31). Here, we address the situation with non-equilibrium initial conditions. Particularly, the initial conditions are such that the probability of starting at the state with *D*<sup>+</sup> is 1 and the probability of starting from the state with *D*<sup>−</sup> is 0. The PDF of *T*+ then satisfies

$$f\_t(T\_+) = f\_t^+(T\_+) = \sum\_{N=0}^{\infty} f\_t^+(T\_+, N). \tag{A28}$$

With *f* <sup>+</sup> *<sup>t</sup>* (*T*+, *N*) given by Equation (A9) explicitly for this case, we have [46]

$$\begin{split} f\_t^+(T\_+, 2k+1) &= \quad \int \dots \int \delta\Big(T\_+ - \sum\_{i=1(\text{odd})}^{2k+1} \tau\_i\Big) 1\_{\{t\_{2k+1}, t\_{2k+2}\}}(t) \psi(\tau\_1) \psi(\tau\_2) \\ &\quad \dots \quad \psi(\tau\_{2k+2}) d\tau\_1 d\tau\_2 \dots d\tau\_{2k+2} \quad \text{if} \quad N = 2k+1, \\ f\_t^+(T\_+, 2k) &= \quad \int \dots \int \delta\Big(T\_+ - \sum\_{i=1(\text{odd})}^{2k-1} \tau\_i - \tau^\*\right) 1\_{\{t\_{2k}, t\_{2k+1}\}}(t) \psi(\tau\_1) \psi(\tau\_2) \\ &\quad \dots \quad \psi(\tau\_{2k+1}) d\tau\_1 d\tau\_2 \dots d\tau\_{2k+1} \quad \text{if} \quad N = 2k, \end{split} \tag{A29}$$

with (*a*,*b*)(*t*) the indicator function equal to 1 if *<sup>t</sup>* <sup>∈</sup> (*a*, *<sup>b</sup>*) and 0 if *<sup>t</sup>* <sup>∈</sup>/ (*a*, *<sup>b</sup>*). We work with the double Laplace transform L *f* + *<sup>t</sup>* (*T*+, *N*) = *f* <sup>+</sup> *<sup>s</sup>* (*u*, *N*) with *t* ⇐⇒ *s* and *u* ⇐⇒ *T*+, which is given by ˆ *f* + *<sup>s</sup>* (*u*, *<sup>N</sup>*) = <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*−*uT*<sup>+</sup> <sup>∞</sup> <sup>0</sup> *<sup>e</sup>*−*st <sup>f</sup>* <sup>+</sup> *<sup>t</sup>* (*T*+, *N*)*dtdT*+. Thus, taking the double Laplace transform of Equation (A29), after substitution of *ψ*(*τ*), we have

$$\begin{aligned} \hat{f}\_s^+(\boldsymbol{u}, 2k+1) &= \quad \hat{\psi}^{k+1}(\boldsymbol{s}+\boldsymbol{u})\hat{\psi}^k(\boldsymbol{s}) \left(\frac{1-\hat{\psi}(\boldsymbol{s})}{\boldsymbol{s}}\right) \quad \text{if} \quad N=2k+1\\ \hat{f}\_s^+(\boldsymbol{u}, 2k) &= \quad \hat{\psi}^k(\boldsymbol{s}+\boldsymbol{u})\hat{\psi}^k(\boldsymbol{s}) \left(\frac{1-\hat{\psi}(\boldsymbol{s}+\boldsymbol{u})}{\boldsymbol{s}+\boldsymbol{u}}\right) \quad \text{if} \quad N=2k. \end{aligned} \tag{A30}$$

Thus, using Equation (A30) for summing over all the values of *N* in Equation (A28), we get

$$\hat{f}\_s(u) = \begin{pmatrix} \hat{\psi}(s+u)\frac{1-\hat{\psi}(s)}{s} + \frac{1-\hat{\psi}(s+u)}{s+u} \end{pmatrix} \frac{1}{1-\hat{\psi}(s+u)\hat{\psi}(s)}.\tag{A31}$$

For exponentially distributed waiting times *<sup>ψ</sup>*ˆ(*s*) = 1/(<sup>1</sup> + *τs*), by substituting *<sup>ψ</sup>*ˆ(*s*) in Equation (A31), we get that the double Laplace transform of Equation (A28) is given by

$$\hat{f}\_s(u) = \frac{2 + \langle \tau \rangle s}{2s + \langle \tau \rangle s^2 + (1 + \langle \tau \rangle s)u}. \tag{A32}$$

By the same procedures used in Appendix F, the inversion of the double Laplace transform of Equation (A32) yields

$$f\_t(T\_+) = \left. \delta(t - T\_+) e^{-\frac{t}{\langle \tau \rangle}} + \frac{e^{-\frac{t}{\langle \tau \rangle}}}{\langle \tau \rangle} {}\_0F\_1 \left( ; 1; \frac{T\_+(t - T\_+)}{\langle \tau \rangle^2} \right)$$

$$+ \quad \frac{e^{-\frac{t}{\langle \tau \rangle}}}{\langle \tau \rangle^2} T\_+ {}\_0\tilde{F}\_1 \left( ; 2; \frac{T\_+(t - T\_+)}{\langle \tau \rangle^2} \right). \tag{A33}$$

Employing the identity *Iν*(*y*)=(*y*/2)*<sup>ν</sup>* 0*F*˜ <sup>1</sup>(; *ν* + 1; *y*2/4) [61] and changing variables, we obtain the PDF of the occupation fraction, which follows

$$g\_t(p\_+) = \left. \delta(1 - p\_+) e^{-\frac{t}{\langle \tau \rangle}} + \frac{t}{\langle \tau \rangle} \right\{ I\_0 \left( \frac{2t}{\langle \tau \rangle} \sqrt{p\_+(1 - p\_+)} \right) $$

$$+ \quad p\_+ \frac{I\_1 \left( \frac{2t}{\langle \tau \rangle} \sqrt{p\_+(1 - p\_+)} \right)}{\sqrt{p\_+(1 - p\_+)}} \} e^{-\frac{t}{\langle \tau \rangle}}. \tag{A34}$$

By taking the series expansion of Equation (A34) in the limit *t* −→ 0, the PDF of *p*<sup>+</sup> can be approximated by

$$
\log\_t(p\_+) \sim \delta(1 - p\_+) \varepsilon^{-\frac{t}{\langle \tau \rangle}} + \frac{t}{\langle \tau \rangle}. \tag{A35}
$$

Therefore, for 1 > *p*+ > 0, the PDF of *p*+ follows a uniform distribution (see the left panel of Figure A3), as in the case of equilibrium initial conditions (Equation (35)).

**Figure A3.** Left: *gt*(*p*+) Equation (A34) for *τ* = 1 and *t* ∈ {0.1, 0.5, 1, 2, 5, 10} and non-equilibrium initial conditions (starting from state "+"). The uniform approximation of *gt*(*p*+) Equation (A35) for *t* = 0.1 is shown in black circles. Right: *gt*(*p*+) Equation (A39) for *τ*<sup>+</sup> = 1, *τ*− = 5 and *t* ∈ {0.1, 0.5, 2, 5, 10, 20} and non-equilibrium initial conditions (starting from state "+"). The uniform approximation of *gt*(*p*+) Equation (A40) for *t* = 0.1 is shown in black circles.

*P*(*x*, *t*) is obtained by exploiting the uniform approximation of *gt*(*p*+) in Equation (A35), i.e.,

$$P(\mathbf{x},t) \sim \frac{e^{-\frac{t}{\langle \mathbf{r} \rangle} - \frac{\mathbf{x}^2}{4D\_+t}}}{\sqrt{4\pi D\_+t}} + \frac{t}{\langle \mathbf{r} \rangle} \left\{ \frac{2e^{-\frac{\mathbf{x}^2}{4D\_+t}}}{\sqrt{4\pi D\_+t}} - \frac{|\mathbf{x}|}{2D\_+t} \left[1 - Erf\left(\frac{|\mathbf{x}|}{\sqrt{4D\_+t}}\right)\right] \right\}.\tag{A36}$$

Equation (A36) follows the same structure as Equation (37), i.e., the case with equilibrium initial conditions.

When *τ*<sup>+</sup> = *τ*−, we have to include *<sup>ψ</sup>*±(*τ*) and *<sup>ψ</sup>*ˆ±(*s*) in Equations (A29) and (A30). Summing the resulting expressions in Equation (A28), we obtain

$$\hat{f}\_s(u) = \begin{pmatrix} \hat{\psi}\_+(s+u)\frac{1-\hat{\psi}\_-(s)}{s} + \frac{1-\hat{\psi}\_+(s+u)}{s+u} \end{pmatrix} \frac{1}{1-\hat{\psi}\_+(s+u)\hat{\psi}\_-(s)}.\tag{A37}$$

By employing *<sup>ψ</sup>*ˆ(*s*) = 1/(<sup>1</sup> + *τ*±*s*) in Equation (A37), we obtain that the double Laplace transform of the PDF of *T*+ is provided by [49]

$$\hat{f}\_s(\mu) = \frac{\langle \tau \rangle\_+ + \langle \tau \rangle\_- + \langle \tau \rangle\_+ \langle \tau \rangle\_- s}{\langle \tau \rangle\_- s + \langle \tau \rangle\_+ (1 + \langle \tau \rangle\_- s)(s + \mu)}.\tag{A38}$$

The inverse Laplace transform of Equation (A38) is obtained by the same procedures explained above and, in Appendix F, eventually

$$\begin{split} g\_{l}(p\_{+}) &= \quad \delta(1-p\_{+})e^{-\frac{t}{\langle\tau\rangle\_{+}}} + \frac{t}{\langle\tau\rangle\_{+}} \Bigg\{ l\_{0} \Big( 2t\sqrt{\frac{p\_{+}(1-p\_{+})}{\langle\tau\rangle\_{+}\langle\tau\rangle\_{-}}} \right) \\ &+ \quad \sqrt{\frac{\langle\tau\rangle\_{+}}{\langle\tau\rangle\_{-}}}p\_{+} \frac{I\_{1}\Big( 2t\sqrt{\frac{p\_{+}(1-p\_{+})}{\langle\tau\rangle\_{+}\langle\tau\rangle\_{-}}} \Bigg)}{\sqrt{p\_{+}(1-p\_{+})}} \Bigg\{ e^{-\frac{tp\_{+}}{\langle\tau\rangle\_{+}}} - \frac{t(1-p\_{+})}{\langle\tau\rangle\_{-}} \Bigg. \tag{A39} \end{split} \tag{A39}$$

We recover Equation (A34) when *τ*<sup>+</sup> = *τ*− = *τ*. In the short time limit, Equation (A39) follows as well a uniform distribution for 1 > *p*+ > 0, see the right panel of Figure A3. In this case, the PDF of *p*+ is given by

$$
\log\_t(p\_+) \sim \delta(1 - p\_+) e^{-\frac{t}{\langle \tau \rangle\_+}} + \frac{t}{\langle \tau \rangle\_+}.\tag{A40}
$$

Thus, for exponentially distributed waiting times, for equilibrium and non-equilibrium conditions, in the short time regime, the PDF of the occupation fraction is always uniform. This feature is only valid for exponentially distributed waiting times, and it is not necessarily fulfilled for other distributions of waiting times.

*P*(*x*, *t*) for the specific case of non-equilibrium initial conditions and exponentially distributed waiting times is

$$P(\mathbf{x},t) \sim \frac{e^{-\frac{t}{\langle \mathbf{r} \rangle\_{+}} - \frac{\mathbf{x}^{2}}{4D\_{+}t}}}{\sqrt{4\pi D\_{+}t}} + \frac{t}{\langle \mathbf{r} \rangle\_{+}} \left\{ \frac{e^{-\frac{\mathbf{p}^{2}}{4D\_{+}t}}}{\sqrt{\pi D\_{+}t}} + \frac{|\mathbf{x}|}{2D\_{+}t} \left[ 1 - Erf\left(\frac{|\mathbf{x}|}{\sqrt{4D\_{+}t}}\right) \right] \right\}. \tag{A41}$$

#### **Appendix E. Deduction of** *gt***(***p***+) for Waiting Times with Similar Mean Waiting Times**

We can use the results found in [52,53] for inverting the double Laplace transform of *St* as provided by Equation (33). For a Fourier–Laplace transform, *κ*ˆ(*ω*,*s*) = <sup>∞</sup> −∞ <sup>∞</sup> <sup>0</sup> *<sup>e</sup>iωx*˜ *e*−*st κ*(*x*˜, *t*)*dsdx* of the form

$$
\hat{\kappa}(\omega, s) = \frac{2\bar{\lambda} + s}{s^2 + 2\bar{\lambda}s + c^2 \omega^2},
\tag{A42}
$$

the inversion yields the result [52,53]

$$\mathbf{x}(\vec{x},t) = \frac{1}{2}e^{-\frac{t}{\lambda}} \left\{ \delta(\vec{x} - ct) + \delta(\vec{x} + ct) \right\} + \frac{1}{2\lambda c} \Theta(ct - |\vec{x}|) \left[ I\_0(z(t)) + \frac{t}{\lambda z(t)} I\_1(z(t)) \right], \text{(A43))}$$

with *z*(*t*) = <sup>1</sup> *λ*˜ *c* <sup>√</sup>*c*2*t*<sup>2</sup> <sup>−</sup> *<sup>x</sup>*˜2.

Now, we can compare the double Laplace transform of *St* given in Equation (33) with the results shown in Equation (A42). Concretely, we can relate the Laplace variable *v* with the Fourier variable *ω* in *κ*ˆ(*ω*,*s*). Since the Laplace transform and the Fourier transform are exponential operators, by setting *v* = *icω*, we can make the former equivalent to a Fourier transform, and we use the expression given by Equation (A43) for inverting *φs*(*v*). In our case, *c* = 1, so *St* ⇔ *ω* are Fourier conjugates and therefore the inversion of *φs*(*v*) results in

$$\begin{split} \phi\_{t}(S\_{t}) &= \ & \frac{1}{2}e^{-\frac{t}{\langle \tau \rangle}} \Bigg\{ \delta(S\_{t}-t) + \delta(S\_{t}+t) \Bigg\} \\ &+ \ & \frac{\Theta(t-|S\_{t}|)}{2\langle \tau \rangle} \Bigg[ I\_{0} \Bigg( \frac{\sqrt{t^{2}-S\_{t}^{2}}}{\langle \tau \rangle} \Bigg) + \frac{tI\_{1}\left(\frac{\sqrt{t^{2}-S\_{t}^{2}}}{\langle \tau \rangle}\right)}{\sqrt{t^{2}-S\_{t}^{2}}} \Bigg]. \end{split} \tag{A44}$$

By changing variables, *St* = 2*T*<sup>+</sup> − *t* = 2*p*+*t* − *t*, we obtain Equation (34) in a straightforward manner. The results in Equation (34) can also be obtained by the inversion of

the double Laplace transform (*t* ⇔ *s* and *T*<sup>+</sup> ⇔ *u*) of the PDF of *T*<sup>+</sup> Equation (3) (see Appendix F). In this case, the double Laplace transform of the PDF of *T*+ is given by

$$\hat{f}\_s(u) = \frac{4 + 2\langle \tau \rangle s + \langle \tau \rangle u}{2\langle \tau \rangle s^2 + 4s + (2 + 2\langle \tau \rangle s)u}. \tag{A45}$$

Finally, we mention that the moments of *T*+ and therefore *p*+ can be obtained by expanding Equation (A45) in powers of *u* as

$$\hat{f}\_{\mathbb{S}}(u) = \frac{1}{s} - \frac{1}{2s^2}u + \frac{1 + \langle \tau \rangle s}{2s^3(2 + \langle \tau \rangle s)}u^2 + O(u^3). \tag{A46}$$

The first two moments of *T*+ are then

$$
\langle T\_{+} \rangle\_{\gamma} \sim \frac{t}{2}, \tag{A47}
$$

$$
\langle T\_+^2 \rangle \quad \sim \quad \left(\frac{\langle \tau \rangle}{4} + \frac{t}{4}\right)t + \frac{\langle \tau \rangle^2}{8} \Big(e^{-\frac{2t}{\langle \tau \rangle}} - 1\Big). \tag{A48}
$$

For *p*+ = *T*+/*t*, we obtain

$$
\langle p\_{+} \rangle\_{\smile} \sim \frac{1}{2^{\prime}} \tag{A49}
$$

$$
\langle p\_+^2 \rangle \quad \sim \quad \frac{1}{4} + \frac{\langle \tau \rangle}{4t} + \frac{\langle \tau \rangle^2}{8t^2} \left( e^{-\frac{2t}{\langle \tau \rangle}} - 1 \right), \tag{A50}
$$

$$Var(p\_+) \quad \sim \quad \frac{\langle \mathbf{r} \rangle}{4t} + \frac{\langle \mathbf{r} \rangle^2}{8t^2} \Big( e^{-\frac{2t}{\langle \mathbf{r} \rangle}} - 1 \Big). \tag{A51}$$

**Appendix F. Deduction of** *gt***(***p***+) for Waiting Times with** *τ***<sup>+</sup> =** *τ−*

Here, we show the procedure for obtaining Equation (45) in Section 2.3.2. Starting from the double Laplace transform of the PDF of *T*+ given by Equation (44), first by inverting with respect to *u* ⇐⇒ *T*+, we get

$$\begin{array}{ll} f\_{s}(T\_{+}) &=& \frac{\langle \text{\textquotedblleft} \rangle\_{-}^{2} \delta(T\_{+})}{\langle \langle \text{\textquotedblleft} \rangle\_{+} + \langle \text{\textquotedblleft} \rangle\_{-} \rangle (1 + \langle \text{\textquotedblleft} \rangle\_{-} s)} \\ &+& \frac{\langle \langle \text{\textquotedblleft} \rangle\_{+} + \langle \text{\textquotedblright} \rangle\_{-} + \langle \text{\textquotedblleft} \rangle\_{+} \langle \text{\textquotedblright} \rangle\_{-} s \rangle^{2}}{\langle \text{\textquotedblleft} \rangle\_{+} \langle \langle \text{\textquotedblleft} \rangle\_{+} + \langle \text{\textquotedblleft} \rangle\_{-} \rangle (1 + \langle \text{\textquotedblright})\_{-} s)^{2}} e^{-T\_{+} s \left( \frac{\langle \text{\textquotedblleft} \rangle\_{+} + \langle \text{\textquotedblleft} \rangle\_{-} + \langle \text{\textquotedblleft} \rangle\_{+} \langle \text{\textquotedblright} \rangle\_{-} s}{\langle \text{\textquotedblleft} \rangle\_{+} \langle \text{\textquotedblright} \rangle\_{-} \rangle (1 + \langle \text{\textquotedblleft} \rangle\_{-} s)^{2}}\right)} \end{array} \tag{A52}$$

the exponent in Equation (A52) can be written as −*T*+*s* - 1 + *τ*− *τ*++*τ*+*τ*−*s* . The inversion of Equation (A52) with respect to *s* ⇔ *t* can be expressed as

$$\hat{f}\_s(T\_+) = \begin{array}{c c} \langle \tau \rangle - \varepsilon^{-\frac{\hat{t}}{\langle \tau \rangle - }}\\ \langle \tau \rangle\_+ + \langle \tau \rangle\_- \end{array} \delta(T\_+) + \mathcal{L}^{-1} \{\hat{\eta}(s)\hat{h}(s)\}. \tag{A53}$$

Thus, the inversion of the second term in Equation (A53) is given by the convolution theorem, following <sup>L</sup>−1{*q*ˆ(*s*)<sup>ˆ</sup> *<sup>h</sup>*(*s*)} <sup>=</sup> *<sup>t</sup>* <sup>0</sup> <sup>L</sup>−1{*q*ˆ(*s*)}|*t*−*t*L−1{<sup>ˆ</sup> *h*(*s*)}|*tdt* , with *q*ˆ(*s*) = (*τ*++*τ*−+*τ*+*τ*−*s*)<sup>2</sup> *τ*+(*τ*++*τ*−)(1+*τ*−*s*)<sup>2</sup> and <sup>ˆ</sup> *h*(*s*) = *e*−*T*+*se* <sup>−</sup> *<sup>T</sup>*+*τ*−*<sup>s</sup> τ*++*τ*+*τ*−*<sup>s</sup>* . The inverse Laplace transform of *q*ˆ(*s*) is given by

$$\mathcal{L}^{-1}\{\dot{\mathfrak{q}}(\mathbf{s})\} = \frac{2e^{-\frac{t}{\langle\tau\rangle-}}}{\langle\tau\rangle+\langle\tau\rangle-} + \frac{te^{-\frac{t}{\langle\tau\rangle-}}}{\langle\tau\rangle+\langle\langle\tau\rangle+\langle\tau\rangle-} + \frac{\langle\tau\rangle+\delta(t)}{\langle\tau\rangle++\langle\tau\rangle-}. \tag{A54}$$

The inverse Laplace transform of ˆ *h*(*s*) can be obtained by rewriting the exponent in the second term of ˆ *<sup>h</sup>*(*s*) as <sup>−</sup> *<sup>T</sup>*+*τ*−*<sup>s</sup> τ*++*τ*+*τ*−*<sup>s</sup>* <sup>=</sup> <sup>−</sup> *<sup>T</sup>*<sup>+</sup> *τ*<sup>+</sup> <sup>+</sup> *<sup>T</sup>*+*τ*− *τ*+*τ*−+*τ*<sup>2</sup> −*τ*+*<sup>s</sup>* , then we obtain

$$\begin{split} \mathcal{L}^{-1}\{\hat{h}(s)\} &= \mathcal{L}^{-1}\{e^{-\frac{T\_{+}}{\langle\tau\rangle\_{+} + \langle\tau\rangle\_{+} \langle\tau\rangle\_{-} - s}}\Big\}\Big|\_{t-T\_{+}} \Theta(t-T\_{+}) \\ &= e^{-\frac{T\_{+}}{\langle\tau\rangle\_{+}}} \mathcal{L}^{-1}\{e^{\frac{T\_{+}}{\langle\tau\rangle\_{+} + \langle\tau\rangle\_{-} - \langle\tau\rangle\_{-} \langle\tau\rangle\_{+} + s}}\Big\}\Big|\_{t-T\_{+}} \Theta(t-T\_{+}) \\ &= e^{-\frac{T\_{+}}{\langle\tau\rangle\_{+}}} \left[e^{-\frac{\langle t-T\_{+}\rangle}{\langle\tau\rangle\_{-}}} \sqrt{\frac{T\_{+}}{\langle\tau\rangle\_{+} \langle\tau\rangle\_{-} (t-T\_{+})}}I\_{1}\Big(2\sqrt{\frac{T\_{+} + (t-T\_{+})}{\langle\tau\rangle\_{+} \langle\tau\rangle\_{-}}}\right) \\ & \qquad + \frac{e^{-\frac{\langle t-T\_{+}\rangle}{\langle\tau\rangle\_{-}}}}{\langle\tau\rangle\_{+} \langle\tau\rangle\_{-}^{2}} \delta\left(\frac{t-T\_{+}}{\langle\tau\rangle\_{+} \langle\tau\rangle\_{-}^{2}}\right) \Big] \Theta(t-T\_{+}). \end{split} \tag{A55}$$

Substituting Equations (A54) and (A55) in Equation (A53), and, after integration, we obtain

$$\begin{split} \left. f\_{\mathbb{P}}(T\_{+}) \right| &= \left. \frac{\langle \tau \rangle\_{-} e^{-\frac{t}{\langle \tau \rangle\_{-}}}}{\langle \tau \rangle\_{+} + \langle \tau \rangle\_{-}} \delta(T\_{+}) \\ &+ \left. \frac{\langle \tau \rangle\_{+} e^{-\frac{t}{\langle \tau \rangle\_{+}}}}{\langle \tau \rangle\_{+} + \langle \tau \rangle\_{-}} \delta(t - T\_{+}) + \left\{ \frac{2}{\langle \tau \rangle\_{+} + \langle \tau \rangle\_{-}} \alpha \bar{F}\_{1} \left( \cdot 1; \frac{T\_{+}(t - T\_{+})}{\langle \tau \rangle\_{+} \langle \tau \rangle\_{-}} \right) \right. \\ &+ \left. \left[ \frac{t - T\_{+}}{\langle \tau \rangle\_{+} (\langle \tau \rangle\_{+} + \langle \tau \rangle\_{-})} + \frac{T\_{+}}{\langle \tau \rangle\_{-} (\langle \tau \rangle\_{+} + \langle \tau \rangle\_{-})} \right] \alpha \bar{F}\_{1} \left( \cdot 2; \frac{T\_{+}(t - T\_{+})}{\langle \tau \rangle\_{+} \langle \tau \rangle\_{-}} \right) \right] e^{-\frac{T\_{+}}{\langle \tau \rangle\_{+}} - \frac{(t - T\_{+})}{\langle \tau \rangle\_{-}}}. \end{split} \tag{A.56}$$

Employing the identity *Iν*(*y*)=(*y*/2)*<sup>ν</sup>* 0*F*˜ <sup>1</sup>(; *ν* + 1; *y*2/4) [61] and changing variables, we obtain the form of *gt*(*p*+) as provided by Equation (45). This procedure can be employed for a system with the same mean waiting times, inverting the double Laplace transform Equation (A45) and obtaining *gt*(*p*+) shown in Equation (34). In addition, also for the non-equilibrium cases treated in Appendix D, see Equations (A34) and (A39).

Finally, we show the corresponding first two moments of *T*+ and *p*+. As we proceeded in Appendix E, we obtain the moments of *T*+ by expanding in powers of *u* Equation (A37), which yields

$$
\langle T\_{+} \rangle \sim \frac{\langle \pi \rangle\_{+} t}{\langle \pi \rangle\_{+} + \langle \pi \rangle\_{-}},
\tag{A57}
$$

$$
\langle T\_{+}^{2} \rangle \sim \frac{\langle \mathbf{r} \rangle\_{+}^{2} t^{2}}{(\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})^{2}} + \frac{2 \langle \mathbf{r} \rangle\_{+}^{2} \langle \mathbf{r} \rangle\_{-}^{2} t}{(\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})^{3}} + \frac{2 \langle \mathbf{r} \rangle\_{+}^{3} \langle \mathbf{r} \rangle\_{-}^{3}}{(\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})^{3}} \left( e^{-\frac{(\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})t}{\langle \mathbf{r} \rangle\_{+} \langle \mathbf{r} \rangle\_{-}}} - 1 \right). \tag{A58}
$$

Therefore, the moments of *p*+ are

*p*+ ∼ *τ*<sup>+</sup> *τ*++*τ*− , (A59)

$$\langle p\_{+}^{2} \rangle \sim \frac{\langle \mathbf{r} \rangle\_{+}^{2}}{\langle (\mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})^{2}} + \frac{2 \langle \mathbf{r} \rangle\_{+}^{2} \langle \mathbf{r} \rangle\_{-}^{2}}{t \langle (\mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})^{3}} + \frac{2 \langle \mathbf{r} \rangle\_{+}^{3} \langle \mathbf{r} \rangle\_{-}^{3}}{t^{2} \langle (\mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-})^{3}} \left( \mathbf{e}^{-\frac{\left( \langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-} \right)t}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}}} - 1 \right), \tag{A60}$$

$$Var(p\_+) \sim \frac{2\langle \tau \rangle\_+^2 \langle \tau \rangle\_-^2}{t(\langle \tau \rangle\_+ + \langle \tau \rangle\_-)^3} + \frac{2\langle \tau \rangle\_+^3 \langle \tau \rangle\_-^3}{t^2(\langle \tau \rangle\_+ + \langle \tau \rangle\_-)^3} \left( e^{-\frac{(\langle \tau \rangle\_+ + \langle \tau \rangle\_-)t}{(\langle \tau \rangle\_+ + \langle \tau \rangle\_-)}} - 1 \right). \tag{A61}$$

**Appendix G. Deduction of the MSD in a Two State Model with** *τ***<sup>+</sup> =** *τ−*

From Equation (A1), we can compute the second moment of *x*(*t*) as

$$\begin{aligned} \langle \mathbf{x}^2(t) \rangle &= \left\langle \left( \sqrt{2D\_+ T\_+} \xi\_1 + \sqrt{2D\_- (t - T\_+)} \xi\_2 \right)^2 \right\rangle, \\ &= \, 2D\_+ \langle T\_+ \rangle + 2D\_- (t - \langle T\_+ \rangle), \end{aligned} \tag{A62}$$

In the second line of Equation (A62), we have employed the linearity of ·, and the properties of independent standard normal random variables, i.e., *ξ*<sup>2</sup> *<sup>i</sup>* = 1 and *ξiξj* = 0 (with *i*, *j* ∈ {1, 2} and *i* = *j*). Thus, now we just have to find *T*+. In order to do that, we start from the definition of average occupation time

$$\begin{aligned} \langle T\_{+} \rangle &= \int\_{0}^{\infty} T\_{+} f\_{t}(T\_{+}) dT\_{+ \prime} \\ &= \int\_{0}^{\infty} f\_{t}(T\_{+}) \left( -\frac{d}{du} e^{-uT\_{+}} \right) \Big|\_{u=0} dT\_{+ \prime} \\ &= -\lim\_{u \to 0} \Big( \frac{d}{du} \hat{f}\_{t}(u) \Big). \end{aligned} \tag{A63}$$

With ˆ *ft*(*u*) = <sup>∞</sup> <sup>0</sup> *ft*(*T*+)*e*−*uT*<sup>+</sup> , and *<sup>T</sup>*<sup>+</sup> ⇔ *<sup>u</sup>* Laplace conjugates. Now, for equilibrium initial conditions, the PDF of the occupation time is given by

$$f\_t(T\_+) = \frac{\langle \mathbf{r} \rangle\_+}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \sum\_{N=0}^{\infty} f\_t^+(T\_+, N) + \frac{\langle \mathbf{r} \rangle\_-}{\langle \mathbf{r} \rangle\_+ + \langle \mathbf{r} \rangle\_-} \sum\_{N=0}^{\infty} f\_t^-(T\_+, N), \tag{A64}$$

with *f* ± *<sup>t</sup>* (*T*+, *N*) the joint PDF of the occupation times at *D*<sup>+</sup> and the number of jumps between states during *t*, once started from *D*±. When starting from *D*<sup>+</sup> and having *N* = 2*k* + 1 or *N* = 2*k* jumps, the occupation time in each case satisfies Equation (1). In the case when the initial state is at *D*−, we have

$$\begin{array}{ccccccccc} T\_{+} &=& \mathfrak{r}\_{2} + \mathfrak{r}\_{4} + \dots + \mathfrak{r}^{\*} & \quad \text{if} \quad N = 2k + 1, \\ T\_{+} &=& \mathfrak{r}\_{2} + \mathfrak{r}\_{4} + \dots + \mathfrak{r}\_{N} & \quad \text{if} \quad N = 2k, \\ \end{array} \tag{A65}$$

with *τ*<sup>∗</sup> = *t* − *tN*, the backward recurrence time. The definition of the joint PDF *f* <sup>±</sup> *<sup>t</sup>* (*T*+, *N*) is already given in Equation (A9). In addition, its double Laplace transform ˆ *f* ± *<sup>s</sup>* (*u*, *N*) = <sup>∞</sup> 0 <sup>∞</sup> <sup>0</sup> *ft*(*T*+, *N*) exp(−*uT*<sup>+</sup> − *st*) *dT*<sup>+</sup> *dt*, is shown in Equations (A22)–(A25). When *N* = 0, we have

$$\begin{array}{rcl}\hat{f}^+\_s(\
u,0)&=&\frac{1-\left(\frac{1-\hat{\psi}\_+(s+\iota)}{\langle\tau\rangle\_+(s+\iota)}\right)}{s+\iota},\\\hat{f}^-\_s(\
u,0)&=&\frac{1-\left(\frac{1-\hat{\psi}\_-(s)}{\langle\tau\rangle\_-s}\right)}{s}.\end{array}\tag{A66}$$

Now, for obtaining ˆ *fs*(*u*), we compute the double Laplace transform of Equation (A64) and then we sum Equations (A22), (A25), and (A66) for all values of *N*. Thereafter, we compute the derivative of ˆ *fs*(*u*) with respect to *u* and its corresponding limit when *u* −→ 0. Following algebraic simplifications, we yield

$$\lim\_{u \to 0} \left( \frac{d}{du} \hat{f}\_s(u) \right) = -\frac{\langle \tau \rangle\_+}{\langle \tau \rangle\_+ + \langle \tau \rangle\_-} \frac{1}{s^2}. \tag{A67}$$

For obtaining the average occupation time Equation (A63), we invert Equation (A67) with respect to *s*, having

$$
\langle T\_{+} \rangle = \left( \frac{\langle \mathbf{r} \rangle\_{+}}{\langle \mathbf{r} \rangle\_{+} + \langle \mathbf{r} \rangle\_{-}} \right) \mathbf{t}. \tag{A68}
$$

Finally, substituting Equation (A68) in Equation (A62), we get Equation (49), which indicates that the MSD is linear with respect to *t*, for any value of time *t*.

#### **References**


## *Article* **Local Analysis of Heterogeneous Intracellular Transport: Slow and Fast Moving Endosomes**

**Nickolay Korabel 1,\*, Daniel Han 1,2,3, Alessandro Taloni 4, Gianni Pagnini 5,6, Sergei Fedotov 1, Viki Allan <sup>2</sup> and Thomas Andrew Waigh 3,\***


**Abstract:** Trajectories of endosomes inside living eukaryotic cells are highly heterogeneous in space and time and diffuse anomalously due to a combination of viscoelasticity, caging, aggregation and active transport. Some of the trajectories display switching between persistent and anti-persistent motion, while others jiggle around in one position for the whole measurement time. By splitting the ensemble of endosome trajectories into slow moving subdiffusive and fast moving superdiffusive endosomes, we analyzed them separately. The mean squared displacements and velocity autocorrelation functions confirm the effectiveness of the splitting methods. Applying the local analysis, we show that both ensembles are characterized by a spectrum of local anomalous exponents and local generalized diffusion coefficients. Slow and fast endosomes have exponential distributions of local anomalous exponents and power law distributions of generalized diffusion coefficients. This suggests that heterogeneous fractional Brownian motion is an appropriate model for both fast and slow moving endosomes. This article is part of a Special Issue entitled: "Recent Advances In Single-Particle Tracking: Experiment and Analysis" edited by Janusz Szwabi ´nski and Aleksander Weron.

**Keywords:** heterogeneous; anomalous diffusion; endosomes

#### **1. Introduction**

Intracellular transport of organelles, such as endosomes, has been described by anomalous diffusion caused by different mechanisms [1,2]. Various models have been proposed to describe it, such as fractional Brownian motion (FBM), continuous time random walks and fractional Langevin equations [3]. However, which of these models is the best is a current topic of much debate.

To decipher which mechanism is at work and determine the appropriate mathematical model to describe it, a large ensemble of trajectories is necessary. Modern experimental techniques facilitate the tracking of large ensembles of intracellular objects for considerable amounts of time. Therefore, the extraction of meaningful statistical information from trajectories is becoming an important issue. The traditional statistical analysis of trajectories includes quantification of ensemble evolution in time and space using the ensemble-averaged mean squared displacements (EMSD), time-averaged MSD (TMSD), probability density functions of displacements and correlation functions. As the accessible

**Citation:** Korabel, N.; Han, D.; Taloni, A.; Pagnini, G.; Fedotov, S.; Allan, V.; Waigh T.A. Local Analysis of Heterogeneous Intracellular Transport: Slow and Fast Moving Endosomes. *Entropy* **2021**, *23*, 958. https://doi.org/10.3390/e23080958

Academic Editors: Janusz Szwabi ´nski and Aleksander Weron

Received: 14 June 2021 Accepted: 23 July 2021 Published: 27 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

measurement time in experiments increases with better live-cell microscopy techniques, the accurate analysis of single trajectories has become possible [4]. New methods of trajectory analysis were developed, such as local time-averaged MSD [5], first passage probability analysis [6–8] and time-averaged diffusion coefficients [9].

Improved microscopy imaging, tracking and analysis methods revealed the intrinsic spatial and temporal heterogeneity within individual trajectories of numerous biological processes [5,10–18]. Significant progress has also been made in analysis and interpretation of superresolution single particle trajectories [19–23]. Recently, individual trajectories of quantum dots in the cytoplasm of living cultured cells were found to perform subdiffusive motion of the FBM type with switching between two distinct mobility states [24]. In contrast to homogeneous systems, heterogeneous trajectories are most prominently described by broad distributions of diffusivities and anomalous exponents, an exponential probability distribution of diffusivities and a Laplace probability distribution of displacements [25]. These observations led to the development of various heterogeneous diffusion models [26–39].

Recently, the intracellular transport of endosomes in eukaryotic cells was shown to be described by spatiotemporal heterogeneous fractional Brownian motion (hFBM) with nonconstant Hurst exponents [40]. By analyzing the local motion of endosomes, we found that it is characterized by power-law probability distributions of displacements and displacement increments, exponential probability distributions of local anomalous exponents and powerlaw probability distributions of local generalized diffusion coefficients. In this paper, we split the ensemble of endosomes into slow and fast moving vesicles, which is the main difference between this study and that of [40]. This splitting allows us to study sub-ensembles separately in addition to studying the ATP driven active transport of endosomes. In particular, there is the central question: What is the appropriate mathematical model to describe the subdiffusive transport of slow moving endosomes? By analyzing locally the slow and fast endosomal trajectories, we find that both are characterized by exponential distributions of anomalous exponents and power-law distributions of generalized diffusion coefficients. This suggests that hFBM is an appropriate model for both slow and fast endosomes.

Endosome trajectories are composed of segments of active and passive motion, and therefore they could be further decomposed into directed runs and random motion. We segmented endosomal trajectories in this way in [10]. In this study, we separated endosome trajectories into superdiffusive trajectories and subdiffusive trajectories for their whole duration. Subdiffusive trajectories do not contain segments of directed movement and cannot be segmented further into active and passive motion. In contrast, fast superdiffusive trajectories can be further segmented. We leave the segmentation of fast trajectories into directed runs and random motion for future work.

#### **2. Materials and Methods**

#### *2.1. Experimental Trajectories*

We studied a large ensemble of two dimensional experimental trajectories, **r**(*t*) = {*x*(*t*), *y*(*t*)}, of early endosomes in a stable MRC5 cell line expressing GFP-Rab5. Trajectories were obtained from tracking wide-field fluorescence microscopy videos (see [10] for experimental details). We studied 103,361 experimental trajectories of early endosomes, the same data acquired in [10]. Three live-cell microscopy videos of MRC5 cells stably expressing GFP-Rab5 could be found in the Supplementary Material (https: //zenodo.org/record/5106450#.YPBsEuhKhPY, accessed on 23 July 2021). An example of experimental trajectories is shown in Figure A1. The endosomes were tracked using an automated tracking software (AITracker, based on a convolutional neural network) [41]. Currently, it is not yet feasible to determine the diameter of endosomes in these experiments, because they are diffraction limited. Thus, it was possible to track the centers of endosomes with sub-pixel accuracy, but not the sizes of the smaller endosomes (less than 200 nm). The duration of all trajectories, *T*, has a good fit to a power law distribution, *T*−1.85 [40], which is a manifestation of the heterogeneity of the trajectories. Slow moving endosomes

stay longer within the observation volume and therefore have longer trajectories than fast moving endosomes, leading to the emergence of the power-law probability distribution for the trajectories' duration.

#### *2.2. Splitting of Ensemble into Slow and Fast Moving Endosomes*

We split ensemble of trajectories into slow and fast moving endosomes using the distance traveled by endosomes:

$$R(t) = \sqrt{(x(t) - x(0))^2 + (y(t) - y(0))^2}.\tag{1}$$

Trajectories which possess active motion have periods of rapid increase or decrease of *R* (Figure 1A). Fast trajectories which have active motion are defined as max{*R*(*t*)} > and slow trajectories which exhibit only passive motion are defined by max{*R*(*t*)} < . Here, max{*R*(*t*)} denotes the maximum values of *R*(*t*) attained in the time interval (0, *t*) and is the threshold. We choose the threshold = 0.25 μm. In the Appendix, we show that changing the threshold to = 0.2 μm in the splitting does not qualitatively change the results. Therefore, we define fast moving endosomes as those that, in the time interval (0, *t*), experienced at least one period of active motion and the maximum distance travelled from the origin exceeds the threshold of = 0.25 μm. Otherwise, an endosome is defined as slow moving. Small variations of the threshold value do not affect the EMSDs of slow and fast moving endosomes, which suggests that the splitting method is robust (Figure A3).

Changing the splitting threshold from max{*R*(*t*)} = 0.25 μm to max{*R*(*t*)} = 0.2 μm, the increase of the number of slow trajectories was 12%. Therefore, in addition to the method of splitting trajectories which uses the minimum travelled distance, we also tested a second method, which makes use of the time-dependent Hurst exponent *H*(*t*) neural network (NN) estimate at the single trajectory level [10]. The procedure is as follows: (1) estimate the time-dependent anomalous exponent *αNN* using the NN; (2) if the anomalous exponent *αNN* is superdiffusive *αNN*(*t*) > 1 for more than 4 consecutive time points, the endosome is considered as fast moving. Otherwise, the endosome is labeled as slow moving (see Figure A2). The correct implementation of the NN procedure requires a minimum time window [10] that is larger than the duration of some of the endosomal trajectories. Hence, short trajectories were discarded in this analysis. The similarity of the distributions of generalized diffusion coefficients (Figures A3B and A4B) suggests that the chosen threshold max{*R*(*t*)} = 0.25 μm was reasonable. Alternative methods of binary classification could be performed using the first passage probability analysis [7] or implementing the normalized radius of gyration of each trajectory [42].

**Figure 1.** Endosomes are split into slow and fast moving: (**A**) Distance *R*(*t*) traveled by fast (black curves) and slow (red curves) endosomes (nine sample experimental trajectories are shown). Most experimental trajectories possess active motion visible as a rapid increase or decrease of *R*; (**B**) EMSDs (solid curves) and E-TMSDs (dashed curves) of fast (black curves) and slow (blue curves) endosomes compared with EMSD and E-TMSD of all trajectories (orange curves). The dashed-dotted and dashed-double-dotted lines represent *t* 1.26 and *t* functions.

#### *2.3. Ensemble and Time Averaged Mean Squared Displacements*

From the two dimensional experimental trajectories **r**(*t*) = {*x*(*t*), *y*(*t*)}, we calculated the ensemble-averaged mean squared displacement (EMSD) as

$$\text{EMSD}(t) = \frac{\left<\mathbf{r}\right>^2(t)}{I^2},\tag{2}$$

where *l* is the length scale which we choose *l* = 1 μm,

$$\left(\left(\mathbf{r}\right)^{2}(t) = \left< \left(\mathbf{x}\_{i}(t) - \mathbf{x}\_{i}(0)\right)^{2} + \left(y\_{i}(t) - y\_{i}(0)\right)^{2} \right>,\tag{3}$$

where the angular brackets denotes averaging over an ensemble of trajectories, *A* <sup>=</sup> <sup>∑</sup>*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *Ai*/*N* and *N* is the number of trajectories in the ensemble.

By fitting the EMSD to power law functions, the anomalous exponent *α* and the generalized diffusion coefficient *Dα* can be extracted using

$$\text{EMSD}(t) = 4D\_a \left(\frac{t}{\tau}\right)^a,\tag{4}$$

where *α* and *Dα* are constants which characterize averaged transport properties of ensemble of endosomal trajectories. The time scale *τ* = 1 s and the length scale *l* = 1 μm are introduced in order to make the generalized diffusion coefficient *Dα* dimensionless.

The time-averaged mean squared displacement (TMSD) of an individual trajectory {*xi*, *yi*} of a duration *T* is calculated as:

$$\text{TMSD}\_i(t) = \frac{\overline{\delta^2(t)}}{l^2},\tag{5}$$

where *l* is the length scale, for which we chose *l* = 1 μm, and

$$\overline{\delta^2(t)} = \frac{\int\_0^{T-t} \left( \mathbf{x}\_i(t'+t) - \mathbf{x}\_i(t') \right)^2 + (y\_i(t'+t) - y\_i(t'))^2 \right) dt'}{T-t}.\tag{6}$$

TMSDs of individual trajectories are averaged further over the ensemble of trajectories to get the ensemble-time-averaged MSD (E-TMSD):

$$\text{E-TMSD}(t) = \langle \text{TMSD}\_i(t) \rangle,\tag{7}$$

where the angular brackets denotes averaging over an ensemble of trajectories as before.

#### *2.4. Local Analysis of Endosomal Trajectories*

The time-local statistical analysis was implemented as follows. We considered only the portion of a single endosomal trajectory within a window of size *W* and centered around the time *t*, i.e., (*t* − *W*/2, *t* + *W*/2). We calculated the TMSD within this chunk of trajectory only: this is the reason for the acronym L-TMSD, i.e., the local TMSD. The experimental detection of the endosomal motion is achieved with the frame rate 1/Δ*t* s<sup>−</sup>1, hence *t* = *i*Δ*t* (here, *i* = 0, 1, 2, ... is the time index) and *W* = *N*Δ*t*, with *N* > 10. The first 10 points of the L-TMSD were fitted with the power-law function

$$\text{L-TMSD} = 4D^L(t) \left(\frac{t'}{\tau}\right)^{a^L(t)},\tag{8}$$

where *t* = 10Δ*t*. *αL*(*t*) and *Dα<sup>L</sup>* (*t*) are the local anomalous exponent and generalized diffusion coefficient, respectively. We iterate this procedure by shifting the time window of a single Δ*t* (*i* → *i* + 1) until the end of the experimental endosomal trace, thus obtaining *αL*(*t*) and *Dα<sup>L</sup>* (*t*) along the entire trajectory. Notice that *α<sup>L</sup>* and *D<sup>L</sup>* are not constants in time and they vary, being local properties of each endosomal trajectory.

#### *2.5. The Time and Ensemble-Time Averaged Velocity Auto-Correlation Functions*

The time averaged auto-correlation function (TVACF) along a single trajectory is defined as:

$$\text{TVACF}\_{i}(t) = \frac{\int\_{0}^{T-t-\tau} \vec{v}(t'+t)\vec{v}(t')dt'}{T-t-\tau},\tag{9}$$

where *v* = *r*(*t*+*τ*)−*r*(*t*) *<sup>τ</sup>* . TVACFs of individual trajectories are averaged further over the ensemble of trajectories to get the ensemble-time averaged VACF (E-TVACF):

$$\text{E-TVACF}(t) = \langle \text{TVACF}\_i(t) \rangle,\tag{10}$$

where the angular brackets denotes averaging over an ensemble of trajectories. The velocity autocorrelation function was suggested as a tool to distinguish between subdiffusion models [43].

#### **3. Results**

We split the ensemble of endosomes into slow and fast moving vesicles using the two methods described above (see Methods, Figures 1A and A2). For both slow and fast endosomes, the EMSDs and E-TMSDs show similar behavior, which suggests ergodicity (see Methods and Figure 1B). MSDs of slow endosomes are not increasing in time, which confirms that these trajectories have no active periods of motion. Surprisingly, we found that both EMSDs and E-TMSDs of slow endosomes are decreasing functions of time, which to our knowledge has never been observed before. We explain this behavior in terms of the coupling between the average diffusivities of slow trajectories and their duration (see Figure 4 and the discussion below). Conversely, MSDs of fast endosomes are increasing functions of time in the intermediate time scale (0.2, 2) s. The anomalous exponent extracted from EMSD or E-TMSD of fast endosomes is *α* 1, smaller than the anomalous exponent obtained by considering all trajectories without distinction into fast or slow, i.e., *α* 1.26. Notice that two subdiffusive regimes characterize the MSD time behavior for fast and all trajectories. The first, at small time scales (*<sup>t</sup>* ≤ <sup>10</sup>−<sup>1</sup> s), can be attributed to the measurement errors [44–46]. The second, at longer time scales (*t* > 10 s), was shown to be spurious and originate from the coupling of the trajectories' duration and their diffusivities [40,47]. We suggest that, due to this coupling, the anomalous exponents deduced from the powerlaw fit of EMSD and E-TMSD, do not capture the essential characteristics of the endosome superdiffusive motility, nor shed light on its fundamental aspects. Therefore, to reveal the effect of the duration of trajectories on the statistical analysis, we consider only trajectories longer than a certain threshold *T* [40].

Figure 2A,B shows the EMSDs and E-TMSDs of slow and fast endosomes, considering only experimental trajectories with duration longer than *T* seconds (2 or 8 s). Unlike the slow moving endosomes, the MSDs of fast vesicles (Figure 2B) present similar qualitative behaviors by choosing *T* = 2 s, *T* = 8 s or no *T* at all (all the fast molecules considered as in Figure 1B, black curve). However, in the intermediate regime, the superdiffusive behavior becomes more and more apparent, ∝ *t* 1.26, and stable. In [40] we found that this process is described by the space-time heterogeneous FBM with the Hurst exponent *H* that randomly switches between persistent *H* > 0.5 and anti-persistent regimes *H* < 0.5, together with the coupling between the diffusivity and duration of trajectories which account for spurious subdiffusion at longer time scales. Moreover, the EMSD curves obtained for *T* = 2 s and *T* = 8 s deviates considerably from the corresponding E-TMSD curves.

**Figure 2.** EMSDs and E-TMSDs (solid and dashed curves) of experimental trajectories of slow (**A**) and fast moving endosomes (**B**). Black curves correspond to *T* → ∞ s. Red and blue curves represent EMSDs and E-TMSDs of experimental trajectories which have duration longer than 2 and 8 s, respectively. The dashed-dotted line in (**A**) represents the function *t* 0.5. In (**B**), the dashed-dotted line and the dashed-double-dotted line represent the linear, *t*, and super-linear, *t* 1.26, functions, respectively.

The MSDs of slow endosomes (Figure 2A) display very different, but ergodic, behavior. For 0.01 < *t* < 2 s, the MSDs of all slow endosomes decreases in time. On the other side, the MSDs of the sub-ensembles of slow endosomes with *T* = 2 s and *T* = 8 s reveal subdiffusive trends with *α*∼0.5. As in the case of fast moving endosomes, we argue that this behavior is due to the coupling between the diffusivity and duration of trajectories. Therefore, we attempt to confirm this hypothesis, by performing simulations of an ensemble of heterogeneous FBM trajectories with constant Hurst exponent *H* = 0.25 (see Figure 4).

The velocity auto-correlation functions (VACF) also confirm the effectiveness of this simple threshold splitting (Figure 3A,B). Indeed, slow and fast endosomes have very different VACFs. Ensemble-time averaged VACFs (E-TVACFs) of fast endosomes (Figure 3B) are positive as expected for superdiffusive motion. In contrast, E-TVACFs of slow endosomes have negative dips at *t* = *τ* and approach zero from negative values (Figure 3A). Such behavior is characteristic of FBM and the generalized Langevin equation but cannot be reproduced by the CTRW model [3].

**Figure 3.** Time-ensemble averaged VACF (E-TVACF) of experimental trajectories of slow (**A**) and fast endosomes (**B**) calculated for different *τ* given in the legend.

To verify that heterogeneous FBM describes slow moving endosomes, we simulated an ensemble of hFBM trajectories. Individual hFBM trajectories were simulated with constant Hurst exponent *H* = 0.25. For standard FBM, this would correspond to subdiffusive MSDs, ' **r**2(*t*) ( ∼*t* <sup>2</sup>*H*∼ *<sup>t</sup>* 0.5. The duration of hFBM trajectories was drawn from the powerlaw distribution *<sup>φ</sup>*(*T*)∼*T*<sup>−</sup>1.85, in accordance with the experimental evidence [40]. The generalized diffusion coefficients were chosen inversely proportional to the duration of trajectories, i.e., *<sup>D</sup>*∼*T*<sup>−</sup>0.6. As shown in Figure 4, the EMSDs of hFBM trajectories agree well with the experimental data.

**Figure 4.** EMSDs calculated for simulated hFBM trajectories (solid lines) as a function of time interval. Black curves correspond to EMSDs of all trajectories of slow endosomes and hFBM, blue curves are EMSDs of trajectories longer than *T* = 2 s and cyan curves are EMSDs of trajectories longer than *T* = 8 s. The subdiffusive behavior with the anomalous exponent *α* = 0.5 is shown as the dasheddotted line. The EMSDs of slow experimental endosomal trajectories are shown for comparison (dashed lines). Notice that hFBM trajectories were simulated without external noise (measurement error), which led to discrepancy between simulated and experimental EMSDs at small time scale.

We next implemented the local analysis [40] to better characterize the slow and fast endosomal dynamics. We calculated the local TMSDs (L-TMSD) for each experimental trajectory at various times *t* ( Methods). From the fit of L-TMSD to Equation (8), we extracted the local anomalous exponents *αL*(*t*) and the local generalized diffusion coefficients *Dα<sup>L</sup>* (*t*) for slow and fast endosomes separately. The local anomalous exponents *αL*(*t*) and the local generalized diffusion coefficients *Dα<sup>L</sup>* (*t*) appear to be positively correlated both for slow and fast endosomes (see Figure A6). The origin of these correlations is not known and will be investigated in future publications. In [40], we found that PDFs of local anomalous exponents and local generalized diffusion coefficients do not depend on the window size or the time *t* (stationary) and are best fitted with exponential and power law functions, respectively.

The PDFs of *α<sup>L</sup>* and *Dα<sup>L</sup>* for slow and fast endosomes are shown in Figure 5A,B. In both cases, the PDFs of *α<sup>L</sup>* follow an exponential distribution, while those of *Dα<sup>L</sup>* are best fitted with a power-law. However, the parameters characterizing the distribution shapes are very different. Furthermore the parameters for the fast endosomes' PDFs coincide with those found by considering all experimental trajectories [40]. This is in agreement with a heterogeneous FBM model of endosomal transport [40], which describes the endosome motion as FBM with non-constant Hurst exponents.

**Figure 5.** Distribution of local anomalous exponents *α<sup>L</sup>* (**A**) and local generalized diffusion coefficients *D<sup>L</sup>* (**B**) obtained from experimental trajectories of slow and fast endosomes. The dashed and dasheddotted lines are best fits to exponential (**A**) and power-law PDFs (**B**). In (**A**), they correspond to 1.86 exp(−1.86*αL*) for PDF of *<sup>α</sup><sup>L</sup>* of fast endosomes (dashed line) and 4.3 exp(−4.3*αL*) for PDF of *<sup>α</sup><sup>L</sup>* of slow endosomes (dashed-dotted line). In (**B**) they correspond to (*DL*)−1.5 for PDF of *D<sup>L</sup>* of fast endosomes (dashed line) and (*DL*)−2.7 for PDF of *D<sup>L</sup>* of slow endosomes (dashed-dotted line).

Finally, we calculated propagators of experimental trajectories for slow and fast endosomes (Figure 6). Using the power-law forms of distributions of local generalized diffusion coefficients of slow *pS*(*DL*)∼(*DL*)−1−*γ<sup>S</sup>* and fast *pF*(*DL*) <sup>∼</sup> (*DL*)−1−*γ<sup>F</sup>* endosomes with *γ<sup>S</sup>* 1.7 and *γ<sup>F</sup>* 0.5 (Figure 5), we fit the propagators with the propagators of hFBM, PDF(*ξ*)∼|*ξ*| <sup>−</sup>1−2*<sup>γ</sup>* with *γ* = *γ<sup>S</sup>* for slow endosomes and *γ* = *γ<sup>F</sup>* for fast endosomes (see Supplementary Note and [33]). For slow endosomes (Figure 6A), we also compare the experimental PDFs with the analytical propagator for obstructed diffusion in two dimensions, *<sup>ξ</sup>*−0.108 exp(−|*ξ*| 1.65) [48].

**Figure 6.** Distribution of scaled x-component of coordinate *ξ* = *x*/*σ<sup>x</sup>* obtained from experimental trajectories of slow (**A**) and fast (**B**) endosomes. The dashed-dotted lines correspond to power-law fit |*ξ*| <sup>−</sup>1−2*γ<sup>S</sup>* for slow endosomes (*γ<sup>S</sup>* 1.7) and <sup>|</sup>*ξ*<sup>|</sup> <sup>−</sup>1−2*γ<sup>F</sup>* for fast endosomes (*γ<sup>F</sup>* 0.5). In (**A**), we also compare PDF of slow endosomes with the analytical propagator for obstructed diffusion (dashed line) [48].

#### **4. Discussion**

In this paper, we extend our investigation of the heterogeneous intracellular transport of endosomes based on the local analysis of experimental trajectories [40]. Individual endosomes move for long distances in a heterogeneous way with short bursts of directed motility, interspersed with periods of subdiffusive motion [49,50]. The heterogeneous character of this motion is also manifested as some endosomes are less motile than others. Some endosomes look as if they are jiggling in one position for the whole period of

observation. Therefore, we split the ensemble of trajectories into slow and fast moving endosomes. The distinct time behavior of mean squared displacements and velocity autocorrelation functions confirm the effectiveness of these methods. The splitting allowed us to study passive subdiffusive and active superdiffusive transport of endosomes separately.

Comparing the behavior of fast endosomes (MSDs, VACFs and propagators) to the behavior of the entire ensemble, we find that they are most consistent with FBM models [40]. Therefore, we conclude that fast endosomes follow heterogeneous FBM [40]. The ergodicity (Figure 2A) and the VACF (Figure 3A) suggest that slow endosomes are also described by the hFBM or heterogeneous generalized fractional Langevin equation motion. For slow endosomes, crowding and obstruction effects could also lead to subdiffusive behavior [2,4]. It is known that obstructed diffusion has many similarities with FBM such as stationarity of the increments and the equivalence of the time and ensemble MSDs [48,51]. The propagators provide a clear way to distinguish obstructed diffusion from FBM. Therefore, we calculated propagators of experimental slow endosomes and compared them with analytical prediction for the propagator of obstructed diffusion and prediction of heterogeneous fBM. The results shown in Figure 6 indicate that slow endosomes follow hFBM at longer time scales, while on smaller scales obstructed diffusion likely contributes to their subdiffusive behavior as well. Crowding effects remain as a possible source of anomalous diffusion of slow endosomes. Recently, in numerical simulations, lipids in crowded conditions of the membrane were shown to be multifractal and anomalous. The dynamics was no longer described by the mechanism consistent with the fractional Langevin equation or by any single known mechanism. Instead, the motion was found to be non-Gaussian and heterogeneous, yet maintains its ergodic properties [52], which is similar to what we observed for experimental trajectories of slow endosomes.

Both slow and fast endosomal trajectories are found to be highly heterogeneous in space and time. The spatial heterogeneity in the form of coupling between endosome diffusivity and duration of endosome trajectory explains the behavior of the MSDs. Longer trajectories have smaller generalized diffusion coefficients since in experiments slowly moving endosomes with smaller diffusion coefficients stay longer in the field of view, having longer durations. For slow and fast endosomes, we can conclude that EMSD and E-TMSD are not adequate to describe the large heterogeneity exhibited in space and time. Therefore, we applied a time local analysis of individual trajectories.

From the local analysis, we found that slow and fast endosomal trajectories are both characterized by exponentially distributed anomalous exponents and power-law distributed generalized diffusion coefficients. However, the parameters of these distributions are different. Although the factors that cause the power-law distributed generalized diffusion coefficients for slow and fast endosomes could be different, some common factors can exist. One of them could be the scale free properties of endosomal networks [53]. Hence, the differences in endosome diameters could generate distinct diffusive properties intrinsic to each endosome. Heterogeneous diffusion generated by the fluctuations of molecular size was found in single-molecule experiments within the cell [14,18,42]. Another common factor promoting power-law distributions of generalized diffusion coefficients could be non-specific interactions with the endoplasmic reticulum or other organelles and large intracellular structures. Recently, non-specific interactions were shown to generate heterogeneous diffusion of nanosized objects in mammalian cells [47].

Our analysis of endosomal transport would be valuable for both fundamental cell biology and nanomedicine applications such as drug and gene delivery. In these applications, nanoparticles are often used as cargo-carrying vesicles, which in turn utilize the endosomal network for their intracellular transport. For example, gold nanoparticles were shown to cluster inside endosomes and move via sub- and superdiffusion [54]. Our results would also be useful for the nanoparticle enhanced radiation therapy of cancer [55–57] where clusters of nanoparticles inside endosomes are used for dose enhancement.

In the future, we expect microscopy techniques will improve in tandem with tracking algorithms, providing datasets with larger ranges of time scales and improved resolution. Thus, further subclassification of ensembles of endosomal tracks (beyond the binary fast and slow separation) will become possible towards the ultimate goal of single molecule specificity. Increasing the dynamic range (to submillisecond time scales) will allow the stepping motion of the motor proteins (kinesin and dynein) attached to microtubules to be connected with the spectra of *α* and *Dα* for the fast moving endosomes at a fundamental level.

**Supplementary Materials:** The following are available online at https://zenodo.org/record/510645 0#.YPBsEuhKhPY, Video S1: Videos for MRC5 cells stably expressing GFP-Rab5.

**Author Contributions:** N.K. analyzed the data. N.K. and A.T. performed computer simulations. N.K., A.T. and T.A.W. wrote the paper. D.H. designed and trained the neural network. N.K., D.H., A.T., G.P., S.F., V.A. and T.A.W. interpreted the data and edited and approved the final version. All authors have read and agreed to the published version of the manuscript.

**Funding:** N.K., S.F. and V.A. acknowledge financial support from EPSRC Grant No. EP/V008641/1. D.H. acknowledges financial support from the Wellcome Trust Grant No. 215189/Z/19/Z. G.P. is supported by the Basque Government through the BERC 2018–2021 programs and the Spanish Ministry of Economy and Competitiveness MINECO through the BCAM Severo Ochoa excellence accreditation SEV-2017-0718.

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Figure A1: An example of the experimental endosome trajectories measured in MRC5 cells stably expressing GFP-Rab5. See the main text for details.

**Figure A1.** An example of the experimental endosomal trajectories (30,000 trajectories are shown).

Figure A2: Two splitting methods used to separate endosome trajectories into slow and fast moving endosomes. The first method uses the maximum distance traveled *R*(*t*). The second method uses the time-dependent anomalous exponent *H*(*t*) estimated with the neural network. An example of the two trajectories is shown, which were classified as slow and fast by both methods. See the main text for details of the methods.

**Figure A2.** An example of experimental trajectories of slow and fast moving endosomes obtained using the maximum distance traveled *R*(*t*) (**top**) and the time-dependent anomalous exponent *H*(*t*) estimated with the neural network (**bottom**).

Figure A3: The EMSD of slow and fast moving endosomes calculated with the splitting method, which uses the maximum distance traveled *R*(*t*). Two values of the threshold produce qualitatively similar results, which suggests that the splitting method is robust against small variations of the threshold.

**Figure A3.** The EMSD of experimental trajectories of slow and fast moving endosomes calculated with the splitting method, which uses the maximum distance traveled *R*(*t*) for two values of the threshold: = 0.2 μm and = 0.25 μm.

Figure A4: Comparison of distributions of anomalous exponents *αNN* and generalized diffusion coefficients *DNN* and local anomalous exponents *α<sup>L</sup>* and *D<sup>L</sup>* of slow moving endosomes. Anomalous exponents *αNN* were estimated using a neural network with window size 0.26 s. The generalized diffusion coefficients *DNN* were estimated by fitting the local TMSD of the trajectory with the power law *DNNt <sup>α</sup>NN* . The distribution of *αNN* has a maximum of 0.6 and decays faster than the distribution of *αL*. This may be because many short trajectories are missing in the NN analysis, since the NN could analyze trajectories with durations longer than its window size [10]. The distributions of generalized diffusion coefficients (right panel), on the other hand, are similar to each other.

**Figure A4.** Slow endosomes. (**Left**) Distribution of anomalous exponents *αNN* of slow moving endosome trajectories (the solid curve) compared with the distribution of local anomalous exponents *α<sup>L</sup>* of slow moving endosome trajectories (the dashed curve); (**Right**) Distribution of generalized diffusion coefficients *DNN* of slow moving endosome trajectories (the solid curve) compared with the distribution of local generalized diffusion coefficients *D<sup>L</sup>* of slow moving endosome trajectories (the dashed curve).

Figure A5: Comparison of distributions of anomalous exponents *αNN* and generalized diffusion coefficients *DNN* and local anomalous exponents *α<sup>L</sup>* and *D<sup>L</sup>* of fast moving endosomes. Anomalous exponents *α<sup>N</sup> N* were estimated using neural network with window size 0.26 s. The generalized diffusion coefficients *DNN* were estimated by fitting the local TMSD of trajectory with the power law *DNNt <sup>α</sup>NN* . The distributions of anomalous exponents (Left) are similar to each other, while the distributions of generalized diffusion coefficients (Right) are almost indistinguishable.

**Figure A5.** Fast endosomes. (**Left**) Distribution of anomalous exponents *αNN* of fast moving endosome trajectories (the solid curve) compared with the distribution of local anomalous exponents *α<sup>L</sup>* of slow moving endosome trajectories (the dashed curve); (**Right**) Distribution of generalized diffusion coefficients *DNN* of fast moving endosome trajectories (the solid curve) compared with the distribution of local generalized diffusion coefficients *D<sup>L</sup>* of fast moving endosome trajectories (the dashed curve).

Figure A6: Local anomalous exponents *α<sup>L</sup>* and local generalized diffusion coefficients *D<sup>L</sup>* are positively correlated for both slow and fast moving endosomes.

**Figure A6.** Correlation between local anomalous exponents *α<sup>L</sup>* and local generalized diffusion coefficients *D<sup>L</sup>* for slow (**A**) and fast (**B**) moving endosomes.

#### **References**


## *Article* **Detecting Transient Trapping from a Single Trajectory: A Structural Approach**

**Yann Lanoiselée 1,2,***<sup>∗</sup>* **, Jak Grimes 1,2, Zsombor Koszegi 1,2 and Davide Calebiro 1,2**


**Abstract:** In this article, we introduce a new method to detect transient trapping events within a single particle trajectory, thus allowing the explicit accounting of changes in the particle's dynamics over time. Our method is based on new measures of a smoothed recurrence matrix. The newly introduced set of measures takes into account both the spatial and temporal structure of the trajectory. Therefore, it is adapted to study short-lived trapping domains that are not visited by multiple trajectories. Contrary to most existing methods, it does not rely on using a window, sliding along the trajectory, but rather investigates the trajectory as a whole. This method provides useful information to study intracellular and plasma membrane compartmentalisation. Additionally, this method is applied to single particle trajectory data of *β*2-adrenergic receptors, revealing that receptor stimulation results in increased trapping of receptors in defined domains, without changing the diffusion of free receptors.

**Keywords:** single particle trajectory; stochastic processes; trapping; confinement

#### **1. Introduction**

Single particle methods, which track fluorescent molecules over time, allow for the quantification of biological events with unprecedented spatial and temporal resolution. In cell biology, the complex organisation of the plasma membrane significantly impacts the lateral diffusion of membrane proteins, leading to non-stationary motion patterns. A proper interpretation of these complex trajectories requires that we take into account the changes in a molecule's underlying motion mechanism. For example, transient trapping of G-protein-coupled receptors and G-proteins is closely related to a restricted collisioncoupling model [1,2]. In this model, the association rates of molecules on the plasma membrane are enhanced by the presence of confining nano-domains, where receptors and G-proteins are more likely to encounter one another. However, using analysis tools that assume the same molecular motion over time leads to incorrect interpretations of the underlying biology. An intermittent process alternating between free Brownian motion and trapping (as observed in [3]) can wrongly be interpreted as a case of anomalous diffusion with an anomalous exponent *α* < 1.

In the present article, we introduce a method to detect transient trapping events within a single trajectory. An advantage of analysing transient trapping events is the possibility of quantifying the binding kinetics of a molecule through different cellular nano-domains. Additionally, this approach does not require multiple visits of independent molecules to the same nano-domain to assess trapping and does not assume trapping nano-domains to be long-lived. Our strategy is to isolate different trapped portions of trajectories by considering the spatial self-localization of consecutive points within a single trajectory. We introduce local measures computed for each trajectory point, *n* ∈ [1, *N*], containing information on neighbouring trajectory coordinates as a way to elucidate the structure of the trajectory. For each trajectory position, the number of neighbours considered for the

**Citation:** Lanoiselée, Y.; Grimes, J.; Koszegi, Z.; Calebiro, D. Detecting Transient Trapping from a Single Trajectory: A Structural Approach. *Entropy* **2021**, *23*, 1044. https:// doi.org/10.3390/e23081044

Academic Editor: Janusz Szwabi ´nski and Aleksander Weron

Received: 15 June 2021 Accepted: 4 August 2021 Published: 13 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

local measure is determined by the number of consecutive trajectory coordinates within the range of the test lengthscale.

The detection of trapping is challenging and has been the subject of investigation by several authors. A possible strategy, based on an ensemble of trajectories, consists of evaluating trapping domains from the evaluation of local confining force [4–8]. On the side of single trajectory analysis, techniques were based on the maximum square displacement [9–11], although they are generally too sensitive to noise and local fluctuations of trajectory dynamics. Following this, a number of methods were developed, including: image analysis techniques [12], a model specific maximum likelihood estimator [13], random forest models [14], back propagation neural network approaches [15], moment-scaling spectrum analysis [16], and standardized maximum distance [17]. Another approach proposes to detect confinement size based on first-passage times [18]. Closer to our approach, Sikora et al. [19,20] have developed a method for transient confinement identification based on recurrence statistics, and Verdier et al. [21] used a graphical representation of trajectories to identify the diffusion mode of a whole trajectory. Most of the above-mentioned techniques [9–11,14–16,19,20] rely on time window approaches. Alternatively, our method is based on a recurrence matrix and investigates a trajectory as a whole, whilst still determining sub-trajectory dynamics.

Recurrence matrices are used in various areas of science. They can be used to reconstruct protein structure [22] and are even used to detect structural changes in reactiondiffusion systems [23]. In general, they are used for quantifying non-linear time-series derived from dynamical systems, such as detecting protein conformation changes in molecular dynamics [24] or for quantifying physiological measurements [25]. In the context of dynamical systems, it has been shown that one can reconstruct the chaotic attractor associated with a time-series [26]. Additionally, the influence of observational noise on recurrence plots has been previously investigated [27], in addition to recurrence plots being used for testing time-series stationarity [28]. Although the concept of a recurrence matrix is not new, we construct it in a modified way that greatly limits the effect of outliers and localisation error. Our central hypothesis is that a trapping event within a trajectory is translated into a recurrence matrix as a square block structure along the diagonal of the recurrence matrix. We introduce 3 new local measures that are particularly relevant in detecting block structures along the diagonal, which are characteristic signatures of molecular trapping. From both our construction and these newly introduced measures, we derive a quantity that is invariant when the molecule is trapped and close to zero everywhere else.

In Section 3, we performed extensive simulations and tests to assess the reliability of our method and its robustness to noise for both 2D and 3D trajectories, comparing our method to the 'Divide and Conquer Moment Scaling Spectrum' (DC-MSS) [16]. Finally, in Section 4, we apply our method to single particle tracking data to trajectories of a prototypical G-protein-coupled receptor (*β*<sup>2</sup> adrenergic receptor), analysing the effects of different pharmacological treatments on receptor trapping.

#### **2. Methods**

We consider either 2 or 3-dimensional trajectories composed of *N* successive coordinates {**x1**, ... , **xN**}, where bold face emphasises the multi-dimensionality of each data point. To make our analysis independent of the trajectory scale, trajectory increments (one-step displacements) are rescaled on each coordinate by their empirical standard deviation. Therefore, the results obtained for Brownian motion are independent of its diffusion coefficient. A recurrence matrix is then calculated from the distance between each pair of points within the trajectory. For each trajectory (see Figure 1a), we construct a positive matrix with Gaussian weights (see Figure 1b):

$$M\_{i,j} = \exp\left(-\frac{1}{2} \left(\frac{|\mathbf{x\_i} - \mathbf{x\_j}|}{\lambda}\right)^2\right),\tag{1}$$

where |**xi** − **xj**| denotes distance between two points *i* and *j*, and *λ* is the test lengthscale. Each element *Mi*,*<sup>j</sup>* is in the range [0, 1], taking values close to 0 when the distance |*ri* − *rj*| is larger than *λ*. The weights are chosen to be Gaussian in such a way that each element, *Mi*,*j*, remains close to 1 for |**xi** − **xj**|/ *λ* < 1 and decays fast when |**xi** − **xj**|/ *λ* > 1. Therefore, the presence of a trapped portion of a trajectory of size ≈ *λ* translates in the matrix *M* as a square block of near 1 entries whose diagonal is aligned to the matrix diagonal. Due to the random nature of molecule displacement, *Mi*,*<sup>j</sup>* entries are noisy, making it difficult to determine the transition between different phases of motion. To overcome this, a local smoothing of the matrix is performed. This operation can be done with computational efficiency by convolving the matrix *M* by a normalized and constant square matrix (2*μ* + 1) × (2*μ* + 1), where *μ* is the smoothing parameter, through a fast Fourier transform (FFT). An advantage of this is to greatly limit the effect of outliers, such as onestep large jumps in position due to tracking errors within a particle's trajectory. Whereas locally averaging trajectory coordinates would greatly disturb the shape of the trajectory and enhance the effect of outliers, zeroes in the Laplacian matrix induced by an outlier are removed by local averaging if an outlier lies inside a trapping block.

The smoothed recurrence matrix is then thresholded to obtain a binary matrix *B* by setting to one all the values larger than a critical value *pc* (see Figure 1c). Here, we choose the critical value to be *pc* = exp(−1), so two points of a trajectory are considered colocalizing if they are within a distance *λ* <sup>√</sup><sup>2</sup> from each other. The consequence of these manipulations turns the problem of finding trapped regions in the trajectories into finding square block structures along the diagonal of the binary matrix *B*.

From matrix *B*, one has to identify the individual block structures. This could be achieved by employing a clustering algorithm, such as k-means or k-medoids algorithms; however, these require known numbers of clusters. Even though empirical methods exist to estimate the number of clusters, such as the 'elbow' or 'silhouette' methods, they do not perform well when clusters are of a greatly differing number of entities [29]. Although spectral clustering [29] does not suffer from these limitations on cluster sizes, detection of cluster numbers relies on spectral gap detection, which fails when blocks overlap. Thus, it would not be suited for situations where a molecule jumps from one trap to another (Hop-diffusion, described in [10]). We therefore introduce a new methodology that is specific to the detection of block structures and solves all of the aforementioned issues.

We wish to detect if any trajectory step *n* ∈ [1, ... , *N*] is a part of a block or not (i.e., trapped or not). For this purpose we define three measures that can be constructed from each point along the matrix's diagonal *Bn*,*<sup>n</sup>* (see Figure 1d for visual illustration). (i) *<sup>t</sup>*|(*n*): The block time, which is the approximate trapping duration seen from the *n*-th trajectory coordinate. It is computed as the number of matrix elements being both equal to 1 and connected to *Bn*,*<sup>n</sup>* along the vertical line *Bn*±*k*,*n*. (ii) *<sup>t</sup>*⊥(*n*): The neighbouring time, which is related to the size of the window 2*t*⊥(*n*) + 1 centred on time point *n* for which all points colocalize. The neighbouring time is computed as the number of connected matrix elements being equal to 1 along the line perpendicular to the matrix diagonal and going through *Bn*,*n*. (iii) *<sup>t</sup>*(*n*): The persistence time, which is the segment formed by connected matrix elements being equal to 1 that are parallel to the matrix diagonal and starting from the extremity of the segment used to compute *t*⊥. This determines how many *m* frames in the future the lower bound *t*⊥(*n*) ≤ *t*⊥(*n* + *m*) holds.

Let us consider an ideal case where the whole trajectory is trapped such that the recurrence matrix is an *N* × *N* square with matrix elements being equal to 1 everywhere. From these three measures, one can deduce an invariant quantity that is valid for any point along the matrix diagonal (see proof in Appendix A):

$$\nu(n) = \frac{t\_{\parallel}(n)}{t\_{\parallel}(n) + t\_{\perp}(n) - 1} = 1. \tag{2}$$

Figure 1e illustrates the computed block time as a function of time (red) and how neighbouring (cyan) and persistence (purple) time compensate each other to verify the equality. This equality is a necessary condition for the point **xn** to belong to a square block. Specific events or features related to trapping interruption will cause violation of this equality.

For an ideal free portion, the matrix is diagonal (let us call it 1−diagonal). In the case of a trapping event followed by free diffusion at *n* + 1, there is a sharp transition from *ν*(*n* − 1) = 1 to *ν*(*n* + 1) = 1/*N*. Let us consider a special case, where two successive trapping events are spatially separated such that their corresponding blocks, of size *s*<sup>1</sup> and *s*2, respectively, share only a single point (the transition point *n*) that lies on the matrix diagonal. Given that the trajectory is longer than the two trapping events, *N* > *s*<sup>1</sup> + *s*2, there is a sharp transition at *n* because the increase of *t* from *s*<sup>1</sup> − 2 to *N* is not compensated by the increase of *<sup>t</sup>*<sup>|</sup> from *<sup>s</sup>*<sup>1</sup> to *<sup>s</sup>*<sup>1</sup> + *<sup>s</sup>*<sup>2</sup> − 1 (see Figure 1f). In the case where two trapping events occur successively at even closer locations, their corresponding blocks will overlap. Even though the equality would be broken at the transition point, departure from *ν* = 1 may not be very sharp because the transition point is no longer on the matrix diagonal, and accordingly, the persistence time is bounded by blocks sizes *<sup>t</sup>* < *<sup>s</sup>*<sup>1</sup> + *<sup>s</sup>*2. Adding *nd* diagonal lines along each side of the matrix diagonal, such that an ideal free motion for which *B* is a 1−diagonal matrix would become a (2*nd* + 1)−diagonal matrix, helps to enhance the variation in *<sup>ν</sup>* at the transition point by changing the bound to *<sup>t</sup>* < *<sup>N</sup>* − <sup>2</sup>*nd*, where *nd* is the number of diagonals. Adding sufficient numbers of lines makes the persistence time *<sup>t</sup>*(*n*) almost as long as the trajectory duration itself, so that when the invariant is violated, *ν* becomes very close to 0.

The number of diagonal lines that should be added depends on the lengthscale *λ* and the smoothing parameter *μ*. In general, adding more diagonal lines allows one to distinguish between traps that are very close to each other. In turn, a large number of diagonal lines reduces the precision of change-point detection for isolated traps. In order to decide the number of diagonal lines used, we performed simulations. Block time has been calculated from *<sup>M</sup>* = 103 simulated trajectories of *<sup>N</sup>* = <sup>2</sup> × 103 steps drawn from two reference types of motion that mimic free diffusion.

In the first case, we simulated Brownian motion (Bm) as the classical model of a freely moving molecule in a homogeneous medium. In the second case, we simulated subdiffusive, fractional Brownian motion (fBm) [30] with anomalous exponent *α* = 0.7 (Hölder exponent *H* = 0.35) as a prototypical diffusion in a crowded environment at percolation threshold [31–34]. In both cases, trajectories were simulated in both 2 and 3 dimensions. As a compromise between sensitivity and precision, the tenth percentile values of block time are used for the rest of the paper for the numbers of diagonal lines to be filled, independent of the dimension of the problem and of the user's choice of reference model (see Appendix D for a comparison of the effect on detection results of the number of added lines).

It is possible that our invariant is broken because of lacunarities inside blocks due to the random nature of molecules' displacements, which can easily be avoided by filling lacunarities inside block components along the diagonal (function imfill in MATLAB). Figure 1g presents the three measures for the trajectory in Figure 1a. The graph shows that inside a block, the pattern is very similar to the one presented in Figure 1e for the ideal case and shows the large change in persistence time at a block transition.

We claim that the *n*-th point of the trajectory is in a block when *ν*(*n*) is larger than a critical value *νc*. In practice, blocks are never perfect squares, so we choose *ν<sup>c</sup>* = 3/4 as a criterion such that blocks can be deformed as illustrated in Figure 1h. However, even in the case of purely free motion (e.g., Brownian motion), some blocks would still be detected because it takes a random finite time to escape a region of size *λ*. To ensure that a detected block is due to trapping and not due to chance, we chose a *p*-value approach. For each test lengthscale and each type of test motion (2D and 3D Brownian motion and fractional Brownian motion), we simulated 103 trajectories and computed the matrices *Bij* before adding diagonal lines based on our previous simulations. Those simulated trajectories were very long (10<sup>4</sup> steps each) in order to ensure the capture of very large potential blocks as test lengthscale *λ* increases. Block size was computed as the number of consecutive points for which our criterion *ν*(*n*) > *ν<sup>c</sup>* is verified. From these simulations, we estimated in each case the empirical cumulative probability density of block size. From a given *p*-value *pval*, the hypothesis that a detected block is a real trapping event (compared to the reference simulated motion: Bm or fBm) is then rejected if the cumulative probability density associated with the tested block size is smaller than 1 − *pval*.

**Figure 1.** (**a**). Simulated 2D trajectory alternating between free Brownian motion and reflected diffusion in a disk of radius *R* = 1. Diffusion coefficient is *D* = 1/2 in both cases and duration of states in both cases is a Poisson distributed duration with mean *Tf* = 5 and *Tf* = 30 for free and reflected motions, respectively. Red circle denotes beginning and blue square the end of the trajectory. (**b**). Matrix *M* computed from trajectory in (**a**) with a test lengthscale *λ* = 1. (**c**). Binary matrix *B* after thresholding *<sup>M</sup>* in (**b**), filling the lacunarities and adding diagonal lines. (**d**). Illustration of the the block time *<sup>t</sup>*<sup>|</sup> (red), the neighbouring time *<sup>t</sup>*⊥(*n*) (purple), and the persistence time *<sup>t</sup>*(*n*) (cyan) computed at the step *<sup>n</sup>* <sup>=</sup> 3 of an ideal *<sup>B</sup>* matrix illustrating a fully trapped trajectory of 8 steps. (**e**). Illustration of the inequality along the diagonal *Bnn* for a perfect block *<sup>t</sup>*<sup>|</sup> <sup>=</sup> *<sup>t</sup>*(*n*) + *<sup>t</sup>*⊥(*n*) <sup>−</sup> 1. (**f**). Illustration demonstrating that at transition between two blocks, the persistence time *<sup>t</sup>*(*n*) becomes as long as the trajectory itself. (**g**). Computation of *<sup>t</sup>*<sup>|</sup> (red), *<sup>t</sup>*⊥(*n*) (purple), and *<sup>t</sup>*(*n*) (cyan) based on (**c**). (**h**). Block invariant *ν*(*n*) (blue) computed over time based on (**g**) against the threshold value *ν<sup>c</sup>* = 0.75; green rectangles underline misclassified trajectory portions. (**i**). Classified trajectory, where black represents free portions and different colours represent distinct detected trapped portions.

#### **3. Simulations**

In this section, we present performance tests for our algorithm in 2D and 3D and compare it to an alternative algorithm, DC-MSS [16], where possible (in 2D).

#### *3.1. Fixed Parameters*

For the analysis, the smoothing parameter *μ*, the number of lines to be filled along the matrix diagonal, and the *p*-value needed to be set. We chose *μ* = 2 in such a way that high-frequency variability in the matrix *Mij* would be dampened without significantly affecting the precision of change-point detection. Then, based on simulations on the effect of different additional diagonal lines (see Appendix D), we added a number of diagonal lines corresponding to the tenth percentile of block-times. Finally, reasoning that the tail of a Brownian motion's first-passage-time distribution from the centre to the border of a disk spans over multiple timescales, choosing a *p*-value very close to 1 would exclude many transient trapping events. Accordingly, we fixed the *p*-value *pval* = 0.05 as a compromise between the sensitivity and reliability of the method and then varied simulation parameters to assess the potential of our approach.

#### *3.2. Simulation*

To test our methodology, for each data point presented below, we simulated 103 of either 2D or 3D trajectories of 103 steps each. Molecules alternate between a free diffusive state and a trapped state in which the molecule remains within a region of set size. We chose the free state to be Brownian motion with one-step diffusion lengthscale *σ* = 1, corresponding to a diffusion coefficient *D* = 1/2. The trapped state was chosen to be reflected Brownian motion inside a disk (2D) or sphere (3D) of radius *R* with the same diffusion coefficient. Similarly, we produced another dataset (2D and 3D), where instead of Brownian motion, we modelled free portions with fractional Brownian motion with Hölder exponent *H* = 0.35, corresponding to an anomalous exponent *α* = 0.7. In all cases, the random duration of each state was chosen to be Poisson distributed with mean *τBm* and *τtrap* for the free and trapped states, respectively. White noise with standard deviation *σerr* was added to trajectory coordinates to model the effect of the localisation error, starting from low noise *σerr* = *σ*/10 to mild noise *σerr* = *σ*/2 and finally strong noise with an equivalent standard deviation of trajectory one-step displacements *σerr* = *σ*.

#### *3.3. Results*

Figure 2a shows the results where both the time spent in free duration and in trapping were varied while the trapping radius was always *R* = 1, and the test lengthscale was *λ* = 1. Different levels of noise *σerr* = *σ*/10, *σ*/2, *σ* were added, respectively, in Figure 2a–c. In these cases, the minimal duration for detecting a trapping event is *τp*0.05 = 9 frames (see table in Appendix E). In these three cases, when there is no confinement at all (*τconf* /*τp*0.05 = 0), the recognition score is close to 1, meaning that the algorithm is robust to false negatives and is able to confirm the absence of trapping. In most cases, more than 90% of trajectories are correctly assigned to their state. The method performs poorly when the trapping duration is close to or shorter than *τp*0.05 or when the time spent between two trapping events is smaller than the time it takes to explore a distance larger than the trap size. Both mild and strong noise does lower the recognition score, but only marginally. Figure 2d–f test cases when radius *R* = 3 and the test lengthscale is *λ* = 3. In this case, the conclusions are the same, but one has to keep in mind that the durations are much longer because the minimum duration for detecting trapping with *λ* = 3 is *τp*0.05 = 42 (see table in Appendix E).

The above presented cases are idealised because, except when searching for a particular trap size, one does not precisely know the size of traps a priori. A reasonable range can instead be determined by observation of the experimental data. Taking advantage of the robustness to false negatives offered by our *p*-value approach, we propose combining the recognition for each lengthscale into a single one. We combine results by taking the union of detected trapped frames, considering lengthscales in the range *λ* ∈ [1, *λmax*] by increments of 0.5. We simulated trajectories alternating between free motion and trapping of distributed sizes. Possible trap radii are uniformly distributed in the range [1, *Rmax*], where *Rmax* = 1, 2, 3 in Figure 2g–i, respectively. The duration in each trapped state is set to be *τconf* = 6*R*<sup>2</sup> + 50, so the trapping time takes into account the radius of the trapping area plus an offset of 50 frames. Trapping was simulated as reflected Brownian motion with an integration step *dt* = 1/2 unless the diffusion length during a step was larger than

a third of the radius <sup>√</sup> 2*Ddt* > *R*/3, in which case positions were approximated as being uniformly distributed inside the trap.

**Figure 2.** Each panel presents the recognition score ∈ [0, 1] for 2D trajectories alternating between free and trapped motions. (**a**–**c**) Trapping radius is *R* = 1, and test lengthscale is *λ* = 1. Shown are tested combinations of free Brownian motion of mean duration *τBm* = [5, 10, 20, ... , 70] and mean trapping duration *τconf* ∈ [0, 60]; coordinates are perturbed with white noise of level *σerr* = *σ* × [0.1, 0.5, 1] (**a**–**c**). (**d**–**f**) *R* = 3 and *λ* = 3. Each rectangle represents a combination of free Brownian motion of mean duration *τBm* = [5, 10, 20, ... , 70] and mean trapping duration *τconf* ∈ [0, 210]; coordinates are perturbed with white noise of level *σerr* = *σ* × [0.1, 0.5, 1]. (**g**–**i**) Free motion is Brownian motion; noise level *σerr* = 0.5*σ* was added to trajectories. Trapping radius is in the range *R* ∈ [1, *Rmax*], where *Rmax* = 1, 2, 3 in (**g**–**i**). In each case, test lengthscales from 1/2 to *λmax* by increments of 1/2 are combined where *λmax* = 1, 2, 3 (dashed red, dotted-dashed blue, and dotted magenta). Black line shows the recognition score obtained from DC-MSS algorithm. (**j**–**l**) Same as for (**g**–**i**) except that the free motion is replaced by subdiffusive fractional Brownian motion with Hölder exponent *H* = 0.35.

For each of these three cases, noise level *σerr* = 0.5*σ* was added to trajectories, and we then tested our combination scheme with three possible *λmax* = 1, 2, 3. For comparison, we applied the DC-MSS algorithm [16] to our simulated data with the default parameters. DC-MSS separates the data into four categories: immobile, confined, free, and superdiffusive. To make it comparable to our scheme, we considered the two first categories as being 'trapped' and the two latter as being 'free'. In Figure 2g the performance of DC-MSS is better than ours when we overestimate the maximum test lengthscale *λmax* = 3, which

overestimates three times the maximum trap size *Rmax* = 1. In turn, choosing *λmax* = 2 already significantly improves our classification, and *λmax* = 1 gives close to perfect recognition. Then, in cases *Rmax* = 2 (Figure 2h) and *Rmax* = 3 (Figure 2i), DC-MSS had a consistently lower score for any choice of parameters *λmax*. It can be surprising that in Figure 2h,i, *λmax* = 1 outperforms the other *λmax* in all cases, while the size of the traps can be larger than this. We explain this by the fact that the test lengthscale does not specify the trap size to be discovered and instead describes distances between points to be considered 'in the vicinity'. When a molecule spends enough time inside a trap of radius *R* = 3, then even with *λmax* = 1, any trajectory point will colocalize with many other points in such a way that the recurrence matrix *Mi*,*<sup>j</sup>* will be in 'quasi-block' form (a block with many holes). In this case, the combination of the smoothing step and the lacunarities-filling will complete the block and allow for accurate detection. In turn, larger lengthscales *λmax* will tend to include, along with a trap, some free points in the vicinity of the confinement area, thus lowering the recognition score.

We also considered the case in which trajectories alternate between subdiffusive fractional Brownian motion and trapping. In the case of a single trap size, results were similar to those obtained in Figure 2a–f (data not shown). In the case of multiple traps' radii (see Figure 2j–l) similar results were obtained, meaning that our approach can distinguish subdiffusion due to molecular crowding from actual trapping in a nano-domain. In comparison, the DC-MSS algorithm tends to misclassify free portions as being trapped, thus giving lesser scores. In Appendix C, additional simulations performed in 3D gave similar results for both diffusive Brownian motion and subdiffusive fractional Brownian motions as 'free states' (see Figure A1).

Lastly, we verified that trajectory duration has only negligible effects as long as trajectory duration is longer than the minimum duration for trapping detection (not shown).

#### **4. Application to Experimental Data**

Based on our methodology, with *λ* = [0.5, 1, 1.5, 2], smoothing parameter *μ* = 2 and *pval* = 0.05, and subdiffusive fBm as our reference for free motion, we investigated the effect of different drugs on the diffusion and trapping of *β*<sup>2</sup> adrenergic receptor (*β*2AR) on the plasma membrane. We recorded fluorescently labelled *β*2AR molecules with total internal reflection microscopy, as they diffuse in the plasma membrane of living cells (2D recording) (see Appendix B for experimental methods). We first characterized receptors under basal conditions (36 cells), without pharmacological stimulus. Next, we treated the cells with a gold-standard agonist (isoproterenol) that activates receptors (47 cells). Additionally, we probed receptors with a neutral antagonist (propranolol), which prevents ligand-dependent receptor activation (29 cells). Figure 3a–c show, respectively, all of the trajectories longer than 50 frames (for improved visibility) from a single cell for each described treatment. Portions of trajectories are coloured according to their identified state (trapped in red and free in blue).

It clearly appears that, although trapping is present in all cases, the prevalence of trapping is increased upon agonist stimulation. This is quantitatively supported in Figure 3d, where it is shown that under basal conditions, 39.2% of receptors at each frame were trapped on the plasma membrane. Upon agonist stimulation, this percentage increased to 52.2%, while it remained similar (45.5%) after neutral antagonist treatment. To test the relevance of the observed change, we used a non-parametric Kruskal–Wallis test with Tukey–Kramer correction for multiple comparisons. We found the change between basal and agonist stimulation to be significant (*<sup>p</sup>* = <sup>2</sup> × <sup>10</sup>−4), clearly demonstrating an effect of agonist stimulation on receptor diffusion dynamics. Contrarily, the change between basal and neutral antagonist treatment was not significant (*p* = 0.74, while the difference between agonist and neutral antagonist was significant (*<sup>p</sup>* = <sup>9</sup> × <sup>10</sup><sup>−</sup>3), suggesting that the drug employed directly influences the receptor trapping, an increase in which correlates with activation of the receptors.

We then sought to further explore the differences observed between these cases. For each trapped trajectory portion, we computed the trapped radius as the distance from the estimated centre of the trap (evaluated as the median of *x* and *y* coordinates for a trapped portion) and the point further away than 95% of points within the trapped portion. In Figure 3e, we binned all of the trapped radii into an empirical probability density function (pdf) which was revealed to be similar for the three conditions, suggesting that the trapping domains are of the same nature in all cases. In all cases, the pdf of trapped radii could be fitted approximately with a Gamma distribution, highlighting the exponential decay of the tail of the distribution. This was further reinforced by the computation of the empirical pdf of trapped portions' durations (see Figure 3f), from which we again obtained a similar empirical pdf for all three conditions. The tails of the trapping duration pdf were fitted to a stretched exponential distribution, thus encompassing the wide (yet finite) range of trapping durations.

**Figure 3.** (**a**–**c**). All receptor trajectories longer than 50 frames from a single cell in each group; trajectory portions are coloured according to whether they are detected as free (blue) or trapped (red). Cells are, respectively: (**a**) in basal state, (**b**) stimulated with agonist, and (**c**) treatment with neutral antagonist. (**d**). Proportion of trapped molecules per frame; each point corresponds to a cell for basal (black), agonist stimulated (yellow), and neutral antagonist treated (green). (**e**,**f**). Empirical probability density function for basal (black), agonist stimulated (yellow), and neutral antagonist treated (green) of (**e**) trap radius and (**f**) trapping duration. Grey lines denote fitting with Gamma distribution (**e**) and stretched exponential (**f**). (**g**,**h**). Empirical probability density estimated for free trajectory portions longer than 50 frames of (**g**) the anomalous diffusion exponent *α* and (**h**) the corresponding generalized diffusion coefficient *Dα*.

Finally, we enquired into the dynamics of free trajectory portions. To do so, following [35], we computed the time-averaged mean square displacement (TAMSD) of each portion on each coordinate as

$$\delta^2(n, N) = \frac{1}{N - n} \sum\_{k=1}^{N-n} (x\_{k+n} - x\_k)^2 \tag{3}$$

and summed the result for both coordinates before performing a non-linear fitting, over the lag-time range *n* ∈ [1, 5], with the formula for ensemble-averaged TAMSD for a 2*D* ergodic anomalous diffusion process (e.g., fractional Brownian motion), with localisation error *σerr*

$$
\langle \delta^2(n, N) \rangle = 4D\_n n^a + 4\sigma\_{err}^2 \tag{4}
$$

where *α* is the anomalous exponent, and *Dα* is the generalized diffusion coefficient. From this, we obtained the empirical pdf for both anomalous exponent and generalized diffusion coefficients for each condition and observed once again that it was remarkably consistent among the tested conditions. The exponents for free portions of trajectories (see Figure 3g) were distributed slightly over *α* = 1 (average exponent *α* = 1.04, 1.05, 1.04), corresponding to simple Brownian motion. The generalized diffusion coefficients were very similar in all tested conditions (see Figure 3h) with an average *Dα* = 0.173, 0.169, 0.168 <sup>μ</sup>m2 <sup>s</sup>−<sup>1</sup> for basal, agonist, and antagonist, respectively. For comparison, we computed the pdf of exponent and *D<sup>α</sup>* from simulated Brownian motion (see Figure 3g,h), using the same parameters for trajectory duration and mean diffusion coefficient as the free portions found in the case of the tested agonist. The distributions obtained from simulations match the experimental for exponent (average exponent from simulation is *αsim* = 1.05). However the experimental distributions of diffusion coefficients are wider that the simulated one. We conclude that the distributed nature of the estimated exponent is mainly due to the intrinsic randomness of the TAMSD applied to random trajectories [36,37] while the spread of *Dα* highlights the heterogeneous nature of cell membrane.

Altogether, these results shed light on the effects of different drug treatments on receptor dynamics. We observe that receptors do not slow down after agonist stimulation. In fact, the change we observe is that receptors are more likely to be trapped, with the nature of the trapping domains remaining the same. For the case of the antagonist, we do not find a significant difference compared to the basal condition, which correlates with the proposed model where neutral antagonists impart no intrinsic activity on the receptor in the absence of an accompanying agonist. We conclude that on timescales longer than our exposure time frame (30 ms), receptors alternate between free lateral diffusion that could be modelled by Brownian motion with fluctuating diffusion coefficient [38–49] and transient trapping in nano-domains of distributed size.

#### **5. Conclusions**

In conclusion, we present an algorithm (Code availability: MATLAB code can be downloaded from https://github.com/YannLanoiselee/Transient\_trapping\_analysis, accessed on 9 August 2021) that can accurately detect transient trapping events from a single trajectory either in two or three dimensions. Our approach is based on recognizing block structures along the diagonal of a thresholded, smoothed recurrence matrix. To this end, we introduced three local measures to be computed along the diagonal of the matrix from which we deduced an invariant quantity inside blocks (trapped portions).

Then, based on a set of user-inputted test lengthscales and on simulations of Brownian and fractional Brownian motions in 2D and 3D as reference processes, we could assess the minimal size of blocks that could be interpreted as the molecule actually being trapped and not a block due to chance, depending on a *p*-value. We tested our method on a set of simulated data and verified the good performance in 2D and 3D when the free type of motion is either Brownian motion of sub-diffusive fractional or Brownian motion with anomalous exponent *α* = 0.7. We checked the robustness of our results against increasing magnitudes of localisation error. We also compared our 2D results with the classification obtained from the DC-MSS algorithm [16] and showed that our method is more accurate in the task of detecting trapping in all tested cases.

Finally, we applied our analysis to single-particle trajectories of *β*<sup>2</sup> Adrenergic Gprotein-coupled receptors recorded through total internal reflection microscopy. Three conditions were tested: the basal state, stimulated with an agonist, and treatment with a neutral antagonist. In all cases, we found that molecules explore traps with similar distributions of size and duration. Instead, it was only the frequency with which molecules were trapped that was different. TAMSD analysis of the free portions of trajectories led to the conclusion that molecules were mostly undergoing Brownian motion, with a variety of parameters indicative of cell membrane heterogeneity. The demonstration of this technique on real biological data and delineation of pharmacological principles using it (agonist = activation, antagonist = net 0 effect) suggest that our methodology to detect trapping events can be used to study the complexity of both intracellular (3D) and membrane proteins (2D) in live cells.

**Author Contributions:** Conceptualization, Y.L.; methodology, Y.L.; software, Y.L.; validation, Y.L., J.G. and Z.K.; formal analysis, Y.L.; investigation, Y.L.; resources, J.G. and Z.K.; data curation, J.G. and Z.K.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L., J.G., Z.K. and D.C.; supervision, D.C.; project administration, D.C.; funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by a Wellcome Trust Senior Research Fellowship (212313/Z/18/Z to D.C.).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in Lanoiselée, Yann; Grimes, Jak; Koszegi, Zsombor; Calebiro, Davide (2021): Trajectory of individual of beta-2 adrenergic receptors at the plasma membrane of Chinese hamster ovary cells (CHO-K1) obtained from TIRF microscope. figshare. Dataset. (https://doi.org/10.6084/m9.figshare.15157410, accessed on 9 August 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of the Square Block Invariant**

To prove the equality in Equation (2), we proceed by two inductions. For a square block of fixed side length *c*, we start by noting the symmetry with respect to the line perpendicular to the matrix diagonal going through point *c*/2 (that lies between two points for odd *c*). Then, we define *n* ∈ [1, *c*/2]; our relationship is verified for *n* = 1, and we suppose it true for *<sup>n</sup>*. Then, observing that *<sup>t</sup>*|(*<sup>n</sup>* + <sup>1</sup>) = *<sup>t</sup>*|(*n*), *<sup>t</sup>*(*<sup>n</sup>* + <sup>1</sup>) = *<sup>t</sup>*(*n*) − 2, and *t*⊥(*n* + 1) = *t*⊥(*n*) + 2, we deduce *ν*(*n* + 1) = *ν*(*n*) = 1. For the second induction, we start by noting that our relationship is valid for *c* = 1, and we suppose it is true for *<sup>c</sup>* = *<sup>k</sup>*. Then, for *<sup>c</sup>* = *<sup>k</sup>* + 1, we find that *<sup>t</sup>*⊥(*n*) remains unchanged, while both *<sup>t</sup>*|(*n*) and *<sup>t</sup>*(*n*) increase by one, thus again verifying our equality. The relationship is thus valid for arbitrary block sizes and at any point along the diagonal within the block.

#### **Appendix B. Experimental Methods**

#### *Appendix B.1. Materials*

Cell culture reagents, Lipofectamine 2000, and TetraSpeck fluorescent beads were purchased from Thermo Fisher Scientific. Isoproterenol and Propranolol were from Tocris Bioscience. The fluorescent SNAP-Surface 549 was from New England Biolabs. Ultraclean glass coverslips were obtained as previously described [50]. For single-molecule experiments, Chinese hamster ovary K1 (CHO-K1) cells (ATCC) were cultured in phenol red-free Dulbecco's modified Eagle's medium (DMEM)/F12, supplemented with 10% FBS,

penicillin, and streptomycin at 37 ◦C, 5% CO2. Cells were seeded onto ultraclean 25 mm round glass coverslips at a density of 3 × <sup>10</sup><sup>5</sup> cells per well. On the following day, cells were transfected using Lipofectamine 2000 with N-terminally SNAP-tagged human *β*2AR (SNAP-*β*2AR) [50] and N-terminally GFP-tagged clathrin light chain (GFP-CCP) (kindly provided by Emanuele Cocucci and Tom Kirchhausen), following the manufacturer's protocol. Cells were labelled with 1 μM SNAP-Surface 549 in complete culture medium for 20 min at 37 ◦C and imaged by single-molecule microscopy ≈ 4 h after transfection to obtain low physiological protein expression levels [2,50]. Cells were washed with complete culture medium and imaged in Hank's balanced salt solution (HBSS) supplemented with 10 mM HEPES. The labelling efficiency was ≈ 90% ([50]) with non-specific labelling < 1%. *β*2ARs were stimulated with either 10 μM Isoproterenol or treated with 10 μM Propranolol.

#### *Appendix B.2. Single-Molecule Microscopy*

Single-molecule microscopy experiments were performed using total internal reflection fluorescence (TIRF) microscopy on a custom system, based on an Eclipse Ti2 microscope (Nikon, Japan) equipped with a 100× oil-immersion objective (NA 1.49, Nikon); 405, 488, 561, and 637 nm diode lasers; an iLas TIRF illuminator; quadruple band excitation and dichroic filters; a quadruple beam splitter; 1.5× tube lens (Cairn Research); four EMCCD cameras (iXon Ultra 897, Andor); and hardware focus stabilization. The sample and objective were maintained at 37 ◦C throughout the experiments. Multicolour single-molecule image sequences were acquired simultaneously at full frame in frame transfer mode, corresponding to one image every 30 ms. Automated single-particle detection and tracking were performed with the u-track software [51], and the obtained trajectories were further analysed using custom algorithms in MATLAB environment as previously described [2].

#### **Appendix C. Simulations for the 3D Case**

In this section, we present the simulation results obtained in three dimensions.

**Figure A1.** Each panel presents the recognition score ∈ [0, 1] for 3D trajectories alternating between free and trapped motions. (**a**–**c**) Free motion is Brownian motion with added noise level *σerr* = 0.5*σ*. Trapping radius is in the range *R* ∈ [1, *Rmax*], where *Rmax* = 1, 2, 3 in (**a**–**c**). In each case, test lengthscales from 1/2 to *λmax* by increments of 1/2 are combined, where *λmax* = 1, 2, 3 (dashed red, dotted-dashed blue, and dotted magenta). (**d**–**f**) Same as for (**a**–**c**) except that the free motion is replaced by subdiffusive fractional Brownian motion with Hölder exponent *H* = 0.35.

#### **Appendix D. Effect of the Number of Diagonal Filled**

In Figure A2, we analysed 2D and 3D simulated trajectories alternating between either Bm or fBm and reflected Brownian motion. Then, we computed the recognition score for three possible maximum lengthscales *λmax* = 1, 2, 3. Then, for each of these *λmax*, we computed results depending on the number of lines that were added along to the matrix diagonal of the binary matrix *B*. We considered no diagonal added at all (*d*0) or the tenth percentile *d*<sup>10</sup> or median *d*<sup>50</sup> of the block time obtained from simulations of Bm or fBm in either 2D or 3D. In all considered cases, the best maximum lengthscale was *λmax* = 1, and the best recognition score was obtained with *d*10.The case *d*<sup>0</sup> was similar to *d*<sup>10</sup> for *Rmax* = 1 but failed for larger trapping radii. On the other hand, *d*<sup>50</sup> could not capture change-points for a short duration of the free states as well as *d*10.

**Figure A2.** Each panel presents the recognition score ∈ [0, 1] for 2 − 3D trajectories alternating between free and trapped motions. Each column corresponds to a trapping radius range *R* ∈ [1, *Rmax*], where *Rmax* = 1, 2, 3, respectively. Rows corresponds to different dimensionality and types of free motion 2D and Bm, 2D and fBm, 3D and Bm, and 3D and fBm, respectively. For the two first rows, black lines correspond to predictions from DC-MSS. Red, blue, and magenta lines correspond to *λmax* = 1, 2, 3, while crosses, circles, and squares indicate that lines along the diagonal of the binary matrix *B* have been filled according to the zeroth, tenth, and median percentiles of block times computed from simulations for either Bm or fBm according to the situation.

#### **Appendix E.** *p* **Value Tables**

In this section, we present the minimal trapped duration including the number of filled diagonal lines (for *d*10) corresponding to *p*-values [0.1, 0.05, 0.01] for different test lengthscales *λ* for fixed smoothing parameter *μ* = 2 and *ν<sup>c</sup>* = 0.75 (see Table A1). Values corresponding to each test lengthscale *λ* are obtained from 10<sup>3</sup> simulated trajectories of 10<sup>4</sup> steps. Simulations have been performed on 2-dimensional Brownian motion with diffu-

sion coefficient *D* = 1/2 (although the result is independent of *D* because of rescaling) and on 2-dimensional fractional Brownian motion (each coordinate generated independently) with diffusion coefficient *D* = 1/2 and a Hölder exponent *H* = 0.35 corresponding to an anomalous diffusion exponent *α* = 0.7, similar to what is found in diffusion in a crowded molecular environment.

In the case of 3D diffusion (see Table A2), similar simulations have been performed with one extra dimension. A table of the minimal trapped duration including the number of filled diagonal lines corresponding to *p*-values [0.1, 0.05, 0.01] can be seen below. They are generally shorter because a trajectory has one more degree of freedom to escape. For Brownian motion, the mean square-displacement is increased 50%.


**Table A1.** Minimal size for a block to be considered a trapped portion for when reference motion is 2D Brownian motion (left) and 2D fractional Brownian motion with *H* = 0.35 (right).

**Table A2.** Minimal size for a block to be considered a trapped portion for when refence motion is 3D Brownian motion (left) and 3D fractional Brownian motion with *H* = 0.35 (right).


#### **References**


## *Article* **Look at Tempered Subdiffusion in a Conjugate Map: Desire for the Confinement**

#### **Aleksander Stanislavsky \* and Aleksander Weron**

Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wrocław University of Science and Technology, Wyb. Wyspia ´nskiego 27, 50-370 Wroclaw, Poland; aleksander.weron@pwr.edu.pl

**\*** Correspondence: astex@ukr.net

Received: 27 October 2020; Accepted: 16 November 2020; Published: 18 November 2020

**Abstract:** The Laplace distribution of random processes was observed in numerous situations that include glasses, colloidal suspensions, live cells, and firm growth. Its origin is not so trivial as in the case of Gaussian distribution, supported by the central limit theorem. Sums of Laplace distributed random variables are not Laplace distributed. We discovered a new mechanism leading to the Laplace distribution of observable values. This mechanism changes the contribution ratio between a jump and a continuous parts of random processes. Our concept uses properties of Bernstein functions and subordinators connected with them.

**Keywords:** anomalous diffusion; statistical analysis; single-particle tracking; trajectory classification

#### **1. Introduction**

Based on a myriad of examples in the physical sciences, 1963 Nobel Prize winner in physics H.P. Wigner emphasized the exceptional role of mathematics in understanding the physical structure of the world around us [1]. Indeed, mathematics is a kind of mental tool created for this purpose, and the world is organized in a logical pattern very similar to mathematics [2]. Thus, mathematics turns out to be the language of science and technology. In many experiments the single-molecule motion manifests anomalous diffusion, absolutely not like the classical Brownian diffusion having the mean-squared displacement (MSD) linear in time [3]. To describe the data, a number of theoretical models was developed. The most popular of them are: continuous-time random walk (CTRW) and Fractional Fokker–Planck equation (FFPE) [4–7], fractional Klein–Kramers equation [8], obstructed diffusion (OD) [9,10], random walk on random walk (RWRW) [11,12], fractional Brownian motion (FBM) [13–16], fractional Lévy *α*-stable motion (FLSM) [17–19], fractional Langevin equation (FLE) [20,21] and autoregressive fractionally integrated moving average (ARFIMA), see [22] and references therein. The ARFIMA model [23–25] is a discrete time analogue of the overdamped fractional Langevin equation [26] responsible for the non-Gaussian law (Lévy *α*-stable) and a long memory. Moreover, the ARFIMA process is a universal and simple discrete time model for fractional dynamics of empirical data. Recall also that the celebrated FBM and FLSM is nothing but the limiting case of ARFIMA. Since the ARFIMA models were successful in analyzing data from other fields (econometrics, see 2003 Nobel Prize in Economic Sciences for C.W.J. Granger and R. Engel; finance and engineering [27–29]), many statistical tools (and computer packages, e.g., ITMS [24]) are widely available for users, see [25,30].

A relation beetwen physical environment and mathematical models is crucial [1,30]:


By using the conjugated Bernstein function theory [31] for a subordinated diffusion, we uncover here a general universal behavior for the pairs of conjugated subordinators. Namely, one can connect the tempered subdiffusion with the diffusion-limited aggregation. Moreover, for the pure Brownian motion this is the Laplace distribution whereas for the Lévy flights is its generalization, the Linnik distribution. It should be also noticed that a large part of well-known anomalous diffusion processes can be represented as time-changed Brownian motion. Thus, by employing the I. Monroe result [32] we find that an anomalous diffusion process is represented as time-changed Brownian motion if and only if it is a semimartingale, [33]. Randomizing the time of the Brownian motion *B*(*t*) by using the independent random process *U*(*t*), we obtain a new process *X*(*t*) = *B*(*U*(*t*)). Such an operation is called subordination, first introduced by S. Bochner [34], and see also [35]. The process *B*, called the parent process, is directed by the new operational time clock *U* called subordinator. The first reasonable usage of subordination in physics dates back to [36], see also [37,38]. Later, physicists developed a considerable intuition on subordination [39,40]. For the very recent results, see [41].

In the method of single-particle tracking (SPT) a major result is that motion in cell membranes is not limited to pure diffusion. Several modes of motion have been detected including such as immobile, confined, tethered, directed, normal diffusion, and anomalous diffusion [42]. After an ensemble average, the time dependence of the MSD for pure modes of motion is well recognized from others [43,44]. One of important phenomena related to the classification of modes of motion is that practically all experimental results show apparent transitions among modes of motion [45]. If a transition is real, it causes this nonclassical behavior to be different. Their studies are of great interest [46–48]. In cell membranes, anomalous diffusion is most likely the result of both obstacles to diffusion and traps with a distribution of binding energies or escape times [49]. Confined motion may result from corrals formed by cytoskeletal proteins near the membrane, from tethering to immobile species, or from restrictions to motion imposed by lipid domains [50]. The confined diffusion of plasma membrane proteins or lipids can be regarded as a special case of subdiffusion [51]. Analytical treatments have been provided for certain shapes of the confinement zones and the characteristic mobilities [52]. The motion of single biomolecules inside a living cell often exhibits subdiffusion in the confined and crowded environment [53]. For the interpretation of experimental results and quantitative predictions of the diffusion behavior, theoretical models can be extremely helpful. One of them is presented in this paper. It describes the transition from subdiffusion to a confined state. It is interesting that the model has a one-to-one connection with the well-known tempered subdiffusion which demonstrates the transition from subdiffusion to normal diffusion in condensed matter physics [54] and geophysics [55], respectively. Such characteristic crossover from subdiffusion to normal diffusion has been observed also in lipid bilayer systems [56–58]. The tempered stable process in different guises has been intensively researched recently [59–67]. The aim of this work is to get the stochastic representation of anomalous diffusion tending to the confinement. We compare its properties with similar ones for the tempered subdiffusion. Finally, our results are applied for relaxation processes with non-exponential decay as well as for the analysis of the experimental data with confined random trajectories of G proteins and receptors in living cells.

#### **2. Conjugate Laplace Exponents and Stochastic Representation of Anomalous Diffusion**

The subdiffusive dynamics is fruitfully modeled as a diffusive motion *X*(*τ*) subordinated by a wide class of random processes subject to infinitely divisible distributions. If the stochastic process *X*(*τ*) has the probability density function (PDF) *h*(*x*, *τ*), it is a solution of the ordinary Fokker–Planck (FP) equation

$$
\partial h(\mathbf{x}, \mathbf{\tau})/\partial \mathbf{\tau} = \mathbf{\hat{L}}(\mathbf{x}) \, h(\mathbf{x}, \mathbf{\tau}) \,. \tag{1}
$$

where *<sup>L</sup>*ˆ(*x*) is the time-independent FP operator (for example, <sup>−</sup> *<sup>∂</sup> <sup>∂</sup><sup>x</sup> <sup>F</sup>*(*x*) + *<sup>D</sup> <sup>∂</sup>*<sup>2</sup> *<sup>∂</sup>x*<sup>2</sup> with a force *F*). Generally, the operator *L*ˆ can be both multidimensional and fractional in space, but with no loss of generality we will consider the one-dimensional case. Infinitely divisible distributions, following the Lévy–Khintchine formula [68], are characterized by the exponentially weighted function

$$
\langle e^{-u\,T\_{\Psi}(\tau)}\rangle = e^{-\tau\Psi(u)} = \int\_0^\infty e^{-ut} \, g\_{\Psi}(t,\tau) \, dt \,, \tag{2}
$$

where Ψ(*u*) is called the Laplace exponent, and *g*Ψ(*t*, *τ*) is the PDF of this process. Note, the Laplace exponents may be only Bernstein functions. This is a very extensive class of functions [31]. In the theory of Bernstein functions a special role is played by so-called conjugate pairs [69]. If one of them is Ψ(*s*), then another will take the form Φ(*s*) = *s*/Ψ(*s*). The parent process *X*(*τ*) may be subordinated by *S*Ψ(*t*) = inf{*τ* > 0 : *T*Ψ(*τ*) > *t*} as well as *S*Φ(*t*) = inf{*τ* > 0 : *T*Φ(*τ*) > *t*}, where *T*<sup>Φ</sup> is a conjugate subordinator. Both cases lead to the FP equation in a general form

$$p(\mathbf{x},t) = q(\mathbf{x}) + \int\_0^t d\tau \, M(t-\tau) \, \hat{L}(\mathbf{x}) \, p(\mathbf{x},\tau) \, , \tag{3}$$

where *M*(*t*) is the memory function [54,70]. The kernel *M*(*t*) has a simple expression after the Laplace transform. Denote the inverse Laplace transform *L*−<sup>1</sup> *<sup>t</sup>* . This gives

$$M\_{\overline{\Psi}}(t) \quad = \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} e^{st} \frac{ds}{\overline{\Psi}(s)} = L\_t^{-1} \frac{1}{\overline{\Psi}(s)} \, , \tag{4}$$

$$M\Phi(t) \quad = \ \frac{1}{2\pi i} \int\_{c-i\infty}^{c+i\infty} e^{st} \, \Psi(s) \frac{ds}{s} = L\_t^{-1} \frac{\Psi(s)}{s} \, , \tag{5}$$

where *c* is large enough that 1/Ψ(*s*) (for the first case) and Ψ(*s*)/*s* (for the second) are defined for *s* ≥ *c*, and *i* <sup>2</sup> = −1. The PDF of the operational time *<sup>S</sup>*Φ(*t*) is simply written as a Laplace image

$$\tilde{f}(\mathbf{r},s) = \frac{1}{\Psi(s)} e^{-\mathbf{r}s/\Psi(s)}.\tag{6}$$

The solution (propagator) of Equation (3) takes the form of a subordination integral

$$p(\mathbf{x},t) = \int\_0^\infty h(\mathbf{x},\mathbf{r}) \, f(\mathbf{r},t) \, d\mathbf{r} \,. \tag{7}$$

Using the Brownian motion as a parent process

$$h\_B(\mathbf{x}, \tau) = \frac{1}{\sqrt{2\pi D \tau}} \exp\left(-\frac{\mathbf{x}^2}{2D\tau}\right),\tag{8}$$

the Laplace image *p*˜(*x*,*s*) is written as the tabulated integral [71], expressed in terms of the modified Bessel function of the third kind. As its index is equal to 1/2, we get the following propagator

$$\vec{p}(x,s) = \frac{1}{\sqrt{2D}} \frac{1}{\sqrt{s\Psi(s)}} \exp\left(-2\frac{|x|\sqrt{s}}{\sqrt{2D\Psi(s)}}\right) \,. \tag{9}$$

Similar calculations can be fulfilled for *S*Ψ(*t*), which we will not present here. If the moments of a parent process *X*(*τ*) are known exactly, as in the case of the Brownian motion, the moments of the process *X*[*S*Φ(*t*)] can be found analytically. Using the MSD of Brownian motion in the form ' *B*2(*τ*) ( = *Dτ*, where *D* is a diffusive constant, the MSD of *Y*(*t*) = *B*[*S*Ψ(*t*)] and *Y*(*t*) = *B*[*S*Φ(*t*)] reads

$$\left< B^2[S\_\Psi(t)] \right> \, \, = \, \, D \, L\_t^{-1} \frac{1}{s \Psi(s)} = D \int\_0^t M\_\Psi(y) \, dy \, \,\,\tag{10}$$

$$\left\langle B^2[S\_{\Phi}(t)] \right\rangle = \left. D \, L\_t^{-1} \frac{\Psi(s)}{s^2} = D \int\_0^t M\_{\Phi}(y) \, dy \right. \tag{11}$$

It is clear that the MSD is depended on the function Ψ(*s*), but in different ways it manifests in a conjugate pair. Similar analysis of two different forms of the Fokker–Planck equation where the memory kernels are conjugate pairs has been done in [72,73]. When the memory kernel is an exponentially truncated power-law, the MSD can approach to saturation. In the next section, we will look at specific examples.

#### **3. Tempered** *α***-Stable Process and Its Conjugate Partner**

An important exemplar of infinitely divisible subordinators is tempered *α*-stable processes, having all moments of operational time [74]. In this case the diffusive motion demonstrates an intermediate behavior between subdiffusion and normal diffusion [54,55]. Then the Laplace exponent is <sup>Ψ</sup>temp(*s*)=(*<sup>s</sup>* <sup>+</sup> *<sup>δ</sup>*)*<sup>α</sup>* <sup>−</sup> *<sup>δ</sup>α*, where *<sup>δ</sup>* is a positive constant and 0 <sup>&</sup>lt; *<sup>α</sup>* <sup>&</sup>lt; 1. If *<sup>δ</sup>* equals to zero, the tempered *α*-stable process becomes ordinary *α*-stable. Let the Brownian motion be a parent process, and the inverse tempered *α*-stable process is directing. The MSD of the subordinated diffusion is

$$\left< \mathbf{x}^2(t) \right> = D \int\_0^t e^{-\delta y} y^{a-1} E\_{a,a}(\delta^a y^a) \, dy \, \tag{12}$$

where *Eα*,*β*(*x*) = ∑<sup>∞</sup> *<sup>k</sup>*=<sup>0</sup> *<sup>x</sup>k*/Γ(*α<sup>k</sup>* + *<sup>β</sup>*) is the two-parameter Mittag–Lefeffler function [75]. If *<sup>t</sup>* <sup>1</sup> (or *<sup>δ</sup>* <sup>→</sup> 0), this value strives for *Dtα*/Γ(*<sup>α</sup>* <sup>+</sup> <sup>1</sup>), whereas for *<sup>t</sup>* 1 (or *<sup>α</sup>* <sup>→</sup> 1) it is linear in time *Dδ*1−*αt*/*α* as expected for normal diffusion shown in Figure 1a. From the asymptotic values for ' *x*2(*t*) ( it is easy enough to obtain the crossover time *t*<sup>x</sup> = # *α δα*−1/Γ(*α* + 1) \$1/(1−*α*) between the two diffusive modes, also shown in Figure 1a. This diffusion behaves anomalous at short time and almost normal at long times.

**Figure 1.** (Color online) Mean squared displacement of tempered subdiffusion (**a**) and its conjugate partner (**b**) with *α* = 0.6 and *δ* = 1 (for *D* = 1). The dashed red and dash-dot green lines show asymptotic behavior of the values. If the panel (**a**) indicates a transition of the subdifussion into normal diffusion at long times, whereas the panel (**b**) shows the emergence of diffusion-limited aggregation.

Next, we study the diffusion motion with the conjugate Laplace exponent *s*/Ψtemp(*s*). Its MSD is not difficult to find. It is expressed in terms of the three-parameter Mittag–Leffler function [75], having the following Taylor series

$$E\_{a,\beta}^{\rho}(\mathbf{x}) = \sum\_{k=0}^{\infty} \frac{(\rho,k)}{\Gamma(ak+\beta)k!}, \quad a,\beta > 0,\tag{13}$$

where (*ρ*, *k*) = *ρ*(*ρ* + 1)(*ρ* + 2)...(*ρ* + *k* − 1) is the Appell's symbol with (*ρ*, 0) = 1, *ρ* = 0. The MSD has also an analytical form

$$\begin{aligned} \left< x^2(t) \right> &= \left. D \int\_0^t e^{-\delta y} y^{-a} E\_{1, 1-a}(\delta y) \, dy - D \delta^a t \right|\_{0} \\ &= \left. D e^{-\delta t} t^{1-a} E\_{1, 2-a}^2(\delta t) - D \delta^a t \right|\_{0} \end{aligned} \tag{14}$$

that gives the short- and long-time behavior

$$\left< \mathbf{x}^2(t) \right> = \begin{cases} Dt^{1-a}/\Gamma(2-a) & \text{if } t \to 0 \\ Da \delta^{a-1} & \text{if } t \to \infty \end{cases} \tag{15}$$

The interrelation in the conjugate pair between each other is quite non-trivial. If for the Laplace exponent Ψtemp(*s*) the pure subduffusion evolves to normal diffusion in time, then for the conjugate case *s*/Ψtemp(*s*) the subdiffusion transforms into diffusion-limited aggregation. Figure 1b just presents the evolution. It has a simple explanation. As the normal diffusion is characterized by the Laplace exponent Ψ(*s*) = *s*, its conjugate partner has Φ(*s*) = *s*/Ψ(*s*) = 1. This clearly implies the confinement. Using asymptotic behavior of the MSD, we determine the crossover time *t* <sup>x</sup> = # <sup>Γ</sup>(<sup>2</sup> <sup>−</sup> *<sup>α</sup>*) *α δα*−<sup>1</sup> \$1/(1−*α*) between the diffusive regimes. Consequently, the duality relation between infinitely divisible subordinators allows one to generate a new impact scenario of traps, in which diffusion behaves less anomalous at short time and extremely anomalous at long times.

Using numerical methods, the propagator under the conjugate Laplace exponent *s*/Ψtemp(*s*) is shown in Figure 2. The propagator has a cusp which is saved for *t* → ∞. Recall that the tempered subdiffusion loses this feature at long times. If the axis *y* is logarithmic, as in Figure 2, the propagator of tempered subdiffusion goes to a parabola (see the panel a), whereas in the confined case it takes a triangular shape (panel b). This is not surprising because for *t* → ∞ the propagator of diffusion motion with the conjugate Laplace exponent *s*/Ψtemp(*s*) can be found analytically, and its form corresponds to the well-known Laplace (or double exponential) distribution [76,77], namely

$$\lim\_{t \to \infty} p(\mathbf{x}, t) = \frac{1}{\sqrt{2Da\delta^{\alpha - 1}}} \exp\left(-\frac{2|\mathbf{x}|}{\sqrt{2Da\delta^{\alpha - 1}}}\right) \tag{16}$$

with a location parameter *μ* = 0 (in general, it may be nonzero) and a scale parameter *<sup>θ</sup>* <sup>=</sup> <sup>√</sup>*αδα*−1*D*/2 <sup>&</sup>gt; 0. Although the PDF of the Laplace distribution is reminiscent of the normal distribution, they are different: the normal distribution is expressed in terms of the squared difference from the mean whereas the Laplace distribution is expressed in terms of the absolute difference from the mean. Therefore, the Laplace distribution has fatter (more precisely, moderate) tails than the normal distribution (with thin tails always) [68]. To get the Laplace distribution as the average value of elementary Gaussians, the necessary ("superstatistical") distribution of the diffusivities is exponential [78–80]. It should be noticed that the Brownian yet non-Gaussian diffusion is not the same considering in this paper. The stationary Laplace distribution of particles' motion also takes place in compartmentalized media [81]. The inverse cumulative distribution function of the Laplace distribution is equal to *xc* = −*θ* ln(2 − 2*q*). The value *xc* is such that any observation from this distribution with the scale parameter *θ* falls in the range [0 *xc*] with probability 0 < *q* < 1 [77]. This allows one to estimate borders of the confinement region, taking into account the values *D*, *α* and *δ*. The Laplace distribution as a confined case is characteristic for the Brownian motion as a parent process. If the parent process becomes infinitely divisible, the confined distribution will be other and presented in the next section.

**Figure 2.** (Color online) Propagator *p*(*x*, *t*) for the tempered subdiffusion (**a**) and its conjugate partner (**b**), tending to the confinement, with a constant potential, *α* = 0.5 and *δ* = 1, drawn for consecutive dimensionless instances of time. Starting with the Dirac delta-function and passing to the subdiffusive PDF, for *t* → ∞ the value *p*(*x*, *t*) becomes the normal distribution, shown by black dotted line on the panel (**a**), and the Laplace distribution (black dotted line) on the panel (**b**).

Note that for *α* = 1/2 the MSD of the tempered subdiffusion and its conjugate partner coincide with each other at short times. The point is that the Laplace exponent *s*1/2 is the only one convertible into itself by the duality relation between conjugate pairs of Laplace exponents [82]. If *α* > 1/2, then for the same values *α* the MSD of tempered subdiffusion less anomalous than the MSD of its conjugate partner at short times. For *α* < 1/2 the opposite happens. Usually the duality relation accelerates the subdiffusion more anomalous (in the sense of *α* < 1/2) and slows down too fast subdiffusion (with *α* > 1/2). This is especially evident for multi-scale anomalous diffusion [82].

#### **4. Confined Distributions for Infinitely Divisible Motion**

Now we apply our approach for infinitely divisible motion as a parent process, whereas the subordinator has the Laplace exponent *<sup>s</sup>*/[(*<sup>s</sup>* <sup>+</sup> *<sup>δ</sup>*)*<sup>α</sup>* <sup>−</sup> *<sup>δ</sup>α*] leading to a confined distribution for *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>. Without loss of generality the one-dimensional case will be represented. Consider any infinitely divisible motion by using the characteristic function in the form

$$\hat{h}(k,t) = \int\_{-\infty}^{\infty} e^{ikx} h(\mathbf{x}, t) \, d\mathbf{x} = e^{-D^\*t} \Xi(|k|) / 2 \, \tag{17}$$

where *D*∗ is a generalized diffusive constant. In the case of *β*-stable Lévy motion the characteristic exponent Ξ(|*k*|) is equal to |*k*| *<sup>β</sup>* with *<sup>β</sup>* <sup>∈</sup> (0, 2). There are also other well-known examples of the characteristic exponent: (i) (|*k*| <sup>2</sup> <sup>+</sup> *<sup>m</sup>β*/2)2/*<sup>β</sup>* <sup>−</sup> *<sup>m</sup>*, *<sup>β</sup>* <sup>∈</sup> (0, 2); (ii) log(<sup>1</sup> <sup>+</sup> <sup>|</sup>*k*<sup>|</sup> *<sup>β</sup>*), *<sup>β</sup>* <sup>∈</sup> (0, 2]; (iii) *<sup>b</sup>*|*k*<sup>|</sup> <sup>2</sup> + |*k*| *β*; (iv) log((1 + |*k*| <sup>2</sup>) + (<sup>1</sup> <sup>+</sup> <sup>|</sup>*k*|2)<sup>2</sup> <sup>−</sup> <sup>1</sup>) and so on [69].

The next development is to consider the subordination of such parent processes. For this purpose we use the same subordinator led to the Laplace distribution above from Brownian motion. Based on the simple forms of ˆ *h*(*k*, *τ*) and ˜ *f*(*τ*,*s*), the solution (propagator) of a subordinated infinitely divisible motion is convenient to write as the Laplace–Fourier transform, taking the form

$$\bar{p}(k,s) = \frac{1}{s} \frac{s/\left( (s+\delta)^a - \delta^a \right)}{[D^\*\Xi(|k|)/2 + s/\left( (s+\delta)^a - \delta^a \right)]} \,. \tag{18}$$

As lim*s*→<sup>0</sup> *<sup>s</sup>*/((*<sup>s</sup>* <sup>+</sup> *<sup>δ</sup>*)*<sup>α</sup>* <sup>−</sup> *<sup>δ</sup>α*) = *<sup>δ</sup>*1−*α*/*α*, using the final value theorem (lim*t*−><sup>∞</sup> *<sup>p</sup>*ˆ(*k*, *<sup>t</sup>*) = lim*s*−><sup>0</sup> ˜ *p*ˆ(*k*,*s*)), the confined characteristic function is written as

$$\hat{p}(k,\infty) = \frac{1}{1 + D^\* a \delta^{a-1} \Xi(|k|)/2}. \tag{19}$$

For the ordinary Brownian motion it is not difficult to check this result by the inverse Fourier transform clearly, as this is the tabulated integral [71]. Other forms of the characteristic exponent Ξ(|*k*|) do not lead to so simple analytical expressions, but then they can be evaluated numerically. Note that 1/[<sup>1</sup> <sup>+</sup> *<sup>A</sup>*Ξ(|*k*|)] with *<sup>A</sup>* <sup>=</sup> *<sup>D</sup>*∗*αδα*−1/2 <sup>&</sup>gt; 0 is an even function, and thus its Fourier transform is equivalent to the cosine transform.

Taking the *β*-stable Lévy motion with the characteristic exponent Ξ(|*k*|) = |*k*| *<sup>β</sup>* under *<sup>β</sup>* <sup>∈</sup> (0, 2), the confined characteristic function *p*ˆ*β*(*k*, ∞) manifests the characteristic function of the symmetric Linnik distribution [83] (or the *β*-Laplace distribution, following Pillai [84]), namely

$$\begin{split} p\_{\beta}(\mathbf{x}, \infty) &= \quad \frac{1}{\pi} \int\_{0}^{\infty} \frac{\cos(k|\mathbf{x}|)}{1 + Ak^{\beta}} dk \\ &= \quad \frac{1}{\pi} \int\_{0}^{\infty} \frac{\sin(z^{1/\beta}|\mathbf{x}|)}{(1 + Az)^{2}} dz \,. \end{split} \tag{20}$$

The last expression was obtained from the integration by parts and has a better convergence in numerical integration. Examples are shown in Figure 3. The symmetric Linnik distribution attracted considerable attention from researchers [85–88]. Generally, the PDF is unimodal [89], geometrically stable [90] and can be expressed in terms of Meijer's G-function [91]. Moreover, the peak of the density is finite for 1 < *β* ≤ 2 (see Figure 3a), it becomes infinite for 0 < *β* ≤ 1 (shown in Figure 3b) [92]. Based on the tabular integral of [93], its value yields

$$p\_{\beta}(0,\infty) = \frac{1}{\pi} \int\_{0}^{\infty} \frac{dk}{1 + Ak^{\delta}} = \frac{1}{\beta A^{1/\beta} \sin(\pi/\beta)}\,. \tag{21}$$

A series expansion for small *x* is written as

$$p\_{\beta}(\mathbf{x},\infty) = \frac{1}{\beta} \sum\_{n=0}^{\infty} \frac{(-1)^{n}}{(2n)!} \frac{\mathbf{x}^{2n}}{A^{(1+2n)/\beta}} \frac{1}{\sin[\pi(1+2n)/\beta]}.\tag{22}$$

If *x* = 0, only the *n* = 0 term is saved, and one obtains the previous expression. According to [94], the asymptotic expansion for large *x* reads

$$p\_{\beta}(\mathbf{x},\infty) = \frac{1}{\pi} \sum\_{n=1}^{\infty} (-1)^{n+1} \frac{\Gamma(1+n\beta) \, A^n}{|\mathbf{x}|^{1+\beta\hbar}} \sin(\pi\beta n/2) \,. \tag{23}$$

Consequently, the leading term of this expansion becomes

$$p\_{\beta}(\mathbf{x}, \infty) \sim \frac{\Gamma(1+\beta)\sin(\pi\beta/2)}{\pi} \frac{A}{|\mathbf{x}|^{1+\beta}}.\tag{24}$$

There are some specific examples representable by tabular integrals [93] that will be considered elsewhere.

Since Ξ(|*k*|) is a Bernstein function (or otherwise the function having a complete monotone derivative), the characteristic function 1/[1 + *A*Ξ(|*k*|)] is typical for a geometrically infinitely divisible PDF [95]. In any case the PDF form is symmetric and unimodal. In dependence of Ξ(|*k*|) it has a finite or infinite maximum. This is because the integral <sup>∞</sup> <sup>0</sup> *dk*/[1 + *A*Ξ(*k*)] has a single improper point, namely *k* → ∞, where the integral is convergent or divergent.

**Figure 3.** (Color online) Propagators *p*(*x*, *t*) from the parent processes, having the *β*-stable Lévy distribution: (**a**) 1 < *β* < 2; (**b**) 0 < *β* ≤ 1; under the subordinator, conjugate to a tempered random process in the sense of Bernstein functions, for *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>. The value *<sup>A</sup>* <sup>=</sup> *<sup>D</sup>*∗*αδα*−1/2 is taken equal to 1.

We can formulate the following **Confinement Principle:** Any subordinated infinitely divisible motion, in which the subordinator is characterized by the Laplace exponent conjugate to a tempered *α*-stable process, has a confined probability distribution. By the infinitely divisible motion we mean a wide class of infinitely divisible processes, including Brownian motion (as a marginal case), Lévy stable motion (Lévy flight) and many other processes with jumps. It is important that each case of characteristic exponents in such an infinitely divisible motion determines its confined probability distribution. For the pure Brownian motion this is the Laplace distribution whereas for the Lévy flights its generalization is the Linnik distribution. This procedure covers a class of geometrically infinitely divisible distributions as a confined case of the infinitely divisible motion subordinated by a special subordinator responsible for the confinement.

#### **5. Conditionally Non-Exponential Decay of Relaxation**

Our comparative analysis of tempered and confined diffusion may be pretty simple extended to relaxation processes with non-exponential decay. As is well known [96,97], the manifestations of many-body effects in anomalous dynamics of relaxing systems, independent of the physical and chemical structures of their interacting entities, are successfully described by stochastic tools. Then the relaxation function of non-exponential relaxation is written as

$$\phi\_{\Psi}(t) = \int\_{0}^{\infty} e^{-b\tau} f\_{\Psi}(\tau, t) \, d\tau,\tag{25}$$

where *b* is a constant, and *f*Ψ(*τ*, *t*) is the PDF of an inverse subordinator *S*Ψ(*t*) = inf{*τ* > 0 : *T*Ψ(*τ*) > *t*}. The Laplace image *φ*Ψ(*t*) takes the simple form

$$
\tilde{f}\_{\Psi}(\mathbf{r}, \mathbf{s}) = \frac{\Psi(s)}{s} e^{-\mathbf{r}\Psi(s)}.\tag{26}
$$

Then the Laplace transform of *φ*Ψ(*t*) in time yields

$$
\tilde{\phi}\_{\Psi}(s) = \frac{1}{s} \frac{\Psi(s)}{\Psi(s) + b}.\tag{27}
$$

Similar conversions that we omit can be done for *s*/Ψ(*s*).

Based on the Laplace exponent of tempered diffusion <sup>Ψ</sup>temp(*s*)=(*<sup>s</sup>* <sup>+</sup> *<sup>δ</sup>*)*<sup>α</sup>* <sup>−</sup> *<sup>δ</sup><sup>α</sup>* and its conjugate partner *s*/Ψtemp(*s*), it is not difficult to obtain their relaxation functions numerically. They are presented in Figure 4.

**Figure 4.** (Color online) Relaxation functions, caused by the inverse tempered subordinator (**a**) and its conjugate partner (**b**) respectively, with *α* = 0.6, *δ* = 1 and *b* = 1 . The first represents the tempered relaxation, and the second is confined. The dashed red line shows a conditionally non-exponential decay due to the confinement effect (lim*t*→<sup>∞</sup> *φ*conf(*t*) = const = 0).

Using the above relationship, we have found asymptotic behavior of the functions. They read

$$\begin{array}{rcl}\lim\_{t\to 0} (1 - \phi\_{\text{temp}}(t)) & = & bt^{\mathfrak{a}}/\Gamma(\mathfrak{a} + 1) \\\\ \lim\_{t\to 0} (1 - \phi\_{\text{conf}}(t)) & = & bt^{1 - \mathfrak{a}}/\Gamma(2 - \mathfrak{a}) \end{array} \tag{28}$$

at short times and accordingly

$$\begin{aligned} \lim\_{t \to \infty} \phi\_{\text{temp}}(t) &= \lim\_{t \to \infty} e^{-bt\delta^{1-a}/a} = 0, \\ \lim\_{t \to \infty} \phi\_{\text{conf}}(t) &= \frac{1}{1 + a\delta^{a-1}b} = \text{const} \end{aligned} \tag{29}$$

at long times. Note that both types of relaxation start with 1 as a power function in time. However, tempered relaxation tends to zero exponentially, whereas the confined relaxation does not reach zero at all. From the physical point of view this latter model can be interpreted in the following way. Dipoles ordered by the external field do not fall into disorder with probability 1 after removing the field as *t* tends to infinity. Therefore, we believe that this model demonstrates a conditionally non-exponential decay. In this context it should be mentioned that the conditionally exponential decay model is a key for the concept of clusters and their dynamics to an imperfectly ordered state, used for the explanation of relaxation in dielectric materials [98–100]. During the relaxation process the strongly coupled local (intracluster) motions are expected to be generated first and then followed by the weakly coupled (intercluster) motions which produce the partial long-range structure. Each of these motions, those leading to the local structure order and those leading to the cluster ordering in general, has its own perceptible contribution to the observed features such as the relaxation function in time domain and the susceptibility in frequency domain.

#### **6. G-Proteins vs. a2AR Receptors from the Analysis of the SPT Data**

As an example for detecting the Laplace confinement in experimental data, we present our analysis of random trajectories obtained from a recent SPT study on G protein-coupled receptors, namely the motion and interaction of individual receptors and G proteins on the surface of living cells [44]. Two types of particles and only the basal case (without drug stimulation) data were studied: G-protein coupled receptors (further we will call them simply receptors) and the G proteins with which the receptors interact. The sample data consisted of 20,000 trajectories of 30 sets for G proteins and more 35,000 trajectories of 30 sets for receptors which randomly walk along *x* and *y* coordinates.

The first aim of the data study is to classify the dynamics for both types of particles. It is based on the standardized maximal distance *Tn* of random processes from its starting point [101]. This approach is quite justified. Really, if the motion is driven by the fractional Brownian motion, then the best among the available methods is the one based on the *p-VAR* test, especially for longer trajectories. But, if the particle dynamics can be described by the Ornstein–Uhlenbeck or diffusive Brownian motion process, then the method yielding the smallest errors is based on the *MAX* test [102–105]. Following the procedure, we use the statistical test:


Then *Tn* is estimated with respect to the quantiles of order *α*/2 and 1 − *α*/2 (for example, *α* = 5%) for different trajectory lengths *n*. The decision rule is as follows: *Tn* < *qn*(*α*/2) means a confined motion, whereas *Tn* > *qn*(1 − *α*/2) is superdiffusion (or directed motion). If *qn*(*α*/2) < *Tn* < *qn*(1 − *α*/2), then *X* = {*X*1, *X*2,..., *Xn*} is Brownian motion. Consequently, this permits us to classify the trajectories available for processing. The results are shown in Figure 5. As seen from this figure, the most of trajectories is Brownian motion: 69% for G proteins and 78% for receptors. The contribution of superdiffusion is the smallest, 2%. The rest corresponds to confined motion. This part is especially interesting to us. Next, we are going to estimate the statistics of such trajectories. It is assumed that the confined random walks can occur in two cases. The first of them is classical, the Ornstein–Uhlenbeck model. It gives the normal distribution for *t* → ∞. The second case leads to the Laplace confinement for *t* → ∞, considered above. Possible transitions of the particles' diffusion type within single trajectories are noted and investigated. For example, in [104] it has been proposed a statistical procedure for detecting transitions of the MSD exponent value within a single trajectory.

Discriminating the statistics of G-protein and receptor confined trajectories between the normal and Laplace distribution functions, we apply the logarithm of the ratio of their maximized likelihoods [106]. The approach leads to the calculations of means, medians, sample variances and averages of the absolute difference between data values and the median. This statistical test gives the ratio *Q* > 0 for the normal distribution, otherwise the Laplace distribution is preferred. After applying the second statistical test, its results together with the first test results are also presented in Figure 5. This shows that for G proteins the confined trajectories obey equally the normal and Laplace distributions, whereas for receptors the normal distribution is approximately twice as common as the Laplace distribution. But it should be pointed out also that the share of confined trajectories with normal statistics remains unchanged for both G proteins and receptors. Judging by the contribution of Brownian motion in all the sets of trajectories, the difference between the percentage ratio of confined trajectories with the normal and Laplace distributions for G proteins and receptors can indicate greater mobility of receptors over G proteins. It should be mentioned that Laplace distributions were detected in the complex diffusive behavior of RNA-protein particles [107].

The occurrence of the Laplace distribution for confined trajectories in the experimental data used by us seems to be natural. First, the most part of the trajectories is Brownian motion. What could be a parent process for subordination in this environment? Brownian motion is preferred. Why? Since we observe a following **Competition Principle** *between parent processes: Brownian motion, Lévy motion or other infinitely divisible process even for any fixed subordinator conjugated one to tempered α-stable responsible* *for confinement*. If Brownian motion is parent, the confined distribution from our subordination approach can have only the Laplace form. In the above data sets any feature, for example, typical for Lévy motion, is not detected. If this was true, it would be a chance for the play of generalized Laplace distributions as a confined distribution. Another case is the Ornstein–Uhlenbeck process leading to the normal statistics in confined trajectories, it has the same (Brownian) roots too. Therefore, the presence of normal and Laplace distributions together into confined trajectories is quite logical and justified physically.

**Figure 5.** (Color online) Analysis of the experimental data as applied to G-protein and receptor random-walk trajectories along the coordinates *x* and *y* with the cutoff length of trajectories more and equal to 50.

#### **7. Discussion**

We have revealed that the conjugate property of Bernstein functions connects the tempered stable subdiffusion with the diffusion-limited aggregation by an one-to-one mapping (in fact, a bijection). If the pure subdiffusion is characterized by multiple trapping events with infinite mean sojourn time, and the power function exponent of MSD is constant in time, then a truncated power-law distribution of trapping times leads to tempered subdiffusion, in which diffusion is anomalous at short times and normal (contribution of traps seems to disappear) at long times [45]. The interpretation of anomalous diffusion tending to the confinement is that the trap impact has the opposite tendency, long waiting times in traps dominate more and more so that it becomes impossible to leave such traps. This model, just like the tempered one, is applicable for the analysis of SPT. Its effects are present in confined random motions of G proteins and receptors in living cells. We have established that the confined distribution form depends on the PDF of the parent process under subordination. If the parent process is Brownian motion, the confined distribution has only the Laplace form. If the Lévy motion is directed, the confined distribution takes the Linnik case. If the support of the parent process is changed from (−∞, ∞) to (0, ∞), as a confined limit, the Mittag–Leffler distribution arises. All this manifests that the presented method has ample opportunities for the study of confined random walks in complex systems. Concerning to relaxation phenomena, complete disorder (e. g. in the form of charge neutralization in dielectrics) does not occur in the relaxing system with confinement features. This concept can be used for developing new cluster models of non-exponential relaxation. It will be considered in more detail elsewhere. Our new methodology is generally valid in a wide class of problems of transport in random media that include live cells, relaxation in heterogeneous substances, and jump-diffusion.

**Author Contributions:** A.S. analyzed tempered subdiffusion in a conjugate map based on Brownian motion and performed the analysis of corresponding relaxation decay and experimental data. A.W. invented the confined distributions for infinitely divisible motion. A.S. and A.W. wrote the text. A.S. prepared Figures 1, 2 and 4, and Figures 3 and 5 were prepared by the authors together. Both authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** A.S. is grateful to the Hugo Steinhaus Center for pleasant hospitality during his visit in Wrocław University of Science and Technology as well kindly acknowledges a support of NAWA PPN/ULM/2019/1/00087/DEC/1. A.W. would like to thank for support of Beethoven Grant No. DFG-NCN 2016/23/G/ST1/04083.

**Acknowledgments:** The authors would like to thank T. Sungkaworn and D. Calebiro for providing the experimental data analyzed in Section 6.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Detection of Anomalous Diffusion with Deep Residual Networks**

**Miłosz Gajowczyk † and Janusz Szwabi ´nski \*,†**

Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wrocław University of Science and Technology, 50-370 Wrocław, Poland; gajowczyk.milosz@gmail.com

**\*** Correspondence: janusz.szwabinski@pwr.edu.pl

† These authors contributed equally to this work.

**Abstract:** Identification of the diffusion type of molecules in living cells is crucial to deduct their driving forces and hence to get insight into the characteristics of the cells. In this paper, deep residual networks have been used to classify the trajectories of molecules. We started from the well known ResNet architecture, developed for image classification, and carried out a series of numerical experiments to adapt it to detection of diffusion modes. We managed to find a model that has a better accuracy than the initial network, but contains only a small fraction of its parameters. The reduced size significantly shortened the training time of the model. Moreover, the resulting network has less tendency to overfitting and generalizes better to unseen data.

**Keywords:** SPT; anomalous diffusion; machine learning classification; deep learning; residual neural networks

#### **1. Introduction**

Recent advances in single particle tracking (SPT) [1–4] have allowed to observe single molecules in living cells with remarkable spatio-temporal resolution. Monitoring the details of molecules' diffusion has become the key method for investigation of their complex environments.

The data collected in SPT experiments often reveal deviations from the Brownian motion [5], i.e., the normal diffusion governed by the Fick's laws [6] and characterized by a linear time-dependence of the mean square displacement (MSD) of the molecules. Those deviations are referred to as anomalous diffusion, a field intensively studied in the physical community [7–10]. Since Richardson found a cubic scaling of MSD for particles in turbulent flows [11], anomalous diffusion was observed in many processes including tracer particles in living cells [12–14], transport on fractal geometries [15], charge carrier transport in amorphous semiconductors [16], quantum optics [17], bacterial motion [18], foraging of animals [19], human travel patterns [20] and trends in financial markets [21]. Depending on the type of nonlinearity, the anomalous diffusion is further divided into sub- and superdiffusion—two categories corresponding to sub- and superlinear MSD, respectively.

Several analytical approaches have already been attempted to analyze mobility patterns of molecules. The most popular one is based on the mean square displacement [7,22–25]. The appeal of this method lies in its relative simplicity. However, it is known to have several limitations due to the finite precision of SPT setups [7,22,26,27] and the lack of significant statistics (short trajectories and/or very few ones). To overcome these problems, several other analytic methods have been proposed [27–38]. Most of them simply replace MSD by other features calculated from trajectories (e.g., radius of gyration [28] or velocity autocorrelation function [39]).

In the last few years, classification of diffusion modes utilizing machine learning (ML) algorithms is gaining on popularity. Bayesian approach [40–42], random forests [43–47],

**Citation:** Gajowczyk, M.; Szwabi ´nski, J. Detection of Anomalous Diffusion with Deep Residual Networks. *Entropy* **2021**, *23*, 649. https://doi.org/10.3390/ e23060649

Academic Editor: Alberto Guillén

Received: 6 April 2021 Accepted: 19 May 2021 Published: 22 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

gradient boosting [44–47], neural networks [48], and deep neural networks [44,49–51] have already been used in an attempt to either just classify the trajectories or to extract quantitative information about them (e.g., the anomalous exponent [45,49,51]). The ML approach seems to be more powerful than the analytical one. However, the latter usually offers a deeper insight into the underlying processes governing the dynamics of molecules.

Despite the enormous progress in both the analytical and ML methods, the analysis of SPT data remains challenging. The classification results produced by different methods often do not agree with each other [27,38,46,47]. The reasons are similar to the ones limiting the applicability of MSD: localization errors, short trajectories, or irregular sampling. Thus, there is still need for new robust methods for anomalous diffusion. To catalog the already existing approaches, to assess their usability and to trigger the search for new ones, a challenge (called AnDi challenge) was launched last year by a team of international scientists [52].

In this paper, we are going to present a novel approach to anomalous diffusion based on deep residual networks (ResNets) [53]. In general, deep learning is quite interesting from the perspective of an end user, since it is able to extract features from raw data automatically, without any intervention by a human expert [54]. We already tested the applicability of convolutional neural networks (CNN) to SPT data [44]. They turned out to be very accurate. However, their architecture was quite complicated and the training times (including an automatic search for an optimal model) were of the order of days. Moreover, the resulting network had problems with the generalization to data coming from sources different than the ones used to generate the training set. Residual networks are a class of CNNs able to cure most of the problems the original CNN architecture is facing (i.e., vanishing and/or exploding gradients, saturiation of accuracy with increasing depth). They excel in image classification—a ResNet network won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015.

We will start from the smallest of the residual architectures, i.e., ResNet18, and then perform a series of numerical experiments in order to adopt it to characterization of anomalous diffusion. Our strategy for model tuning will be quite simple and focused mainly on the reduction of the parameters of the network. However, it should be noted here that there exist already sophisticated methods for designing small models with good performance [55–58]. The resulting network will then be applied to the G protein-coupled receptors and G proteins data set, already analyzed in Refs. [38,46,47]. Although our method is not a direct response to the AnDi Challenge [52] (e.g., we use different diffusion models for training), it is consistent with its goal to search for new robust algorithms for classification.

The paper is structured as follows. In Section 2, we briefly introduce the basics of MSD-based methods, the diffusion models we are interested in as well as the residual networks, which will be used for classification. In Section 3, data sets are briefly discussed. The search for the optimal architecture and the performance of the resulting model are presented in Section 4. The results are concluded in the last section.

#### **2. Models and Methods**

#### *2.1. Traditional Analysis*

A typical SPT experiment yields a series of coordinates (2D or 3D) over time for every observed particle. Those series have to be analyzed in order to find a relationship between the individual trajectories and the characteristics of the system at hand [59]. Typically, the first step of the analysis is the detection of the type of diffusion encoded in the trajectories.

The most common approach to classification of diffusion is based on the mean-square displacement (MSD) of particles [7,22–25]. The recorded time series is evaluated in terms of the time averaged MSD (TAMSD),

$$\overline{\delta\_t^2(\Delta)} = \frac{1}{t - \Delta} \int\_0^\infty \left[ \mathbf{x}(t' + \Delta) - \mathbf{x}(t') \right]^2 \mathbf{d}t' \tag{1}$$

where *x*(*t*) is the position of the particle at time *t* and Δ is the time lag separating the consequtive positions of the particle. Typically, *δ*<sup>2</sup> *<sup>t</sup>* (Δ) is calculated in the limit Δ *t* to obtain good statistics, since the number of positions contributing to the average decreases with the increasing Δ.

The idea behind the MSD-based method is simply to evaluate the experimental MSD curves, i.e., *δ*<sup>2</sup> *<sup>t</sup>* (Δ) as a function of the varying time lag Δ and then to fit them with a theoretical model of the form

$$
\overline{\delta\_t^2(\Delta)} \simeq \mathbb{K}\_\mathfrak{a} \Delta^\mathfrak{a} \,. \tag{2}
$$

where *Kα* is the generalized diffusion coefficient and *α* is the so-called anomalous exponent. The value of the latter one is used to discriminate between different diffusion types. The case *α* = 1 corresponds to the normal diffusion (ND), also known as the Brownian motion [5]. In this physical scenario, a particle moves freely in its environment. In other words, it does not meet any obstacles in its path, and it also does not interact with other distant molecules. Any non-Brownian (*α* = 1) emanation of particle transport is referred to as the anomalous diffusion. A sublinear MSD (*α* < 1) stands for subdiffusion, which is appropriate to represent particles slowed down due to viscoelastic properties of their surroundings [60], particles colliding with obstacles [61,62] or trapped particles [63,64]. A superlinear case (*α* > 1) indicates superdiffusion, which relates to a fast and usually directed motion of particles driven by molecular motors [65].

#### *2.2. Choice of Diffusion Models*

Many different theoretical models of diffusion may be used for analysis of experimental data (see Ref. [9] for a detailed overview). However, following Refs. [43,44], we decided to consider four models: normal diffusion [5], directed motion (DM) [22,66,67], fractional Brownian motion (FBM) in subdiffusive mode [68], and confined diffusion (CD) [40]. According to Saxton [7], for those basic models of diffusion in 2D, we have:

$$\begin{aligned} \frac{\overline{\delta\_{ND}^2(\Delta)}}{\overline{\delta\_{FBM}^2(\Delta)}} &=& 4D\Delta, \\ \frac{\overline{\delta\_{FBM}^2(\Delta)}}{\overline{\delta\_{DM}^2(\Delta)}} &=& 4D\Delta + \left(v\Delta\right)^2, \\ \frac{\overline{\delta\_{CD}^2(\Delta)}}{\overline{\delta\_{CD}^2(\Delta)}} &\simeq& r\_c^2 \left[1 - A\_1 \exp\left(\frac{-4A\_2D\Delta}{r\_c^2}\right)\right]. \end{aligned} \tag{3}$$

Here, *v* is the drift velocity in the directed motion, the constants *A*<sup>1</sup> and *A*<sup>2</sup> characterize the shape of the confinement, and *rc* is the confinement radius.

#### *2.3. Deep Learning Classification Methods*

The above method has become very popular in the SPT community due to its simplicity. It should work flawlessly for pure long trajectories with no localization errors. However, real trajectories usually contain a lot of noise, which makes the fitting of mathematical models to MSD curves challenging, even in the case of normal diffusion [22]. Moreover, many experimental trajectories are short, limiting the evaluation of the MSD curves to just a few time lags. As a consequence, there is a need for methods going beyond MSD to provide a reliable information concerning the trajectories.

In a recent paper [44], we proposed two machine learning methods that outperform the MSD analysis in case of noisy data. The first one is perceived as traditional machine learning and utilizes a set of human-engineered features that should be extracted from trajectories to feed the classifiers (see also Refs. [46,47] for a more extensive analysis). The second one is based on deep neural networks, which constitute the state-of-the-art of the modern machine learning classification. We showed that both methods perform similarly on the synthetic test data. However, the deep learning approach may seem appealing to practitioners from the SPT community because it usually operates on raw trajectories as input data and does not require human intervention to create features for each trajectory. A cascade of multiple layers of nonlinear processing units is used in this case for automatic feature identification, extraction, and transformation [69].

#### 2.3.1. Convolutional Neural Networks

Convolutional neural networks (CNN) were used in Ref. [44] for classification purposes. This choice was triggered by the fact that those networks have already been successful in many tasks including time series analysis [70]. A CNN has usually two components. The first one consisting of hidden layers extracts features from raw data. The fully connected part of the network is responsible for classification (see Figure 1 for a schematic representation of a CNN). In order to detect features in the input data, the hidden layers perform a series of convolutions and pooling operations. Each convolution provides its own map of features (a 3D array) by utilizing a filter that is sliding over the input data. The size of the maps is reduced in the pooling elements.

**Figure 1.** A schematic representation of a CNN network (source: Ref. [44]).

Choosing the right depth of the network is a challenging task. In Ref. [44], we assumed the architecture of the form (see also Ref. [71] for implementation details)

$$\text{Batch} - \left[ \text{Conv} - \text{Batch} - \text{ReLU} \right] \ast \text{N} - \text{Dense} - \text{ReLU} - \text{Dense} - \text{BatchNorm}, \quad \text{(4)}$$

and then performed a random search in the architecture and hyperparameter space in order to find the optimal model as well as other parameters required to initialize it. Here, *Batch* is the batch normalization layer, i.e., a layer performing normalization of the data (not explicitly shown in Figure 1). *Conv* and *Dense* stand for convolution and dense layers, respectively. *ReLu* is the abbreviation of the rectified linear unit, which is an activation function filtering out negative values from the output of the preceding layer. Finally, *So f tMax* is the activation function determining the final output of the classifier. We haven't used the pooling layers in this model because reducing the spatial size of the 2D trajectories is usually not necessary. The procedure resulted in a network consisting of six convolutional layers and two dense ones.

#### 2.3.2. ResNet Architecture

Although the model resulting from the above procedure performed well on our synthetic data (accuracy at the level of 97%), its architecture was quite complicated and the network itself was relatively deep, resulting in processing times of the order of days on a cluster of 24 CPUs with 50 GB total memory. However, long training times were not the only issue. It is known that with the increasing depth the problem of vanishing/exploding gradients may appear in the training phase of neural networks. Moreover, the training error may increase with the number of layers, resulting in a saturation of accuracy [53].

This is the reason why in this paper we decided to use the residual network (ResNet) [53]. It is a class of CNNs, which utilizes shortcuts (skip connections) to jump over several layers of the networks. Those shortcuts allow the network to make progress even if several layers have stopped learning because there is one blocking the backpropagation (Figure 2).

**Figure 2.** A regular CNN (left) versus a Resnet. Thanks to the skip connections in ResNet, the signal can easily pass a blocking layer in the backpropagation phase.

The residual network may be understood as a stack of residual units, where each unit is a small neural network with a skip connection. The outline of the unit is shown in Figure 3. For given input *x*, the desired mapping we want to obtain by learning is *H*(*x*). Since the shortcut connection carries out the input layer to the addition operator shown in the figure, the rest of the unit needs only to learn the residual mapping *F*(*x*) = *H*(*x*) − *x*. When a regular CNN network is initialized, its weights are close to zero, so the network just outputs values close to zero. After adding the shortcuts, the network initially models the identity function. Therefore, if the target function is close to that function (which is often the case), the training phase will be significantly shorter than in the case of a regular CNN.

In Figure 4, the actual ResNet architecture is shown. We see that the core of the network is divided into four stages. Each of them contains, in addition to the residual units, a downsampling block. Its role is to reduce the information making its way across the network.

#### 2.3.3. XResNet

In 2018, three modifications to the original ResNet architecture have been proposed under the common name XResNet [72]. Going into their details is beyond the scope of this paper. However, since they are known to have a non-negligible effect on the accuracy of the resulting model in some scenarios, we decided to include them in our search for the optimal architecture.

**Figure 3.** Residual unit in ResNet.

**Figure 4.** The architecture of ResNet. The downsampling block at the beginning of each stage help to reduce the amount of information in the case of deeper networks (path B is used in this case).

#### **3. Synthetic and Experimental Data**

#### *3.1. Synthetic Training Data*

The main factor limiting the deployment of machine learning to trajectory analysis is the availability of high-quality training data. It should contain a reasonable (i.e., large) amount of input data (trajectories) and corresponding desired output (their diffusion types). Since real data from experiments is not really provable (otherwise we would not need any new classification method), synthetic sets generated with computer simulations of different diffusion models are used for training. An ML algorithm uses the input–output pairs to

learn the rules for data processing. Once trained, it is able to use those rules to classify new unseen trajectories.

As already mentioned in Section 2.2, we decided to follow Refs. [43,44] and use four basic models of diffusion to generate the training set of trajectories. The simulation methods will be briefly described in the remaining part of this section.

#### 3.1.1. Normal Diffusion

Although several equivalent methods for simulation of Brownian motion exist, we will follow the approach presented by Michalet [22]. In case of normal diffusion, the probability distribution of the displacement's norm of a particle is given by the Rayleigh distribution

$$P(\mu) = \frac{2\mu}{4D\Delta t} \exp\left(\frac{-\mu^2}{4D\Delta t}\right), \text{ } \mu \ge 0,\tag{5}$$

where *u* is the absolute distance traveled by the particle in time Δ*t*. Thus, to simulate a trajectory, we have to randomly choose a start position of a particle and a random direction of the displacement *ϕ* and then pick a random step length *u* from the distribution (5). The new position of the particle is calculated,

$$\begin{array}{rcl} \chi\_{new} &=& \chi\_{old} + \mu \cos \varphi, \\ \chi\_{new} &=& \chi\_{old} + \mu \sin \varphi, \end{array} \tag{6}$$

and taken as the starting point for the next move. The whole procedure is repeated till a trajectory of a desired length is generated.

#### 3.1.2. Directed Motion

The simulation algorithm for the Brownian motion may be easily extended to generate a trajectory for diffusion with drift. All we have to do is simply to add a correction to the particle's position due to its active motion:

$$d\mathbf{x}\_i \quad = \quad v\boldsymbol{\Delta t}\cos\beta\_i \tag{7}$$

$$dy\_i \quad = \ v \Delta t \sin \beta\_\prime \tag{8}$$

where *v* is the norm of the drift velocity and *β* its direction. Once we have the corrections, we add them to the new coordinates:

$$\begin{array}{rcl} \chi\_{n\upsilon w} &=& \chi\_{old} + \mu \cos \varrho + d\chi\_i \\ \chi\_{n\upsilon w} &=& y\_{old} + \mu \sin \varrho + dy\_i. \end{array} \tag{9}$$

The drift velocity is one of the parameters of the simulation. However, instead of setting its value directly, we will rather use an active-motion-to-diffusion ratio [43]:

$$R = \frac{v^2 T}{4D},\tag{10}$$

where *T* is the time duration (i.e., the length of the trajectory). In our simulations, we will draw a random value of *R* from a given range and then calculate *v* for given *D* and *T*. In this way, it will be easier to generate similar trajectories with different values of *v* and *D*.

#### 3.1.3. Confined Diffusion

Again, a small modification of the model for normal diffusion is needed to simulate a particle confined inside a reflective circular boundary. We simply divide every step of the simulation into 100 substeps with Δ*t* = Δ*t*/100. Then, a normal diffusion move is carried out in every substep. The new position of the particle after all substeps will be updated only if the distance from the center of the boundary to new coordinates is smaller than the radius *rc* of the boundary.

Following Wagner et al. [43], we will introduce a boundedness parameter *B*, defined as the area of the smallest ellipse enclosing a normal diffusion trajectory (with no confinement) divided by the area of the confinement,

$$B = \frac{A\_{\text{ellipps}}}{\pi r\_c^2} \simeq \frac{DN\Delta t}{r\_c^2}.\tag{11}$$

It will help us to control the level of trapedness of particles in the simulations. *B* will be set randomly for each synthetic trajectory. Based on its value, the radius *rc* will be calculated for given *D*, *N*, and Δ*t*.

#### 3.1.4. Fractional Brownian Motion

In addition to the confined diffusion, we will also use fractional Brownian motion to simulate the subdiffusive motion. FBM is the solution of the stochastic differential equation

$$dX\_t^i = \sigma dB\_t^{H,i}, i = 1,2,\tag{12}$$

where *<sup>σ</sup>* <sup>=</sup> <sup>√</sup>2*<sup>D</sup>* is the scale parameter related to the diffusion coefficient *<sup>D</sup>*, *<sup>H</sup>* <sup>∈</sup> (0, 1) is the Hurst parameter and *B<sup>H</sup> <sup>t</sup>* is a continuous-time, zero-mean Gaussian process starting at zero, with the following covariance function

$$\mathbb{E}\left(B\_t^H B\_s^H\right) = \frac{1}{2} \left( |t|^{2H} + |s|^{2H} - |t-s|^{2H} \right). \tag{13}$$

The Hurst parameter *H* is connected with the anomalous exponent *α* via the relation

$$H = \frac{\alpha}{2}.\tag{14}$$

Since we want to use FBM for subdiffusion (i.e., *α* < 1) only, the values of *H* will be restricted to the interval (0, 1/2) in the simulations.

#### 3.1.5. Creating Noisy Data

Real measurements of particles' positions are usually altered by noise from different sources including localization errors, vibrations of the sample, electronic noise or errors in the postprocessing phase [73]. Different methods of adding noise to synthetic trajectories are possible. One can, for instance, vary the diffusion coefficient of particles or simply add some disturbance to every point of a trajectory. We will go for the latter method and add normal Gaussian noise with zero mean and standard deviation *σ* to each simulated position.

To easily generate trajectories characterized by different levels of noise, we will proceed in the following way. We first introduce the signal-to-noise ratio:

$$Q = \begin{cases} \begin{array}{c} \frac{\sqrt{D\Lambda t}}{\sigma} \\ \frac{\sqrt{D\Lambda t + (\upsilon \Lambda t)^2}}{\sigma} \end{array} & \text{for ND, CD, and FBM,} \end{cases} \tag{15}$$

Then, we will randomly set *Q* and use the above formula to determine the standard deviation *σ* appropriate for given *D*, Δ*t*, and *v*.

#### 3.1.6. Simulation Details

For the sake of comparison, our synthetic data set should resemble all characteristics of the one used in Ref. [44]. To recap, we generated 20,000 trajectories, 5000 for each diffusion type. The time lag between consecutive points within a trajectory was set to Δ*t* = 1/30 s, which is a typical value in experimental setups. All other parameters of the diffusion models were chosen randomly from the predefined ranges. Details can be found in Table 1.


**Table 1.** Parameters of the simulation and their values. All values except Δ*t* were randomly chosen from given ranges.

The data set was then divided into three subsets: the training set for fitting the machine learning models, the validation set used to estimate prediction errors for model selection and the test set for assessment of the final model. The stratified sampling method [74] was used for that purpose to guarantee a balanced representation of the diffusion modes in the subsets. Their sizes are presented in Table 2.

**Table 2.** Partition of the synthetic data set.


#### *3.2. Real Data*

We will apply our classifier to data from a single particle tracking experiment on G protein-coupled receptors and G proteins, already analyzed in Refs. [38,46,47]. The receptors mediate biological effects of many hormones and neourotransmitters and are also important as pharmacological targets [75]. Their signals are transmitted to the cell interior via interactions with G proteins. The analysis of the dynamics of these two types of molecules is extremely interesting because it may shed more light on how the receptors and G proteins meet, interact, and couple.

#### **4. Results**

The main goal of this work was to find a deep residual network with the simplest possible architecture, which is able to detect types of anomalous diffusion with satisfactory accuracy. In this section, we will first present a series of experiments that allowed us to significantly reduce the number of parameters of the original ResNet architecture. Then, we will apply the resulting model to classify both synthetic and real trajectories. All results were obtained with custom Python codes, available at https://github.com/ milySW/NNResearchAPI, accessed on 20 May 2021. PyTorch library [76] was used to build the neural networks.

#### *4.1. Finding the Optimal Network Architecture*

We performed a series of computer experiments to find a reasonable ResNet architecture. Our goal was to keep the network as small as possible to reduce both the training times and the danger of overfitting. At the same time, we targeted the classification performance on synthetic data beyond the accuracy of 90%.

Before we dive into the results of the most important experiments, we would like to provide one important note. It is usually not worth investing effort and time in more complicated networks for tiny improvements of accuracy because, due to the stochastic nature of the networks, even different instances of the same model may yield slightly different results. Having that in mind, we introduced a (rather arbitrary) threshold equal to 0.2 percentage point as an indicator of improvements worth considering. All changes in accuracy smaller than the threshold were seen as irrelevant.

#### 4.1.1. Impact of XResNet Modifications

Our first attempt was to check if the XResNet modifications [72] to the original architecture are worth considering. We took ResNet18, i.e., the smallest residual network with 18 layers, as the starting point. Results are shown in Table 3. Although the original architecture performs better on the training set, the modified one generalizes better to unseen data (i.e., has higher accuracy on the validation set). This may indicate the tendency of ResNet18 to overfit. The cost we have to pay for the improvement in validation accuracy by 0.34 percentage point is the increase in the number of parameters of the model (by 43,328) and a longer average time needed to complete one epoch (i.e., one cycle through the training data set). Despite the cost, we will keep the modifications in the model and try to reduce the number of parameters by other means.

**Table 3.** Impact of the XResNet modifications [72] on the accuracy of the model. Bold indicates the architecture we chose for further investigations.


#### 4.1.2. Depth of Neural Network

The baseline ResNet architecture consists of four stages, each of which is characterized by a different number of kernels that are convolved with the input [53]. However, ResNet was designed for classification of images, which are usually more complex than our trajectories. Thus, it will be interesting to check how a partial removal of those stages impacts the accuracy of the classifier. Results of our experiments are shown in Table 4. We see that reducing the depth of the network leads to a significant decrease in the number of the parameters in the model and improves its accuracy on the validation data.

**Table 4.** Relationship between the accuracy of the model and its depth. Depth equal to 3 was chosen for further investigations.


As expected, one does not need the full depth of the original ResNet architecture to classify the trajectories. Although the number of the parameters for two stages is very tempting, we decided to go further with depth 3 because it gives a slightly better performance.

#### 4.1.3. Dimension and Size of Convolutions

The original Resnet architecture works with 2D objects and uses convolution kernels of size 3 × 3. It will be interesting to see how the model performs with smaller kernels. Although a 2 × 2 kernel is theoretically possible, one usually tries to avoid kernels of even sizes due to the lack of a well defined central pixel. Consequently, we will compare only 1 × 1 kernels with the baseline. As it follows from Table 5, the accuracy of the model declines significantly with the introduction of the smaller kernels.


**Table 5.** Relationship between the size of the 2D convolution kernels and the performance of the model.

There is also a possibility of flattening the trajectories to 1D vectors and convolve them with 1 × *X* kernels. We have checked the model for kernels with an odd *X* ranging from 3 to 11. Results are shown in Table 6. As we can see, those changes could slightly improve the performance of the model. Moreover, the size of the model was reduced by 44%. Thus, we will keep 1 × 5 kernels and work with 1D input for further investigations.

**Table 6.** Relationship between the size of the 1D convolution kernels and the performance of the model.


#### 4.1.4. Feature Maps

The number of parameters of the model may also be reduced by limiting its "breadth", understood here as the number of feature maps (convolution kernels) at each layer. The latter for the *i*-th block is given by the formula:

$$\begin{cases} \mathbf{x}\_0 = 64, \\ \mathbf{x}\_i = \mathbf{x}\_0 \cdot 2^{i-1}, \qquad \text{for } i = 1, 2, \dots, n. \end{cases} \tag{16}$$

From Table 7, it follows that decreasing *x*<sup>0</sup> from 64 to 32 will not significantly decrease the accuracy of the model, but will reduce the number of parameters by a factor of 4. Moreover, the learning process of the network takes noticeably less time.


**Table 7.** Relationship between the number of feature maps and the accuracy of the model.

#### 4.1.5. Additional Features

One of the advantages of deep networks, at least from the perspective of an end user, is the ability to work with raw experimental data. There is no need for human-engineered features as input because the network extracts its own features automatically from the data. While this is true for ResNet architecture as well, in principle, we could augment the input to the model by some additional attributes, including the ones tailor-made to the problem of diffusion.

A set of features with the potential of distinguishing different diffusion modes from each other was presented in Ref. [44]. Here, we would like to check if adding some of those attributes to the model will have a positive impact on accuracy. We decided to use asymmetry, efficiency, fractal dimension, and TAMSD at lag 20 as additional input (see Refs. [43,44] for definitions). For each trajectory, the values of the attributes were added to the network after the raw data went through all convolutional layers and was flattened.

Results of this series of experiments are shown in Table 8. Although the network was fed with additional information, its accuracy has not improved. To explain that, let us have a look at the distribution of asymmetry among trajectories in our data set. As it follows from Figure 5, its values for different types of diffusion overlap to some extent. Thus, classifying them based on the information encoded in asymmetry may be challenging. The same holds for the other attributes. Thus, we are not going to include them in our final model.


**Table 8.** Impact of additional attributes on the performance of the model.

**Figure 5.** Distribution of asymmetry among trajectories in the synthetic data for different types of diffusion.

4.1.6. Impact of Autocorrelation

Following Ref. [77], we decided to check if the autocorrelation function taken as additional input improves the accuracy of the model. We combined the raw trajectories with their autocorrelations calculated at lags 8, 16, and 24 into a single tensor structure and used it as input to the model. Again, this measure did not improve the accuracy (Table 9).



#### 4.1.7. Selective Backprop

One of the interesting techniques to accelerate the training of deep neural networks is the selective backprop [78]. The idea behind this procedure is to prioritize samples with high loss at each iteration. It uses the output of sample's forward pass in the training phase to decide whether to use that sample to compute gradients and update parameters of the model or to skip immediately to other sample.

We carried out an experiment with two selective backprop scenarios. In the first one, a subset of training data covering 98% of the total loss was chosen for back-propagation. In this way, only 50–60% of trajectories were used in every epoch to update the network. In the second scenario, 50% of the training data were always taken, covering between 94% and 99% of the total loss in each epoch. It turned out that this method indeed shortens the training phase of the network (in particular average epoch time). However, it yields worse performance compared to the model utilizing the whole data set for back-propagation (Table 10).


**Table 10.** Different scenarios of selective backprop and their impact on the accuracy of the model.

#### 4.1.8. Choice of Hyperparameters

In the last series of experiments, we tried to find optimal values of some hyperparameters of the model. First, we looked at the cost function. Its choice allows us to control the focus in the training phase. Cross entropy for instance strongly penalizes misclassification, as it grows exponentially while approaching a wrong prediction [79]. Mean squared error (MSE) is usually used for regression problems. It does not punish wrong classifications enough, but rather promotes being close to a desired value. Although the cross entropy is the natural choice in classification tasks, the choice of the cost function seems to have no significant impact on the model's validation accuracy (Table 11). We kept MSE for shorter training times.

**Table 11.** Impact of cost function on the accuracy of the model.


An activation function defines the output of a node for the given input. It usually introduces some nonlinearity to the model. We checked four different functions. Sigmoid [80] is one of the most widely used activation functions today. It nicely mimics the behavior of real neurons; however, it may suffer from vanishing/exploding gradients. ReLU [81] is computationally very cheap, but it is also known to "die" in some situations (weights may update in such a way that the neuron never activates). Leaky ReLU [82] and ELU [83] are modifications of ReLU that mitigate that problem.

According to Table 12, ReLU activation function offers the highest accuracy on the validation set.

**Table 12.** Accuracy of the model for different choices of the activation function.


The batch size is another important hyperparameter in the model. It defines the number of samples to work through before the model's internal parameters are updated. Larger batches should allow for more efficient computation, but may not generalize well to unseen data [84]. Small batches, on the other hand, are known to sometimes have problems with arriving at local minima [79].

Results for three different batch sizes are shown in Table 13—512 turned out to be the best one in our model.


**Table 13.** Accuracy of the model for different batch sizes.

#### 4.1.9. Resulting Model

Based on the results of the above experiments, we were able to reduce the number of parameters in the model from 11,220,420 in Resnet18 with XResNet modifications to 399,556. In the same time, the accuracy of the model on validation data increased by 1.33 percentage points.

The architecture of the final model is summarized in Table 14. Besides the already mentioned parameters and hyperparameters, there are two others that have not been discussed yet. The activation threshold is a boolean flag telling the model whether it should automatically estimate the threshold value, above which the neurons become active. In addition, the learning rate is a tuning parameter that determines the step size at each iteration while moving toward a minimum of the loss function. To find its value, we used a finder algorithm proposed in Ref. [85] and implemented in a PyTorch Lightning module [86].

**Table 14.** Details of the optimal architecture.


#### *4.2. Performance of the Model*

A test set consisting of 3000 samples (750 for each diffusion type) was used to assess the performance of the final model (see Section 3.1.6 for details). In Figure 6, the confusion matrix of the classifier is shown. By definition, an element *Cij* of the matrix is equal to the number of observations known to be in class *i* (true labels) and predicted to be in class *j* [87].

**Figure 6.** Confusion matrix of the model. Rows correspond to the true labels and columns to the predicted ones.

The model achieves the best performance for subdiffusion. Only 12 out of 750 trajectories have been wrongly classified in case of FBM and 25 out of 750 in case of CD. The other two modes are more challenging for the classifier. As for DM, 136 trajectories are misclassified, most of them as normal diffusion. The performance for the latter is slightly better—109 trajectories got wrong labels.

In Section 4.1.5, we tried to improve the performance of the model with some additional human-engineered features, which were motivated by the characteristics of diffusion itself. We were not really successful because it turned out that the distributions of those features overlap with each other, particularly for DM and ND, contributing to the confusion of the classifier. We guess that the same holds for features extracted automatically by the ResNet model—they are not specific enough to better distinguish DM from ND.

The confusion matrix may be used to calculate the basic performance metrics of the classifier. They are summarized in Table 15. Accuracy is defined as the number of correct predictions divided by the total number of predictions. Precision is the fraction of correct predictions of a class among all predictions of that class. It indicates how often a classifier is correct if it predicts a given class. Recall is the fraction of correct predictions of a given class over the total number of samples in that class. It measures the number of relevant results within a predicted class. Finally, F1 score is the harmonic mean of precision and recall.


**Table 15.** Basic performance metrics of the model on test data.

Even though the model has apparently some problems with DM and ND classes, its overall accuracy on test data are high. It returns much more relevant results than the irrelevant ones (high average precision), and it is able to yield most of the relevant results (high average recall). The F1 score simply confirms that.

It could be also interesting to check how the performance metrics of the classifier evolve with the training time (i.e., with the number of epochs). The results are presented in Figure 7. To generate the plots, we trained 50 instances of the model and then averaged the metrics. In this way, we could also estimate the 95% confidence levels. We see that all metrics reach a satisfactory level already in the third epoch. Further training improves the performance of the model only slightly.

**Figure 7.** Performance metrics (on validation data) of the model as functions of the training time.

The same results, but this time broken down into separate diffusion modes, are shown in Figure 8. The measures for DM and ND are not only smaller than the ones for subdiffusion, but they also fluctuate to a higher extent when we look at values after the early epochs. This is due to the fact that these two classes are often confused with each other.

**Figure 8.** Performance metrics (on validation data) for each diffusion mode as functions of the training time.

The metrics for individual classes in the best epoch are shown in Figure 9. Again, we see a small gap between the subdiffusive classes on one hand and the problematic ones (i.e., DM and ND) on the other. However, even in the worst case, the metrics are above 80% indicating a good performance of the classifier.

**Figure 9.** Performance metrics in the best epoch for each diffusion mode.

#### *4.3. Classification of Real Data*

From the available data on G protein-coupled receptors and G proteins, we took into account only trajectories with at least 50 steps. In this way, the data set was reduced to 1029 G proteins and 1218 receptors. Classification results are shown in Table 16. For the sake of comparison, two other predictions are reported in the table: a gradient boosting method utilizing noisy training data and a set of human-engineered features (reduced Set A trained with noise, see Table 15 in Ref. [47] for details) and a statistical testing procedure based on the maximum distance traveled by the particle (MAX method, see Refs. [38,46] for details).

**Table 16.** Classification of real data: comparison of our model with the feature based ML method from Ref. [47] (Set A with noise) and the statistical hypothesis testing from Ref. [38,46] (MAX method). "Rec." and "G Prot." stand for G protein-coupled receptors and G proteins, respectively. Due to rounding, the numbers may not add up precisely to 100%.


Despite some differences in the absolute numbers, all three methods classify most of the trajectories as normal diffusion. However, there are significant discrepancies between them in the classification of the remaining time series. While our method labels almost all of them as superdiffusion, the other two ones predict subdiffusion in most of the cases. Unfortunately, the ground-truth for real data are missing and the results cannot be proven. However, it was already pointed out in Ref. [38] that different classification algorithms may provide substantially different results for the same data sets. Averaging of the results from all available methods has been proposed to mitigate the risk of large classification errors.

#### **5. Discussion and Conclusions**

Identifying the type of motion of particles in living cells is crucial to deduct their driving forces and hence to get insight into the mechano-structural characteristics of the cells. With the development of advanced AI methods in the last decades, there is an increasing interest to use them for that purpose. These methods are expected to outperform the well established statistical approach, in particular for noisy and small data sets.

In this paper, deep residual networks have been used to classify the SPT trajectories. We started from the well-known ResNet architecture [72], which excels in image classification, and carried out a series of numerical experiments to adapt it to detection of diffusion modes. We managed to find a model that has a better accuracy than the initial network, but contains only a small fraction of its parameters (399,556 vs. 11,177,092 in ResNet18, i.e., the smallest among ResNet networks). The reduced number of parameters had a huge positive impact on the training time of the model. Moreover, the resulting network has less tendency to overfitting and generalizes better to unseen data.

The overall accuracy of our model on the synthetic test data with noise is pretty good (90.6%). Breaking down the predictions into individual classes reveals that the model is able to recognize FBM and confined diffusion with a remarkable accuracy (99.6% and 98.53%, respectively). The detection of normal diffusion and directed motion seems to be more challenging and the model mixes up those two categories with each other from time to time.

Regarding the classification of real data, the predictions of our model are a little bit confusing. Compared to two other methods, i.e., a statistical testing procedure based on the maximum distance traveled by the particle [38,46] and gradient boosting methods with a set of tailor-made features characterizing the trajectories [47], it gives a similar fraction of normal diffusion (the majority class) among the trajectories. However, while our model classifies the remaining data as superdiffusion, the other ones assign most of those trajectories to the subdiffusive class. Moreover, it should be mentioned that some other classifiers provide results different from the ones in Table 16 [38,46]. In light of the above, the authors in Ref. [38] suggested taking a mean of the results of all available methods to minimize the risk of large errors. Therefore, there is still need to search for new classification methods for SPT data.

**Author Contributions:** Conceptualization, J.S.; methodology, J.S.; software, M.G.; validation, M.G.; investigation, M.G. and J.S.; writing—original draft preparation, J.S.; writing—review and editing, M.G. and J.S.; supervision, J.S. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by core funding for statutory R&D activities. J.S. was also funded by NCN Beethoven Grant No. 2016/23/G/ST1/04083.

**Data Availability Statement:** Codes required to generate training datasets may be found at https: //github.com/milySW/NNResearchAPI (accessed on 20 May 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Impact of Feature Choice on Machine Learning Classification of Fractional Anomalous Diffusion**

#### **Hanna Loch-Olszewska \*,† and Janusz Szwabi ´nski \*,†**

Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wrocław University of Science and Technology, 50-370 Wrocław, Poland

**\*** Correspondence: hanna.loch@pwr.edu.pl (H.L.-O.); janusz.szwabinski@pwr.edu.pl (J.S.)

† These authors contributed equally to this work.

Received: 12 November 2020; Accepted: 12 December 2020; Published: 19 December 2020

**Abstract:** The growing interest in machine learning methods has raised the need for a careful study of their application to the experimental single-particle tracking data. In this paper, we present the differences in the classification of the fractional anomalous diffusion trajectories that arise from the selection of the features used in random forest and gradient boosting algorithms. Comparing two recently used sets of human-engineered attributes with a new one, which was tailor-made for the problem, we show the importance of a thoughtful choice of the features and parameters. We also analyse the influence of alterations of synthetic training data set on the classification results. The trained classifiers are tested on real trajectories of G proteins and their receptors on a plasma membrane.

**Keywords:** anomalous diffusion; machine learning classification; feature engineering

#### **1. Introduction**

Starting with the pioneering experiment performed by Perrin [1], the quantitative analysis of microscopy images has become an important technique for various disciplines ranging from physics to biology. Over the last century, it has evolved to what is now known as single-particle tracking (SPT) [2–4]. In recent years, SPT has gained popularity in the biophysical community. The method serves as a powerful tool to study the dynamics of a wide range of particles including small fluorophores, single molecules, macromolecular complexes, viruses, organelles and microspheres [5,6]. Processes such as microtubule assembly and disassembly [7], cell migration [8], intracellular transport [9,10] and virus trafficking [11] have been already successfully studied with this technique.

A typical SPT experiment results in a series of coordinates over time (also known as "trajectory") for every single particle, but it does not provide any directed insight into the dynamics of the investigated process by itself. Mobility patterns of particles encoded in their trajectories have to be extracted in order to relate individual trajectories to the behavior of the system at hand and the associated biological process [12]. The analysis of SPT trajectories usually starts with the detection of a corresponding motion type of a particle, because this information may already provide insights into mechanical properties of the particle's surrounding [13]. However, this initial task usually constitutes a challenge due to the stochastic nature of the particles' movement.

There are already several approaches to analyse the mobility patterns of particles. The most commonly used one is based on the mean square displacement (MSD) of particles [10,14–17]. The idea behind this method is quite simple: a MSD curve (i.e., an average square displacement as a function of the time lag) is quantified from a single experimental trajectory and then fitted with a theoretical expression [18]. A linear best fit indicates normal diffusion (Brownian motion) [19], which corresponds to a particle moving freely in its environment. Such a particle neither interacts with other distant particles nor is hindered by any obstacles. If the fit is sublinear, the particle's movement is referred to as subdiffusion. It is appriopriate to represent particles moderated by viscoelastic properties of the environment [20], particles which hit upon obstacles [21,22] or trapped particles [9,23]. Finally, a superlinear MSD curve means superdiffusion, which relates to the motion of particles driven by molecular motors. This type of motion is faster than the linear case and usually in a specific direction [24].

Although popular in the SPT community, the MSD approach has several drawbacks. First of all, experimental uncertainties introduce a great amount of noise into the data, making the fitting of mathematical models challenging [10,14,25,26]. Moreover, the observed trajectories are often short, limiting the MSD curves to just a few first time lags. In this case, distinguishing between different theoretical models may not be feasible. To overcome these problems, several analytical methods that improve or go beyond MSD have already been proposed. The optimal least-square fit method [10], the trajectory spread in space measured with the radius of gyration [27], the van Hove displacements distributions [28], self-similarity of trajectory using different powers of the displacement [29] or the time-dependent directional persistence of trajectories [30] are examples of methods belonging to the first category. They may be combined with the results of the pure MSD analysis to improve the outcome of classification. The distribution of directional changes [31], the mean maximum excursion method [32] and the fractionally integrated moving average (FIMA) framework [33] belong to the other class. They allow efficient replacement of the MSD estimator for classification purposes. Hidden Markov models (HMM) turned out to be quite useful in heterogeneity checking within single trajectories [34,35] and in the detection of confinement [36]. Classification based on hypothesis testing, both relying on MSD and going beyond this statistics, has been shown to be quite successful as well [26,37].

In the last few years, machine learning (ML) has started to be employed for the analysis of single-particle tracking data. In contrast to standard algorithms, where the user is required to explicitly define the rules of data processing, ML algorithms can learn those rules directly from series of data. Thus, the principle of ML-based classification of trajectories is simple: an algorithm learns by adjusting its behavior to a set of input data (trajectories) and corresponding desired outputs (real motion types, called the ground truth). These input–output pairs constitute the training set. A classifier is nothing but a mapping between the inputs and the outputs. Once trained, it may be used to predict the motion type of a previously unseen sample.

The main factor limiting the deployment of ML to trajectory analysis is the availability of high-quality training data. Since the data collected in the experiments is not really provable (otherwise, we would not need any new classification method), synthetic sets generated with computer simulations of different diffusion models are usually used for training.

Despite the data-related limitations, several attempts at ML-based analysis of SPT experiments have been already carried out. The applicability of the Bayesian approach [18,38,39], random forests [40–43], neural networks [44] and deep neural networks [41,45,46] was extensively studied. The ultimate goal of those works was the determination of the diffusion modes. However, some of them went beyond the pure classification and focused on extraction of quantitative information about the trajectories (e.g., the anomalous exponent [42,45]).

In one of our previous papers, we compared two different ML approaches to classification [41]. Feature-based methods do not use raw trajectories as input for the classifiers. Instead, they require a set of human-engineered features, which are then used to feed the algorithms. In contrast, deep learning (DL) methods extract features directly from raw data without any effort from human experts. In this case, the representation of data is constructed automatically and there is no need for complex data preprocessing. Deep learning is currently treated as the state-of-the-art technology for automatic data classification and slightly overshadows the feature-based methods. However, from our results, it follows that the latter are still worth to consider. Compared to DL, they may arrive at similar accuracies in much shorter training times, are usually easier to interpret, allow to work with trajectories of different lengths in a natural way and often do not require any normalisation of data. The only

drawback of those methods is that there is not a universal set of features that works well for trajectories of any type. Choosing the features is challenging and may have an impact on the classification results.

In this paper, we would like to elaborate on the choice of proper features to represent trajectories. Comparing classifiers trained on the same set of trajectories, but with slightly different features, we will address some of the challenges of feature-based classification.

The paper is structured as follows. In Section 2, we briefly introduce the concept of anomalous diffusion and present the stochastic models that we chose to model it. In Section 3, methods and data sets used in this work are discussed. The results of classification are extensively analysed in Section 4. In the last section, we summarise our findings.

#### **2. Anomalous Diffusion and Its Stochastic Models**

Non-Brownian movements that exhibit non-linear mean squared displacement can be described by multiple models, depending on some specific properties of the corresponding trajectories. The most popular models are the continuous-time random walk (CTRW) [9], random walks on percolating clusters (RWPC) [47,48], fractional Brownian motion (FBM) [49–51], fractional Lévy *α*-stable motion (FLSM) [52], fractional Langevin equation (FLE) [53] and autoregressive fractionally integrated moving average (ARFIMA) [54].

In this paper, we follow the model choice described in [26,37,43]—namely, we use FBM, the directed Brownian motion (DBM) [55] and Ornstein–Uhlenbeck (OU) processes [56]. With the particular choice of the parameters, all these models simplify to the classical Brownian motion (i.e., normal diffusion).

The FBM is the solution of the stochastic differential equation

$$dX\_t^i = \sigma dB\_t^{H,i}, \; i = 1,2,\tag{1}$$

where *<sup>σ</sup>* <sup>&</sup>gt; 0 is the scale coefficient, which relates to the diffusion coefficient *<sup>D</sup>* via *<sup>σ</sup>* <sup>=</sup> <sup>√</sup>2*D*, *<sup>H</sup>* <sup>∈</sup> (0, 1) is the Hurst parameter and *B<sup>H</sup> <sup>t</sup>* is a continuous-time, zero-mean Gaussian process starting at zero, with the following covariance function

$$\mathbb{E}\left(B\_t^H B\_s^H\right) = \frac{1}{2}\left(|t|^{2H} + |s|^{2H} - |t-s|^{2H}\right). \tag{2}$$

The value of *H* determines the type of diffusion in the process. For *H* < <sup>1</sup> <sup>2</sup> , FBM produces subdiffusion. It corresponds to a movement of a particle hindered by mobile or immobile obstacles [57]. For *H* > <sup>1</sup> 2 , FBM generates superdiffusive motion. It reduces to the free diffusion at *H* = <sup>1</sup> 2 .

The directed Brownian motion, also known as the diffusion with drift, is the solution to

$$dX\_t^i = \upsilon\_i dt + \sigma dB\_t^{1/2, i}, \; i = 1, 2,\tag{3}$$

where *<sup>v</sup>* = (*v*1, *<sup>v</sup>*2) ∈ **<sup>R</sup>**<sup>2</sup> is the drift parameter and *<sup>σ</sup>* is again the scale parameter. For *<sup>v</sup>* = 0, it reduces to normal diffusion. For other choices of *v*, it generates superdiffusion related to an active transport of particles driven by molecular motors.

The Ornstein–Uhlenbeck process is often used as a model of a confined diffusion (a subclass of subdiffusion). It describes the movement of a particle inside a potential well and can be determined as the solution to the following stochastic differential equation:

$$dX\_t^i = -\lambda\_i (X\_t^i - \theta\_i)dt + \sigma dB\_t^{1/2, i}, \; i = 1, 2, \; \theta\_i \in \mathbb{R}. \tag{4}$$

The parameter *θ* = (*θ*1, *θ*2) is the long-term mean of the process (i.e., the equilibrium position of a particle), *λ* = (*λ*1, *λ*2) is the value of a mean-reverting speed and and *σ* is again the scale parameter. If there is no mean reversion effect, i.e., *λ<sup>i</sup>* = 0, OU reduces to normal diffusion.

#### **3. Methods and Used Data Sets**

In this paper, we discuss two feature-based classifiers: random forest (RF) and gradient boosting (GB) [58]. The term feature-based relates to the fact that the corresponding algorithms do not operate on raw trajectories of a process. Instead, for each trajectory a vector of human-engineered features is calculated and then used as input for the classifier. This approach for the diffusion mode classification has already been used in [41–43,45], but here, we propose a new set of features, which gives better results on synthetic data sets.

Both RF and GB are examples of ensemble methods, which combine multiple classifiers to obtain better predictive performance. They use decision trees [59] as base classifiers. A single decision tree is fairly simple to build. The original data set is split into smaller subsets based on values of a given feature. The process is recursively repeated until the resulting subsets are homogeneous (all samples from the same class) or further splitting does not improve the classification performance. A splitting feature for each step is chosen according to Gini impurity or information gain measures [58].

A single decision tree is popular among ML methods due to the ease of its interpretation. However, it has several drawbacks that disqualify it as a reliable classifier: it is sensitive to even small variations of data and prone to overfitting. Ensemble methods combining many decision trees help to overcome those drawbacks while maintaining most of the advantages of the trees. A multitude of independent decision trees is constructed by making use of the bagging idea with the random subspace method [60–62] to form a random forest. Their prediction is aggregated and the mode of the classes of the individual trees is taken as the final output. In contrast, the trees in gradient boosting are built in a stage-wise fashion. At every step, a new tree learns from mistakes committed by the ensemble. GB is usually expected to perform better than RF, but the latter one may be a better choice in case of noisy data.

In this work, we used implementations of RF and GB provided by the scikit-learn Python library [63]. The performance of the classifiers was evaluated with the common measures including accuracy, precision, recall, F1 score and confusion matrices (although the information given by those measures is to some extent redundant, we decided to use all of them due to their popularity). The accuracy is a percentage of correct predictions among all predictions, that is a general information about the performance of a classifier (reliable in case of the balanced data set). The precision and recall give us a bit more detailed information for each class. The precision is a ratio of the correct predictions to all predictions in that class (including the cases falsely assigned to this class). On the other hand, the recall (also called sensitivity or true positive rate) is the ratio of correct predictions of that class to all members of that class (including the ones that were falsely assigned to another class). The F1 score is a harmonic mean of precision and recall, resulting in high value only if both precision and recall are high. Finally, the confusion matrices show detailed results of classification: element *ci*,*<sup>j</sup>* of matrix C is the percentage of the observations from class *i* assigned to class *j* (a row presents actual class, while the column presents predicted class).

The Python codes for the data simulation, features calculation, models preparation and performance calculation are available at Zenodo (see Supplementary Materials).

#### *3.1. Features Used for Classification*

As already mentioned above, both ensemble methods require vectors of human-engineered features representing the trajectories as input. In some sense, those methods may be treated as a kind of extension to the statistical methods usually used for classification purposes. Instead of conducting a statistical testing procedure of diffusion based on one statistic, what is often the case, we can combine several statistics with each other bu turning them into features, which are then used to train a classifier. This could be of particular importance in situations, when single statistics yield results differing from each other (cf. [43]). It should be mentioned, however, that choosing the right features is a challenging task. For instance, we have already shown in [41] that classifiers trained with a popular set of features do not generalise well beyond the situations encoutered in the training set. Thus, great attention needs

to be paid to the choice of the input features to machine learning classifiers as well. They ought to cover all the important characteristics of the process, but at the same time, they should contain the minimal amount of unnecessary information, as each redundant piece of data causes noise in the classification or may lead to overfitting, for example (for a general discussion concerning a choice of features, see, for instance, [64]).

Based on the results in [41,43], we decided to use the following features in our analysis, hereinafter referred to as Set A:


$$\kappa(n\_1, n\_2) = \frac{\frac{1}{N - n\_1} \sum\_{i=1}^{N - n\_1} \left| X\_{i + n\_1} - X\_i \right|^2}{\frac{1}{N - n\_2} \sum\_{i=1}^{N - n\_2} \left| X\_{i + n\_2} - X\_i \right|^2} - \frac{n\_1}{n\_2} \kappa$$

where *n*<sup>1</sup> < *n*2. In this work, we set *n*<sup>2</sup> = *n*<sup>1</sup> + 1 and averaged the output over *n*1. In other words, we used (*n*<sup>1</sup> replaced by *n* for convenience):

$$\kappa = \frac{1}{N-1} \sum\_{n=1}^{N-1} \kappa(n, n+1). \tag{5}$$

• Efficiency, calculated as

$$E = \frac{|X\_{N-1} - X\_0|}{(N-1)\sum\_{i=1}^{N-1} |X\_i - X\_{i-1}|^2} \tag{6}$$

which measures the linearity of a trajectory.

• Straightness, a measure of the average direction change between subsequent steps, calculated as:

$$S = \frac{|X\_{N-1} - X\_0|}{(N-1)\sum\_{i=1}^{N-1} |X\_i - X\_{i-1}|}. \tag{7}$$

• The value of empirical velocity autocorrelation function [65] of lag 1 in point *n* = 1, that is

$$\chi = \frac{1}{N-2} \sum\_{i=1}^{N-2} \left( X\_{i+2} - X\_{i+1} \right) \cdot \left( X\_{i+1} - X\_i \right) \dots$$

• Maximal excursion, given by the formula

$$ME = \frac{\max(X\_{i+1} - X\_i)}{X\_{N-1} - X\_0}.\tag{8}$$

It is inspired by the mean maximal excursion (MME) [32], detecting the jumps that are long as compared to the overall displacement.

• The statistics based on *p*-variation [52]:

$$V\_m^{(p)} = \sum\_{i=0}^{N/m-1} |X\_{(i+1)m} - X\_{im}|^p.$$

The usefulness of this statistic to recognition of the fractional Lévy stable motion (including fractional Brownian motion) was shown in [52]. We introduce a quantity that verifies if for any *<sup>p</sup>* the function *<sup>V</sup>*(*p*) *<sup>m</sup>* of the variable *<sup>m</sup>* changes the monotonicity. We provide the information if

for the highest value of *<sup>p</sup>* such that *<sup>V</sup>*(*p*) *<sup>m</sup>* does change the monotonicity, it is convex or concave. In short, we analyse *<sup>V</sup>*(*p*) *<sup>m</sup>* as a function of *<sup>m</sup>* to provide one the following values:

$$P = \begin{cases} & 0 & \text{if it does not change the monotonicity,} \\ & 1 & \text{if it is convex for the highest } p \text{ for which it is not monomorphism,} \\ & -1 & \text{if it is concave for the highest } p \text{ for which it is not monomorphism.} \end{cases} \tag{9}$$

The first five features were already used in [41]. It should also be mentioned here that three of them are based on MSD curves. There is one important point to consider while calculating the curves, namely the maximum time lag. If not specified otherwise, we will use the lag equal to 10% of each trajectory's length. Since this choice is not obvious and may impact the classification performance, we will discuss the sensitivity of classifiers' accuracies to different choices of the lag in Section 4.5.

Apart from the set of features presented above, denoted Set A, we are going to analyse two other sets: the one used in [40,41], referred as Set B, and the one proposed in [43] (set C). The lists of features used in each set are given in Table 1 (for their exact definition, please see the mentioned references). Sets A and B have several features in common. The link between sets A and C is not so apparent, but the maximal excursion and *p*-variation-based statistics play in the description of trajectories a role similar to the standardised maximum distance and the exponent of power function fitted to *p*-variation, respectively.

Following [41], we consider four classifiers for each set of features: RF and GB classifiers built with the full set (labelled as "with *D*") and with a reduced one after the removal of the diffusion constant *D* ("no *D*").


**Table 1.** Features used for classification purposes in each of analysed sets.

#### *3.2. Synthetic Data*

Unlike the explicitly programmed methods, machine learning algorithms are not ready-made solutions for arbitrary data. Instead, an algorithm needs to be firstly fed with a reasonable amount of data (so-called training data) that should contain the main characteristics of the process under investigation in order to find and learn some hidden patterns. As the classifier is not able to extract any additional patterns from previously unseen samples after this stage, its performance is highly dependent on the quality of the training data. Hence, the training set needs to be complete in some sense.

First, we created our main data set, which will be referred to as the base data set for the remainder of this paper. It is analogous to the one used in [43]. We generated a number of 2D trajectories according to the three diffusion models described in Section 2, with no correlations between the coordinates. A single trajectory can be denoted as

$$X\_{\mathbb{N}} = \left(X\_{t\_0}, X\_{t\_1}, \dots, X\_{t\_N}\right), \tag{10}$$

where *Xti* = -*X*1 *ti* , *X*<sup>2</sup> *ti* ∈ **<sup>R</sup>**<sup>2</sup> is the position of the particle at time *ti* = *<sup>t</sup>*<sup>0</sup> + *<sup>i</sup>*Δ*t*, *<sup>i</sup>* = 0, 1, ... , *<sup>N</sup>*. We kept the lag Δ*t* between two consecutive observations constant.

The details of our simulations are summarised in Table 2. In total, 120,000 trajectories have been produced, 40,000 for each diffusion mode, in order to balance the data set. The length of the trajectories was randomly chosen from the range between 50 and 500 steps to mimic typical observations in experiments. We set *σ* = 1 μm s−1/2 and Δ*t* = 1 s.

**Table 2.** Characteristics of the simulated trajectories used to train the classifiers. For the base training set, the following values were used: *c* = 0.1, *σ* = 1 μm s−1/2 and Δ*t* = 1 s.


Since the normal diffusion can be generated by a particular choice of the models' parameters (*H* = 0.5 for FBM, *v* = 0 for DBM and *λ* = 0 for OU), it is almost indistinguishable from the anomalous diffusion generated with the parameters in the vicinity of those special values. The addition of the noise complicates the problem even more. Thus, following [43], we introduced a parameter *c* that defines a range in which a weak sub- or superdiffusion should be treated as a normal one. Although introduced here at a different level, it bears resemblance to the cutoff *c* used in [37].

Apart from the base data set, we are going to use several auxiliary ones to elaborate on different aspects of the feature choice. In Section 4.3, we will work with a training set, in which the trajectories from the base one are disturbed with a Gaussian noise to resemble experimental uncertainties. In Section 4.4, we will analyse the performance of classifiers trained on synthetic data generated with *σ* = 0.38, corresponding to the diffusion coefficient *D* = 0.0715 μm2 s<sup>−</sup>1, which is adequate for the analysis of real data samples. To study the sensitivity of the classifiers to the value of the cutoff *c* in Section 4.6, we will use three further sets with *c* = 0, *c* = 0.001 and *c* = 0.01. In Section 4.7, a synthetic set with *σ* = 2*D*, where *D* is drawn from the uniform distribution on [1, 9] will be used to check how the classifiers cope with the trajectories characterised by heterogeneous mobilities.

For all data sets, the training and testing subset were randomly selected with a 70%/30% ratio.

#### *3.3. Empirical Data*

To check how our classifiers work on unseen data, we will apply them to some real data. We decided to use the trajectories of G proteins and G-protein-coupled receptors already analysed in [37,43,66]. To avoid some issues related with short time series, we limited ourselves to trajectories with at least 50 steps only, obtaining 1037 G proteins' and 1218 receptors' trajectories. They are visualised in Figure 1.

**Figure 1.** Trajectories of the receptors (**left**) and G proteins (**right**) used as input for the classifiers. Different colors are introduced to indicate different trajectories. The set of the receptors contains 1218 trajectories and the one of G proteins—1037 trajectories. The lengths of the trajectories are from range [50, 401], the time step is equal to 28.4 ms and recorded positions are given in μm.

#### **4. Results**

The main goal of our work is a comparative analysis of classifiers trained using different sets of features (see Table 1 for their definition). The classifiers were trained and tested on our base data set and the auxiliary data sets, for comparison.

In order to optimise both classification algorithms, we looked for their hyperparameters using the RandomisedSearchCV method from scikit-learn library. It performs a search over values of hyperparameters generated from their distributions (in our case, discrete uniform ones). The term hyperparameter in this context means a parameter required for the construction of the classifier, which has to be set by a human expert before the learning process starts. In general, it influences the performance of the classifier, hence its choice is essential.

#### *4.1. Classification Results on Base Data Set Using Proposed Set of Features*

We start with the classifiers trained on the base set (see Table 2 for details). We trained four different classifiers: RF and GB for both the full set of attributes ("with D") and a reduced one ("no D"). Set A of features was used for representation of trajectories. The performance of these classifiers will be treated as a benchmark in our further analysis.

The hyperparameters of the classifiers are presented in Table 3 (for the detailed explanation of each of these parameters, please see [43,58]). It is worth noticing a difference in the ensemble sizes between the full set and the reduced one—in case of the gradient boosting, we observe a ninefold reduction of the number of trees. However, this difference does not reflect in the performance of the classifiers. Taking the number of features into account, the value of the max\_depth hyperparameter for RF with *D* is surprisingly high. It seems to be an artifact of the hyperparameter tuning procedure via random grid search. From our analysis (not included in this paper), it follows that this value can be set to 20 without a negative impact on accuracy. Nevertheless, we decided to keep the original result of the automatic hyperparameter tuning in order to treat all of the classifiers on the same footing. We should probably add that the largest tree in RF was 38 levels deep, despite such a high value of the maximum depth.

We begin the analysis of the classifiers by inspecting their accuracies. The results are shown in Table 4. As we can see, both classifiers perform excellently, with more than 95% of correct predictions for the test set. In the case of the training data, GB performs better than RF. However, RF is slightly more accurate on the test set, indicating a small tendency of GB to overfit.


**Table 3.** Hyperparameters of the optimal classifiers built on base data set with Set A of features. The full set of features is labelled as "with *D*". The "no *D*" columns stand for the reduced set of features after the removal of the diffusion coefficient *D*. N/A (i.e., "Not Applicable") indicates hyperparameters specific for random forest.

**Table 4.** Accuracy of the best classifiers trained on the base data set (see Table 2) with Set A of features. The "with *D*" and "no *D*" columns refer to the full and reduced (after removal of *D*) sets of features, respectively. The results are rounded to three decimal digits.


To explain the relatively small differences in the performance between the "with D" and "no D" versions of the classifiers, we may want to look at the importances of features. There are several ways to calculate those importances. We used a method which defines the importance as the decrease in accuracy after a random shuffling of values of one of the features. Results are given in Table 5. Just to recall, features with high importances are the drivers of the outcome. The last important ones might often be omitted, making the classification model faster to fit and predict. The results of the node impurity importances (the total decrease in node impurity caused by a given feature, averaged over all trees in the ensemble [67]) are similar.

**Table 5.** Permutation feature importances of the classifiers built on base data set with Set A of features. The "with *D*" and "no *D*" columns refer to the full and reduced (after removal of *D*) sets of features, respectively. The rows are sorted according to the decreasing importances for random forest with *D*. The most and least important features are indicated with bold or underlining, respectively.


It turns out that *D* is the least important feature for RF classifier trained on the full set and the third one with the smallest importance for GB classifier. That is why its removal has a small impact on the accuracy of prediction and why the classifiers trained on the reduced set of features with no *D* are worth considering—we expect them to work better on unseen data having diffusion coefficients different from the one used in the base set. Indeed, its removal does not change the performance of the classifier on the test set (see Table 4). Later in Section 4.7, we will show that in case of the training set with varying *D*, the situation is different: *D* will become more important and excluding it from the set will reduce the accuracy.

The most informative feature in all cases is the velocity autocorrelation function for lag *δ* = 1 at point *n* = 1. It is worth mentioning that this quantity has been already successfully used for the distinction of subdiffusion models [68], but not in the ML context. The anomalous exponent *α*, which is a standard method for the diffusion mode classification, is the second most important feature for all models, with a significant influence on the results. Thus, it seems that the classifiers distinguish between the models first and then assess the mode of diffusion.

To get more insight into the detailed performance of the classifiers, their normalised confusion matrices are shown in Figure 2. Please note that the percentages may not sum to 1.0 due to rounding. We see that all models have the biggest problems with the classification of normal diffusion. This is simply due to the fact that the differences between normal diffusion and realizations of weak sub- or superdiffusion are negligible and it is challenging to classify it properly even after introduction of the parameter *c* (the role of which will be studied in more detail in Section 4.6).

**Figure 2.** Normalised confusion matrices for classifiers built on base training data (see Table 2) with Set A of features. The "with *D*" (top row) and "no *D*" (bottom row) labels refer to the full and reduced (after removal of *D*) sets of features, respectively. All results are rounded to two decimal digits.

The values presented in Figure 2 may be used to calculate the other popular measures of performance: precision, recall and F1 score (see Section 3). The results, rounded to three decimal digits, are summarised in Table 6. Again, we see that the measures point to the highest error rate for the normal diffusion: for the random forest model with *D* as one of the features, only 92.9% of the trajectories classified as normal diffusion were in fact in this class (precision), whereas 94.4% of freely diffusing trajectories were correctly classified (recall). Such a high error rate is related to the mentioned lack of distinctions between the nodes—the normal diffusion is some kind of buffer between subdiffusion and superdiffusion, thus it can be incorrectly classified as one of these two.


**Table 6.** Precision, recall and F1 scores of the classifiers trained on base synthetic data with Set A of features. For each classifier, the testing set consists of 12,000 trajectories per diffusion mode—that is, 36,000 in total. All classifiers were built on base data set with Set A of features.

#### *4.2. Comparison with Other Sets of Features*

Below, we show the comparison of the classification results with all considered classifiers (based on three different set of features) on our base data set (Table 2).

In Table 7, the accuracies on the test set are shown, calculated using the tenfold cross-validation method [58]. As the calculation of the accuracy of the classifier is based on the single train/test split, in an unfortunate case, the test set can contain the data with characteristics that have not been seen by classifier during training, and thus the accuracy would be falsely low. The *k*-fold cross-validation is a technique that helps to reduce that bias. The data is randomly split into *k* folds (without replacement) and the model is trained and tested *k* times—each time one fold is the test set, whereas the remaining ones create the training set. The overall accuracy is the mean of the accuracies of each run. The hyperparameters of the particular models are summarised in Table 8 and they were established using the RandomisedSearchCV method again.

**Table 7.** Accuracy of the classifiers built on the base data set using different sets of features, measured using tenfold cross-validation method. All results are rounded to three decimal digits.


In the comparison of all these classifiers, the ones based on the set of features proposed in this article provide the best results on our base synthetic data set. Actually, the choice of features was inspired by two of our previous articles [41,43]. The new set combines the attributes used in those papers: it contains the anomalous exponent *α*, diffusion coefficient *D*, efficiency, straightness and mean squared displacement ratio that have been used in [41], and the normalised maximal excursion and *p*-variation-based features used in [43].

Nevertheless, we need to underline here that it does not mean that this set of features is the solution for all the classification problems—it simply seems to be the best choice for such synthetic data set. The lack of universality of feature-based methods was already presented in [41]: the classifiers did not generalise well to samples generated with slightly altered models.

**Table 8.** Hyperparameters of the optimal classifiers built on base data set used for the calculation of tenfold cross-validation accuracy in Table 7. The "with *D*" and "no *D*" columns refer to the full and reduced (after removal of *D*) sets of features, respectively. N/A stands for "Not Applicable" (the first two parameters are random forest specific). The definitions of the feature sets are given in Table 1.


To compare the performance of these models in more details, the values of recall, precision and F1 score are given in Table 9. For the sake of clarity, we only compare the random forest classifiers built on the complete features' sets (with the diffusion coefficient *D*). For the remaining cases, the behaviour is alike, except for the fact that all measures for classifiers with features as in Set B but without diffusion coefficient *D* are significantly lower than for other classifiers. We would like to underline here that the set of features proposed in Section 3.1 provides the best results in all measures used here. For all classifiers, the results for superdiffusion and subdiffusion are better than for normal diffusion class, what is understandable, as the only kind of error that occurs is the misclassification of anomalous diffusion trajectories as the normal diffusion. In case of normal diffusion, a part of misclassified trajectories is labelled as superdiffusion, and another part is labelled as subdiffusion.

**Table 9.** Detailed performance comparison of random forest classifiers based on three sets of features, built on the base data set. Metrics are calculated on the test data. All results are rounded to three decimal digits. For each classifier, the test set consists of 12,000 trajectories per diffusion mode—that is, 36,000 in total.


#### *4.3. Adding Noise*

The results on our base data set are promising, but, unfortunately, real data are more challenging to classify, as they usually contain some noise and/or measurement error. Thus, we added a random Gaussian noise with zero mean and standard deviation *σGn* to our trajectories. In order to control the noise amplitude with respect to standard deviation of a process, we followed the idea used in [40,41,43], namely setting a random signal-to-noise ratio instead of *σGn*. The signal-to-noise ratio is defined as

$$Q = \begin{cases} \begin{array}{cc} \frac{\sqrt{D\Delta t + v^2 \Delta t^2}}{\sigma\_{Gt}} & \text{for DBM,} \\ \frac{\sqrt{D\Delta t}}{\sigma\_{Gt}} & \text{otherwise,} \end{array} \end{cases} \tag{11}$$

where *v* = *v*2 <sup>1</sup> + *<sup>v</sup>*<sup>2</sup> <sup>2</sup>. The value of *σGn* was calculated for each trajectory separately, based on the random value of *Q* drawn from the uniform distribution on interval [1, 9].

The accuracies of the classifiers trained on the data set with noise are given in Table 10. It is worth comparing the results with Table 4—there is a decrease of the accuracy, especially in case of the reduced set of features ("no *D*"), but both methods still classify the diffusion modes well. Nevertheless, in this case, it turns out that the inclusion of the diffusion coefficient *D* as one of the features is important. Still, for our synthetic data set with noise, the features in Set A seem to describe the characteristics of the used processes most precisely.

**Table 10.** Performance of the classifiers trained on data with random Gaussian noise. Accuracies (for test data only) are rounded to three decimal digits.


#### *4.4. Empirical Data*

In order to present the methods in a practical context, we are going to apply the classifiers from Sections 4.1 and 4.3 to real G protein data (see Section 3.3). Additionaly, to follow the approach from [43], we will consider additional classifiers fed with the data set similar to the base one, but with *σ* = 0.38, since this value corresponds to the mean diffusion coefficient of the real data sample (*D* = 0.0715 μm2s−1). Accuracies of the additional classifiers are shown in Table 11. Interestingly, they are slightly better than the ones for the base set. It seems that the change of the scale parameter positively influenced the ranges of other characteristics, resulting in an increased accuracy (it worked as implicit feature engineering in the absence of data normalization).

**Table 11.** Performance of the classifiers trained on data with *σ* = 0.38. Accuracies (for test data only) are rounded to three decimal digits.


Before we start to analyse the results for real data, there are several points to consider. First, it should be emphasised once again that the data collected in experiments is not provable. Since the ground truth is missing, we cannot really choose the best among the classifiers. We just

could use some additional information about the G proteins in order to indicate if the classifiers work reasonably or not. Second, real trajectories are often heterogeneous, meaning that a particle may change its type of motion within a single trajectory [69]. Thus the classifiers fed with homogeneous synthetic data may be not the best choice to work with such data.

In Tables 12–14, we show the results of classification of real data with the base classifiers, the ones with the noise and the ones with *σ* = 0.38, respectively. In all three cases, we considered only the "with *D*" classifiers (for the justification, see Section 4.7). The results obtained with the classifiers trained on different data sets vary slightly, but they agree on a small percentage of superdiffusive trajectories. This is somehow expected from the biological background: during their movement, the G proteins and G-protein-coupled receptors pair, spending some amount of time immobilised. In the same time, there is no evidence of any other force that can accelerate the movement.


**Table 12.** Classification results for real trajectories. The base data set (*σ* = 1, no noise; see Section 4.1) with the full sets features (labelled as "with *D*" in the previous sections) was used for training. The numbers may not add up precisely to 100% due to rounding.

**Table 13.** Classification results for real trajectories. The noisy data set (*σ* = 1, see Section 4.3) with the full sets of features (labelled as "with *D*" in the previous sections) was used for training. The numbers may not add up precisely to 100% due to rounding.


On our base data set, the classifiers based on Set A label most of both G proteins' and G protein-coupled receptors' trajectories as subdiffusion (64–84%, depending on particle type and method). This is somewhat in between the results of classifiers based on Set B and Set C, where the former point to subdiffusion more frequently, while the latter apply only in 52–59% of cases.

Comparing the behaviour of the classifiers based on the different data sets used for training, we can see that the classifiers built on the Set C are the most stable in some sense—they yield similar results independently of the training data, indicating to a significant fraction of subdiffusive and freely diffusing trajectories. For the new proposed set of features, Set A, as well as for Set B, the introduction

of noise does not alter the classification significantly, but the decrease of the scale of the trajectories in data set (setting *σ* = 0.38) leads to recognition of more trajectories as the normal diffusion, similarly to the *p*-variation-based statistical test proposed in [37]. Alternately, the GB classifier based on Set B and scaled data set classifies a significant percentage of trajectories as superdiffusive, which is rather unexpected.

**Table 14.** Classification results for real trajectories. The data set with *σ* = 0.38 (no noise) and with the full sets of features was used for training. The numbers may not add up precisely to 100% due to rounding.


For the full picture, in Table 15, we also include the results for the classifiers built with the reduced Set A—that is, without diffusion coefficient *D* ("no *D*"). Following the results for the synthetic trajectories, where on the noisy data set the accuracy for the classifiers based on the reduced set of features is smaller (see Table 10), we acknowledge that the results on that data set can be biased. Indeed, such classifiers claim that most of the trajectories exhibit the normal diffusion, whereas the classifiers built on the base and the scaled data set classify them as subdiffusion.

**Table 15.** Classification results for real trajectories. The classifiers were trained with the reduced Set A (labelled as "no *D*"). The numbers may not add up precisely to 100% due to rounding.


To sum up, all the classifiers identify most trajectories as normal or subdiffusive, but the fraction of both diffusion modes varies between classifiers. The scaling of trajectories in the training data set has introduced significant changes in the results (please compare Tables 12 and 14), thus the properties of particular features should be further examined (for example, their normalisation). Moreover, in [69], the authors showed that the trajectories in the analysed data set change their character during the time evolution. Different features used in the classifiers probably capture slightly different characteristics of the trajectories; thus, the sensitivity of features for the heterogeneity of movement should be verified.

#### *4.5. Influence of MSD Calculation Methods*

Some of the features used in our set—that is, the diffusion coefficient *D*, the anomalous exponent *α* and the mean displacement ratio *κ*, are based on the time-averaged MSD. This quantity can be highly biased for large lags, as then only a few displacements are included in the calculation of the mean value. Alternately, if we choose to fit the diffusion coefficient or the anomalous exponent to only a few data points (to MSD calculated for a few lags only), the estimation could be biased. This is a known problem in the analysis of the biological data and has already been discussed in [26,70,71].

We have considered the influence of the number of lags on the accuracy of the classifiers and trained them on the base data set with the values of features calculated using 50% or 10% of available TAMSD length. In Table 16, the comparison of these accuracies on the test set is shown, using all three sets of features. For each set, only the "with *D*" variant has been considered. The better results are obtained with the shorter TAMSD curve, but the differences are only slight. Thus, we have set the 10% as the fixed value for all our considerations.

**Table 16.** Accuracies on test sets for the classifiers built with the features' sets with 10% or 50% of MSD curve length used for calculation of the MSD-based features. All results are rounded to three decimal digits.


#### *4.6. Sensitivity of the Model to Parameter C*

Up to this point, we used set of synthetic data generated with *c* = 0.1 (see Table 2 for the meaning of *c*). This parameter was used to define ranges, outside of which weak sub- or superdiffusion should be distinguished from the normal one. It is time to analyse the impact of *c* on the prediction performance of our classification models.

In Table 17, the accuracies on the test set of the particular classifiers are presented. The highest value of this metrics for *c* = 0.1 could suggest that it is is the best choice, but there is the other side of a coin—the highest *c* means that more trajectories in the data set were falsely labelled as normal diffusion on the data set simulation stage, despite the fact that they were generated from models with the parameters corresponding to the anomalous diffusion. In Table 18, the values of precision, recall and F1 are shown for the random forest classifier ("with *D*") trained on each of the analysed sets. Although the precision for the normal diffusion grows with the increasing value of *c*, there is a drop in the recall value between *c* = 0.01 and *c* = 0.1. Inversely, for both modes of anomalous diffusion, the precision drops when changing from *c* = 0.01 and *c* = 0.1. It means that we not only make a base mistake in labelling, falsely labelling some normal trajectories as anomalous ones at the data set generation stage (what is not visible here), but also setting too high value of *c* parameter adds some confusion.

The issue is visualised in Figure 3, where the histograms of predicted labels are shown (please mind the logarithmic scale on *y*-axis). The ranges defined by the parameter *C* are indicated with black dashed lines. All observations between the dashed lines were treated as normal diffusion by the classifiers (such label was assigned at the data set generation stage as ground truth). Although for *c* = 0.1 and all diffusion models, the major part of trajectories was classified correctly, the distribution of the normal diffusion label assigned is wider than, for example, *c* = 0.01, especially in the case of fractional Brownian motion. Thus, to diminish the error (understood as an incorrect label in comparison to real diffusion mode, not assigned ground truth label), a smaller value of *c* should be taken—for example, the mentioned *c* = 0.01.

**Figure 3.** The histograms of assigned labels for different diffusion models, as predicted for the test sets by classifiers built on data sets with different values of parameter *c* with Set A of features. Please mind the logarithmic scale on *y*-axis. The dashed lines bounds the regions for which the normal diffusion was assigned as ground truth despite the real character of trajectories.



#### *4.7. Role of Diffusion Coefficient D*

Finally, we move to the case in which parameter *σ* varies between trajectories. The data set for the classification was prepared according to Table 2, but each trajectory was characterised by a random *σ* value equal to <sup>√</sup>2*D*, where *<sup>D</sup>* was drawn from the uniform distribution on the interval [1, 9]. The same set of features was used and an additional regularisation was performed in the classifier training procedure.

The accuracy results for such classifiers are shown in Table 19. As one can see, the classifiers are still correct in more than 90% of cases and we can still consider them as useful. Interestingly, the changes in *D* have bigger influence to values than adding noise, introduced in Section 4.3. Thus, our classifiers work better in case of homogeneous environment with a constant diffusion coefficient, and as could be somehow expected, the difference between the classifiers with the diffusion coefficient *D* as a feature and the ones

without it is visible, in favour of the all features' set. Thus, there is no reason to consider the reduced set of features in future research.


**Table 18.** Precision, recall and F1 scores for classifiers trained on data with different values of the cutoff *c*. Set A of features was used. All results are rounded to three decimal digits. For each data set, the support of the testing set is 12,000 trajectories per diffusion mode, giving 36,000 in total.

**Table 19.** Performance of the best classifiers trained on the data set with varying diffusion coefficient *D* and Set A of features. Accuracies are rounded to three decimal digits.


In Figure 4, the confusion matrices of the analysed classifiers are shown. There is definitely more confusion between superdiffusion and free diffusion, in both directions, but still there is no misclassification between super- and subdiffusion (what would point to more serious problems with the classification). We think that these results can be even improved with the revision of the diffusion coefficient estimation method.

#### *4.8. Beyond Multi-Class Classification*

Up to this point, the classifiers were set to output only one among three available classes. However, both RF and GB classifiers are ensemble methods that determine the final output through voting of their base learners (decision trees). That voting can be exploited to provide probabilities of being assigned to each class. Their analysis can help in understanding the classifiers' behaviour and sources of misclassifications.

In Figure 5, ternary plots for both random forest and gradient boosting classifiers based on full Set A of features are shown. They complement the results shown in Table 4 and Figure 2. As we can see, the majority of the points is concentrated at the edges of the plots, corresponding to a situation with at most two non-vanishing class probabilities for given trajectories. The points located near the vertices depict the trajectories with one dominant class. There is much less of a burden in case of the gradient boosting classifier—the probability of assigning a trajectory to a finally claimed class is much higher and there are almost no trajectories with non-zero probabilities for all classes. This is clearly linked to the construction of both these classifiers. In random forest, each base classifier independently returns a predicted class and the final output is the most frequent class returned. Thus, the spread of the predictions can be high. In gradient boosting, the trees are constructed sequentially: each new one is supposed to correct the predictions of the ensemble and its results have a higher weight in the final aggregation. Thus, the final trees are having the greatest impact on the outcome and we expect GB to produce output with one dominant probability in most of the cases.

In Figure 6, predicted class probabilities for sample trajectories are shown, for random forest (left graph) and gradient boosting (right graph). Indeed, the gradient boosting classifier was more decisive, producing more univocal results, even if they were incorrect (please see the first trajectory from the top and the second trajectory form the bottom).

**Figure 4.** Normalised confusion matrices for classifiers built on training data with varying *D* and Set A of features. All results are rounded to two decimal digits.

**Figure 5.** Ternary plots of the class probabilities assigned to the testing data by the classifiers trained on the base data set with Set A of features.

Finally, we can verify the distribution of the class probabilities for our experimental data (see Section 3.3 and 4.4), where the ground truth for the diffusion type is not known. In Figure 7, the corresponding ternary plots for empirical data are presented, for random forest and gradient boosting classifiers (left and right column, respectively) and for both G-protein-coupled receptors and G proteins (top and bottom row, respectively). These graphs can clearly show us the trajectories for which the classifiers' decisions were the most vague—all points near the center of the triangle correspond to trajectories with significant probabilities of all of three diffusion types. Moreover, we can see that in case of random forest, the trajectories classified as superdiffusion had also a significant probability of being a normal diffusion, whereas the gradient boosting classifier undoubtedly returned high probability of them belonging to superdiffusion.

**Figure 6.** The class probabilities for exemplary trajectories from the testing set, based on the classifiers trained on the base data set and constructed with Set A of features.

In Figure 8, the predicted class probabilities for several interesting trajectories are shown, for both random forest (left graph) and gradient boosting (right graph). Again, the gradient boosting algorithm is more firm, but in cases of misclassification, it also claims the incorrect diffusion type with less doubt. Such an analysis of the classifiers decisions is a great starting point for further research—the output classifiers build on different data sets and with different sets of features can be examined in detail to find the exact source of a given prediction. That can also lead us to a reasonable model for the anomaly detection in the trajectories.

**Figure 7.** Ternary plots of the class probabilities assigned to empirical data by the classifiers trained on the base data set with Set A of features.

**Figure 8.** The class probabilities for exemplary trajectories from the empirical data set, based on the classifiers trained on the base data set and constructed with Set A of features.

#### **5. Conclusions**

In this paper, we presented a new set of features (referred to as Set A, see Table 1) for the two types of machine learning classifiers, random forest and gradient boosting, that on the synthetic data set gives good results, better than the set used previously in [43]. We have analysed the performance of our classifier trained and tested on the multiple versions of the synthetic data set, allowing us to assess its usefulness, flexibility and robustness. Moreover, we compared the proposed set with the ones already used in this problem, from [40,41,43]. Our set gives the best results in terms of the most common metrics.

Although the results on the synthetic data set are promising, we acknowledge the challenge with the application of the classifiers to real data. As discussed in [41], the classifiers trained on particular models for given diffusion modes do not generalise well. In Section 4.4, we show that even the classifiers with good accuracy return not clear result when used with the data of potentially different characteristics. To some extent, it can be improved by including more models in the training data set.

Thus, we would like to underline the importance of the features' selection for a given problem—even for the same task (e.g., diffusion mode classification), both models chosen for the training data generation and features chosen for their characterisation have a great influence on the performance of classifiers. Moreover, the assumptions made in constructions of the classifiers, such as hyperparameters' values or simply the choice of classifier type, are also highly important.

**Supplementary Materials:** Python codes for every stage of the classification procedure, together with a short documentation, are publicly available at Zenodo (https://doi.org/10.5281/zenodo.4317214).

**Author Contributions:** Conceptualization, H.L.-O. and J.S.; methodology, J.S.; software, H.L.-O.; validation, H.L.-O. and J.S.; investigation, H.L.-O. and J.S.; writing—original draft preparation, H.L.-O.; writing—review and editing, J.S.; supervision, J.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by NCN-DFG Beethoven Grant No. 2016/23/G/ST1/04083.

**Acknowledgments:** Calculations were carried out using resources provided by the Wroclaw Centre for Networking and Supercomputing (http://wcss.pl).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

DBM directed Brownian motion DL deep learning


#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Fractional Dynamics Identification via Intelligent Unpacking of the Sample Autocovariance Function by Neural Networks**

**Dawid Szarek 1, Grzegorz Sikora 1, Michał Balcerek 1, Ireneusz Jabło ´nski <sup>2</sup> and Agnieszka Wyłoma ´nska 1,\***


Received: 31 October 2020; Accepted: 18 November 2020; Published: 20 November 2020

**Abstract:** Many single-particle tracking data related to the motion in crowded environments exhibit anomalous diffusion behavior. This phenomenon can be described by different theoretical models. In this paper, fractional Brownian motion (FBM) was examined as the exemplary Gaussian process with fractional dynamics. The autocovariance function (ACVF) is a function that determines completely the Gaussian process. In the case of experimental data with anomalous dynamics, the main problem is first to recognize the type of anomaly and then to reconstruct properly the physical rules governing such a phenomenon. The challenge is to identify the process from short trajectory inputs. Various approaches to address this problem can be found in the literature, e.g., theoretical properties of the sample ACVF for a given process. This method is effective; however, it does not utilize all of the information contained in the sample ACVF for a given trajectory, i.e., only values of statistics for selected lags are used for identification. An evolution of this approach is proposed in this paper, where the process is determined based on the knowledge extracted from the ACVF. The designed method is intuitive and it uses information directly available in a new fashion. Moreover, the knowledge retrieval from the sample ACVF vector is enhanced with a learning-based scheme operating on the most informative subset of available lags, which is proven to be an effective encoder of the properties inherited in complex data. Finally, the robustness of the proposed algorithm for FBM is demonstrated with the use of Monte Carlo simulations.

**Keywords:** anomalous diffusion; fractional Brownian motion; estimation; autocovariance function; neural network; Monte Carlo simulations

#### **1. Introduction**

Many single-particle tracking data related to the motion in crowded environments exhibit anomalous diffusion behavior [1,2]. This behavior is also visible in various phenomena such as finance [3,4], ecology [5], hydrology [6], and biology [7], as well as meteorology and geophysics [8,9]. Anomalous diffusion behavior is manifested by deviations from the laws of Brownian motion (BM). One of the most common definitions of the anomalous diffusion process is expressed in the nonlinear behavior of its second moment:

$$\mathbb{E}\left[X^2(t)\right] \sim t^\mu,\tag{1}$$

*Entropy* **2020**, *22*, 1322; doi:10.3390/e22111322 www.mdpi.com/journal/entropy

where *α* is the so-called anomalous diffusion exponent. When *α* = 1, the process is classified as diffusion, while for *α* = 1, we call it anomalous diffusion. More precisely, for *α* < 1, it is called subdiffusion, while for *α* > 1, it is superdiffusion. It should be mentioned that anomalous diffusion can be also related to the non-Gaussian probability density function of the corresponding process; for instance, see [10–13].

The class of anomalous diffusion processes is very rich. The most classical anomalous diffusion models are fractional Brownian motion (FBM) [8,14], Lévy stable motion [15], continuous-time random walk [16,17], and the subordinated processes (also called time-changed processes) [18–25]. We also mention here the processes with time- or position-dependent diffusion coefficients such as scaled Brownian motion [26,27] or heterogeneous diffusion models [28], as well as the superstatistical process [29] or diffusing diffusivity models (also called Brownian yet non-Gaussian diffusion process) [30]. We also refer the readers to the articles [31–34] and the references therein.

In the case of experimental data with anomalous dynamics, the main problem is first to recognize the type of anomaly and then to reconstruct properly the physical rules governing such a phenomenon. The main challenge is to identify the process from short trajectory inputs. Various approaches to address this problem can be found in the literature; for instance, see [35–42]. One of the simplest and most efficient approaches is based on the theoretical properties of the sample autocovariance function (ACVF) [43]. It is known that the ACVF is the characteristic that determines completely the centered Gaussian process. Thus, its sample version is a proper tool for the testing and estimation of the parameters of this process. The approach presented in the literature [43] is effective; however, it does not utilize all of the information contained in the sample ACVF for a given trajectory, i.e., only values of statistics for selected lags are utilized. Therefore, an evolution of this approach is proposed in this paper.

Herein, we compared three approaches that apply the ACVF for estimation of the anomalous diffusion exponent. The first one, the so-called *naive* method, uses the ACVF in one specific lag to estimate the anomalous diffusion exponent of the given process. The second algorithm is based on the ACVF corresponding to the vector of the selected lags. The last technique is based on the sample ACVF information extracted with the most informative subset of available lags. The designed novel method in the third approach is intuitive and it uses information directly available in a new fashion. Information retrieval from the sample ACVF vector is performed here with a learning-based scheme operating on the most informative lags, i.e., a feedforward neural network (FNN) [44] is designed and applied for solving the regression task. This approach has been proven to be an effective encoder of the properties inherited in complex data [45–47]. The goal is to preliminarily assess (using computer simulations) the predictive properties of an FNN for the estimation of the anomalous diffusion exponent based on a short data set. This exercise provides evidence of the performance of the simple version of the neural network (NN) adapted for the defined regimes (anomalous diffusion) and the ACVF vector, which is a projection of a valid complex and real trajectory, e.g., for particle movement in a solution. The reported results can be further enhanced by the exploitation of adaptive mechanisms inserted into recurrent neural networks (RNNs) and/or the detailed and proper inferences of multiscale pattern(s) for deep learning [1–5]. The advantage of the application of an FNN to the task defined above is that a trained neural network model can be a robust and efficient estimator for anomalous diffusion exponents, e.g., complex relations hidden in ACVF data can be extracted within one-step, concluding in a trained feedforward neural network.

The robustness of the introduced algorithm based on the ACVF and NN methods in comparison to the known ACVF-based techniques is demonstrated herein for the exemplary Gaussian process using Monte Carlo simulations. We considered the FBM as an exemplary model with fractional dynamics; that is, the Gaussian process with stationary increments and the so-called self-similar property parametrized by the Hurst exponent *H* = 0.5*α*, where *α* is the anomalous diffusion exponent given in (1).

The main goal of the paper was to prove that the incorporation of intelligent-based algorithms into classical estimation schemes can shed new light on the investigation of the anomalous diffusion phenomenon. Moreover, the classical tools enhanced by artificial intelligence (AI) methods are more effective in comparison to the known statistical algorithms used for anomalous diffusion parametrization.

The rest of the paper is organized as follows: In Section 2, we outline the definition of the fractional Brownian motion and the exemplary Gaussian process with anomalous diffusion behavior. Next, in Section 3, we discuss two of the estimation methods for the Hurst exponent based on the ACVF that are commonly used in various applications. In the next section, we present a new approach for the estimation of the *H* index. Namely, in the new algorithm, we combined the ACVF and NN methods. To demonstrate the effectiveness of the new approach, in Section 5, we present a simulation study where we compare the three considered algorithms for the estimation of the Hurst exponent. The last section concludes the paper and presents a future study.

#### **2. Fractional Brownian Motion**

Fractional Brownian motion (FBM) {*XH*(*t*), *t* ≥ 0} with the Hurst index *H* ∈ (0, 1) is a continuous and centered Gaussian process defined through the following Langevin equation [14,48–50]:

$$\frac{dX\_H(t)}{dt} = D\mathfrak{J}\_H(t),\tag{2}$$

where the parameter *D* is the diffusion coefficient. In Equation (2), {*ξH*(*t*), *t* ≥ 0} is the fractional Gaussian noise process with the autocorrelation function satisfying the following:

$$\text{Corr}(\pounds\mu(0), \pounds\mu(t)) \sim 2H(2H-1)Dt^{2(H-1)}, \text{ t} \ge 0. \tag{3}$$

The FBM was introduced by Kolmogorov in 1940 (see [8]). FBM is the only Gaussian process with the self-similar property. Because the FBM is a centered Gaussian process, it can be also defined through the ACVF that, in this case, is given by [14]:

$$\mathbb{E}\left[X\_H(t)X\_H(s)\right] = \frac{1}{2}D\left(t^{2H} + s^{2H} - |t - s|^{2H}\right), \qquad \text{where } t, s \ge 0. \tag{4}$$

Thus, for the given *<sup>t</sup>* <sup>≥</sup> 0, *XH*(*t*) ∼ N 0, *Dt*2*H* . The FBM has stationary increments. Moreover, if *H* > 1/2, then the increments of the process are positively correlated, while for *H* < 1/2, they are negatively correlated. Moreover, for *H* > 1/2, the FBM exhibits the so-called long range dependence, which means that the following property is satisfied:

$$\sum\_{n=1}^{\infty} \mathbb{E}\left[X\_H(1)(X\_H(n+1) - X\_H(n))\right] = \infty. \tag{5}$$

As can be seen, for *H* = 1/2, the FBM reduces to the ordinary Brownian motion (BM). It should be mentioned that the FBM is considered one of the classical processes used to describe the anomalous diffusion phenomenon. Indeed, for *H* < 1/2, it exhibits subdiffusion behavior, while for *H* > 1/2, it shows superdiffusion behavior. To see the differences between the behavior of the trajectories corresponding to different anomalous types, in Figure 1, we demonstrate the exemplary trajectories of FBM for *H* = 0.3 (subdiffusion), *H* = 0.5 (diffusion), and *H* = 0.7 (superdiffusion).

**Figure 1.** Exemplary trajectories for *H* ∈ {0.3, 0.5, 0.7}.

#### **3. ACVF-Based Methods for the Estimation of the Hurst Exponent**

In this article, we consider three approaches that utilize the ACVF in the estimation of the Hurst parameter *H*. Here, we depict two approaches known from the literature. The last technique is described in detail in the next section.

The methods presented in this section are based on the sample version of the ACVF. Let us consider the trajectory of FBM, *X<sup>H</sup>* = {*XH*(1), *XH*(2), ··· , *XH*(*n*)}, and the corresponding sample of increments, *ξ<sup>H</sup>* = {*ξH*(1), *ξH*(2), ··· , *ξH*(*n*)}, where *ξH*(*t*) = *XH*(*t*) − *XH*(*t* − 1) for *t* = 1, 2, ... , *n*. The sample ACVF for *ξ<sup>H</sup>* is given by [51]:

$$
\hat{\gamma}\_{\sharp}(\tau) = \frac{1}{n} \sum\_{t=1}^{n-\tau} \zeta\_H(t) \zeta\_H(t+\tau), \qquad \tau = 0, 1, \ldots, n-1. \tag{6}
$$

The statistic *γ*ˆ *<sup>ξ</sup>* (*τ*) is a rescaled estimator of the theoretical autocovariance function for *γξ* (*τ*), corresponding to lag *τ*, where *γξ* (*τ*) = E (*ξH*(1)*ξH*(1 + *τ*)). One can easily show that the statistic (6) is a biased estimator of *γξ* (*τ*), namely:

$$\begin{split} \mathbb{E}\left[\gamma\_{\xi}^{\star}(\tau)\right] &= \frac{1}{n} \mathbb{E}\left[\sum\_{t=1}^{n-\tau} \xi\_{H}(t)\xi\_{H}(t+\tau)\right] = \\ &= \frac{n-\tau}{n} \mathbb{E}\left[\xi\_{H}(1)\xi\_{H}(1+\tau)\right] = \frac{D(n-\tau)}{2n} \left(|\tau+1|^{2H} + |\tau-1|^{2H} - 2|\tau|^{2H}\right) = \frac{n-\tau}{n}\gamma\_{\xi}(\tau). \end{split}$$

However, in our considerations, we used the biased version of the *γξ* (*τ*) estimator due to its lower variance in comparison to the unbiased one.

In the literature, a few possibilities of estimating the Hurst parameter *H* using the ACVF have been presented. In the simplest approach, we considered only the first lag *τ* = 1, and compared the statistic *γ*ˆ *<sup>ξ</sup>* (1) with the desirable theoretical value of *γξ* (1) = *<sup>D</sup>* <sup>2</sup> (22*<sup>H</sup>* − <sup>2</sup>). Thus, using the following relation:

$$
\hat{\gamma}\_{\vec{\xi}}(1) = \frac{D}{2} \left( 2^{2H} - 2 \right) \ .
$$

one can obtain the simple estimator of *H*:

$$
\hat{H} = \frac{1}{2} \log\_2 \left( \hat{\gamma}\_{\hat{\varsigma}}(1) \frac{2}{D} + 2 \right). \tag{7}
$$

As such an approach is simple, we refer to it as *naive* (the M1 method) in the following analyses. In this approach, the diffusion coefficient *D* is assumed to be known. However, in real applications, the *D* parameter can also be estimated as a sample variance of the vector *ξ*. As one can see, the ACVF for lag

*τ* = 1 includes the necessary information about the FBM process. Unfortunately, if the corresponding measurements are burdened by an additive error (i.e., some measurement noise), *γ*(1) changes accordingly, resulting in the need to consider more advanced techniques for the estimation of the *H* parameter; for instance, see [52,53] and the discussion therein.

In an alternative approach for estimating the *H* parameter, one can simultaneously use more lags *τ*. Thus, we can fit the function *τ* <sup>→</sup> *<sup>D</sup>* 2 |*τ* + 1| <sup>2</sup>*<sup>H</sup>* + |*<sup>τ</sup>* − <sup>1</sup>| <sup>2</sup>*<sup>H</sup>* − <sup>2</sup>|*τ*| 2*H* to the empirical ACVF *γ*ˆ *<sup>ξ</sup>* (*τ*) for the corresponding lags *τ* = 1, 2, . . . , *τmax* in the least squares sense, i.e., the estimator is calculated as follows:

$$\hat{H} = \arg\min\_{H \in \left(0, 1\right)} \sum\_{\tau=1}^{\tau\_{\text{max}}} \left[ \hat{\gamma}\_{\vec{s}}(\tau) - \frac{D}{2} \left( |\tau + 1|^{2H} + |\tau - 1|^{2H} - 2|\tau|^{2H} \right) \right]^2,\tag{8}$$

for some maximum lag *τmax*. Again, if the diffusion coefficient is unknown, we can estimate it by considering arg min over (*H*, *D*) ∈ (0, 1) × (0, ∞). This approach is unfortunately much more complex, as it requires nonlinear regression methods. In the further analysis, we refer to this approach as the M2 method.

#### **4. ACVF and NN-Based Methods for the Estimation of the Hurst Exponent**

As mentioned above, three methods for the estimation of the Hurst exponent *H* are compared in this article. All of the identification algorithms are based on the ACVF. The first (M1) and the second (M2) methods were described in the previous section. In this section, we present the algorithm based on the ACVF and NN methods, denoted as the M3 method in the simulation study.

One might expect that the two proposed methods provide the best estimation results when the data follows the "pure" theoretical model, i.e., FBM. In reality, this is rarely the case—often, the observed trajectories are biased by a measurement error and/or various interleaving processes cover the anomalous diffusion component in the acquired signal. A purely statistical approach, such as in the M1 and M2 methods, can bring about limitations in real-world data applications, whereas artificial intelligence has shown its potential to overcome parasitic conditions in data from numerous fields of applications [45,54–56]. This triggers applications of learning-based schemes that enable weakening of the initial assumptions related to data properties and model building, providing good estimators for a wide range of anomalous diffusion regimes (i.e., for a wide range of *H* values). The assumed architecture for the neural network model and the training process are crucial aspects for efficient data exploration in artificial intelligent schemes, with the latter being of particular importance for NN model performance [44]. As a consequence, the input data properties used for the NN model training condition the reliability of the observed outputs—i.e., the value of the estimated *H* and its uncertainty in the reported study.

In the previous section, we assumed that the relationship between the data and the estimated parameter is known and given by Equations (7) and (8) for the M1 and M2 methods, respectively. Now, we propose to use a feedforward neural network as the predictor of this theoretical relationship, i.e., <sup>E</sup>[*H*|{*γ*<sup>ˆ</sup> *<sup>ξ</sup>* (*τ*)}]. To be more precise, the FNN is proposed as the model of the hidden relationships in the experimental data. The last one means that obtaining a formal expression for the rules governing the phenomena in a real-world system is not the main subject of interest here, but encoded in the FNN model, these rules are used to enhance the reliability of the Hurst exponent estimation from the data that correspond to the FBM, according to our assumption. It is worth noting that the modeling of <sup>E</sup>[*H*|*XH*] or <sup>E</sup>[*H*|*ξH*] is also possible with the NN-based approach; however, more sophisticated NN topologies are required to reconstruct the long dependency valid for the FBM model (for *H* 0.5). Thus, dealing with long and varying input vector lengths is required to realize this task; models based on RNNs, long short-term memory (LSTM) neural networks [57], or other forms of intelligent recurrence should be used, which implies higher requirements regarding their training [58].

The information about the underlying process is concentrated in the first couple of lags of the sample ACVF. In this paper, 32 lags (including the 0th lag) were used as the input for the Hurst exponent estimation. In the next section, it is shown that this amount was sufficient for our study.

The proposed architecture of the neural network model consists of three hidden layers of consecutive sizes of 64, 64, and 32. The Swish-1 [59] activation function, defined as:

$$\text{Swish-}\beta(\mathbf{x}) := \mathbf{x} \times \sigma(\beta \mathbf{x}) = \frac{\mathbf{x}}{1 + e^{-\beta \mathbf{x}'}} \tag{9}$$

was used for the neurons in each layer—in many applications, this expression outperforms the other activation functions [59].

In the designed FNN predictor, the size of the first layer was conditioned by the number of lags corresponding to the ACVF used during the experiment (i.e., 32 neurons were inserted into the first layer) and the output layer consisted of one neuron, which produced the *H* estimator as the model response. Since the estimated value of the Hurst exponent was within the range of *H* ∈ (0, 1) and the FNN model designed for its estimation can produce all real numbers as the output, post-processing transformation needed to be applied to the response of the output neuron. Here, the sigmoid function, defined as:

$$
\sigma(\mathbf{x}) := \frac{1}{1 + \varepsilon^{-\mathbf{x}}},
\tag{10}
$$

was used. The function given in (10) projects all real numbers to the interval (0, 1).

To boost the FNN training process, the Adam optimization algorithm [60] was applied as it quickly converges to a minimum [61–64]. The mean squared error (MSE) was used to quantify the prediction error of the FNN model.

#### **5. Simulation Study**

The efficiency of the three methods (M1, M2, and M3) designed for *H* estimation is demonstrated in this section using computer simulations. Cholesky decomposition [65] was used for the generation of the FBM trajectories, since it allows to simulate outputs with extreme *H* parameter values (unlike Davies–Harte [66], which can fail to generate small samples for *H* parameters close to 1). Regarding the practical usefulness of the designed procedures, their efficiency can be expressed in the context of the length of the input trajectory required to estimate *H* with expected reliability. This study was performed for trajectories of various lengths, and the results are reported below.

The two statistical methods described in Section 3 were ready to use, whereas the designed feedforward neural network needed to be trained in advance, for which training and validation datasets were prepared.

The training dataset was formed with 1,572,64 FBM trajectories generated during computer simulations. This set consisted of vectors of different lengths (from 32 up to 1024) and referred to different *H* parameter values. For every trajectory length, *N* ∈ {32, 64, 128, 256, 512, 1024}, 262,144 trajectories were generated using computer simulations (262,144 × 6 = 1,572,864), each with the *H* parameter selected randomly from the uniform distribution U(0, 1). Next, for each trajectory *ξH*, the ACVF (biased estimator (6), as introduced earlier) was calculated, resulting in a set of ACVFs of lengths *N* (*N* lags—*τ* ∈ {0, 1, . . . , *N* − 1}) with corresponding *H* parameters. Using the same procedure, the validation and test subsets were generated, each of a size of 196,608 (32,768 × 6 = 196,608).

The length of the ACVF vectors for the NN training was limited to 32 first lags (namely, *τ* ∈ {0, 1, ... , 31}). This selection was preceded by the calculation of MAE prediction error, such as in Figure 2—more input samples do not decrease the error, whereas a smaller input size increases the MAE value.

*Entropy* **2020**, *22*, 1322

**Figure 2.** The MAE calculated for the M3 method when a different number of lags is used as the feedforward neural network (FNN) input (determining the size of the input layer of the FNN), depending also on the selected quantile; the number of lags (32) selected for use in the paper are marked with a red line.

To train the FNN (the architecture of the FNN as described in Section 4), input data were gathered into batches of 64 (the number of training examples used to calculate weight updates). The total number of NN parameters was 8385, trained for 13 epochs (the number of times that the training algorithm operated on the entire training dataset), for a total of 93 s × 13 epochs ≈ 20 min. The number of epochs was selected dynamically using the early stopping method [67]. Since the model did not improve the prediction error significantly after the third epoch, the training procedure could be stopped then (then, the training time would be 5 min). Calculations were performed on a PC with Intel Core i7 (3.7 GHz, 6 cores, 12 threads) and RAM of 64 GB.

The test dataset was used to compare the M1, M2, and M3 methods—for M1, the first lag was used (*τ* = 1); for M2, the set of lags was *τ* ∈ {0, 1, ... , *τmax*} (if not stated differently, *τmax* = 31); M3 always used 32 lags (*τ* ∈ {0, 1, ... , 31}). The metrics used were the absolute error and (for the aggregated results) the mean absolute error.

Figure 3 provides a comparison of the MAEs for the three methods (i.e., M1, M2, and M3) when dealing with trajectories of different lengths and with the diffusion characteristic (*H* parameter), grouped by the true parameter *H* into the bins [0–0.2), [0.2–0.4), [0.4–0.6), [0.6–0.8), and [0.8–1).

**Figure 3.** MAE heatmap for the M1, M2, and M3 methods depending on the length of the input trajectory and the value of the Hurst exponent applied during the computer simulations.

The conclusion is that the NN-based approach (M3) is more efficient than the other schemes studied in the paper, as it did not need long trajectories to estimate the Hurst exponent *H* with a minimized/minor error. For *H* close to 0 or 1, there was a significant difference in the MAEs between the considered methods—although the NN approach achieved a similar level of error to the other methods when fed samples of a size of 64 for the estimation, M1 and M2 struggled to deliver equivalent performance for longer inputs, i.e., up to 16-times longer trajectories, as was shown during the computer simulations. It is worth noting that the estimation error for the Hurst exponent in the diffusion case (i.e., *H* ≈ 0.5) was similar for all of the considered approaches. In summary, since M1 estimated *H* reliably when using only the first lag, which was also true for the other methods, it was possible to distinguish between normal and anomalous diffusion using the information contained in the first lag or the first few lags.

To further compare these methods, the distribution of the (absolute) prediction error is shown in Figure 4. A logarithmic scale was applied to the y-axis in the boxplots to distinguish the performance of the following algorithms. The figure is divided into six parts, each reporting on the performance of the M1, M2, and M3 methods. In this way, it was possible to analyze how the distributions of the prediction error varied for each of these methods, the different lengths of the input trajectories, and also the different diffusion types (i.e., superdiffusion and subdiffusion); similarly to Figure 3, the input data were grouped evenly into five bins in reference to the true value of the *H* parameter.

Figure 4 depicts the spread of the estimation error and also reports on the number of outliers. The obtained results prove that the NN-based algorithm can more reliably estimate the *H* value for its following ranges (thus regimes of anomalous diffusion behavior) than in the two classical (M1 and M2) methods.

When it comes to the analysis of the multidimensional spread of the observed distributions, M3 performed similarly to the other methods. However, it is worth noting that M1 could not be used to obtain reliable results for some anomalous diffusion regimes, i.e., in several cases in Figure 4, the prediction

errors were over 0.5 (which means that M1 could not distinguish between sub- and superdiffusion in these cases). There were also some cases in the presented study when all of the methods working with the smallest considered sample size (32) struggled to distinguish between subdiffusion and superdiffusion. Nonetheless, M3 applied to two-times longer trajectories (i.e., 64 samples inserted as the input) performed reliably and efficiently through the whole range of *H* values.

**Figure 4.** Absolute error calculated for the M1, M2, and M3 methods when fed with simulated input trajectories of different lengths and representing various modes of anomalous diffusion regimes (encoded with the *H* value).

It would be advantageous here to understand what makes M3 outperform the M2 algorithm. The most straightforward explanation relies on the fact that the input information contributes to outputs in various ways in M2 and M3. Namely, although M2 and M3 use the same input information, M2 explores each lag with the same importance, which is not the case for the FNN. The neural network-based method can learn inhomogeneous relationships contained in historical data, focusing on a specific subset of lags and leaving others with less influence on the observed output. This is exactly how the intelligent unpacking mechanism works in the M3 method.

This phenomenon can be clearly observed in Figure 5, where the MAE is compared for the methods working with trajectories of different lengths, for different *H* parameter values, and various numbers of lags used during the calculations. In the case of the M1 method, the results are presented only for *τ* = 1, as there was no possibility to expand or shrink the set of lags here. M2 used lags up to *τmax* for the Hurst exponent estimation, i.e., (*τ* ∈ {0, 1, ... , *τmax*}) with *τmax* ∈ {2, 3, ... , 31}. The number of used lags could not be reduced for the M3 method since it required exactly 32 lags. To overcome this problem, a selected number of the first values of the calculated lags for the ACVF (this number is later referred to as non-zero lags) were extracted. The remaining lags were reset to zero during the calculations. This means that the relationships valid for the raw data were cut at the level of some lags, resembling the diffusion case *H* = 0.5 (i.e., there is no inter-dependency and the ACVF is equal to 0). All of these contributed to weakening the anomalous behavior (filling further lags with zero diminished/removed the long-range dependencies); thus, the reliability of the used methods for prolonged input data sequences could be improved, especially in the case of the M3 algorithm (see Figures 3 and 4).

**Figure 5.** MAE heatmap calculated for the M1, M2, and M3 methods depending on the length of the input trajectory, *τmax*, and for the following ranges of Hurst exponent values.

Figure 5 proves that using at least 20 lags was sufficient to minimize the error of *H* estimation in M3. However, the prediction error imperceptibly decreased for M2 with the addition of consecutive lags to the

input. One of the explanations for this observation might be that the magnitude of the contribution to the observed outputs (here, the value of the Hurst exponent) was smaller for the higher-order lags than for the first few lags. This means that the feedforward neural network more efficiently unpacked the information about the anomalous diffusion process than the M1 and M2 statistical schemes.

#### **6. Summary and Conclusions**

Anomalous diffusion is a complex phenomenon observed in physical systems. This complexity is inherited in recorded data, which, in practice, can be additionally corrupted by measurement noise. Moreover, anomalous diffusion components can emerge from a bunch of other regimes manifested in the observed system (and thus, in recorded data). Unpacking and disentangling the information contained in such data is a challenge, especially because complex interrelations are typically encoded in a small number of data samples. The statistical modeling of anomalous dynamics is quite well established and is concerned with the approximation of nontrivial patterns by the anomalous diffusion exponent. The problem is that for complex systems and the limited a priori knowledge available, the statistical inference can bring about non-robust estimation, especially when noisy components occur in data.

In recent years, there has been increasing interest in the use of AI methods of data exploration in various areas of science and engineering, especially when dealing with complex systems and a limited amount of preliminary information. In this paper, we proposed the application of a simple feedforward neural network to the quantification of anomalous dynamics. Namely, we compared this AI-based approach with statistical modeling, which allowed us to conclude that a combination of these two data exploration methods can decrease the estimation error of the anomalous diffusion exponent.

Herein, we considered the FBM, as it is one of the classical Gaussian processes that can be used for the description of anomalous diffusion behavior. Moreover, the FBM for special cases reduces to the classical BM, and thus is also useful for the analysis of diffusion processes.

Our approach is based on the ACVF, which completely describes the zero-mean Gaussian process. Thus, we selected the sample version of this statistic as a base for the estimation methodology. Via a simulation study, we proved that the classical approaches that utilize the sample ACVF are effective; however, when the process under consideration becomes anomalous diffusion, then their efficiency decreases. Thus, there is a need to incorporate more advanced techniques. In this article, we addressed such problems and proposed a simple modification of the classical ACVF-based approaches through the inclusion of NN-based methodology. Our simulations showed that the introduced estimation method outperformed the other considered approaches, especially for short trajectory lengths. This message is crucial for practical applications, where the real trajectories may be relatively short. In this case, the method based on a combination of ACVF and NN techniques seems to be more effective in contrast to the classical algorithms. It should be highlighted that although the presented approach in this paper is based on the ACVF, this methodology can be applied to any other statistic that is crucial for the estimation of anomalous diffusion processes, e.g., mean squared displacement, p-variation, and ergodicity breaking parameter. Moreover, the potential applicability of the described approach is much wider than the anomalous diffusion field. It could bring new insight into studies of physical systems with various properties reflected and decoded in the corresponding statistical quantities, i.e., long-range dependence, ergodicity breaking, or self-similarity. Additionally, the FBM was considered here only as an exemplary anomalous diffusion process, and the presented approach can be used for any other models with anomalous diffusion behavior. The designed method can also also used not only for estimation purposes, but also for the classification of the anomalous diffusion model governing physical phenomena. It could be directly achieved by the NN learning various ACVF formulas that uniquely define different Gaussian processes. Finally, outside of the Gaussian world, other appropriate statistics that evidently and precisely detect proper non-Gaussian

models can be effectively brought into the identification process. Thus, the introduced approach is universal in many areas and can be extended in various directions.

The combination of known statistical algorithms with the deep learning methodology is not a new idea in the area of anomalous diffusion phenomena analysis. Many authors have recognized that simple statistical methods in some cases seem to be inefficient for the proper identification or parametrization of the anomalous diffusion model, especially for recordings with short-length trajectories. In recent years, in the literature regarding anomalous diffusion processes, one can find the application of intelligent methods that enhance the classical approaches. In the physical sciences, deep learning methods have ound very interesting applications. We mention here the new approaches based on the artificial NN algorithms (i.e., [68–73]) or the general machine learning methods applied to fractional dynamics analysis (i.e., [74–77]). However, to the best of our knowledge, a combination of ACVF- and NN-based methods has not been presented in the context of anomalous diffusion analysis.

In a future study, we plan to analyze the influence of additive noise on the estimation results for a model disturbed by external force. The same problem has been considered previously (e.g., in [53]), where the effectiveness of the time-averaged mean square displacement-based approach for the *H* parameter estimation was analyzed for the FBM with additive noise. Another future study will be related to the combination of the other time-averaged statistics [78] and deep learning methodology for the estimation problem in the anomalous diffusion regime.

**Author Contributions:** Conceptualization, G.S., I.J., A.W.; methodology, D.S., M.B., I.J., A.W.; software, D.S., G.S., M.B.; writing-original draft preparation, D.S., G.S., M.B., I.J., A.W.; writing-review and editing, D.S., G.S., M.B., I.J., A.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**





**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Extracting Work Optimally with Imprecise Measurements**

**Luis Dinis 1,2,\* and Juan Manuel Rodríguez Parrondo 1,2**


**Abstract:** Measurement and feedback allows for an external agent to extract work from a system in contact with a single thermal bath. The maximum amount of work that can be extracted in a single measurement and the corresponding feedback loop is given by the information that is acquired via the measurement, a result that manifests the close relation between information theory and stochastic thermodynamics. In this paper, we show how to reversibly confine a Brownian particle in an optical tweezer potential and then extract the corresponding increase of the free energy as work. By repeatedly tracking the position of the particle and modifying the potential accordingly, we can extract work optimally, even with a high degree of inaccuracy in the measurements.

**Keywords:** confinement; information theory; Brownian particle; stochastic thermodynamics

#### **1. Introduction**

Modern techniques have allowed for the manipulation of objects at the microscale. A paradigmatic example are colloidal particles trapped by optical tweezers. At this scale the scale of Brownian motion—not only the motion of particles, but the energy fluxes, work, or heat, become stochastic. Nevertheless, the combination of manipulation and imaging or other detection techniques allow for some degree of control [1]. For instance, in driven systems, the external driving may be modified based on outcomes of measurements, as in feedback control, leading, for example, to (efficient) confinement in small region of space [2] or to the reduction of thermal fluctuations, i.e., cooling, a technique that is implemented in both classical or quantum systems [3,4]. Another application of feedback is an increase of the performance of certain motors operating at the microscale, such as Brownian ratchets or micro-motors [5–9].

Feedback exploits the information that is acquired through measurement as a thermodynamic resource. It is now known that the work needed to perform an isothermal feedback process, for a system in contact with an environment at constant temperature *T*, is bounded by the following extension of the second law of thermodynamics [9,10]:

$$\mathcal{W} \ge \Delta F - kTI\_\prime \tag{1}$$

where Δ*F* is the free energy difference between the final and initial states of the process, *k* the Boltzmann's constant, and *I* is the amount of information that is gained in the measurement, quantified by the mutual information from information theory. Information is always positive (or zero) and, thus, in a cycle (Δ*F* = 0) it is possible to extract work (*W* < 0) from a single thermal bath with measurement and feedback.

Equation (1) also shows that a given level of accuracy in the measurement, quantified by the mutual information, limits the amount of work that can be extracted in one feedback operation. Some especially tailored protocols saturate that bound (1) and they may be used to convert all of the information acquired into useful work. These are processes that are reversible under feedback [11–13]. In this article, we first review these protocols and show

**Citation:** Dinis, L.; Parrondo, J.M.R. Extracting Work Optimally with Imprecise Measurements. *Entropy* **2021**, *23*, 8. https://dx.doi.org/10.3390/ e23010008

Received: 30 October 2020 Accepted: 19 December 2020 Published: 23 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

how to use their special properties in order to extract energy with the same efficiency and even power when operating with higher measurement errors. In order to fix ideas, we use a well known model that we proceed to describe in the following section.

#### **2. Model Description and Cycle Operation**

As our system, we consider an overdamped Brownian particle that is in contact with a thermal bath, which acts as its environment. The particle feels a harmonic potential. This is a well proven theoretical model for an experimental system that was formed by a colloidal particle in water at constant temperature and trapped by optical tweezers. The potential *Vκ*,*x*<sup>0</sup> = <sup>1</sup> <sup>2</sup> *<sup>κ</sup>*(*<sup>x</sup>* − *<sup>x</sup>*0)<sup>2</sup> has tunable parameters *<sup>x</sup>*0, the position of the center of the trap, and *κ*, the stiffness. As the Brownian particle position fluctuates, the energy transfers and thermodynamic potentials become also fluctuating; in fact, they are stochastic variables. The framework to analyze energetics for these fluctuating systems in the mesoscale is stochastic thermodynamics [14–16] .We review the main concepts in the following. The Brownian particle may, due to a collision with the solvent, absorb some energy and climb the potential well. Or it may transfer energy back to the thermal bath via viscosity and go down in the potential. These energy transfers with the thermal bath constitute heat *Q*ˆ and, and since this energy can be stored as potential energy, this is the internal energy *E*ˆ of the particle. In our system then the internal energy is *E*ˆ = *Vκ*,*x*<sup>0</sup> = <sup>1</sup> <sup>2</sup> *<sup>κ</sup>*(*<sup>x</sup>* − *<sup>x</sup>*0)<sup>2</sup> [17–20]. We will use *a*ˆ to denote a stochastic variable and the regular letter *a* for the average over realizations, i.e., *<sup>E</sup>* <sup>=</sup> *E*ˆ. Another form of energy transfer is work *<sup>W</sup>*<sup>ˆ</sup> : an external agent may modify the harmonic potential (changing the parameters) and increase or decrease the potential energy of the particle. If the internal energy depends on a parameter *λ* that is modified from *λ*<sup>0</sup> to *λ<sup>f</sup>* then, formally, the definition of work is:

$$
\mathcal{W} = \int\_{\lambda\_0}^{\lambda\_f} \frac{\partial E}{\partial \lambda} d\lambda \tag{2}
$$

This is best seen with an example. For instance, consider a fast increase of the stiffness of the potential from *ki* to *k <sup>f</sup>* . If the increase is very fast, so that the particle does not modify its position *x* during the time, the stiffness is changing, the energy of the particle increases by an amount Δ*V* = <sup>1</sup> <sup>2</sup> (*<sup>x</sup>* − *<sup>x</sup>*0)2(*<sup>k</sup> <sup>f</sup>* − *ki*). This energy is supplied by the agent controlling the potential who has then performed a work *W*ˆ = Δ*V* > 0. Consequently, the particle is in a tighter parabola and the equilibrium dispersion of the position of the particle decreases, so that this is commonly referred to as a compression. If the stiffness is decreased, work is exerted on the agent by the system and, since the distribution of particle positions will eventually widen, this corresponds to an expansion.

With these definitions, energy is conserved and the first law is fulfilled either at the level of trajectories Δˆ *E* = *Q*ˆ + *W*ˆ or as averages *E* = *Q* + *W* [14–20].

In order to extract energy from the thermal bath, we propose the following cyclic operation in two stages:

1. Confinement of the Brownian particle by (repeated) measuring and feedback

#### 2. Isothermal expansion

The system works as a motor if the work obtained in the isothermal expansion exceeds the work that is needed for confinement. During a compression, the free energy of the system increases (due to the entropy decrease). Using reversible feedback confinement [2], we can minimize the work that is needed for stage 1, which turns out to vanish, and extract all of the free energy increase of stage 1 as work during stage 2. Let us analyze each stage in more detail.

#### *2.1. Optimal Confinement*

The confinement of a system to a small region of the phase space (at constant temperature) implies a decrease of entropy of the system. For the entropy of a Brownian particle, we use the standard choice of Shannon's entropy, *S* = −*k ρ*(*x*)log(*x*)*dx*, where *ρ*(*x*) is the probability distribution of the particle position. With this choice, the second law of thermodynamics is fulfilled on average and the thermodynamic relation *F* = *E* − *TS* is recovered for a system in contact with a thermal bath. Although, strictly speaking, this is a generalization of the free energy to non-equilibrium systems, in systems that are in contact to a thermal bath it plays a similar role as the standard thermodynamic free energy, and stochastic thermodynamics for our system closely resembles macroscopic thermodynamics [9].

Let us consider, for simplicity, that the internal energy change between the initial and final states of the confinement process vanishes (we will see later that this is the case in our particular system). A reduction of entropy then corresponds to an increase of free energy Δ*F* = Δ*E* − *T*Δ*S*. This increase in free energy could then be extracted as work in an isothermal expansion. However, the whole process cannot operate as a motor, as this will defeat the second law (extracting work from a single thermal bath). Indeed, the second law states for the confinement

$$W\_1 \ge \Delta F\_1 \text{ (w/o. feedback)}\tag{3}$$

and then for the isothermal expansion back to the initial state (Δ*F*<sup>2</sup> = −Δ*F*1)

$$
\Delta W\_2 \ge \Delta F\_2 = -\Delta F\_1 \text{ (w/o. feedback)}\tag{4}
$$

so that *W*total = *W*<sup>1</sup> + *W*<sup>2</sup> ≥ 0 and the system dissipates energy into the thermal bath.

However, as explained above, when measuring and feeding back to the system, *W* is bounded by (1) instead. Thus, the work that is needed for the confinement may be reduced and the work output of the cyclic process (Δ*F*cycle = 0) may be negative:

$$\mathcal{W}\_{\text{total}} \ge -kTI \tag{5}$$

Notice that mutual information is always a positive quantity.

Following [2], we propose a reversible feedback confinement that can confine the particle with *W*<sup>1</sup> = 0 and, as will be shown later (see Equation (13)), without dissipating heat to the thermal bath, so that the increase in free energy that is produced by the confinement can later be completely recovered as work during a quasistatic expansion in stage 2.

For a system that is in contact with a thermal bath, a feedback process is reversible if the Hamiltonian is modified after the measurement, so that probability of the state of the system conditioned on the measurement outcome is the Gibbsian state of the new Hamiltonian. After a measurement, the probability to find a given state changes instantaneously, the new probability distribution takes into account the information obtained, and ut must be updated according to Bayesian inference. If the Hamiltonian also changes rapidly and the Gibbs state of the new Hamiltonian matches the posterior probability distribution, the system remains at equilibrium and no further evolution of the probability distribution ensues until a new measurement is taken.

In our model, we take the common assumption of Gaussian measurement errors. If the particle is located at a position *x*, then the measurement outcome *m* is Gaussian distributed around *x* and the dispersion *σ<sup>m</sup>* quantifies the measurement error:

$$q(m|\mathbf{x}) = \frac{1}{\sqrt{2\pi\sigma\_m^2}} e^{-(m-\mathbf{x})^2/2\sigma\_m^2} \tag{6}$$

After a measurement, the probability distribution of the position of the particle updates according to Bayes' theorem from the initial distribution *ρ*:

$$\rho'(\mathbf{x}|m) = \frac{\rho(\mathbf{x})q(m|\mathbf{x})}{\pi(m)}\tag{7}$$

where *<sup>π</sup>*(*m*) = *dxq*(*m*|*x*)*ρ*(*x*) is the marginal distribution of the measurement outcome.

For a Brownian particle in a time-independent potential, the equilibrium distribution is its corresponding Gibbs distribution:

$$
\rho(\mathbf{x}) \propto e^{-V(\mathbf{x})/kT} \tag{8}
$$

In a harmonic potential, it is a Gaussian, centered in the trap position *x*<sup>0</sup> and with variance being given by *σ*<sup>2</sup> = *kT*/*κ*. It can be shown [7] that, after a measurement, the new distribution that is computed according to (7) remains Gaussian. If the initial distribution has mean *x*¯ and standard deviation *σ*, after a measurement, then the distribution updates to a Gaussian with the mean and deviation given by [2]:

$$\mathfrak{x}'(m) = \frac{\sigma\_m^2}{\sigma^2 + \sigma\_m^2} \mathfrak{x} + \frac{\sigma^2}{\sigma^2 + \sigma\_m^2} m \tag{9}$$

$$\frac{1}{\sigma'^2} = \frac{1}{\sigma^2} + \frac{1}{\sigma\_{\rm m}^2} \tag{10}$$

We can make the post-measurement distribution an equilibrium distribution by setting a new center of the trap position *x* <sup>0</sup> and stiffness *κ* , as

$$\kappa'=kT/\sigma'^2\tag{11}$$

$$\mathbf{x}\_0'(m) = \overline{\mathbf{x}}'(m) \tag{12}$$

Notice that *κ* > *κ*; hence, the particle is more tightly bound or confined after this change. Additionally, Equation (10) implies *σ* < *σ*, so that every measurement and feedback step further reduces the variance of the particle distribution.

**Figure 1.** Reduction of variance after measuring. **Left**: Initial distribution. Histogram of 40,000 random Gaussian numbers centered in 0 with standard deviation *σ* = 3.0 (blue bars) and a theoretical Gaussian distribution with the same parameters (red continuous line). **Right**: Posterior distribution. Histogram of particle positions with measurement outcomes in a given interval (0.89, 0.98) (blue bars) and prediction according to Bayes' theorem (7) (red). Measurement outcomes were performed with measurement error *σ<sup>m</sup>* = 3.0. Using *σ<sup>m</sup>* = 3.0 and *σ* = 3.0 in (10) gives *σ* = 2.12, which matches the sample standard deviation of 2.10.

In order to check this reduction of variance in simulation, we have computed the particle distribution after a measurement. For this, we first generate a large number of trajectories, starting from an initial equilibrium distribution for a harmonic potential centered in position *x*<sup>0</sup> = 0 and corresponding dispersion *σ* = 3.0, as depicted in Figure 1(left). After some time interval, for each trajectory, we measure its position by generating a (Gaussian) random measurement outcome *m* around the actual position *x* with dispersion

*σ<sup>m</sup>* = 3.0 (see the details in Section 5). We can then fix a small interval around a given measurement of the position (*m*, *m* + Δ*m*) of our choice, for instance (0.89, 0.98), and only check the realizations that gave a measurement in that interval. The distribution of the actual positions *x* of these particular realizations are distributed as in (7). In our case, a Gaussian with new reduced standard deviation *σ* = 2.12 is given by (10). This can be seen in Figure 1 (right).

This process of measurement and feedback can be repeated and a new, more confined state could be achieved. Figure 2 (top) shows the confining effect of repeating this procedure.

In every measurement and feedback step, the trapped Brownian particle stays in equilibrium with the thermal bath at temperature *T*. Consequently, the average energy is not modified by the feedback process. The average internal energy *E* of a trapped particle in one dimension is given by the equipartition theorem, as *E* = *kT*/2. Because the process is isothermal, Δ*E* = 0. On the other hand, always being in equilibrium, there is no relaxation of the particle distribution and the heat that is transferred from the heat bath vanishes on average *Q* = 0. Therefore, according to the first law, the average work done on the system also vanishes:

$$
\Delta E\_1 = Q\_1 + W\_1 = 0 \Rightarrow W\_1 = -Q\_1 = 0 \tag{13}
$$

This has been checked in simulations, as shown in Figure 2 (bottom). Details about work computation during measurement and feedback can be found in Section 5.

**Figure 2.** Confinement. **Top**: particle trajectory (gray line), measurement outcome (red dots) and trap center position (blue line). **Bottom**: Cumulative work for different realizations (color lines) and its average over 200 realizations (thick black line) for confinement in 10 measurement steps. See the simulation details in Section 5. Initial trap stiffness *κ* = 0.1 and position *x*<sup>0</sup> = 0. Initial condition is equilibrium with trap potential to avoid transient due to equilibration. Particle diffusion coefficient *D* = 1 and friction *γ* = 1.

In general, for other feedback protocols where the stiffness of the trap is suddenly changed, work is performed, on average [21], as in the simple example described after Equation (2). The feedback process used here is different (in addition to a sudden increase of stiffness, trap position is also modified in a precisely combined manner) and it is special in the sense that average work vanishes. As encoded in Equation (1), this can solely be achieved by using information regarding the position through measurement in the feedback (see Equation (9) for the new trap position). To see why this matters, consider our Brownian particle in a harmonic potential, where the observer happens to know that the particle is exactly at the bottom of the well. This would allow for this external agent to increase the stiffness of the potential well with an abrupt change, without performing work, since the energy of the particle is always zero at the bottom of the well, for any stiffness. The confining protocol is a refinement of this idea that works for any position of the particle, by displacing the bottom of the potential well towards the measured particle position and changing the stiffness in a suitable manner.

Furthermore, one can compute the mutual information that is obtained in the process of measurement and evaluate the increase in free energy Δ*F*<sup>1</sup> for the confinement stage. From the definition of mutual information:

$$I(m, \mathbf{x}) = \int \pi(m) q(m|\mathbf{x}) \log \frac{q(m|\mathbf{x})}{\pi(m)} \tag{14}$$

When considering that the measurement outcome distribution *q*(*m*|*x*) and the marginal distribution *π*(*m*) are Gaussian with variance *σ*<sup>2</sup> *<sup>m</sup>* and *σ*<sup>2</sup> + *σ*<sup>2</sup> *<sup>m</sup>* = *σ*<sup>2</sup> *<sup>m</sup>σ*2/*σ*2, respectively, the information that is acquired in a measurement is

$$I(m, x) = -\frac{1}{2} \log \frac{\sigma'^2}{\sigma^2} \ge 0 \tag{15}$$

Mutual information intuitively measures the decrease in uncertainty of variable *x* if we know the value of *m*, or vice versa [22]. In our case, from (10), if the measurement error *σ<sup>m</sup>* is very large then *σ* ≈ *σ* and we extract almost no information from measuring (*I* ≈ 0). Conversely, for infinite precise measurement *σ<sup>m</sup>* → 0, then *σ* → 0, and we obtain infinite information from a measurement, as an infinite precise description of a position would require an infinite number of bits to store it.

The entropy of a Gaussian of variance *σ*<sup>2</sup> is *S*(*ρ*) = *k* log *σ* <sup>√</sup>2*πe*. In the measurement process, the distribution changes from a Gaussian of variance *σ*<sup>2</sup> to a Gaussian of variance *σ*2, and we have

$$
\Delta S\_{1\text{step}} = k \log \sigma' \sqrt{2\pi e} - k \log \sigma \sqrt{2\pi e} = k \frac{1}{2} \log \frac{\sigma'^2}{\sigma^2} = -kI(m, \mathbf{x}) \tag{16}
$$

Because Δ*E* = 0, we finally obtain

$$
\Delta F\_{\text{1step}} = \Delta E - T\Delta S\_{\text{1step}} = kT I(m, \mathbf{x}).\tag{17}
$$

This is valid for every measurement and feedback step while using the reversible feedback protocol. In a sequence of confinement steps with successive variances *σ*0, *σ*1, ... , *σn*, the total information is

$$I\_{\text{total}} = -\frac{1}{2} \sum\_{i=1}^{n} \log \frac{\sigma\_i^2}{\sigma\_{i-1}^2} = -\frac{1}{2} \log \frac{\sigma\_n^2}{\sigma\_0^2}. \tag{18}$$

*σ*2 *<sup>n</sup>* can be obtained from (10) by recursion, giving:

$$\frac{1}{\sigma\_n^2} = \frac{1}{\sigma\_0^2} + n\left(\frac{1}{\sigma\_m^2}\right) \tag{19}$$

Finally, the free energy difference between the final and initial states in the confinement stage is

$$
\Delta F\_1 = kT I\_{\text{total}} = \frac{kT}{2} \log \frac{\sigma\_0^2}{\sigma\_n^2} = kT I\_{\text{total}}.\tag{20}
$$

Every bit of information that is extracted in the measurement is turned into an increase of free energy during the confinement stage and it can be converted into useful work in the subsequent expansion.

#### *2.2. Work Extraction by Isothermal Expansion*

If an external agent changes the stiffness of the optical trap from *κ<sup>i</sup>* to *κ <sup>f</sup>* < *κi*, energy is recovered as work, as explained above. In a quasistatic process, the work done by the system is given by the free energy difference. Because stage 2 completes the cycle of operation of the motor ending in the initial state, we have Δ*F*<sup>2</sup> = −Δ*F*<sup>1</sup> and

$$\mathcal{W}\_{\text{total}} = \mathcal{W}\_1 + \mathcal{W}\_2 = 0\\ -\Delta F\_1 = -kTI\_{\text{total}}.\tag{21}$$

which corresponds to extracted work. In fact, it saturates expression (5) and it is the maximum possible work that can be extracted while using the information that was obtained in the measurements.

This result can also be recovered by the direct computation of the work of a process changing stiffness from *κ<sup>i</sup>* to *κ <sup>f</sup>* and while taking into account that, for a quasistatic process, one can use the equipartition theorem stating *x*2 = *kT*/*κ*(*t*), with *<sup>κ</sup>*(*t*) the instantaneous value of the stiffness. Subsequently, the average work during the expansion, according to (2), reads:

$$\mathcal{W}\_2 = \int\_{\kappa\_i}^{\kappa\_f} d\kappa \frac{\langle \mathbf{x}^2 \rangle}{2} = \frac{kT}{2} \int\_{\kappa\_i}^{\kappa\_f} \frac{d\kappa}{\kappa} = \frac{kT}{2} \log \frac{\kappa\_f}{\kappa\_i} \tag{22}$$

The expansion starts at the end of the confinement process with a distribution of variance *σ<sup>n</sup>* and ends at *σ*0. Subsequently, while using the relation between stiffness and variance in the confinement stage (11), we have

$$W\_2 = \frac{kT}{2} \log \frac{\kappa\_f}{\kappa\_i} = -\frac{kT}{2} \log \frac{\sigma\_n^2}{\sigma\_0^2} = -kT I\_{\text{total}} \tag{2.3}$$

Notice that during both the confinement and expansion the system must be at equilibrium in order to transform every bit of information into useful work.

In practice, though, for a process changing the stiffness of the potential to be approximately quasistatic, it is enough that the time of the process is large compared to the inverse frequency of the trap given by *ν* = *κ*/*γ*. This is the criterion that we have used for simulations. Additionally, it is worth noting that, even though the work in every realization of the expansion may differ in principle in a stochastic system, work is—in this particular example—a self-averaging quantity: for a quasistatic expansion, the total work obtained in any realization is very similar to its average value. The argument for self-averaging of the work is the following: from work definition (2), work in a single realization when expanding is *W*ˆ = *<sup>x</sup>*<sup>2</sup> <sup>2</sup> *dκ*. If the expansion is very slow, in the time *κ* is modified a certain small amount, the particle position has time to fluctuate and sample the whole quasi-equilibrium distribution and *x*<sup>2</sup> approximately can be replaced by its average value (see the full computation in [14]).

Figure 3 depicts the complete diagram of the proposed cycle.

Finally, one could also define an efficiency *η* as the ratio between the extracted thermodynamic resource (work) and the thermodynamic resource consumed to make the engine run, in this case information. With this definition, this reversible feedback engine attains the maximum efficiency:

$$
\eta = \frac{-W}{kTI} = 1\tag{24}
$$

as in a similar system [23], with just one measurement per cycle.

**Figure 3.** Cycle for extracting work from a thermal bath with inaccurate measurements.

#### **3. Results**

#### *3.1. Work Is Optimal*

We have performed computer simulations of the model system that is described above. Figure 4 depicts part of two consecutive cycles, each of them with a confinement stage that is composed of 10 measurement and feedback steps, followed by an isothermal expansion. The top panel depicts the particle position (gray line), trap center (blue line), and measurement outcomes (red dots), whereas the bottom panel shows the evolution of the stiffness along the cycle.

Figure 5 shows the cumulative work that was done on the system along the time of a single cycle. The thick solid line represents the average over 200 cycles. Every cycle consists of a confinement that is achieved by measuring the particle position 10 times and the subsequent isothermal expansion. Average work extracted (*W* < 0) by the end of the cycle approaches the expected result that is given by Equations (18), (19) and (21), marked with dashed black line. The shaded area represents the variance of the work, which is substantially large. As is apparent from the figure, most of the variance comes from the confinement step, with the quasistatic work being a self-averaging quantity. Finally, work that corresponds to two particular cycles is shown by thin blue lines.

**Figure 4.** Trajectories. (**Top**) Particle trajectory (gray continuous line), trap center (blue continuous line), measurement outcomes (red dots). (**Bottom**) Stiffness evolution during the cycle. Every cycle starts with *κ* = 1, there are 10 measurement steps, followed by quasistatic expansion. *D* = 1, *γ* = 1.

**Figure 5.** Average cumulative work along the confinement-expansion cycle (thick blue line) computed from 200 realizations. The shaded area corresponds to one standard deviation from the average. Thin blue lines represent cumulative in two representative cycles. Simulation parameters are: Δ*t* = 0.001, time between measurements *τ* = 0.1, number of measurements before expansion is 10, measurement error *σ*<sup>2</sup> *<sup>m</sup>* = 1, initial stiffness of the trap *κ* = 1, diffusion coefficient *D* = *kT*/*γ* = 1, and drag coefficient *γ* = 1.

#### *3.2. Power and Efficiency with Higher Measurement Errors*

Consider two setups, *A* and *B*, with different measurement errors being given by variances *σ*<sup>2</sup> *mA* and *<sup>σ</sup>*<sup>2</sup> *mB* = <sup>2</sup>*σ*<sup>2</sup> *mA*. Suppose that only one measurement step is performed in each system before the expansion. According to our discussion above, the measurement information that can be later transformed into work is smaller in system *B* than in *A*:

$$I\_{B1} = \frac{1}{2} \log \frac{\sigma\_{mB}^2 + \sigma\_0^2}{\sigma\_{mB}^2} = \frac{1}{2} \log \left( 1 + \frac{\sigma\_0^2}{2\sigma\_{mA}^2} \right) < \frac{1}{2} \log \left( 1 + \frac{\sigma\_0^2}{\sigma\_{mA}^2} \right) = I\_{A1} \tag{25}$$

However, we can obtain as much information in system *B* with two measurements as in system *A* with one measurement. After two measurements, while using the reversible confinement protocol, the variance of the equilibrium distribution *σ*<sup>2</sup> *<sup>B</sup>*<sup>2</sup> in system *B* is equal to the variance in system *A* after one measurement *σ*<sup>2</sup> *A*1:

$$\frac{1}{\sigma\_{B2}^2} = \frac{1}{\sigma\_0^2} + 2\left(\frac{1}{\sigma\_{mB}^2}\right) = \frac{1}{\sigma\_0^2} + \left(\frac{1}{\sigma\_{mA}^2}\right) = \frac{1}{\sigma\_{A1}^2} \tag{26}$$

Using (18), we obtain:

$$I\_B = \frac{1}{2} \log \frac{\sigma\_{B2}^2}{\sigma\_0^2} = \frac{1}{2} \log \frac{\sigma\_{A1}^2}{\sigma\_0^2} = I\_A \tag{27}$$

As explained above, this implies that the same work can be extracted in the subsequent quasistatic expansion. In fact, bothof the systems run with the same efficiency *η* = 1; hence, every bit of information is turned into work in the expansion. Furthermore, system *B* can also be run in principle at the same power as system *A*. During the confinement process, after the adjustment of the potential, the particle distribution is at equilibrium. No relaxation occurs, as explained previously. Therefore, a new measurement and feedback step could, in principle, be taken immediately after, in rapid succession. Thus, halving the time between measurements in system *B* as compared to system *A* ensures the same cycle time. As the work obtained is also the same, both of the systems operate with the same power. Figure 6 depicts this, where we show the simulation results for system *A* with one measurement and expansion and system *B* with two (faster) measurements and expansion. Approximately the same work is obtained in both systems. For reference, we have also marked the expected extracted work for a system with tge measurement error given by *σmB*, but using just one measurement.

**Figure 6.** Average work extraction for two different measurement errors, using one measurement with variance *σ*<sup>2</sup> *mA* <sup>=</sup> 1 (blue thick line), and using two measurements with variance *<sup>σ</sup>*<sup>2</sup> *mB* = 2 (red thick line). Dashed line represents expected work extraction and fine dashed line corresponds to expected work extraction with just 1 measurement of variance *σmB*. Thin lines represent single realizations of the work in system *A* (blue) and *B* (red).

#### **4. Discussion**

Reversible feedback confinement is an optimal way of reducing the entropy of a system to be later used for work extraction. Nevertheless, it requires a high degree of control over the Hamiltonian, to adapt it to the new probabilistic state after the measurement. This might be a limitation for experimental realizations, although a low dissipation is expected, even if a similar or approximate protocol is implemented. Theoretically, the dissipation could be accounted for by using the Kullback–Leibler distance between the post-measurement particle distribution and the equilibrium distribution of the potential after feedback [24] , if they were different due to a less precise tuning of the potential.

In principle, for a measurement and feedback protocol, imprecision in the measurement, which will inevitably arise in an experimental setup, will limit the work extraction or power. Nevertheless, we have shown here that this limitation can be overcome by adding more measurement steps before the quasistatic expansion, as long as the reversible feedback confinement protocol is used. In principle, the application of this protocol is instantaneous. In practice, this means that the confinement may be applied in a very short time, limited maybe by the response time of the feedback mechanism or the measurement acquisition time. Thus, if the response times of measurement, feedback, and Hamiltonian modification are fast as compared to system's relaxation time, optimal work extraction is feasible, even with a high degree of inaccuracy in the measurement, while using repeated optimal feedback.

#### **5. Materials and Methods**

The confined Brownian particle evolves according to Langevin equation:

$$
\gamma \dot{\mathfrak{x}} = -V\_{\mathfrak{x}, \mathfrak{x}0}'(\mathfrak{x}) + \mathfrak{f}(\mathfrak{t}),
\tag{28}
$$

with *ξ*(*t*) Gaussian white noise *ξ*(*t*)*ξ*(*t* ) = 2*kTγδ*(*t* − *t* ), *T* bath temperature and *k* Boltzmann's constant. The potential *Vκ*,*x*0(*x*) is defined above and it is controlled through measurement and feedback. Model simulations were performed in C language, solving the Langevin evolution equation with the Heun method for a stochastic differential equation [25]. We provide, in the following, some details on work computation, measurement, and feedback steps. For full details, the code is available here: http://seneca.fis.ucm.es/ldinis/code/extract\_optimal\_work.zip.

• Measurement. In order to perform a measurement in the simulation, a Gaussian number "r" with zero average and standard deviation 1 is generated. Subsequently, if particle position is *x*, the measurement outcome *m* is

$$m = \mathfrak{x} + \sigma\_m \mathfrak{r} \tag{29}$$

Notice that *m* is then distributed according to Equation (6)


$$
\Delta W = \frac{1}{2}\kappa'(\mathbf{x} - \mathbf{x}\_0')^2 - \frac{1}{2}\kappa(\mathbf{x} - \mathbf{x}\_0)^2 \tag{30}
$$

This Δ*W* is added to a variable *W* that stores the cumulative work that was done along the whole simulation.

• After the feedback, evolution equation resumes with the new potential parameters.

• Work during expansion. Work is also performed as a result of the change in *κ* during an expansion. In the simulation, during the expansion stage, *κ* changes an amount Δ*κ* = *κ <sup>f</sup>* −*κ<sup>i</sup> Nexp* in every time step, where *Nexp* is the number of time steps of the expansion. Therefore, in a time step, a work

$$
\Delta W = \frac{1}{2} \Delta \kappa (\mathbf{x} - \mathbf{x}\_0)^2 \tag{31}
$$

is performed. Again, this Δ*W* has to be added to the variable *W*, which stores the total or cumulative work of the whole process.

**Author Contributions:** Conceptualization, L.D. and J.M.R.P.; methodology, L.D.; software, L.D.; validation, L.D.; formal analysis, L.D.; investigation, L.D.; writing—original draft preparation, L.D. and J.M.R.P.; writing—review and editing, L.D. and J.M.R.P.; visualization L.D. and J.M.R.P.; supervision, L.D. and J.M.R.P.; project administration, L.D. and J.M.R.P.; funding acquisition, L.D. and J.M.R.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** L.D. and J.M.R.P. acknowledge financial support from Ministerio de Ciencia, Innovación y Universidades grant number FIS2017-83709-R.

**Data Availability Statement:** Data for this study was generated using custom computer code. The code and instructions are available at http://seneca.fis.ucm.es/ldinis/code/extract\_optimal\_ work.zip.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Diauxic Growth at the Mesoscopic Scale**

#### **Mirosław Lachowicz \*,† and Mateusz De¸bowski †**

Institute of Applied Mathematics and Mechanics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097 Warsaw, Poland; mateusz.debowski@mimuw.edu.pl

**\*** Correspondence: M.Lachowicz@mimuw.edu.pl

† These authors contributed equally to this work.

Received: 27 October 2020; Accepted: 10 November 2020; Published: 12 November 2020

**Abstract:** In the present paper, we study a diauxic growth that can be generated by a class of model at the mesoscopic scale. Although the diauxic growth can be related to the macroscopic scale, similarly to the logistic scale, one may ask whether models on mesoscopic or microscopic scales may lead to such a behavior. The present paper is the first step towards the developing of the mesoscopic models that lead to a diauxic growth at the macroscopic scale. We propose various nonlinear mesoscopic models conservative or not that lead directly to some diauxic growths.

**Keywords:** diauxic growth; replicator equation; mesoscopic model; integro-differential equations

#### **1. Introduction**

In various processes in nature and social sciences, e.g., artificial neural networks, biology, medicine, and sociology, the logistic growth is observed in experiments. The logistic growth describes, at the macroscopic scale, the limited growth of a population. It is a typical way of modeling tumor growth—see e.g., [1–3] and references therein. It leads to the curve of *S*, or sigmoid, shape. In more mathematical terms a single inflection point is present. In some cases, however, a more complex behavior is observed. That was pointed out in 1949 by Monod—see [4], page 390—"*This phenomenon is characterized by a double growth cycle consisting of two exponential phases separated by a phase during which the growth rate passes through a minimum, even becoming negative in some cases*". Monod referred such a behavior to the growth of bacterial cultures and called it—diauxie. The similar effect was hypothesized in the analysis of a role for the CDC6 protein in the entry of cells into mitosis—see [5]. Based on the experimental data in [5], a new hypothesis that CDC6 slows down the activation of inactive complexes of CDK1 and cyclin B upon mitotic entry was formulated and the corresponding mathematical model was developed. Another example is the process of DNA melting in the case when the possible base pairs of AT (or TA) and of CG (or GC) appear in two separate groups composed only of AT and CG—see Figure 7.14, page 205, in [6].

In mathematical terms, we can refer to diauxic growth, if the corresponding increasing bounded function has more than one single inflection point. The first mathematical description of such a behavior is contained in [7].

One may note that the data of total cases of COVID–19, according to Johns Hopkins University, in September 2020, show the curves with more than one inflection points in cases of various European countries, like Spain, Italy, France, Germany, and UK. On the other hand, countries like Brazil, Chile, and South Africa display curves closed to the logistic growth (with only one inflection point).

The comparison between the logistic curve and the curve with diauxic growth is presented in Figure 1.

*Entropy* **2020**, *22*, 1280; doi:10.3390/e22111280 www.mdpi.com/journal/entropy

**Figure 1.** Comparison between logistic curve and diauxic growth curve.

In the present paper, we apply

**Definition 1.** *An increasing bounded and positive-valued real function is said to have a diauxic growth if its number of inflection points is bigger than one.*

Although the diauxic growth can be related, similarly to the logistic one, to the macroscopic scale one may ask whether the models on mesoscopic or microscopic scales (cf. [8]) can result in a diauxic growth. The present paper is the first step towards the developing of the mesoscopic models that lead to a diauxic growth at the macroscopic scale. We propose various nonlinear mesoscopic models, both conservative and not, which lead directly to some diauxic growths.

#### **2. Replicator Equation**

We consider the following replicator equations that occur in the multi-player games, see [7,9].

$$\frac{\mathbf{d}\,\mathbf{x}}{\mathbf{d}\,t} = \mathbf{x}(1-\mathbf{x})\mathcal{P}(\mathbf{x})\,,\tag{1}$$

where P = P(*x*) is a polynomial. In [7] the following polynomials were considered

$$\mathcal{P}(\mathbf{x}) = (\mathbf{x} - \mathbf{a})^2 + \omega \, , \tag{2}$$

where 0 < *a* < 1, *ω* > 0 is a small number, and

<sup>P</sup>(*x*)=(*<sup>x</sup>* <sup>−</sup> *<sup>a</sup>*)2(*<sup>x</sup>* <sup>−</sup> *<sup>b</sup>*)<sup>2</sup> <sup>+</sup> *<sup>ω</sup>* , (3)

where 0 < *a* < *b* < 1 and *ω* is a (small) number. The former refers to three players games whereas the latter to five players games. Both are related to two strategies ↑ and ↓ in an infinitely large population. The variables *x* and 1 − *x* are the frequencies of strategies ↑ and ↓, respectively.

Consider the following payoff matrix in the case of a 3 players (for the sake of simplicity) game

$$
\uparrow \uparrow \quad \uparrow \downarrow \quad \downarrow \downarrow
$$

$$
\uparrow \quad \quad \quad \quad \quad a\_1 \quad a\_2 \quad a\_3 \quad \quad \quad \quad
$$

$$
\downarrow \quad \quad \quad b\_1 \quad b\_2 \quad b\_3 \quad \quad
$$

where *ai*, *bi*, and *i* = 1, 2, 3, are the corresponding payoffs. The classical way of presentation is used. For example *a*<sup>2</sup> is payoff of the "*first player*" with strategy ↑ against the other players with strategies ↑ and ↓. Again, for the sake of simplicity, we assume that the payoffs are nonnegative.

Let now *μ* = *μ*(*t*) and *ν* = *ν*(*t*) be densities of players with strategies ↑ and ↓, respectively, cf. [10]. In terms of the averages payoffs of the two strategies their dynamics is defined by the system

$$\begin{array}{rcl} \frac{d}{dt}\mu &=& \mu \Big( a\_1 \frac{\mu^2}{(\mu+\nu)^2} + 2a\_2 \frac{\mu \cdot \nu}{(\mu+\nu)^2} + a\_3 \frac{\nu^2}{(\mu+\nu)^2} - \kappa \Big),\\ \frac{d}{dt}\nu &=& \nu \Big( b\_1 \frac{\mu^2}{(\mu+\nu)^2} + 2b\_2 \frac{\mu \cdot \nu}{(\mu+\nu)^2} + b\_3 \frac{\nu^2}{(\mu+\nu)^2} - \kappa \Big), \end{array} \tag{4}$$

where, in addition to the net growth, we consider a linear death terms with rate *κ* > 0. We see that *x*(*t*) = *<sup>μ</sup>*(*t*) *<sup>μ</sup>*(*t*)+*ν*(*t*) satisfies Equation (1) with

$$\mathcal{P}(\mathbf{x}) = \mathbb{A}\mathbf{x}^2 + 2\hat{\mathbf{b}}\mathbf{x} + \mathbf{x},\tag{5}$$

where *<sup>a</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>a</sup>*<sup>3</sup> <sup>−</sup> <sup>2</sup>*a*<sup>2</sup> <sup>+</sup> *<sup>a</sup>*<sup>1</sup> <sup>−</sup> *<sup>b</sup>*<sup>3</sup> <sup>+</sup> <sup>2</sup>*b*<sup>2</sup> <sup>−</sup> *<sup>b</sup>*1, <sup>ˆ</sup> *b* = −*a*<sup>3</sup> + *a*<sup>2</sup> + *b*<sup>3</sup> − *b*<sup>2</sup> and *x*ˆ = *a*<sup>3</sup> − *b*3. We refer to these statements throughout the paper.

#### **3. Mesoscopic Model**

We study the time–evolution of the probability density *f* . The function *f* = *f*(*t*, *u*) is the distribution of an internal, microscopic state *<sup>u</sup>* <sup>∈</sup> <sup>U</sup> at time *<sup>t</sup>* <sup>≥</sup> 0 of a (*statistical* or *test*) agent, <sup>U</sup> is a domain in <sup>R</sup>*d*, *<sup>d</sup>* <sup>∈</sup> <sup>N</sup> <sup>=</sup> { 1, 2, ... }. Such a description then has a mesoscopic nature. An arbitrary vector *<sup>u</sup>* <sup>∈</sup> <sup>U</sup> can be related to a biological state, activity, opinion (e.g., political opinion), a social state of a test agent , etc.—cf. [8,11–14] and references therein. The model has therefore a wide range of possible applications in various applied sciences, such as biology, medicine, social, or political sciences.

The time evolution is defined by the general nonlinear integro–differential Boltzmann-like equation, see [14] and references therein,

$$\frac{\partial}{\partial t} f(t, u) = \mathbb{Q}[f](t, u) \,, \qquad t > 0 \,, \quad u \in \mathbb{U} \,. \tag{6}$$

where

$$Q[f](t,\mu) = \int\_{\mathbb{R}^d} \left( f(t,v) \, T[f(t,\cdot,\cdot)](v,\mu) - f(t,\mu) \, T[f(t,\cdot,\cdot)](u,v) \right) d\nu \,.$$

The nonlinear operator *Q* describes interactions between agents causing the change of state. The turning rate *T*[ *f* ](*u*, *v*) measures the rate for an agent with state *u* to change it into *v*. A simpler equation, with two possible states only, was studied in [15]—see also [8].

The modeling process leads to a proper choice of the turning rate.

**Case 1.** *Let*

$$T[f(t, \cdot, \cdot)](u, v) = \beta(u, v) f^\gamma(t, v) \,, \qquad u, v \in \mathbb{U}\_{\prime \prime}$$

*where γ* > 1 *is here a given integer.*

The rate of transition from state *u* to state *v* is proportional to the *γ*–th power of actual probability of state *v*. The higher is the probability, the larger is the chance of the change. The interaction kernel *β* corresponds to the tendency of agents to change a state. In particular, it may restrict the interactions only to states that are close each to other—see Ref. [16]. The (sensitivity) parameter *γ* describes the level of sensitivity of interactions: The greater is *γ* the more sensitive interactions are.

The models defined by Case 1 were proposed in [14], and then studied in various directions in [16–18]. Ref. [14] proposed results of global existence in the space homogeneous case for 0 < *γ* < 1, whereas *γ* > 1 was considered in [16–18]. Assuming *γ* = 1 for symmetric *β* yields a trivial model. Thus it was excluded as it is stated in Case 1. The detailed information on the modeling leading to Case 1 can be found in [19] (see also references therein), where it was referred to the conformist society.

We consider the following equation

$$\frac{\partial}{\partial t} f(t, u) = Q[f](t, u) \qquad t > 0, \quad u \in \mathbb{U} \tag{7}$$

with

$$Q[f](t,u) = f^\gamma(t,u) \int\_{\mathcal{U}} \mathfrak{f}(v,u) f(t,v) \, \mathrm{d}v - f(t,u) \int\_{\mathcal{U}} \mathfrak{f}(u,v) f^\gamma(t,v) \, \mathrm{d}v. \tag{8}$$

Independently we consider the following two, formally more general, kinetic equations

**Case 2.** *Let*

$$T[f(t, \cdot)](u, v) = \underbrace{\int \dots \int}\_{\mathbf{U}} A \left( v, u, v\_1, \dots, v\_{\gamma'} \right) a \left( u, v\_1, \dots, v\_{\gamma'} \right) \times \mathbf{u}$$

$$f(t, v\_1) \dots f(t, v\_{\gamma'}) \, \mathrm{d}v\_1 \dots \, \mathrm{d}v\_{\gamma'} \qquad u, v \in \mathbb{U}\_{\times}$$

*where γ is an integer.*

Case 2 leads to

$$\frac{\partial}{\partial t} f(t, u) = \vec{Q}[f](t, u) \qquad t > 0, \quad u \in \mathbb{U} \tag{9}$$

with

$$\begin{aligned} \bullet[\dot{Q}[f](t,u) &= \underbrace{\int \dots \int}\_{\mathbf{U}} A\left(u,v,v\_1,\dots,v\_{\gamma}\right) a\left(v,v\_1,\dots,v\_{\gamma}\right) \times \\ f(t,v)f(t,v\_1) &\dots f(t,v\_{\gamma}) \operatorname{d\!d}v \operatorname{d\!d}v\_1 \dots \operatorname{d\!v}\_{\gamma} &- \\ f(t,u) &\underbrace{\int \dots \int}\_{\mathbf{U}} a\left(u,v\_1,\dots,v\_{\gamma}\right) f(t,v\_1) \dots f(t,v\_{\gamma}) \operatorname{d\!d}v\_1 \dots \operatorname{d\!v}\_{\gamma} \end{aligned} \tag{10}$$

**Case 3.** *Let γ be an integer and*

$$\begin{array}{l} T[f(t, \cdot)](u, v) = A\_0 \left(v, u\right) a\_0 \left(v\right) f(t, v) + \\ \stackrel{\scriptstyle \gamma}{\;} + \underset{\scriptstyle j = 1}{\stackrel{\scriptstyle \bullet}{\;}} \underbrace{\int \dots \int \underset{\scriptstyle \mathcal{U}}{\text{d}}}\_{j \times \text{e}} A\_j \left(v, u, v\_1, \dots, v\_j\right) a\_j \left(u, v\_1, \dots, v\_j\right) f(t, v\_1) \dots f(t, v\_j) \,\text{d}v\_1 \dots \,\text{d}v\_j \dots \\ u, v \in \mathcal{U}\_{\prime} \end{array}$$

Case 3 leads to

$$\frac{\partial}{\partial t} f(t, u) = \tilde{Q}[f](t, u) \qquad t \ge 0, \quad u \in \mathcal{U} \tag{11}$$

with

$$\begin{split} \hat{Q}[f](t,u) &= \int\_{\mathcal{U}} A\_0\left(u,v\right) a\_0\left(v\right) f(t,v) \, \mathrm{d}v - a\_0(u)f(t,u) + \\ &+ \sum\_{j=1}^{\gamma} \left( \int\_{\mathcal{U}} \dots \underbrace{\int\_{\mathcal{U}}}\_{\mathcal{U}} A\_j\left(u,v,v\_1,\dots,v\_j\right) a\_j\left(v,v\_1,\dots,v\_j\right) \times \\ &\qquad \qquad \qquad \qquad \left(f(t,v)\right) \times \\ &= f(t,v)f(t,v\_1) \dots f(t,v\_j) \, \mathrm{d}v \, \mathrm{d}v\_1 \dots \, \mathrm{d}v\_j - \\ &= f(t,u) \underbrace{\int\_{\mathcal{U}} \dots \int\_{\mathcal{U}}}\_{j\times\infty} a\_j\left(u,v\_1,\dots,v\_j\right) f(t,v\_1) \dots f(t,v\_j) \, \mathrm{d}v\_1 \dots \, \mathrm{d}v\_j}\_{j\times\infty} \right). \end{split} \tag{12}$$

The terms *Aj*(*u*, *v*, *v*1, ... , *vj*) can be interpreted as the transition probabilities of changing from state *v* to *u* caused by interaction with agents with states *v*1, *v*2,...,*vj* whereas *aj*(*v*, *v*1, ..., *vj*) as rate of interaction between the agent with state *v* and agents with states *v*1,...,*vj*.

One may note that Equations (9) and (11), under suitable symmetry assumption, can be directly related with the dynamics of *N* interacting agents in the limit *N* → ∞—see [8,13]. The former may be related to the interactions between *γ* agents, whereas the latter to interactions between *j* agents, with *j* = 1, 2, ... , *γ* and *j* = 0 corresponds to a stochastic change without any interaction—see [13]. One may note, however, that Equation (11) can be directly reduced to Equation (9) just taking *α<sup>j</sup>* ≡ 0 for each *j* = 0, 1, ... , *γ* − 1. On the other hand, thanks to the conservative properties, Equation (9) results in Equation (11) as well, under a suitable choice of *A* and *α*. For these reasons we concentrate on Equation (11) only.

The *Lp*-norm is denoted by . *p*.

We may state the following local existence–uniqueness result for solutions to Equation (7).

**Proposition 1.** *Let γ* > 1 *and*

$$
\mathcal{B} \in L\_{\infty}(\mathbb{U} \times \mathbb{U})\,.\tag{13}
$$

*If <sup>f</sup>*<sup>0</sup> *is a probability density such that <sup>f</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*∞(U)*, then there exists <sup>T</sup>* <sup>&</sup>gt; <sup>0</sup> *such that the solution <sup>f</sup>* <sup>=</sup> *<sup>f</sup>*(*t*) *to* (7) *exists and is unique in <sup>L</sup>*∞(U) <sup>∩</sup> *<sup>L</sup>*1(U) *on the interval* [0, *<sup>T</sup>*)*. The solution preserves positivity and <sup>L</sup>*1*-norm (i.e., it is a probability density) on* [0, *T*)*. Moreover,*


The first part of proof follows from [14]—see [19]—based on the Lipschitz property of the corresponding operator. The rest follows by *a priori* estimates.

From [16–20] we see that the behavior of the solution to Equation (7) may be very complex and may lead to various interesting applications in biology, medicine, and social sciences.

In contrast to Equation (7) with *γ* > 1, Equation (7) with *γ* = 1 (for asymmetric *β*), as well as Equations (9) and (11) result in the global existence–uniqueness of solutions.

**Proposition 2.** *Let γ* = 1 *and Equation* (13) *be satisfied. If f*<sup>0</sup> *is a probability density then for any T* > 0 *the solution f* = *f*(*t*) *to* (7) *exists and is unique in L*1(U) *on the interval* [0, *T*]*. The solution preserves positivity and L*1*-norm (i.e., it is a probability density) on* [0, *T*]*.*

We consider the following conservative situation

**Assumption 1.** *Let γ be an integer and*

$$\begin{array}{ll} A\_j \ge 0, & a\_j \ge 0, & a\_j \in L\_\infty \left( \mathbb{U}^{j+1} \right), \\ \int A\_j \left( u, v, v\_1, \dots, v\_j \right) \, \mathrm{d}u = 1 \quad \text{for all} \quad \left( v, v\_{1'}, \dots, v\_j \right) \in \mathbb{U}^{j+1} \\ \text{such that} & a\_j \left( v, v\_{1'}, \dots, v\_j \right) > 0, \qquad \forall \ j = 1, \dots, \gamma. \end{array} \tag{14}$$

**Proposition 3.** *Let Assumption 1 be satisfied. If f*<sup>0</sup> *is a probability density then for any T* > 0 *the solution f* = *f*(*t*) *to Equation* (11) *exists and is unique in L*1(U) *on the interval* [0, *T*]*. The solution preserves positivity and L*1*-norm (i.e., it is a probability density) on* [0, *T*]*.*

**Corollary 1.** *The solutions in Propositions <sup>2</sup> and <sup>3</sup> are in <sup>L</sup>*∞(U) *on every compact* [0, *<sup>T</sup>*] *provided that <sup>f</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*∞(U)*.*

The proofs of Propositions <sup>2</sup> and <sup>3</sup> are standard and based on the Lipschitz property in *L*1(U)—cf. [20]. Similarly Corollary 1 follows.

Moreover, we need the smoothness of the solutions. Let *Wm*,*p*(U) and *C<sup>m</sup> <sup>B</sup>* (U) be the Banach spaces—the classical Sobolev space (a subspace of *Lp*(U)) and the space of *m*–differentiable functions with the usual norms denoted by . (*m*) *<sup>p</sup>* and . (*m*) [*B*] , respectively—see [21].

Let *<sup>X</sup>*(*m*) <sup>=</sup> *<sup>W</sup>m*,1(U) <sup>∩</sup> *<sup>C</sup><sup>m</sup> <sup>B</sup>* (U), *<sup>m</sup>* <sup>=</sup> 0, 1, 2, . . . , and . (*m*) be defined

$$\|\cdot\|^{(m)} = \|\cdot\|\_{p}^{(m)} + \|\cdot\|\_{\left[B\right]}^{(m)}, \qquad m = 0, 1, 2, \dots, n$$

In particular, for *<sup>m</sup>* <sup>=</sup> 0, we write *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*(0) <sup>=</sup> *<sup>L</sup>*1(U) <sup>∩</sup> *<sup>L</sup>*∞(U) and . <sup>=</sup> . (0).

**Proposition 4.** *Let the assumption of Proposition <sup>1</sup> be satisfied and additionally f*<sup>0</sup> <sup>∈</sup> *<sup>X</sup>*(*m*) *and*

$$\int\_{\mathcal{U}} \beta(u, v) g(v) \,\mathrm{d}v \in X^{(m)} \qquad \text{for each} \quad g \in X^{(m)},\tag{15}$$

*for some <sup>m</sup>* <sup>=</sup> 1, 2, 3, ... *. Then the solution <sup>f</sup>* <sup>=</sup> *<sup>f</sup>*(*t*) *(given by Proposition 1) satisfies <sup>f</sup>*(*t*, .) <sup>∈</sup> *<sup>X</sup>*(*m*) *for all t* ∈ [0, *T*)*.*

#### **4. Macroscopic Behavior in the Conservative Case**

In the present section we fix our attention on the behavior of the cumulative distribution function corresponding to the solution of a (mesoscopic) kinetic equation.

For simplicity we assume that <sup>U</sup> = [ 0, <sup>∞</sup> [ <sup>≡</sup> <sup>R</sup><sup>1</sup> <sup>+</sup> and *γ* = 3, however possible generalizations are straightforward.

We show that, for particular assumptions on the parameters *Aj*, *αj*, *j* ≤ 3, of Equation (11), the solution *f* = *f*(*t*, *u*) leads to the distribution

$$F(t, u) = \int\_0^u f(t, \vec{u}) \, d\vec{u} \,\,,\tag{16}$$

that possesses a diauxic growth with respect to *t* > 0, for any sufficiently large *u* > 0.

**Assumption 2.** *We assume the interactions such that*

$$u\_j(\mathfrak{u}, v\_1, \dots, v\_j) = j! \eta\_j \chi \left(v\_1 \le \mathfrak{u} \right) \chi \left(v\_2 \le v\_1 \right) \cdots \chi \left(v\_j \le v\_{j-1} \right) \quad \text{for all} \quad \mathfrak{u}, v\_1, \dots, v\_j \in \mathbb{R}\_+^1,\tag{17}$$

*where χ* (true) = 1*, χ* (false) = 0*, η<sup>j</sup> are positive constants,*

$$\int\_{0}^{u} A\_{j}\left(\mathbb{I}, v, w\_{1}, \dots, w\_{j}\right) \,\mathrm{d}\mathbb{I} = \chi\left(w\_{1} \le u\right) \quad \text{for all} \quad u, v, w\_{1}, \dots, w\_{j} \in \mathbb{R}^{1}\_{+}.\tag{18}$$

*j* = 1,2,3 *(we keep in mind that γ* = 3*), with the standard convention, i.e., if j* = 1*, then w*1, ... , *wj means w*1*, if j* = 2*, then w*1, ... , *wj means w*1, *w*2*, if j* = 3*, then w*1, ... , *wj means w*1, *w*2, *w*3*, and*

$$u\_0(u) = \eta\_0 \quad \text{for} \quad \text{any} \quad u \in \mathbb{R}^1\_+,\tag{19}$$

$$\int\_{0}^{u} A\_{0}\left(\vec{u}, v\right) \,\mathrm{d}\vec{u} = \zeta(u) \quad \text{for all} \quad u \ge u\_{0} \quad \text{and} \quad v \in \mathbb{R}^{1}\_{+},\tag{20}$$

*where u*<sup>0</sup> <sup>&</sup>gt; <sup>0</sup> *is a given constant and <sup>ζ</sup> is a increasing function such that <sup>ζ</sup>*(0) = <sup>0</sup> *and* lim*u*→<sup>∞</sup> *<sup>ζ</sup>*(*u*) = <sup>1</sup>*.*

We may note, that Assumption 2 implies Assumption 1.

By Equation (17) and simple calculations, we obtain

$$\begin{aligned} \underbrace{\int\_{0}^{u} \int\_{0}^{\infty} \dots \int\_{0}^{\infty} f(t, \vec{u}) \, a\_{\vec{j}} \left( u, v\_{1}, \dots, v\_{\vec{j}} \right) f(t, v\_{1}) \dots f(t, v\_{\vec{j}}) \, \mathrm{d}v\_{1} \dots \mathrm{d}v\_{\vec{j}} \, \mathrm{d}\vec{u} &= \\\ \underbrace{\frac{\eta\_{\vec{j}}}{f^{+}}}\_{f^{-}} \left( \int\_{0}^{u} f(t, \vec{u}) \, \mathrm{d}\vec{u} \right)^{j+1}, \end{aligned} \tag{21}$$

for *j* equal 1, 2 and 3 , and any *f*(*t*, ·) ∈ *L*<sup>1</sup> R1 + and

$$\int\_{0}^{u} f(t, \vec{u}) \, a\_{0} \left( \vec{u} \right) \, \text{d}\vec{u} = \eta\_{0} \int\_{0}^{u} f(t, \vec{u}) \, \text{d}\vec{u} \,. \tag{22}$$

Moreover, for any *f*(*t*, ·) ∈ *L*<sup>1</sup> R1 + such that *f* <sup>1</sup> = 1, *j* = 1, 2, 3, by Equations (17) and (18), we have

$$\begin{array}{ll} \underbrace{\int\_{0}^{u} \dots \int\_{0}^{\infty} A\_{j}}\_{\begin{subarray}{c} (j+1) \times \\ (j+1) \times \end{subarray}} \times\_{j} \left( \mathbb{D}, v, v\_{1}, \dots, v\_{j} \right) \mathbbm{a}\_{j} \left( v, v\_{1}, \dots, v\_{j} \right) \times \\ \underbrace{(t, v) f(t, v\_{1}) \dots f(t, v\_{j})}\_{\begin{subarray}{c} \operatorname{\bf r} \\ 0 \end{subarray}} \operatorname{\bf r} \operatorname{\bf d} v \operatorname{\bf d} v\_{1} \dots \operatorname{\bf d} v\_{j} \operatorname{\bf d} \operatorname{\bf d} =\\ \operatorname{\bf j} \operatorname{\bf r}\_{j} \operatorname{\bf j} \operatorname{\bf f} (t, v) \int\_{0}^{\infty} f(t, v\_{1}) \chi \left( v\_{1} \le v \right) \chi \left( v\_{1} \le u \right) \int\_{0}^{v\_{1}} f(t, v\_{2}) \dots \int\_{0}^{v\_{j-1}} f(t, v\_{j}) \operatorname{\bf d} v \operatorname{\bf d} v\_{1} \dots \operatorname{\bf d} v\_{j} \operatorname{\bf d} \operatorname{\bf d} =\\ \mathbbm{I}\_{1} + \mathbbm{I}\_{2} \end{array} \tag{23}$$

where

$$\mathbb{I}\_1 = \frac{\eta\_j}{j+1} \left( \int\_0^u f(t, \vec{u}) \, \mathrm{d}\vec{u} \right)^{j+1}.$$

and

$$\mathbb{I}\_2 = \eta\_j \left( \left( \int\_0^u f(t,\vec{u}) \, \mathrm{d}\vec{u} \right)^j - \left( \int\_0^u f(t,\vec{u}) \, \mathrm{d}\vec{u} \right)^{j+1} \right).$$

Finally, for any *f* ∈ *L*<sup>1</sup> R1 + such that *f* <sup>1</sup> = 1, by Equations (19) and (20), for any *u* > *u*<sup>0</sup> we have

$$\int\_{0}^{\underline{u}} \int\_{0}^{\infty} A\_{0}\left(\vec{u}, \upsilon\right) \mu\_{0}\left(\upsilon\right) f(t, \upsilon) \,\mathrm{d}\upsilon \,\mathrm{d}\overline{\mathrm{d}} = \eta\_{0}\zeta(\underline{u})\,\mathrm{d}\underline{\mathrm{d}}\tag{24}$$

By the above calculations, integrating Equation (11) with respect to *u*, we can see that any solution *f* of Equation (11), corresponding to an initial datum that is a probability density, is such that *x*(*t*) = *F*(*t*, *u*) given by Equation (16), for any fixed *u* > *u*0, satisfies the following equation

$$
\dot{\mathbf{x}} = \mathcal{W}(\mathbf{x})\,. \tag{25}
$$

where

$$\mathcal{W}(\mathbf{x}) = -\eta\_3 \mathbf{x}^4 + \left(\eta\_3 - \eta\_2\right) \mathbf{x}^3 + \left(\eta\_2 - \eta\_1\right) \mathbf{x}^2 + \left(\eta\_1 - \eta\_0\right) \mathbf{x} + \eta\_0 \mathbf{f}(\mathbf{u})\,,\tag{26}$$

where *u* is treated here as a (fixed) parameter.

Therefore, it is easy to see that the parameters of the model can be chosen in such a way that *t* → *F*(*t*, *u*) possesses a diauxic growth for any fixed sufficiently large *u*. We then obtain

**Corollary 2.** *Let Assumption 2 be satisfied and f*<sup>0</sup> *be a probability density on* U = R<sup>1</sup> <sup>+</sup>*. The solution f* = *f*(*t*, *u*) *to Equation* (11)*, given by Proposition 3, is such that the corresponding F* = *F*(*t*, *u*) *given by Equation* (16) *has a diauxic growth with respect to t, for any sufficiently large u* <sup>∈</sup> <sup>R</sup><sup>1</sup> +*.*

#### **5. Macroscopic Behavior in the Nonconservative Case**

In order to adapt to a situation typical in game theory—cf. Section 2, we replace Assumption 1 by the following more general statement.

**Assumption 3.** *Let γ be an integer and*

$$\begin{array}{llll} A\_{\rangle} \ge 0, & a\_{\rangle} \ge 0, & a\_{\rangle} \in L\_{\infty}(\mathbb{U}^{j+1}), \\ A\_{\rangle} \left( \begin{array}{c} \left( \text{\raisebox{-0.5pt}{10.0pt}{10.0pt} \right)} \in L\_{1}(\mathbb{U}) \end{array} \right) \in L\_{1}(\mathbb{U}) & \text{for all} \quad \left( \begin{array}{c} \left( \text{\raisebox{-0.5pt}{10.0pt}{10.0pt} \right)} \in \mathbb{U}^{j+1} \\ \text{such that} \quad a\_{\rangle} \left( \left( \text{\raisebox{-0.5pt}{10.0pt}{10.0pt} \right)} > 0 \right) > 0, & \forall \, j = 0, \ldots, \gamma \end{array} \right) \end{array} \tag{27}$$

In this section, we deal with the macroscopic behavior derived by the mesoscopic structures defined in the previous section.

We decompose <sup>U</sup> <sup>=</sup> <sup>U</sup><sup>∗</sup> <sup>∪</sup> <sup>U</sup>∗, where <sup>U</sup><sup>∗</sup> and <sup>U</sup><sup>∗</sup> are arbitrary (Lebesgue) measurable sets such that <sup>U</sup><sup>∗</sup> <sup>∩</sup> <sup>U</sup><sup>∗</sup> <sup>=</sup> <sup>∅</sup>, both with positive (Lebesgue) measures. For a given solution *<sup>f</sup>* of the mesoscopic equation we are interested in the behavior of

$$\int\_{\mathcal{U}\_\*} f(t, v) \, \mathrm{d}v \qquad \text{and} \qquad \int\_{\mathcal{U}^\*} f(t, v) \, \mathrm{d}v \tag{28}$$

that can be related to *μ*(*t*) and *ν*(*t*), cf. Equation (4), as well as

$$\frac{\int f(t,v) \, \mathrm{d}v}{\int \overline{f(t,v) \, \mathrm{d}v}}$$

that can be related to *x*(*t*) in the macroscopic description, cf. Equation (1) with Equation (5).

Similarly to that of [22], we assume a direct dependence of the rate *α*<sup>2</sup> on the unknown function *f* in Equation (11). This is a Enskog-type of assumption known in kinetic theory—cf. [23] and references therein.

**Assumption 4.** *We assume*

*1. α*<sup>0</sup> = *α*<sup>1</sup> *and*

$$\alpha\_2 = \alpha\_2 \left( f(t); v\_{1\prime} v\_{2\prime} v\_3 \right) = \frac{\kappa}{\left( \int\_U f(t, u) \, du \right)^2} \prime$$

	- *(a) A* (*u*, *v*1, *v*2, *v*3) d*u* = *<sup>a</sup>*<sup>1</sup> *<sup>κ</sup> , if v*1, *<sup>v</sup>*2, *<sup>v</sup>*<sup>3</sup> <sup>∈</sup> <sup>U</sup>∗*;*
	- U∗ *(b)* U∗ *A* (*u*, *v*1, *v*2, *v*3) d*u* = <sup>2</sup> *<sup>a</sup>*<sup>2</sup> <sup>3</sup> *<sup>κ</sup> , if vi* <sup>∈</sup> <sup>U</sup>∗*, for some <sup>i</sup>* <sup>=</sup> 1, 2, 3*, and vj* <sup>∈</sup> <sup>U</sup><sup>∗</sup> *for each <sup>j</sup>* <sup>=</sup> 1, 2, 3 *such that j* = *i;*
	- *(c)* U∗ *A* (*u*, *v*1, *v*2, *v*3) d*u* = *<sup>a</sup>*<sup>3</sup> <sup>3</sup> *<sup>κ</sup> , if vi* <sup>∈</sup> <sup>U</sup>∗*, for some <sup>i</sup>* <sup>=</sup> 1, 2, 3*, and vj* <sup>∈</sup> <sup>U</sup><sup>∗</sup> *for each <sup>j</sup>* <sup>=</sup> 1, 2, 3 *such that j* = *i;*
	- *(d)* U∗ *A* (*u*, *v*1, *v*2, *v*3) d*u* = *<sup>b</sup>*<sup>1</sup> <sup>3</sup> *<sup>κ</sup> , if vi* <sup>∈</sup> <sup>U</sup>∗*, for some <sup>i</sup>* <sup>=</sup> 1, 2, 3*, and vj* <sup>∈</sup> <sup>U</sup><sup>∗</sup> *for each <sup>j</sup>* <sup>=</sup> 1, 2, 3 *such that j* = *i;*
	- *(e)* U∗ *A* (*u*, *v*1, *v*2, *v*3) d*u* = <sup>2</sup> *<sup>b</sup>*<sup>2</sup> <sup>3</sup> *<sup>κ</sup> , if vi* <sup>∈</sup> <sup>U</sup>∗*, for some <sup>i</sup>* <sup>=</sup> 1, 2, 3*, and vj* <sup>∈</sup> <sup>U</sup><sup>∗</sup> *for each <sup>j</sup>* <sup>=</sup> 1, 2, 3 *such that j* = *i;*
	- *(f)* U∗ *A* (*u*, *v*, *v*1, *v*2) d*u* = *<sup>b</sup>*<sup>3</sup> *<sup>κ</sup> , if v*, *<sup>v</sup>*1, *<sup>v</sup>*<sup>2</sup> <sup>∈</sup> <sup>U</sup>∗*.*

Assume now that the payoffs *a*1, *a*2, *a*3, *b*1, *b*2, *b*3, see Section 2, are such that the corresponding Equation (1) with Equation (5) result in solutions that have a diauxic growth—cf. [7]. Then the kinetic Equation (11) leads to diauxic growth of (28) if Assumption 4 is satisfied. In fact

**Theorem 1.** *Let Assumption <sup>4</sup> be satisfied and f*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*1(U) *be nonnegative and such that*

$$\int\_{\mathbb{U}\_\*} f\_0(u) \,\mathrm{d}u > 0$$

*Then, for any t* > <sup>0</sup>*, there exists a unique solution f* = *f*(*t*) *of Equation* (11) *in L*1(U)*. Moreover it is possible to choose the payoffs a*1, *a*2, *a*3, *b*1, *b*2, *b*<sup>3</sup> *in such a way that* (28) *given by the solution f* = *f*(*t*) *has a diauxic growth.*

**Proof.** It is standard to see that the operator defined by the right-hand-side of Equation (11) is locally Lipschitz continuous in *L*1(U). Then a local in time solution *f* = *f*(*t*) exists in *L*1(U) and it is unique. It is also standard that the solution preserves nonnegativity of the initial datum. We observe that *μ*(*t*) := U∗ *f*(*t*, *u*) d*u* and *ν*(*t*) := U∗ *f*(*t*, *u*) d*u* satisfy Equation (4) on the interval of time of existence of the solution.

Therefore *<sup>μ</sup>*(*t*) *<sup>μ</sup>*(*t*)+*ν*(*t*) satisfies Equation (1) on the same time interval. By the form of Equation (4), we observe that any solution of Equation (4) must be bounded on any compact interval. This delivers an *a priori* estimate of the *L*1(U)-norm of the solution, which concludes the proof.

**Remark 1.** *For simplicity, we assumed at the beginning that all payoffs were nonnegative. It is easy to see that Assumption 4 can be easily modified to cover the case if any of payoffs is negative.*

#### **6. Concluding Remarks**

In the paper, we show that some mesoscopic models can produce a diauxic behavior on the macroscopic level. In such a case, the macroscopic picture is more complex that the usual one of a logistic-type, similar to the curve of cumulative normal distribution function (and thus related to the central limit theorem) with only one inflection point. The paper should be understood as the first step of description the relationships between the mesoscopic and macroscopic scales where new and interesting effects can appear. One may hypothesize that a complex but organized behavior on the level of micro-scale or meso-scale can lead to the diauxic macroscopic growth. This, however, still needs a new mathematical background.

**Author Contributions:** Conceptualization, M.L. and M.D.; methodology, M.L.; formal analysis, M.L. and M.D.; investigation, M.L. and M.D.; visualization, M.D.; supervision, M.L.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** M. Lachowicz was supported by the National Science Centre, Poland, Grant 2017/25/B/ST1/00051.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**



**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-3486-2