**Differential Models, Numerical Simulations and Applications**

Editor **Gabriella Bretti**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Gabriella Bretti Istituto per le Applicazioni del Calcolo "M. Picone" Consiglio Nazionale delle Ricerche Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Axioms* (ISSN 2075-1680) (available at: https://www.mdpi.com/journal/axioms/special issues/differential models numerical simulations).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-2299-9 (Hbk) ISBN 978-3-0365-2300-2 (PDF)**

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editor**

**Gabriella Bretti** She graduated in Mathematics and received a Ph.D. in Applied Mathematics from University of Rome "La Sapienza", Italy. She is a permanent researcher at IAC-CNR in Rome. Her research interests are focused on modeling and numerical methods for nonlinear PDEs, flows in heterogeneous media, inverse problems, differential models in mathematical biology, medicine and cultural heritage. In an interdisciplinary framework, her main research interests consist of the study of the available data coming from laboratory experiments or clinical measures in order to extract the most significant underlying features of the observed p henomena. Her expertise consists of the development of mathematical models and related simulation and optimization tools in order to describe and forecast the evolution of a complex systems. Research and Industrial Projects: She has been involved in many national projects and in technology transfer to Italian companies (Autovie Venete, OCTO Telematics) for the development of a predictive software for managing vehicular traffic on the Italian highways. She has been involved in Italian regional projects for the development of mathematical algorithms for the conservation of Cultural Heritage (CH), ADAMO and SISMI DTC Project Latium. Currently, she is involved in the ESA project Pomerium for detecting pollution levels in the air and the effects of degradation on the constituting materials of Colosseum and Pyramid of Cestius. Editorial activity: since September 2021, she has been a member of the Editorial Board of Axioms (MDPI) and a reviewer of many international journals of applied mathematics: https://publons.com/researcher/1306628/gabriella-bretti/.

Conference organization: She organized Minisymposia for IMACS2018, SIMAI2018, SIAMGS21, ICNAAM2021. She was the chair of Workshop Indam MACH2021 and MCHBS 2021 Virtual WorL shop. Research production: She has authored more than 40 papers published in international jour nals in applied mathematics: https://www.researchgate.net/profile/Gabriella-Bretti. She has been invited to give communications in international conferences. https://orcid.org/0000-0001-5293-2115. Her research interests are focused on numerical methods for nonlinear PDEs, differential mȬ ls in mathematical biology and flow in heterogeneous media with application to chemical da mage of monuments for the conservation of cultural heritage. The highlighted topics imply the application of interdisciplinary methodologies and techniques requiring the integration between mathematics, physics and chemistry.

### *Editorial* **Differential Models, Numerical Simulations and Applications**

**Gabriella Bretti**

Istituto per le Applicazioni del Calcolo "M. Picone", Consiglio Nazionale delle Ricerche, via dei Taurini 19, 00185 Rome, Italy; gab.bretti@gmail.com

**Abstract:** Differential models, numerical methods and computer simulations play a fundamental role in applied sciences. Since most of the differential models inspired by real world applications have no analytical solutions, the development of numerical methods and efficient simulation algorithms play a key role in the computation of the solutions to many relevant problems. Moreover, since the model parameters in mathematical models have interesting scientific interpretations and their values are often unknown, estimation techniques need to be developed for parameter identification against the measured data of observed phenomena. In this respect, this Special Issue collects some important developments in different areas of application.

**Keywords:** applied mathematics; numerical methods; computational mathematics; differential and integro-differential models; inverse problems

#### **1. Special Issue Overview**

The Special Issue contains 12 contributions covering a fan of methodology and applications that can be summarized as follows:


#### *1.1. Numerical Methods, Simulations and Control for Particles Dynamics*

In [1] the authors developed a hybrid PDE–ODE mathematical model mimicking the mechanisms observed in cancer-on-chip experiments, where tumor cells are treated with chemotherapy drugs and secrete chemical signals into the environment, attracting multiple immune cell species. The in silico model proposed here goes towards the construction of a "digital twin" of the experimental immune cells and allows the reconstruction of the chemical gradients in the chip environment in order to better understand the complex mechanisms of immunosurveillance. The development of a trustable simulation algorithm, able to reproduce the dynamics observed in the chip, requires an efficient tool for the calibration of the model parameters. In this respect, the present paper represents a first methodological work to test the feasibility and the soundness of the calibration technique here proposed, based on a multidimensional spline interpolation technique for the timevarying velocity field surfaces obtained from cell trajectories.

The authors in [2] studied a relaxation limit of the so-called aggregation equation with a pointy potential in one-dimensional space. The aggregation equation is today widely used to model the dynamics of a density of individuals attracting each other through a potential. When this potential is pointy, solutions are known to blow up in final time. For this reason, measure-valued solutions have been defined. The convergence of this approximation was studied and a rigorous estimate of the speed of convergence in one dimension with the Newtonian potential was obtained; moreover, the numerical discretization of this relaxation limit by uniformly accurate schemes was investigated.

**Citation:** Bretti, G. Differential Models, Numerical Simulations and Applications. *Axioms* **2021**, *10*, 260. https://doi.org/10.3390/axioms10040260

Received: 12 October 2021 Accepted: 13 October 2021 Published: 19 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In [3] a Mean Field Games model where the dynamics of the agents is given by a controlled Langevin equation, and the cost is quadratic, was addressed. An appropriate change of variables transforms the Mean Field Games system into a system of two coupled kinetic Fokker–Planck equations and an existence result for the latter system, obtaining consequently a solution for the Mean Field Games system.

In [4] a tailored version of the Cellular Potts model, a grid-based stochastic approach where cell dynamics are established by a Metropolis algorithm for energy minimization, was developed. The proposed model allowed for quantitatively analyzing selected cell migratory determinants (e.g., the cell and nuclear speed and deformation, and forces acting at the nuclear membrane) in the case of different experimental setups. Most of the numerical results show a remarkable agreement with the corresponding empirical data.

#### *1.2. Modeling and Numerical Methods for Traffic and Manifacturing Problems*

In [5], two models describing the dynamics of heavy and light vehicles on a road network were introduced, taking into account the interactions between the two classes. Such models are tailored for two-lane highways where heavy vehicles cannot overtake. The first model couples two first-order macroscopic LWR models, while the second model couples a second-order microscopic follow-the-leader model with a first-order macroscopic LWR model. Numerical results show that both models are able to catch some second-order (inertial) phenomena such as stop and go waves. Models are calibrated by means of real data measured by fixed sensors placed along the A4 Italian highway Trieste–Venice and its branches, provided by Autovie Venete.

The authors of [6] use empirical traffic data collected from three locations in Europe and the US to reveal a three-phase fundamental diagram with two phases located in the uncongested regime. Model-based clustering, hypothesis testing and regression analyses are applied to the speed–flow–occupancy relationship represented in the three-dimensional space to rigorously validate the three phases and identify their gaps. Accordingly, a threephase macroscopic traffic-flow model and a characterization of solutions to the Riemann problems are proposed. In this work, critical structures in the fundamental diagram that are typically ignored in first- and higher-order models are identified, which could significantly impact travel-time estimation on highways.

In [7], the input-to-state stability (ISS) of an equilibrium for a scalar conservation law with nonlocal velocity and measurement error arising in a highly re-entrant manufacturing system was studied. A numerical discretization of the scalar conservation law with nonlocal velocity and measurement error was introduced and a suitable discrete Lyapunov function was analyzed to provide ISS of a discrete equilibrium for the proposed numerical approximation.

#### *1.3. Inverse Problems for Biomedical Applications*

In [8] a new framework for optimal design was developed, by introducing new protocols for estimating soft tissue parameters in biaxial experiments. This framework is based on the information-theoretic measures of mutual information, and conditional mutual information and their combination is proposed. In particular, the information gain about the parameters from the experiment as the key criterion to be maximized is considered and directly used for optimal design. Information gain is computed through k-nearest neighbor algorithms applied to the joint samples of the parameters and measurements produced by the forward and observation models. For biaxial experiments, the results show that low angles have a relatively low information content compared to high angles. The results also show that a smaller number of angles with suitably chosen combinations can result in higher information gains when compared to a larger number of angles which are poorly combined.

The authors of [9] study the problem of functional connectivity by quantifying the statistical dependencies among time series describing the activity of different neural sources from the magnetic field recorded with magnetoencephalographic (MEG) exam. This problem can be addressed by utilizing connectivity measures whose computation in the frequency domain often relies on the evaluation of the cross-power spectrum of the neural time series, estimated by solving the MEG inverse problem. Recent studies have focused on the optimal determination of the cross-power spectrum in the framework of regularization theory for ill-posed inverse problems, providing indications that, rather surprisingly, the regularization process that leads to the optimal estimate of neural activity does not lead to the optimal estimate of the corresponding functional connectivity. Along these lines, the present paper utilizes synthetic time series, simulating the neural activity recorded by a MEG device to show that the regularization of the cross-power spectrum depends on the spectral complexity of the neural activity.

#### *1.4. Theoretical Study and Numerical Solutions for Integro-Differential Equations*

In [10] a prey–predator system with logistic growth of prey and hunting cooperation of predators is studied. The introduction of fractional time derivatives and the related persistent memory strongly characterize the model behavior, as many dynamical systems in the applied sciences are well described by such fractional-order models. Mathematical analysis and numerical simulations are performed to highlight the characteristics of the proposed model. The existence, uniqueness and boundedness of solutions is proved; the stability of the coexistence equilibrium and the occurrence of Hopf bifurcation is investigated. Some numerical approximations of the solution are finally considered; the obtained trajectories confirm the theoretical findings.

The work in [11] is devoted to the study of dynamical models, such as the Rothamsted Carbon (RothC) model, used predict the long-term behavior of soil carbon content for the achievement of land degradation neutrality as measured in terms of the Soil Organic Carbon (SOC), a key indicator of land degradation. Indeed, a reduction in the SOC stock of soil results in degradation and it may also have potential negative effects on soil-derived ecosystem services. In this paper, continuous and discrete versions of the RothC model were compared, especially to achieve long-term solutions. The original discrete formulation of the RothC model was then compared with a novel nonstandard integrator that represents an alternative to the exponential Rosenbrock–Euler approach in the literature.

The authors in [12] studied the asymptotic behavior of the numerical solution to the Volterra integral equations. In particular, a technique based on an appropriate splitting of the kernel is introduced, which allows one to obtain vanishing asymptotic (transient) behavior in the numerical solution, consistent with the properties of the analytical solution, without having to operate restrictions on the integration steplength.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **Estimation Algorithm for a Hybrid PDE–ODE Model Inspired by Immunocompetent Cancer-on-Chip Experiment**

**Gabriella Bretti 1,\*, Adele De Ninno 2, Roberto Natalini 1, Daniele Peri <sup>1</sup> and Nicole Roselli <sup>3</sup>**

	- **\*** Correspondence: gabriella.bretti@cnr.it

**Abstract:** The present work is motivated by the development of a mathematical model mimicking the mechanisms observed in lab-on-chip experiments, made to reproduce on microfluidic chips the in vivo reality. Here we consider the Cancer-on-Chip experiment where tumor cells are treated with chemotherapy drug and secrete chemical signals in the environment attracting multiple immune cell species. The in silico model here proposed goes towards the construction of a "digital twin" of the experimental immune cells in the chip environment to better understand the complex mechanisms of immunosurveillance. To this aim, we develop a tumor-immune microfluidic hybrid PDE–ODE model to describe the concentration of chemicals in the Cancer-on-Chip environment and immune cells migration. The development of a trustable simulation algorithm, able to reproduce the immunocompetent dynamics observed in the chip, requires an efficient tool for the calibration of the model parameters. In this respect, the present paper represents a first methodological work to test the feasibility and the soundness of the calibration technique here proposed, based on a multidimensional spline interpolation technique for the time-varying velocity field surfaces obtained from cell trajectories.

**Keywords:** differential equations; mathematical biology; cell migration; microfluidic chip

**MSC:** 65M06; 92B05; 92C17; 82C22

#### **1. Introduction**

Recruitment of immune cells to a tumor is a key parameter in cancer prognosis and response to therapy and the complex relationship between cellular, noncellular components and secreted chemotactic factors plays an essential role in directing the migration of both activating and suppressive immune cell types. In recent years with the development of the highly multidisciplinary Organ-on-Chip field (OOC) [1,2], microfluidic technologies are employed as valuable in vitro platform tools to build tumor microenvironments, with modular degree of complexity and to visualize and quantify immune infiltration in response to anticancer therapies.

The work that set the stage for Organs-on-Chip (OOC) was published in 2010 [3] and thereafter OOC constituted a dynamic field of research, and substantial effort has been devoted to creating realistic mimics of different organs in recent years [4–7].

The coupling with live-cell imaging may enable extraction of single-cell tracking profiles which can be processed with advanced mathematical tools. In this context, the present study is inspired by the modeling of the complex mechanisms behind the cell dynamics and interactions between immune (ICs) and treated tumor (TCs) cells in microfluidic chips. In this framework, several studies [1,7,8] were conducted on immunocompetent cancer-onchips to assess the effects of therapeutic drugs on TCs and on the possible reactions of the

**Citation:** Bretti, G.; De Ninno, A.; Natalini, R.; Peri, D.; Roselli, N. Estimation Algorithm for a Hybrid PDE–ODE Model Inspired by Immunocompetent Cancer-on-Chip Experiment. *Axioms* **2021**, *10*, 243. https://doi.org/10.3390/ axioms10040243

Academic Editor: Bin Han

Received: 13 June 2021 Accepted: 22 September 2021 Published: 28 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

immune system. One of the first studies addresses the role of formyl peptide receptor 1/annexin a1 axis in anti-tumor response to anthracycline-based chemotherapy [7]. Timelapse recordings were performed in a microfluidic platform designed for oncoimmunology [1] research to assess physical and chemical contacts between malignant and immune populations. Some of the earliest applications of microfluidic cell culture technology focused on modeling specific steps in the cancer cascade, including tumor growth [9] and expansion [10], angiogenesis [11–16], progression from early to late stage lesions involving an epithelial–mesenchymal transition [17,18], tumor cell invasion [19] and metastasis [20]. Although Organs-on-Chip represent an increasing in importance field of research because it allows experimentalists to have major control on the objects of investigation, resulting in more accurate measurements, some aspects still remain difficult to understand. In particular, a quantification of the average chemical gradients present in the chip environment is difficult to be estimated.

Motivated by these laboratory experiments, we present an in silico mathematical model to describe ICs migration in the case of an efficient interaction with TCs treated with chemotherapeutic drug.

The main goal of this work is to build a bridge between experimental data coming from Cancer-on-Chip experiment given as particles and macroscopic mathematical models, such as the one proposed in reference [21], in order to gain further insights on the dynamics and short-range interactions between cells. Indeed, the present works goes towards the direction of the mean-field limit to unveil the statistical properties of the model. Recently, one of the authors of the present paper has derived rigorously in Wasserstein's type topologies the mean-field limit (and propagation of chaos) to the Vlasov-type equation, in the framework of generalizations of the kinetic model given by Cucker–Smale dynamical system, see reference [22]. Here we propose a simulation algorithm based on a hybrid macroscopic-microscopic chemotaxis model able to reproduce the main features of phenomena observed in microfluidic chip, such as migration of ICs and short-range interactions with cancer cells representing the sources of chemical gradients.

Existing methods used in the literature so far were dealing with the description of particle trajectories and are essentially represented by statistics on cell trajectories, as in the work by Agliari et al. [23]. Another possibility, already applied in other papers dealing with particle trajectories, see for instance reference [24], is to compute the distance between immune cells and tumor cells across time. However, we would like to stress that here our main concern is to show the feasibility of the proposed approach and to create a connection with macroscopic modeling of the same problem, in order to see what happens when we go to the limit (corresponding to the situation in which millions of cells move in the environment). Our final aim is indeed to be able in the future to simulate the behavior of immune cells in the tissue of an organ. In this framework, a novel strategy for the estimation of model parameters to perform the validation against real data and the calibration of the mathematical model is introduced. Such strategy involves from one hand the analysis of ICs trajectories coming from experimental data to describe realistically the average speed and the pathways of immune cells and the creation of a synthetic dataset for evaluating the effectiveness of the calibration technique for the estimation of model parameters. Moreover, in order to determine the velocity distribution of ICs, a stochastic component is obtained analyzing real trajectories of cells falling in the area under examination and then it is added to the deterministic velocity field.

The modeling of dynamics and interactions in microfluidic chip environment was already studied in [18,25–27] with particle models and with cellular automata models in [28]. In [21] a fully macroscopic mathematical (PDE) model was considered, both for the chemical gradient which is seen as an average field and for the density of immune cells. With the PDE model we were able to describe long-range interactions in the chip environment and a first estimate of the chemical gradient driving ICs movement was obtained in [29].

Here, instead, we are interested in describing efficient short-range interactions between treated tumor cells and immune cells. As an example of this phenomenon, we considered the study and the experimental setting in [7], where tumor cells treated with chemotherapy drug release a chemical stimulus sensed by healthy immune cells, thus promoting their migration towards the tumor cells. For the construction of the model, we used a *discrete in continuous* approach where we coupled a reaction-diffusion partial differential equation (PDE) describing the evolution of the average substances released by the TCs in the tumor microenvironment with a particle model, where every IC in the system was considered to be a single entity and provided with specific properties, as previously done in [30] in a different experiment. Here the hybrid approach was revealed to be crucial in providing a proper description of the multiscale phenomenon that with a traditional macroscopic model would not be possible to fully decipher. In particular, our model describes the dynamics of each IC by means of ordinary differential equations (ODE) in such a way that every cell can be followed individually. Since the motion of ICs is driven by chemotaxis, the migratory activity is regulated by a chemotactic term which allows them to sense the gradient of the chemicals in the tumor neighborhood and of specific forces such as adhesion–repulsion which establish between cells.

The calibration of the mathematical model is supported by the development of ad-hoc parameter estimation procedure based on an interpolation technique for the approximation of the velocity field of ICs across time, i.e., multidimensional spline method previously introduced in [31] and described in Section 4.1. Such procedure, applied to synthetic cell trajectories dataset, provides accurate estimates for the values of unknown model parameters and demonstrates to be a promising approach for the validation of the proposed model against real data.

#### *1.1. The Geometry of the Microfluidic Chip and the Related Computational Domain*

The immune-oncology chip designed for the experiment consists of three main culture chambers for plating adherent TCs and floating ICs connected by a bridge of microcapillaries allowing chemical and physical contacts. In Figure 1 a picture of the two boxes is shown. With regards to the dimensions, the capillaries have, respectively, width and length of 12 μm and 500 μm, while the height is of 10 μm; however, since in the video footage the experiment is recorded at a fixed height, the third spatial dimension in our framework is neglected.

**Figure 1.** Microfluidic chip environment. On the (**left**): real microphotograph of microfluidic devices filled with food dye and schematics of the 2D layout with three main culture chambers connected by arrays of microchannels. On the (**right**): Timelapse video frame with detected immune cells in the left and intermediate chambers surrounded, respectively, by red and green circles. Credit: Vacchelli et al. (2015) edited by AAAS.

The cross-sectional dimensions of culture chambers are 1 mm (width) × 100 μm (height). Further details about the chip design are illustrated in [7]. We also underline that here we are considering a 2D culture in the liquid with dying cancer cells adherent to the glass slide and mainly static, and immune cells floating.

In the experimental set-up, the timelapse observation area comprises left TCs culture chamber and central one with loaded ICs. Our goal consists of modeling the migration of ICs towards the TCs taking into account the different forces acting on the cells. To achieve such a result, we neglected the migration of ICs in the right chamber and through the microchannels, and we only focused on motility patterns of infiltrated ICs in the TCs left chamber.

Moreover, in order to better analyze the short-range dynamics and interactions between tumor and immune cells, we restricted our study on a subarea of the left chamber of length and height equal to 200 μm. This area corresponds to the subset [400, 600] × [200, 400] μm, and we defined it as Ω = [0, *Lx*] × [0, *Ly*]. In Figure 2 we present the setting of the experiment at time *t* = 24 h, with a focus on the area under analysis. The main reason for considering only a portion of the chip is motivated by:


Moreover, Ω contains four treated TCs, and this makes the area a good representative of cell dynamics, since in the laboratory experiment there are about 60 TCs in the entire chamber.

**Figure 2.** Timelapse pre-processed image of microfluidic chip environment at time t = 24 h. (**A**) Full chip, in red the domain Ω; (**B**) Focus on the area Ω.

#### *1.2. Original Contribution of the Present Paper*

To describe such complex mechanisms with a multiscale nature, the model here proposed belongs to the category of hybrid models, involving the coupling between macroscopic and microscopic models.

As mentioned above, we assume the presence of forces between cells, which are established due to the chemoattractant and due to the nature of the cells itself. For this reason, in the current work, we refer to the discrete–continuous model (1)–(2) proposed in [30] to describe the morphogenesis of the posterior lateral line system in zebrafish. Such a model includes a classical reaction-diffusion partial differential equation (PDE) used to describe the evolution of the concentration of chemicals released by TCs in the tumor milieu, coupled with ordinary differential equations (ODEs) for the inter-cellular dynamics of ICs, which establishes both due to the presence of the chemoattractant and to the forces generated between the cells. In particular, the use of a particle model permits the highlighting of the role of single cells in the space, to analyze short-range interactions and to attribute specific characteristics to cells behavior that a macroscopic model would not allow. As mentioned in Section 1.1, for computational reasons and for the datatype at

our disposal, cell tracks extracted from video recorded at a fixed level on the z-axis, we neglect the third dimension and consider only the 2D case. However, the third dimension can be easily taken into account by the model, but we do not expect important changes in the overall dynamics.

Here, chemical signals are described as an average field by means of a reactiondiffusion equation

$$
\partial\_t f = D \Delta f + S(t, \mathbf{Y}, f) \tag{1}
$$

with a possible source or degradation term given in *S* term. Equation (1) was endowed with Robin boundary conditions to state the exchange of chemicals between the internal and the external environment. The gradient field *f* influences the evolution of cell positions, according to a second order equation of the form:

$$\dot{\mathbf{X}}\_{i} = F(t, \mathbf{X}, \dot{\mathbf{X}}, \mathbf{Y}, f, \nabla f) - \mu \dot{\mathbf{X}}\_{i} \tag{2}$$

where **X***i*, *i* = 1, ..., *Ntot*,*I*, is the position vector of the *i*-th IC, *Ntot*,*<sup>I</sup>* is the total number of ICs, **<sup>X</sup>** = (**X**1, ...,**X***Ntot*,*I*) contains all the positions, and **<sup>X</sup>**˙ = (**X**˙ 1, ...,**X**˙ *Ntot*,*I*) the velocities. The vector **Y** = (**Y**1, ...,**Y***Ntot*,*<sup>T</sup>* ), *i* = 1, ..., *Ntot*,*T*, stands for TCs positions, which are taken as constants in the problem, since dying TCs do not migrate as time evolves. The function **F** includes several effects: from the detection of the chemical signal *f* (chemotaxis) to mutual interactions between ICs (adhesion and repulsion) and ICs and TCs (adhesion and repulsion). All these effects take into account a non-local sensing radius. The term *μ***X**˙ *<sup>i</sup>* represents damping due to cell adhesion to the substrate.

We remark that the use of second order equations to describe the cell motion was first used in the seminal work [32] in the framework of Newton's equation of motion for particles. Indeed, even in this case the acceleration is not so high, this formulation describes the effects of the presence of an external force causing velocities and direction changes. Moreover, the friction term for cells immersed in a fluid is also taken into account. In order to have a more complete model able to represent the randomness characterizing cell motility, we also introduced a stochastic component in the velocity field in the spirit of Langevin model, see the review [33] and references therein.

In this context, in order to better reproduce the phenomenon under consideration such as the presence of chemical stimulus of treated cancer cells, we applied the following modification respect to the model in [30]:


The equations are solved in a square domain Ω, introduced in Section 1.1, subset of the chamber where the experiment was performed and the approximation was carried out using a classical central difference scheme in space and the Crank–Nicolson scheme in time, although in Appendix A we prefer to present the general form of the scheme based on the *θ*-method.

The main novelty of our approach here is mainly represented by:


In particular, such calibration algorithm—based on the mathematical model—is built to test the possibility of validating the model on real data provided by experimentalists. The main goal of the present work is indeed the introduction of a robust procedure for the estimate of model parameters by means of the paths taken by ICs in the chip subarea under examination. It is worth noting that this is not an easy task, since to succeed in the model calibration we need to extract a common behavior from the time-varying trajectories of immune cells located at different points of the examined area. For this reason, in this first work based on this framework we propose a calibration strategy taking into account this variability and applying it to a synthetic dataset, in order to assess the effectiveness of our methodology.

We underline that the synthetic dataset of ICs pathways has been created reproducing qualitatively the real trajectories extracted from the video footage of the experiment, in order to have a "realistic" dataset. In more detail, a set of trajectories of cells sharing an average behavior has been produced by suitably tuning model parameters to have an upper and lower bound of cell speeds taken from the experimentally observed ones. With these synthetic trajectories, we have computed ICs velocity field at every observation time as a surface produced by the model itself, supposing to have a single population of immune cells thus showing an average behavior. Such velocity field has been then approximated using a multidimensional spline interpolation technique, described in Section 4.1 to have smoothing in time and space on the dataset to be compared.

Finally, this average field has been used as our target solution to be used in the calibration procedure, as described in detail in Section 4.2. The methodology for the error quantification is based on the minimization of a cost functional depending on the difference between the target velocity field of cell in the whole domain and the velocity field obtained for another choice of the model parameters during the optimization procedure performed with a local search method and we compared them at every time step. In addition to this, we also studied the introduction of further terms in functionals with the aim of improving the optimization results and show the soundness of our algorithm representing a starting point for investigations on experimental data in the next future.

#### *1.3. Main Contents and Plan of the Paper*

We introduce a hybrid model composed of a reaction diffusion equation, describing the time and space evolution of signaling substances released by treated TCs in a subarea of the main chamber, and of an agent-based model made up of second order differential equations for each IC. The boundary conditions associated with the partial differential equation are of Robin type, so that we can have a control on the amount of chemical exchanged with the surrounding microenvironment. In addition, based on experimental observation, an ad hoc parameter estimation technique was developed. To summarize the main contents of the present work, the mathematical issues faced in this study are identified into:


The plan of the paper is as follows. In Section 2 we describe the biological framework that inspired our study, and we introduce the mathematical formulation of biologically inspired models, and we present the adopted model. Section 3 is devoted to the modeling study on the scenarios suggested from the observation of the dynamics in the laboratory experiment performed with the numerical simulation algorithm reported in the Appendix. Section 4 contains the parameter estimation techniques here developed and the results

obtained with the calibration algorithm. A sensitivity analysis is also performed and reported in the Appendix. Finally, in Section 5 we discuss the presented results, and the future aims of our work.

#### **2. Materials and Methods**

#### *2.1. Biological Framework*

The OOC technology allow the design and recreation of more sophisticated in vitro cellular microenvironments under physiological or pathological scenarios; as potential candidates for better prediction of human responses than animal testing cannot. This success is also due thanks to compatibility with a variety of microscopy techniques including live-cell high-content imaging and to the wide variety of cells and tissues that the chips can host, see [1].

One of the most challenging scenarios for the application of these devices is represented by cancer-immune relations due to very complex and not still completely discovered signaling modalities between ICs and TCs. Some attempts have been presented with the aim of modeling the effect of drugs on TCs, see for instance [34,35] where the microfluidic devices helped to control and evaluate the magnitude of cancer responses.

In particular, here we refer to the study in Vacchelli et al. [7], where timelapse imaging of microfluidic co-cultures were performed to investigate the motility patterns and crosstalk between ICs and TCs in the context of chemotherapy-induced anticancer immune responses.

#### Setting of the Laboratory Experiments

Details on the in vitro microfluidic experiments, type of cells involved and loading procedures can be found in [7,36]. Briefly, human breast cancer cells were previously treated with anthracycline-based chemotherapy and were cultivated in the left chamber. This treatment triggers the process of immunogenic cell death [37], according to which TCs release danger signals. The right chamber is loaded with unlabeled human peripheral blood mononuclear cells (PBMCs) from healthy donors (with normal expression of FPR1). PBMCs start migrating biased by the detection of chemical signal produced by TCs and after crossing microchannels enter in contact with dying cancer cells, engaging in stable interactions leading to TCs killing events.

In the chip, the chosen culture medium is neutral, which means that no exogenous substance is introduced. Timelapse imaging was performed using a microscope placed directly inside the CO2 incubator for all the duration of the recordings. Images were taken every 2 min over a period of 72 h of migration. Immune cells tracks (<400) were extracted in left chamber (interval time: 24–48 h) using TrackMate plugin(ref) available in the Fiji/ImageJ software (https://imagej.nih.gov/ij/, accessed on 1 August 2020).

Our aim consists of designing a basic mathematical model able to capture the main aspects related to the migratory movement of the ICs with respect to TCs. Therefore, we only focus on the case of efficient interaction between dying cancer cells exposed to ICD inducers and immune cells from healthy donors. Concurrently, we want to develop an algorithmic calibration procedure to derive model parameter values as close to empirical observations as possible.

#### *2.2. Mathematical Framework*

In this context, it is important to remark that late years have experienced an increasing interest in the direction of developing techniques to combine experimental data and mathematical models, in order to produce systems, i.e., in silico models, whose solutions could be as close as possible to the experimental outcomes. Indeed, the success of informed models is mainly due to the consistent improvements in computational abilities of the machines and in imaging techniques that allowed a wider access to data.

The involvement of the immune system in all stages of the tumor life cycle, including prevention, maintenance and response to therapy is now recognized as central to under-

standing cancer development from a systemic point of view. The increasing availability of experimental data and of treatment options, have represented a breeding ground to construct always more precise models. From a mathematical point of view, equations of different nature can be considered, depending on the type of analysis to be carried out. In macroscopic models, consisting of PDEs, any reference to the single constituents, the cells, are neglected, and macroscopic quantities such as the average cell densities are taken into consideration, see for instance the classical models [38,39] and their application to the immunocompetent microfluidic chip experiments in the recent work [21], allowing the simulation of long-range dynamics of immune cells driven by the chemical gradients secreted by cancer cells in the environment. The evolution of chemical stimuli in the tumor microenviroment can be also described by means of ODE models, see for instance the work [25] consisting of three ODEs that model the dynamics of effector and tumor cells and the cytokine IL-2, is one of the first to describe tumor-immune cells interactions. Another model in the same direction is due to Lee et al. [18], where changes in velocity and directionality of ICs are modeled with a system of ODE considering the combined effects of the interstitial flow, the tumor mass, the cytokines IL-8 and CCL-2, and the related receptors. Moreover, ODE models describing cell responses heterogeneity to death ligand (cytotoxic drugs) observed in laboratory experiments with single-cell techniques is proposed in [40]. In [28] the immune system is depicted as a network in the framework of cellular automata, where the nodes are molecules and cells and the arches connecting them are the influences each node has on others. Another kind of modeling consists of combining the *macro* approach expressed by a PDE, with the *micro* approach expressed by an ODE.

Hybrid models, developed including the macroscopic and microscopic scales, have been deeply studied in recent years, see [41–44]. In particular, in [41] a hybrid discrete– continuous model of metastatic cancer cell migration was developed, while in [45] a hybrid discrete-continuum approach was applied to model Turing pattern formation. In [43] a 3D agent-based model is proposed using an off-lattice approach to simulate vasculogenesis. For a comprehensive literature review of hybrid models sees the book chapter [46].

Here, in the framework of hybrid deterministic PDE–ODE models, we propose a *discrete–continuous* approach previously applied in [30], where immune cell motion is governed by ODE and the diffusion of chemoattractant released by cancer cells in the environment is described by PDE. The present work is complementary to the fully macroscopic approach proposed in [21], since it allows a zooming on immune system dynamics. Moreover, we enriched the equations of cell motion with a Brownian component to reproduce zigzag pathways of ICs.

#### The Model

Here we develop a model that can mimic cell migration under the effects of chemicals released by TCs on immune system dynamics. The considered TCs are experiencing apoptosis during which they liberate damaged-associated molecular patterns together with tumor antigens that elicit an immune response. For the sake of simplicity, the alarm concentrations are not differentiated, and we indicate them by means of the chemoattractant *f* , whose evolution is described by a PDE. At the cellular level, the model is discrete and includes the equation of the motion of each IC. We also underline that since we refer to experiment of treated cancer cells, according to the experimental setting we do not need to include TCs migration or duplication in our model. Let us summarize the main ingredients which compose the model.

For the IC motion we use a second order dynamic equation, which takes into account the forces acting on the cells that arise from the presence of the chemical signals and from the mechanical interactions between cells. The effect of cancer is described by a chemotactic term produced by the gradient of the *f* . The cell–cell mechanical interactions due to filopodia, consist of a radial attraction and repulsion depending on the relative positions of the cells, see [47] for experimental results in this direction. To describe this effect we refer

to the mechanism introduced in [48]. The action of ICs regarding TCs is described by a radial repulsion term, depending on cells positions. Finally, we introduce a damping term, proportional to the velocities, which models cell adhesion to the substrate [49–51].

About the concentration of the chemoattractant, we associate a diffusion equation including a source term, given by the chemoattractant released by TCs, and a natural degradation term.

In the current context, the species under examination are TCs and ICs, but we underline that the setting can be made more complex with the introduction of a greater number of cell species and with the presence of an exogenous substance in the environment. Furthermore, in the present framework we are not taking into account the killing activity of ICs towards TCs as we already did in [21], since we do not have a quantitative data regarding the killing rate of immune cells in the chip environment. However, in the future also this feature will be included in the model.

Let **X***<sup>i</sup>* be the position of the *i*-th IC and *f*(**x**, *t*) the total chemoattractant concentration, the following equations are introduced:

$$\int \partial\_{\mathbf{l}} f = D \Delta f + \zeta F\_{\mathbf{l}}(\mathbf{Y}) - \eta f \tag{3}$$

$$\mathbb{E}\left[\left.\hat{\mathbf{X}}\_{i} = \gamma F\_{1}(\boldsymbol{\chi}(f)\nabla f) + F\_{2}(\mathbf{X}\_{i}\mathbf{Y}) + F\_{3}(\mathbf{X}) - \mu\dot{\mathbf{X}}\_{i}\right.\tag{4}$$

where *γ*, *μ*, *ξ* and *η* are given constants and *Fn*(·), *n* = 1, 2, 3, 4 are suitable functions.

Function *F*<sup>1</sup>

The term *F*<sup>1</sup> is the chemotactic term and therefore it is related to the detection of a chemical signal by *i*-th cell in its neighborhood and it is taken to be a weighted average over a ball of radius *R*¯ and centered in **X***i*:

$$F\_1(\mathbf{g}(\mathbf{x},t)) = \frac{1}{\mathcal{W}} \int\_{\mathbf{B}(\mathbf{X}\_i, \mathbb{R})} \mathbf{g}(\mathbf{x}, t) w\_i(\mathbf{x}) d\mathbf{x} \tag{5}$$

where

$$\mathbf{B}(\mathbf{X}\_{i\prime}\mathcal{R}) := \{ \mathbf{x} : ||\mathbf{x} - \mathbf{X}\_{i}|| \le \mathcal{R} \},$$

· being the Euclidean norm,

$$w\_i(\mathbf{x}) = \begin{cases} 2\exp\left(-||\mathbf{x} - \mathbf{X}\_i||^2 \frac{\log 2}{\bar{R}^2}\right), & \text{if } ||\mathbf{x} - \mathbf{X}\_i|| \le \bar{R}, \\ 0, & \text{otherwise,} \end{cases} \tag{6}$$

is a truncated Gaussian weight function, and

$$\mathcal{W} := \int\_{\mathbf{B}(\mathbf{X}\_i, \mathbb{R})} w\_i(\mathbf{x}) d\mathbf{x}\_i$$

independently on *i*. A similar definition holds for the vector quantity *F*1.

In addition, the function *F*<sup>1</sup> contains the gradient of the chemical substance and a chemotaxis function *χ*(*f*), representing the chemotactic sensitivity of ICs. Such chemotactic function, suggested in [52] by Lapidus and Schiller (1976) has the form:

$$\chi(f) = \frac{k\_1}{(k\_2 + f)^2},\tag{7}$$

where *k*<sup>1</sup> represents the cellular drift velocity, while *k*<sup>2</sup> is the receptor dissociation constant, which indicates how many molecules are necessary to bind the receptors. Such function is known as receptor saturation function and has the effect of reducing chemotaxis of cells in areas of high chemoattractant concentrations. In the integral, *R*¯ will be chosen larger than the IC radius *RI*, thus relation (5) describes a chemical signal that is sensed more in the center of the cell and less at the edge of the cell extensions.

Function *F*<sup>2</sup>

Function **F**<sup>2</sup> includes a repulsion effect among TCs and ICs. In particular, repulsion occurs at a distance between the centers of two cells less than *R*1, where *R*<sup>1</sup> is defined as the sum of the two radii, *RI* and *RT*, of the immune and tumor cells, respectively. With this formulation, repulsion occurs when an IC and a TC start being effectively overlapped. We assume

$$\mathsf{F}\_{2}(\mathsf{X},\mathsf{Y}) = \sum\_{j:\mathsf{Y}\_{j}\in\mathsf{B}(\mathsf{X}\_{i},\mathsf{R}\_{2})} \mathsf{K}(\mathsf{Y}\_{j} - \mathsf{X}\_{i}),\tag{8}$$

where the function **K** depends on the relative positions **Y***<sup>j</sup>* − **X***i*, namely:

$$\mathbf{K}(\mathbf{Y}\_{j} - \mathbf{X}\_{i}) = -\omega\_{\text{repPT}} \left( \frac{1}{||\mathbf{Y}\_{j} - \mathbf{X}\_{i}||} - \frac{1}{R\_{1}} \right) \frac{\mathbf{Y}\_{j} - \mathbf{X}\_{i}}{||\mathbf{Y}\_{j} - \mathbf{X}\_{i}||}, \quad \text{if } ||\mathbf{Y}\_{j} - \mathbf{X}\_{i}|| \le R\_{1}. \tag{9}$$

where *ωrepT I* is a constant. The quantity *R*<sup>2</sup> > *R*<sup>1</sup> indicates the radius of the ball centered in **X***i*, in which the centers of the tumor cells can fall. Thus, we are considering all the tumor cells in proximity of the center of an immune cell.

#### Function *F*<sup>3</sup>

Function **F**<sup>3</sup> includes adhesion–repulsion effects between ICs. In particular repulsion occurs at a distance between the centers of two ICs less than *R*<sup>3</sup> and takes into account the effects of a possible cell deformation. Conversely, adhesion occurs at a distance greater than *R*<sup>3</sup> and less than *R*<sup>4</sup> > *R*3, and it is due to a mechanical interaction between cells via filopodia. We assume

$$\mathsf{F}\_{3}(\mathsf{X}) = \sum\_{j:\mathsf{X}\_{j}\in\mathsf{B}(\mathsf{X}\_{i},\mathsf{R}\_{4})} \mathsf{K}(\mathsf{X}\_{j} - \mathsf{X}\_{i}),\tag{10}$$

where the function **K** depends on the relative positions **X***<sup>j</sup>* − **X***i*, namely:

$$\mathbf{K}(\mathbf{X}\_{j} - \mathbf{X}\_{i}) = \begin{cases} -\omega\_{\text{rep}} \left( \frac{1}{||\mathbf{X}\_{j} - \mathbf{X}\_{i}||} - \frac{1}{R\_{3}} \right) \frac{\mathbf{X}\_{j} - \mathbf{X}\_{i}}{||\mathbf{X}\_{j} - \mathbf{X}\_{i}||}, & \text{if } ||\mathbf{X}\_{j} - \mathbf{X}\_{i}|| \le R\_{3}, \\\\ \mathbf{y}, \quad \mathbf{y}. \end{cases} \tag{11}$$

$$\begin{cases} \boldsymbol{\omega}\_{\text{add}} \left( ||\mathbf{X}\_{\text{j}} - \mathbf{X}\_{\text{i}}|| - R\_{3} \right) \frac{\mathbf{X}\_{\text{j}} - \mathbf{X}\_{\text{i}}}{||\mathbf{X}\_{\text{j}} - \mathbf{X}\_{\text{i}}||} \text{ } & \text{if } R\_{3} < ||\mathbf{X}\_{\text{j}} - \mathbf{X}\_{\text{i}}|| \le R\_{4} \text{ (12)} \end{cases} \text{ for } R\_{3} < ||\mathbf{X}\_{\text{j}} - \mathbf{X}\_{\text{i}}|| \le R\_{4} \text{ (12)}$$

where *ωrep*, *ωadh* are constants. *R*<sup>3</sup> = 2*RI* will be chosen, so that the repulsion occurs when two cells start being effectively overlapped. Please note that function (11) gives a repulsion which goes as 1/*r*, *r* being the distance between the centers of two cells as we can find in [53,54]. The function (12) is the Hooke's law of elasticity. The last term in the first equation is due to the cell adhesion to the substrate (see for example [49–51]).

#### Friction

The addend *μ***X**˙ *<sup>i</sup>* in the equation of the motion is due to cell adhesion to the substrate, with damping coefficient equals *μ*.

#### Function *F*<sup>4</sup>

In the diffusion equation, only cancer cells are responsible for the production of the chemoattractant, so that

$$\mathbf{F\_4(Y)} = \sum\_{j=1}^{N\_{\text{Int},c}} \chi\_{\mathbf{B}(\mathbf{Y}\_j, \mathbb{R}\_T)\_{\forall}} \tag{13}$$

where *Ntot*,*c* is the total number of cancer cells, and

$$\chi\_{\mathbf{B}(\mathbf{Y}\_{j},\mathbb{R}\_{\Gamma})} = \begin{cases} 1, & \text{if } \mathbf{x} \in \mathbf{B}(\mathbf{Y}\_{j}, \mathbb{R}\_{\Gamma}), \\\\ 0, & \text{otherwise.} \end{cases} \tag{14}$$

In the previous formula *RT* is the radius of a cancer cell, considering that the source of chemoattractant is defined by the dimension of a single cell.

#### Initial Conditions

Initial data for Equation (4) are given by the position and velocity of each IC:

$$\mathbf{X}\_i(0) = \mathbf{X}\_{i\downarrow 0} \text{ \textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{0}}{\text{ }}\_i \text{\textbf{\color{red}{}}{\text{ }}\_i}$$

Moreover, since TCs do not migrate and maintain their initial position **Y** during the whole time, the initial data for Equation (3) is provided by the chemoattractant produced by TCs at time *t* = 0:

$$f(\mathbf{x},0) = F\_4(\mathbf{Y}).$$

Boundary Conditions

Now, let Ω = [0, *Lx*] × [0, *Ly*] our domain, for the chemoattractant we require the inhomogeneous Robin boundary condition:

$$D\frac{\partial f}{\partial \mathbf{n}} + af = b\_{\prime} \text{ on } \partial \Omega\_{\prime} \tag{15}$$

where *b* signals the similarity with the inhomogeneous Neumann boundary condition and regulates the exchange with the external environment.

The system of Equations (3) and (4) can be now rewritten as:

$$\begin{cases} \begin{aligned} \partial\_t f &= D \Delta f + \xi \sum\_{j=1}^{N\_{\text{tot}}} \chi\_{\mathbf{B}(\mathbf{Y}\_j, \mathbf{R}\_\Gamma)} - \eta f\_{\prime} \\ \quad \mathbb{X}\_i &= \frac{\gamma}{\mathcal{W}} \int\_{\mathbf{B}(\mathbf{X}\_i, \mathbf{R})} \chi(f(\mathbf{x}, t)) \nabla f(\mathbf{x}, t) w\_i(\mathbf{x}) d\mathbf{x} + \sum\_{j: \mathbf{Y}\_j \in \mathbf{B}(\mathbf{X}\_i, \mathbf{R}\_2)} (\chi\_{\mathbf{Y}\_i}) \mathbf{K}(\mathbf{Y}\_j - \mathbf{X}\_i) \\ &+ \sum\_{j: \mathbf{X}\_j \in \mathbf{B}(\mathbf{X}\_i, \mathbf{R}\_4)} (\chi\_{\mathbf{X}\_i}) \mathbf{K}(\mathbf{X}\_j - \mathbf{X}\_i) - \mu \mathbf{V}\_i. \end{aligned} \end{cases} (16)$$

#### *2.3. Stochastic Model*

The model (16) is composed of deterministic equations, but a stochastic version for Equation (4) can be formulated. In fact, in recent years, several studies have shown that ICs exhibit an intermittent motion composed of a walking phase and of a zigzag phase. The walk is characterized by pause steps between the run steps, while during the zigzag, cells tend to turn away from their last turn directions and prefer to move forward in a zigzag manner [27,55].

This characteristic walk is here described by Brownian motion [56] as a first approach to the problem, revealing to be effective in reproducing the randomness of cell trajectories. The stochastic equation for ICs motion is:

$$
\dot{\mathbf{X}}\_i = \gamma F\_1(\chi(f)\nabla f) + F\_2(\mathbf{X}\mathbf{Y}) + F\_3(\mathbf{X}) - \mu(\dot{\mathbf{X}}\_i - \sigma\psi). \tag{17}
$$

Equation (17) contains the stochastic contribution, where *ψ*(*t*) is a Gaussian white noise, and *σ* is the standard deviation of ICs trajectories. With this formulation, ICs are not only subjected to mechanical forces such as adhesion or repulsion, but also to random factors that might be related to unknown cell mechanisms. Some estimates of parameter *σ* based on experimental data are provided in Section 3.1.3.

#### **3. Study on Different Scenarios: Numerical Tests**

In this section, we look at some of the different scenarios the model can produce. We remark that the numerical simulations are performed in MATLAB c . The computational time for a simulation on the total number of frames *Tf* = 681(*N*Δ*<sup>t</sup>* = 680) takes about 260 s on an Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz 2.59 GHz.

#### *3.1. Scenarios Representing Relevant Features of ICs Dynamics and Interactions*

One of the motivations of the present study is to show the features of the proposed hybrid model in describing Cancer-on-Chip experiment. To this aim, we explore the different dynamics in three significant situations represented by the scenarios here proposed. During apoptosis TCs release alarm substances, which remain localized in proximity of the cells, that become *attractors* for ICs. In absence of an incoming chemical flow through the boundary, if the immune cells fall in the basin of attractions of the tumor cells, would remain trapped in their proximity all the time. From the experiments, however, we observed that ICs migrate also towards the bottom (where a reservoir of TCs was located) and to the left side of Ω, due to the presence of other cells in the surrounding microenvironment, generating a diagonal motion, as illustrated in Figure 2.

To allow ICs migration towards the sides of the domain, we added an incoming chemical flow through the boundaries, which caused changes in ICs orientation.

Moreover, to have control over the changes in orientation and direction of the migratory activity, it is necessary to find proper parameter values to attain a balance between the internal chemical concentration due to TCs and the concentration present at the boundary. For instance, if the chemotactic concentration at the boundary is higher than the internal concentration, ICs will sense the resulting gradient and move towards the boundary. On the opposite, ICs will not sense the gradient at the boundary. Thus, a balance between the internal and the boundary concentrations must be reached to have an IC attracted by the TCs and by the concentration on the sides.

For the numerical simulations we consider the domain Ω where four TCs are present (see Figure 2). The numerical positions of the TCs and the initial concentration of the chemoattractant released by the TCs are shown in Figure 3. We numbered the TCs from 1 to 4 to distinguish them.

**Figure 3.** TCs locations (**left**) and TCs chemoattractant (**right**) at time *t* = 0. The grid dimensions are in μm.

In our preliminary study, we also changed the number of TCs and their positions, with the aim of assessing the effect of their presence and their localization on the overall dynamics of ICs.

We assume the coefficients *a* and *b* in the Robin boundary condition (15) in such a way to have an incoming flux through the sides *y* = 0, *x* = *Lx* and *x* = 0.

The laboratory experiment recording immune-cancer cells interaction has an overall duration of 24 h, but after data analysis we identified the presence of 97 ICs crossing the area in a time interval of 681 frames, which correspond to 1360 min (22 h 40 min) of video recording. To provide an idea of how ICs dynamics changes in time, we divided the time interval in six homogeneous parts, so that we can take a picture of ICs positions at six different instants *Tk* for *k* = 0, ...,5 (*T*<sup>0</sup> = 1 frame, *T*<sup>1</sup> = 136 frames (4 h 32 min), *T*<sup>2</sup> = 272 frames (9 h 4 min), *T*<sup>3</sup> = 408 frames (13 h 36 min), *T*<sup>4</sup> = 544 frames (18 h 8 min), *T*<sup>5</sup> = 681 frames (22 h 40 min)).

The initial conditions for Equation (4) are assumed as the initial positions and speeds of the experimental cells.

In the following we consider three different scenarios that describe the interactions between TCs and ICs in a subarea Ω of the chip. Specifically, we consider:


The first scenario is assumed to be a prototypal case, where the reciprocal mechanical forces between cells and the chemotactic stimuli are the only guiding forces of ICs migration, the second case takes also into account the possible effects of TCs death over ICs motion, and the third case contemplates the addition of the stochastic component Equation (A11), which modifies the deterministic IC trajectories.

#### 3.1.1. Scenario 1: Deterministic Motion

In Figure 4 we show the final concentration of the chemoattractant, plotted as a 2D surface with a focus on the contour lines (left), and at the boundary of the domain and in correspondence of the centers of the TCs as a function of the longitudinal (center) and transverse distance (right). Cutting the surface this way, the maximum concentrations related to each TC are highlighted.

**Figure 4.** Final Concentration. (**Left**) Concentration plotted in 2D, with contour lines. (**Center**) Concentration profiles as a function of the longitudinal distance. (**Right**) Concentration profiles as a function of the transverse distance. The grid dimensions are in μm.

In Figure 5 the time evolution of the migratory activity of ICs at six different times is depicted. At the initial time, only one IC enters the domain, while at time *T*<sup>1</sup> the number of ICs increases, and they are directed towards the tumor. As the time grows, most of the ICs approach the TCs and stay nearby, accumulating around them; while the others, guided by the inflow of chemical signal, move towards the left-bottom boundaries of the domain.

#### 3.1.2. Scenario 2: Deterministic Motion including Cell Death

The biological experiment inspiring our work is related to the phase of immunogenic cell death, during which the TCs, previously treated with a drug, release alarm molecules sensed by the immune system. This latter reacts, attacking the tumor.

In the current scenario we focus on the description of the effects of cell death on the ICs dynamics. We remark that in this preliminary study, we did not insert a specific term in the model to describe the death of TCs due to ICs, but we directly *turned off* some TCs. Of course, this represents a simplification of what happens in reality, but is here applied for illustrative purposes. However, in the next future we will include in our modeling the death of cancer cells as a consequence of killing activity of ICs, similarly to the modeling proposed in [21].

**Figure 5.** ICs dynamics photographed at six consecutive times. ICs meet all TCs and move towards the left and the right sides of Ω. The grid dimensions are in μm.

Please note that in this simplified setting we let dead TCs physically remain in the domain, thus the effects of repulsion with ICs are still effective, but they do not release alarm molecules anymore. This provokes a decrease of the internal concentration due to the absence of previous sources. In the following, we turn off the third TC during the time interval [*T*2, *T*3], and then we turn off also the fourth TC in the time interval [*T*3, *T*4]. Figure 4 shows the level of the initial concentration (interval [*T*0, *T*2]), while Figure 6 presents the evolution of the concentration after cell death. Comparing Figure 6 with Figure 4, we can observe how the overall concentration decreases after the two cells are killed.

Figure 7 shows the ICs dynamics. At time *T*<sup>2</sup> both the third and the fourth cells are approached by ICs, while at time *T*<sup>3</sup> ICs start moving away from the third cell, which was previously killed and not releasing chemicals anymore. Between time *T*<sup>3</sup> and *T*<sup>4</sup> also the fourth cell is turned off and from time *T*<sup>4</sup> onwards it is not approached anymore. The effect of cell death of some tumor cells (the cells 3 and 4 depicted in Figure 3) on ICs dynamics is depicted in Figure 7. Please note that in the present case a different behavior respect to the one depicted in Figure 5 is observed, since here the accumulation around dead cancer cells does not occur, as expected.

**Figure 6.** Chemoattractant concentration in the time interval [*T*2, *T*3] (**Top**) and at the final time (**Bottom**). (**A1**,**A2**) Concentration plotted in 2D, concentration profiles as a function of the longitudinal distance (**A2**,**B2**) and transverse (**A3**,**B3**) distance.

**Figure 7.** ICs dynamics photographed at six consecutive times. Cell 3 dies during the interval [*T*2, *T*3], while cell 4 dies during the interval [*T*3, *T*4]. The grid dimensions are in μm.

3.1.3. Scenario 3: Stochastic Motion

Several chemical signals are produced at sites of dying TCs and then diffuse into the surrounding environment. ICs sense these chemoattractants and move in the direction where their concentration is greatest, therefore locating the source of the chemoattractants and their associated targets. If on one hand the deterministic model allows us to control the direction of motion of the ICs acting of the concentration of chemicals, on the other hand the complex migratory activity of ICs is better described by a stochastic model, that takes into account the variability of cells, over the preset deterministic mechanisms. For this reason, the features of this last scenario seem to be more suitable to qualitatively describe the real problem.

**Estimates of standard deviations.** Here we preliminary determine an appropriate value for the standard deviation *σ* in Equation (17). To this end, we consider the trajectory *Pi* of the *i*-th IC, for *i* = 1, ..., *Ntot*,*<sup>I</sup>* and its corresponding smoothing *Si*, obtained with a moving average (the Matlab c *smooth* function), and we compute the variance of each IC trajectory:

$$
\sigma\_i^2 = \frac{1}{T\_i} \sum\_{k=1}^{T\_i} \left( P\_i^k - S\_i^k \right)^2,\tag{18}
$$

where *Ti* is the number of frames the *i*-th IC spends on the domain Ω and we obtain the sequence <sup>Σ</sup> <sup>=</sup> {*σ*<sup>2</sup> *i* }*i*=1,...,*Ntot*,*<sup>I</sup>* . Successively, we average the values of Σ and divide by 120 s, which corresponds to the time interval between two consecutive frames:

$$
\sigma^2 = \left(\frac{1}{N\_{\text{tot},I}} \sum\_{i=1}^{N\_{\text{tot},I}} \sigma\_i^2\right) / 120,\tag{19}
$$

obtaining a variance expressed in seconds. As a last step, we apply the square root to relation (19) and then obtain the standard deviation *σ* of real trajectories. From the analysis conducted on the total number of ICs, we observed cells with variances very far from the average, probably because of the presence of different cell species among the group of ICs. We remark that the immune cell population is heterogeneous, since it is composed by T-lymphocites, monocites, dendritic cells; however, a clear distinction among cell species is not possible currently. For this reason, in order to obtain homogeneous data, we neglect these cell trajectories in the computation of the standard deviation. Indeed, our mathematical model aims at describing an average behavior, and therefore the trajectories whose variance is too far from the average are discarded.

In the presence of more data, we will try to make a classification of cells based on their pathways (length, stops, tortuosity, etc.). This will be the subject of a future study.

The cleaned standard deviations computed from experimental trajectories are reported in Table 1.

**ICs dynamics.** Figure 8 captures ICs at six different times, while Figure 4 shows the evolution of the chemoattractant. Differences with the deterministic dynamics shown in Figure 5 can be highlighted. In the stochastic case, ICs are more spread.

The computations for this case are performed with a finer time spacing of Δ*t* = 10 s, with the aim of better describing by the model the complex mechanisms really happening to cells at smaller time scales, and then, the plots are taken every 2 min, which is the timeframe of video recordings.

As an example, in order to evaluate the effects of the stochastic component on the dynamics and establish a qualitative comparison with the deterministic model, we depict the trajectories of six randomly selected cells. In Figure 9 we assume the starting point of a given experimental cell and show the trajectories obtained with the ODE model (blue) and the SDE model (red), while the corresponding trajectory extracted from the video footage is plotted in black. As can be observed, the deterministic trajectories are smooth and tend, in most cases, to stop in correspondence of the TCs, showing that once the immune cells have fallen in the basin of attraction of the tumor, they tend to be trapped there. The stochastic trajectories, instead, have a more similar behavior to the real ones, showing that after some time spent near the tumor, the immune cells move to the boundary.

**Figure 8.** Brownian motion. ICs dynamics photographed at six consecutive times. ICs meet all TCs and move towards the left and the right sides of Ω. The grid dimensions are in μm.

So far the Brownian motion yields satisfactory results as shown in Figure 9, but we do not exclude in the next future to deal with more homogeneous and richer dataset to better identify the probability distribution and then apply a general Levý walk.

**Figure 9.** ICs Dynamics. (Blue) Deterministic Motion. (Red) Brownian Motion. (Black) Experimentally observed motion.


**Table 1.** Estimates of physical parameter values.

#### **4. Parameters Estimation on Synthetic Data**

In this Section we present the calibration algorithm for the estimation of some model parameters and the results obtained with it. All the model parameters are reported in Table 1. We underline that some of them are given from experimentalists since are taken from the laboratory experiment (size of the cropped area, number of TCs and ICs across time, standard deviation of cell pathways). Other parameters, such as for instance, the diffusion coefficient of chemoattractant, cellular drift velocity, receptor dissociation constant or the radii of cells are taken from the literature. For the model parameters whose values still need to be assigned, we assumed fixed most of them, such as the radii of action between cells or adhesion/repulsion coefficients among cells, setting them in a phenomenological way through the observation of their effects on the overall dynamics.

Then, in order to reduce the computational cost of the optimization procedure, we apply our calibration procedure only to 4 model parameters, i.e., the coefficient of chemotactic effect *γ*, and the chemical inflow, respectively, at the left, right and bottom boundaries *b*1, *b*2, *b*4, since we observed qualitatively a great effect of such parameters on the overall dynamics and, as a simplification of the model, we assume *b*<sup>3</sup> = 0. In particular, the coefficient *γ* affects the speed of ICs, while *b*1, *b*2, *b*<sup>4</sup> affect the directionality of immune cells.

We underline that the parameter estimation procedure here described is applied to the deterministic version of the model, i.e., (3) and (4). An extension of this procedure to calibrate the stochastic model (3)–(17) will be included in a future work.

In this methodological study we want to show the feasibility and applicability of such estimation technique for some significant model parameters of the hybrid model describing Cancer-on-Chip experiment. Our final goal is indeed not the exact fitting of model parameters, but is to show that we are able to qualitatively reproduce the dynamics observed experimentally, even if the quantification of the chemical gradients in the environment is currently impossible for biologists in this kind of experiments. In the next future, we will complement this methodology with macroscopic models deriving from them to be able to simulate immunocompetent behavior in organs and tissues, where millions of cells are present. To this aim, we applied the proposed procedure to a synthetic dataset produced

by the model itself—but strongly inspired by experimental one—to assess the soundness of this strategy. In the future, we aim at having more data in such a way to apply the proposed strategy to a real biological dataset.

#### **Data preparation**.

To succeed in the calibration of model parameters, we need to extract a common behavior from the time-varying trajectories of immune cells located at different points of the observed area. Thus, here we propose a calibration algorithm taking into account this variability constructing a "realistic" synthetic dataset of ICs pathways that can reproduce qualitatively the real trajectories extracted from the video footage of the experiment. In particular, a set of trajectories of cells sharing an average behavior has been produced by suitably tuning model parameters to have an upper and lower bound of cell speeds taken from the experimentally observed ones. Then, ICs velocity field given by 2D surface is computed by the model itself at every observation time of 2 min (as the video footage timeframe). It is worth noting that the application of this strategy implies the assumption to have a single population of immune cells showing an average behavior.

The velocity field is then approximated using a multidimensional spline interpolation technique, described in Section 4.1 to have smoothing in time and space on the dataset to be compared.

#### **Main steps of the calibration algorithm.**

The calibration algorithm applied to estimate crucial model parameters can be summarized as follows:


The results obtained with the procedure above are reported in Section 4.3.

#### *4.1. Multidimensional Interpolation*

The position and speed of the ICs, variable in time, provide punctual information about the velocity field in which the ICs are immersed. As a consequence, they can be considered to be training points for the calibration of an interpolation algorithm, able to provide information about the full space. Here we are considering time as the third dimension, being *x* and *y* the first two. As the interpolation scheme we are using a multidimensional spline, described in [31].

Each training point is supposed to influence the value of the interpolating function over a (limited) portion of the space in proximity of it. The degree of influence of the *i*th training point (*wi*(*x*, *y*, *t*)) is a function of time and space, driven by a compact-support function radial basis decaying at zero outside the influence area. Then, the interpolated value is obtained as a weighted sum of the values at the training points:

$$
\tilde{w}(\mathbf{x}, \mathbf{y}, \mathbf{t}) = \sum\_{i=1}^{n} w\_i(\mathbf{x}, \mathbf{y}, \mathbf{t}) h(\mathbf{i}) c(\mathbf{i}), \tag{20}
$$

where *c*(*i*) represents the value of the interpolating function at the *i*th training point and *n* is the total data-points available, corresponding to *t*1, ... , *tn* discrete times. Specifically, we have two functions *v*˜, i.e., *v*˜*x*(*x*, *y*, *t*) and *v*˜*y*(*x*, *y*, *t*). Moreover, the vector *c*(*i*) is different according to the velocities under exam, with training points corresponding alternatively to the *x*- and *y*-velocities. Since different influence areas may overlap, the value of *wi*(*x*, *y*, *t*) needs to be adjusted to have a correct fit. To this aim, we need to solve a linear system where the *n* Equations (20) are collocated at the training points, generating an *n* × *n* system. The *n* coefficients *h*(*i*) are computed only once, and then applied in the interpolation

procedure. In this specific case, we are using a linear function for the influence coefficients *wi*(*x*, *y*, *t*).

#### *4.2. The Calibration Algorithm*

With Formula (20) we have generated a velocity field *v*˜ that from now on we indicate as *V<sup>e</sup>* to point out that the velocity field is produced with the punctual velocities *v* the immune cells assume at every time step, coming from experimental data (in this specific case from synthetic data). Later we have used this field to define an appropriate functional to be minimized.

To assess the goodness and soundness of our strategy, the analysis which follows is performed on synthetic data, i.e., numerical data produced by the PDE–ODEs system (3) and (4). Future work will be directed towards the application of this methodology to the experimental outcomes.

As a first step we have interpolated immune cell velocities *v* obtained from synthetic data to the end of generating a velocity field to use as a target, in correspondence of every frame. Thus, we have produced *Tf* = 681 velocity fields *V<sup>e</sup> <sup>x</sup>* using the punctual velocities in *x* and *Tf* = 681 velocity fields *V<sup>e</sup> <sup>y</sup>* using the velocities in *y*. In Figure 10 we present an example of the surfaces resulting from the interpolation at a certain time step. Figure 10A shows the interpolation of the *x*-velocities, while Figure 10B shows the interpolation of the *y*-velocities. The green points have coordinates (*Px*, *Py*, *vx*) in (A) and (*Px*, *Py*, *vy*) in (B). For more details about the interpolation technique see Section 4.1.

**Figure 10.** Interpolated surfaces *V<sup>e</sup> <sup>x</sup>* (**A**) and *V<sup>e</sup> <sup>y</sup>* (**B**) at fixed time. The green points indicate the values of the velocities in the corresponding positions. (**A**) Velocities in the *x*-direction, (**B**) Velocities in the *y*-direction.

Successively, we have launched the optimization algorithm and at every run we have computed the numerical solutions of the deterministic model with the approximation scheme described in Appendix A. We have then built a routine in which inserting the numerical positions we are able to find the corresponding velocities on the surface *V<sup>e</sup>* . The resulting interpolated velocities *V<sup>n</sup>* are then used to construct the objective function:

$$f\_V(\theta) = \left[ \frac{1}{T\_f} \sum\_{k=1}^{T\_f} \left( \frac{1}{N\_{tot,I}^k} \sum\_{i=1}^{N\_{tot,l}^k} \frac{||V\_i^{\varepsilon,k} - V\_i^{u,k}(\theta)||}{||V\_i^{\varepsilon,k}||} \right) \right]^2. \tag{21}$$

In (21) we compare at every time step the punctual interpolated velocities with the punctual target velocities, i.e., the velocities used to generate the field. With *θ* we indicate the vector whose dimension corresponds to the parameter values we search.

In addition, we have also included a Tikhonov regularization term in the functional, as usually done for the regularization of linear inverse problems. In particular, we consider the following term to be added to the functional to be minimized:

$$P\_{\lambda}(\theta) = \lambda^2 \left| |\theta - \theta\_0|\right|^2,\tag{22}$$

where *θ*<sup>0</sup> are the a priori estimate of target parameter values and *θ* are the parameter values we are optimizing. The constant *λ* is a regularization parameter that helps the algorithm in the search for optimal values of model parameters by reducing the number of local minima in the functional, see [62]. Tests for different values of *λ* showed better results of the calibration algorithm for *λ*<sup>2</sup> = 0.1. Thus, with relation (22) we want to reduce the error between the target values and the searched ones.

With functionals (21) and (22) we can define the minimization problem:

$$\min\_{\theta \in \Theta} J(\theta) = \min\_{\theta \in \Theta} (J\_V(\theta) + P\_\lambda(\theta)), \tag{23}$$

where Θ is the space to explore to search the unknown parameter values. For completeness, we introduce another estimator we have used for our simulations, which is based on the idea of comparing the distances assumed by the tumor and immune cells at every time step. We indicate with *Cj* the position of the *j*-th TC (the temporal indicator *k* is omitted since TC positions do not evolve in time). We call *d<sup>k</sup> <sup>i</sup>*,*<sup>j</sup>* the distance between the *i*-th IC and the *j*-th TC at time *k*:

$$d\_{i,j}^k = ||P\_i^k - \mathbb{C}\_j||\_\prime \tag{24}$$

and then, fixed the *i*-th IC, we compare the distances *dn*,*<sup>k</sup> <sup>i</sup>*,*<sup>j</sup>* obtained from the numerical positions with the distances *de*,*<sup>k</sup> <sup>i</sup>*,*<sup>j</sup>* obtained from the synthetic positions:

$$D\_i^k = \frac{1}{N\_{\text{tot}, \mathcal{E}}} \sum\_{j=1}^{N\_{\text{tot}, \mathcal{E}}} \frac{||d\_{i,j}^{\mathbf{e},k} - d\_{i,j}^{n,k}||}{||d\_{i,j}^{\mathbf{e},k}||} .$$

successively, we sum over all ICs:

$$D^k = \frac{1}{N\_{\text{tot},i}^k} \sum\_{i=1}^{N\_{\text{tot},i}^k} D\_i^k \prime$$

and then we sum over all times. To summarize, we have the functional:

$$J\_{D\_{TI}} = \left[\frac{1}{T\_f} \sum\_{k=1}^{T\_f} \left(\frac{1}{N\_{tot,i}^k} \sum\_{j=1}^{N\_{tot,i}} \left(\frac{1}{N\_{tot,c}} \sum\_{j=1}^{N\_{tot,c}} \frac{||d\_{i,j}^{c,k} - d\_{i,j}^{n,k}||}{||d\_{i,j}^{c,k}||}\right)\right)\right]^2. \tag{25}$$

In conclusion, to evaluate the error committed by the optimization with respect to synthetic data, we indicate with *θ*<sup>0</sup> the target value and with *θ*<sup>∗</sup> the corresponding optimized parameter and we define:

$$RE = \frac{||\theta^\* - \theta\_0||}{||\theta^\*||} \times 100,\tag{26}$$

the approximation error of each parameter.

The minimization of the functionals above is performed with Particle Swarm Optimization (PSO) [63] as search method using the Matlab c toolbox. For the approximation of target and computed velocity fields resulting in a 2D varying in time surface we use a spline multidimensional interpolation described in Section 4.1.

#### *4.3. Results on Parameters Estimation*

Tests on the calibration algorithm are performed to assess the goodness and robustness of the methodology shown in Section 4.2. Please note that the total number of parameters to be assigned is 14, since the other parameters are given from the experimentalists or taken from the literature, as reported in Table 1. Of course, it is possible to make a parameter estimation of all of them. However, in the present work, in order to reduce the

computational cost, we apply our calibration procedure only to 4 model parameters that we consider very significant since they strongly affect the dynamics of the ICs, i.e., *γ*, *b*1, *b*<sup>2</sup> and *b*4.

We recall the *γ* is the coefficient which enhances the effect of the chemotactic function (5), while *bi*, for *i* = 1, 2, 4 are the parameters that control the chemical inflow at the boundaries. In order to perform the tests, we produced synthetic solutions choosing specific values for the listed parameters:

$$
\gamma = 5 \cdot 10^{-3}, b\_1 = 22, b\_2 = 12, b\_4 = 18,\tag{27}
$$

which correspond to those used to create the Scenario 1 (see Section 3.1.1). The synthetic solutions are used to generate the velocity fields *V<sup>e</sup>* to be used as targets.

To assess our strategy, we start testing one parameter and then adding one more at time. The range for the parameter variations is chosen by varying the initial guess by a percentage of <sup>±</sup>50%. Specifically, we searched *<sup>γ</sup>* in *<sup>I</sup>*50% = [3.5 <sup>×</sup> <sup>10</sup>−3, 6.5 <sup>×</sup> <sup>10</sup>−3], *<sup>b</sup>*<sup>1</sup> in *I*50% = [11, 33], *b*<sup>2</sup> in *I*50% = [6, 18] and *b*<sup>4</sup> in *I*50% = [9, 27]. We also choose as initial guess a perturbation of the model parameters by +30%. We tested the algorithm with the following two different functionals:

$$J\_1 = J\_V + P\_{\lambda\_V} \tag{28}$$

and

$$J\_2 = J\_V + f\_{D\_{T1}} + P\_\lambda. \tag{29}$$

Results with the functional *J*<sup>1</sup> are reported in Table 2, while results with *J*<sup>2</sup> are in Table 3. In each row of Tables 2 and 3 errors obtained with the parameter estimation algorithm are reported. Please note that errors are reported in increasing order respect to the number of parameters involved in the parameter estimation procedure, from the top to the bottom. We can notice that in both cases the errors in parameters estimation are quite low since their order of magnitude is around 10−<sup>3</sup> in the worst case, meaning the calibration was successful. We also underline that the norm of the total error of the functional is low, around 10−7, given by the *JV* component.

**Table 2.** *<sup>J</sup>*1. Deterministic Model Calibration. Parameter *<sup>γ</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>−3, parameter *<sup>b</sup>*<sup>1</sup> <sup>=</sup> 22, parameter *b*<sup>4</sup> = 18 and parameter *b*<sup>2</sup> = 12 are varied by +30% and they are optimized in the range ±50%.


**Table 3.** *<sup>J</sup>*2. Deterministic Model Calibration. Parameter *<sup>γ</sup>* <sup>=</sup> <sup>5</sup> <sup>×</sup> <sup>10</sup>−3, parameter *<sup>b</sup>*<sup>1</sup> <sup>=</sup> 22, parameter *b*<sup>4</sup> = 18 and parameter *b*<sup>2</sup> = 12 are varied by +30% and they are optimized in the range ±50%.


In conclusion, as a way to quantify the goodness of our reconstruction, we compute the percentage of cells passing through the cropped area of the chip representing the computational domain. In particular, we depict with histograms the statistics obtained from the trajectories reconstructed by the calibrated model and the statistics computed from the trajectories of the synthetic dataset, i.e., the target solutions of the calibration algorithm. The percentage of outgoing cells *N<sup>k</sup> <sup>o</sup>* over the number of cells *N<sup>k</sup> <sup>p</sup>* present in the domain Ω, is computed by the formula:

$$N^k = \frac{N\_o^k}{N\_p^k} \times 100,\tag{30}$$

for each timeframe *k* = 1, ... , 680 with a time spacing of 2 min. Then, the optimized parameters (27) obtained with the calibration procedure described in 4.2 are used as the model parameters to be given in input to both the ODE model (3) and (4) and the SDE model (3)–(17) for the computation of the statistics on cell trajectories. More precisely, we perform this computation for the target case, i.e., the synthetic trajectories obtained by the PDE–ODEs system (yellow bars in Figure 11). Then, assuming randomly placed initial positions of ICs, we compute the same statistics for the PDE–ODEs system (red bars in Figure 11) and for the PDE-SDEs system (blue bars in Figure 11). In particular, the initial positions *P<sup>k</sup> <sup>i</sup>* , for *i* = 1, ... , *Ntot*,*<sup>I</sup>* are randomly perturbed in a range of (0, 5] μm. As can be seen in Figure 11, the statistics of target trajectories (ODE-target) and reconstructed trajectories (ODE-modified and SDE-modified) are quite similar in terms order of magnitude. Please note that the ODE-modified shows the ability to reproduce well also the timing of the exits from the domain, while a slight variability, as expected, can be observed in the SDE-modified case in terms of time occurrence of exits from the domain.

**Figure 11.** Percentage of outgoing cells over the number of cells present in the domain. The plot shows results starting from frame 150, since no relevant information emerged in the previous frames.

#### **5. Conclusions and Future Work**

In the present work we have developed a mathematical model to describe the interactions between ICs and treated TCs in microfluidic chip, together with a strategy to estimate unknown parameter values.

Regarding the modeling part, we have introduced a hybrid model, composed of a reaction-diffusion equation to describe the evolution of chemicals released by TCs, coupled with an equation for the motion of each IC, driven by the chemical gradient. The deterministic system represents a first microscopic model of ICs dynamics in a subarea of the chip.

First, we qualitatively analyzed the overall dynamics of ICs, varying important parameters of the system, such as the chemotactic coefficient *γ* and the boundary parameters. Indeed, according to their magnitude, the direction of the motion can be controlled and decided *a priori*. Then, in order to reproduce some interesting features characterizing cell behavior, we adjusted suitably the model parameters and we presented three different scenarios that can occur in the chip:


Regarding the stochastic scenario, it is important to highlight that the Brownian motion is not the most appropriate to describe ICs movement, but in the future we aim at substituting it by the Lévy walk. In recent works [64,65], it has indeed emerged that a heavy-tailed process is more efficient and realistic than Brownian motion. However, in the present work, the shortage of data at our disposal did not provide us information on the nature of the different ICs involved in the experiment. To this end, we are working on a classification strategy in a forthcoming paper to be able to identify the different categories of ICs and then differentiate the model parameters according to the corresponding cell type.

In addition to the pure modeling part, we have developed a model calibration procedure based on the comparison of the velocity fields related to ICs at every time step.

The calibration procedure with synthetic data revealed to be successful, thus representing a first step towards the model calibration on experimental data.


#### **Future perspectives.**

Our future aim is to extend this framework to validate our model against the experimental velocity fields computed from the real IC trajectories extracted from the video footage of the experiment.

To face the problem of calibrating the model parameters against real data, we first must produce an interpolation model able to take into account the non-deterministic nature of the cell motions. A possible strategy is to produce a stochastic interpolator, able to determine the expected value and variance of the local speed: the final interpolated value will be then obtained adding a stochastic part, computed using the interpolated variance and the same probability density function of the experimental data, to the interpolated expected value. Using this approach, we have a non-deterministic interpolated value with the same statistical qualities of the interpolating dataset. Implicitly, we are considering as deterministic the local values of the expected value and of the variance of our experimental dataset, and we are also assuming to be able to compute the statistical distribution of the real data (otherwise, we can adopt a prescribed distribution. i.e., Gaussian). It is now evident that the comparison cannot be produced on a single path, but we need to observe the trajectories from a statistical standpoint, i.e., comparing the average path over many simulations, whose output is now non-deterministic. At the same time, in order to derive the statistical properties of the real IC trajectories and speed, a large number of experiments is needed, with a time resolution of the same order of magnitude of the time scale of the physical phenomenon under investigation. This implies a huge experimental and numerical effort, one or two order of magnitude larger than the present work.

**Author Contributions:** Methodology, G.B. and R.N. (modeling and numerical framework) and D.P. (multidimensional spline interpolation); Software, N.R. (numerical simulation algorithm and calibration procedure) and D.P. (spline based approximation routine); Supervision, G.B. and R.N.; Validation, G.B, N.R. and A.D.N.; Visualization, N.R.; Data curation, A.D.N. and N.R.; Conceptualization, G.B. and R.N.; Investigation, N.R. Writing, G.B., N.R., A.D.N., D.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research of D. Peri was partially funded by Italian Ministry of Education, University and Research (MIUR) to support this research with funds coming from PRIN Project 2017 (No. 2017KKJP4X entitled "Innovative numerical methods for evolutionary partial differential equations and applications".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data are contained within the article.

**Acknowledgments:** We are grateful to Francesca Romana Bertani, Luca Businaro, Annamaria Gerardino from IFN-CNR and Antonella Sistigu from Istituto Regina Elena for the interesting and clarifying discussions about the microfluidic chip experiment and to Davide Vergni from IAC-CNR for his support in the comprehension of stochastic behavior of cell trajectories.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

In this section we discuss the numerical approximation scheme employed for Equations (3) coupled with (4) or (17) in the numerical simulations. The methods used in the numerical simulations employ a 2*D* finite difference scheme. We consider the spatial domain Ω = [0, *Lx*] × [0, *Ly*] and we introduce a discretization on *Lx* in *N* − 1 subintervals of length <sup>Δ</sup>*<sup>x</sup>* <sup>=</sup> *Lx*

*<sup>N</sup>* <sup>−</sup> <sup>1</sup> and a discretization on *Ly* in *<sup>M</sup>* <sup>−</sup> 1 subintervals of length <sup>Δ</sup>*<sup>y</sup>* <sup>=</sup> *Ly M* − 1 . Then we introduce a Cartesian grid ΩΔ consisting of grid points (*xn*, *ym*), where *xn* = *n*Δ*x*, for *n* = 0, ..., *N* − 1 and *ym* = *m*Δ*y*, for *m* = 0, ..., *M* − 1. The same can be done for the time interval [0, *T*], in this case if Δ*t* is the time step, *tk* will be the *k*-th temporal step, i.e., *tk* = *k*Δ*t*, for *k* = 0, ..., *N*Δ*t*. Note that the spatial steps were chosen as half radius of an IC: Δ*x* = Δ*y* = 2 μm, and the temporal step as the 1/6 of the video footage timeframe (of 2 min): Δ*t* = 10 s. With the notation *u<sup>k</sup> <sup>n</sup>*,*<sup>m</sup>* we denote the approximation of a function *u*(*x*, *y*, *t*) at the grid point (*xn*, *ym*, *tk*).

Moreover, since experimentally it was observed that ICs leave the domain Ω, to manage the entrance and exit of cells and avoid numerical instabilities, we added a ghost grid to Ω*d*, where the cells lie after having left the main domain. To construct the grid, we considered the extended *x* interval [−*Lx*<sup>∗</sup> , *Lx* + *Lx*<sup>∗</sup> ], with *Lx*<sup>∗</sup> > 0, with nodes *xn* = *n*Δ*x*, for *n* = −*N*∗, ..., 0, ..., *N*∗, and the extended *y* interval [−*Ly*<sup>∗</sup> , *Ly* + *Ly*<sup>∗</sup> ], with *Ly*<sup>∗</sup> > 0, with nodes *ym* = *m*Δ*y*, for *m* = −*M*∗, ..., 0, ..., *M*∗. The equations were solved only on Ω*d*.

#### *Appendix A.1. Discretization of the PDE*

Regarding the approximation of the parabolic Equation (3). The right hand side is composed of the diffusion term, the source term, and the stiff degradation term −*η f* .

To eliminate this last quantity, we perform the classical exponential transformation:

$$f(\mathbf{x}, t) = e^{-\eta t} u(\mathbf{x}, t),\tag{A1}$$

which leads to the diffusion equation with source for *u*(**x**, *t*):

$$
\partial\_l u = D \Delta u + e^{\eta l} \xi^l \sum\_{j=1}^{N\_{\text{tot},c}} \chi\_{\mathbf{B}(\mathbf{Y}\_j, \mathbb{R}\_T)}. \tag{A2}
$$

For this equation we apply a central difference scheme in space, i.e., the 5-point stencil for the Laplacian, and the parabolic Crank–Nicolson scheme in time.

The numerical scheme can be written as:

$$\begin{split} \frac{u\_{n,m}^{k+1} - u\_{n,m}^{k}}{\Delta t} &= \frac{D}{2} \left( D\_x^2 u^{k+1} + D\_y^2 u^{k+1} \right) + \frac{D}{2} \left( D\_x^2 u^k + D\_y^2 u^k \right) + \\ &+ \frac{1}{2} \epsilon^{\mathcal{Y}(k+1)\Delta t} \xi \sum\_{j=1}^{N\_{\text{tot},c}} \mathcal{X}\_{\mathbf{B}(\mathbf{Y}\_j^{k+1}, \mathbb{R}\_T)} + \\ &+ \frac{1}{2} \epsilon^{\mathcal{Y}k\Delta t} \xi \sum\_{j=1}^{N\_{\text{tot},c}} \mathcal{X}\_{\mathbf{B}(\mathbf{Y}\_j^k, \mathbb{R}\_T)^\prime} \end{split} \tag{A3}$$

where the second finite difference *D*<sup>2</sup> *xu* is defined as the central difference:

$$D\_x^2 \mu^k = \frac{\mu\_{n-1,m}^k - 2\mu\_{n,m}^k + \mu\_{n+1,m}^k}{\Delta x^2}$$

and *D*<sup>2</sup> *yu* is defined analogously.

#### *Appendix A.2. Boundary Conditions*

Using the transformation (A1), the associated boundary conditions are:

$$D\frac{\partial u}{\partial \mathbf{n}} + au = e^{\eta t}b,\tag{A4}$$

that we rewrite as *<sup>∂</sup><sup>u</sup> <sup>∂</sup>***<sup>n</sup>** <sup>+</sup> *pu* <sup>=</sup> *<sup>q</sup>*(*t*), with *<sup>p</sup>* <sup>=</sup> *<sup>a</sup> <sup>D</sup>* and *<sup>q</sup>*(*t*) = *<sup>e</sup>η<sup>t</sup> <sup>b</sup> <sup>D</sup>*. Moreover, to distinguish the values of *p* and *q* on the different sides of Ω, we number them as follows: *p*<sup>1</sup> and *q*<sup>1</sup> are assumed on *y* = 0, *p*<sup>2</sup> and *q*<sup>2</sup> on *x* = *Lx*, *p*<sup>3</sup> and *q*<sup>3</sup> on *y* = *Ly* and *p*<sup>4</sup> and *q*<sup>4</sup> on *x* = 0.

For the discretization of the boundary conditions, we use a central finite difference scheme. On *y* = 0 and *y* = *Ly*, we have:

$$\frac{\partial \mu}{\partial y} + p\_s \mu - q\_s = \frac{\mu\_{n,m+1}^k - \mu\_{n,m-1}^k}{2\Delta y} + r\_s^k \mu\_{n,m} - h\_{s\prime}^k \tag{A5}$$

with *s* = 1, 3, while on *x* = 0 and *x* = *Lx*, we have:

$$\frac{\partial \boldsymbol{u}}{\partial \mathbf{x}} + p\_s \boldsymbol{u} - q\_s^k = \frac{\boldsymbol{u}\_{n+1,m}^k - \boldsymbol{u}\_{n-1,m}^k}{2\Delta \mathbf{x}} + r\_s^k \boldsymbol{u}\_{n,m} - h\_s^k,\tag{A6}$$

for *s* = 2, 4 and *k* ≥ 0.

The signs of *rs* and *hs* depend on the incoming/outgoing flow. For instance on *y* = 0, we have *r*<sup>1</sup> = −*p*<sup>1</sup> and *h*<sup>1</sup> = −*q*1, on *y* = *Ly*, we have *r*<sup>3</sup> = *p*<sup>3</sup> and *h*<sup>3</sup> = *q*3, on *x* = *Lx*, we have *r*<sup>4</sup> = −*p*<sup>4</sup> and *h*<sup>4</sup> = −*q*<sup>4</sup> and on *x* = 0 we have *r*<sup>2</sup> = *p*<sup>2</sup> and *h*<sup>2</sup> = *q*<sup>2</sup> in order to have an incoming chemical flux through the boundary. The previous discretizations (A5) and (A6) are used with the well-known ghost nodes technique [66].

#### *Appendix A.3. Discretization of the ODE*

The equation of the motion (4) is reduced to the first order system

$$\begin{cases} \dot{\mathbf{V}}\_{i} = \frac{\gamma}{\mathcal{W}} \int\_{\mathbf{B}(\mathbf{X}\_{i},\tilde{\mathbf{R}})} \chi(f(\mathbf{x},t)) \nabla f(\mathbf{x},t) w\_{i}(\mathbf{x}) d\mathbf{x} + \sum\_{j:\mathbf{Y}\_{j} \in \mathcal{B}(\mathbf{X}\_{i},\mathbf{R}\_{2})} \mathbf{K}(\mathbf{Y}\_{j} - \mathbf{X}\_{i}) \text{ (A7)}\\ \quad + \sum\_{j:\mathbf{X}\_{j} \in \mathcal{B}(\mathbf{X}\_{i},\mathbf{R}\_{4}) \backslash \{\mathbf{X}\_{i}\}} \mathbf{K}(\mathbf{X}\_{j} - \mathbf{X}\_{i}) - \mu \mathbf{V}\_{i\nu} \\ \dot{\mathbf{X}}\_{i} = \mathbf{V}\_{i\nu} \end{cases} \tag{A8}$$

for *i* = 1, ..., *Ntot*,*I*. Equation (A7) is discretized with a one-step IMEX method, putting in implicit the term containing **V**˙ *<sup>i</sup>* and in explicit the other addends, see [67]. Equation (A8) is solved with the forward Euler method. The two-dimensional integral in (A7) can be computed by a 2D quadrature formula, which due to the truncated Gaussian weight function *wi*(**x**) given in (6), is reduced to a sum of the discretized integrand functions on the grid points belonging to the ball **B**(**X***i*, *R*¯). For an integrand function *g*(**x**, *t*) holds:

$$\int\_{\mathbf{B}(\mathbf{X}\_i,\mathbf{R})} \operatorname{g}(\mathbf{x},\mathbf{t}) w\_i(\mathbf{x}) d\mathbf{x} \approx \sum\_{n,m:\,\mathbf{t}.(\mathbf{x}\_n,\mathbf{x}\_m)\in\mathbf{B}(\mathbf{X}\_i^n,\mathbf{R})} \operatorname{g}\_{n,m}^k(w\_i)\_{n,m}^{(k)}$$

where (*wi*)(*k*) is the weight function centered in **X***<sup>k</sup>* , at time step *tk*. The same holds for the integral W defined in (5), which is approximated by:

$$\widetilde{\mathcal{W}} := \sum\_{n\_\*,m\_\*,\mathfrak{k},\mathfrak{k},(\mathfrak{x}\_{m\_\*}\mathfrak{x}\_{m\_\*}) \in \mathbf{B}(\mathfrak{X}\_i^k,\mathbb{R})} (w\_i)\_{n\_\*,m\_\*}^{(k)}.$$

The gradients of Equation (A7) are approximated with the first order difference:

$$\nabla \mathcal{g} \left( \mathbf{x}\_{\mathrm{n}}, \mathbf{y}\_{\mathrm{m}}, \mathbf{t}\_{k} \right) \approx \left( \frac{\mathcal{g}\_{\mathrm{n}+1,\mathrm{m}}^{k} - \mathcal{g}\_{\mathrm{n},\mathrm{m}}^{k}}{\Delta \mathbf{x}}, \frac{\mathcal{g}\_{\mathrm{n},\mathrm{m}+1}^{k} - \mathcal{g}\_{\mathrm{n},\mathrm{m}}^{k}}{\Delta \mathbf{y}} \right) \cdot \mathbf{x}$$

Equation (A7) is discretized as follows:

$$\begin{split} \frac{\mathbf{V}\_{i}^{k+1} - \mathbf{V}\_{i}^{k}}{\Delta t} &= \frac{\gamma}{\overrightarrow{\mathcal{W}}} \sum\_{\substack{n, \text{ms.t.}\{\mathbf{x}\_{n}, \mathbf{x}\_{n}\} \in \mathbf{B}(\mathbf{X}\_{i}^{k}, \mathbb{R})}} \chi(f^{k}) (\nabla\_{n, \text{m}} f^{k}) (w\_{i})\_{n, \text{m}}^{(k)} \\ &+ \sum\_{j: \mathbf{X}\_{j}^{k} \in \mathbf{B}(\mathbf{X}\_{i}^{k}, \mathbf{R}\_{2})} \mathbf{K} (\mathbf{X}\_{j}^{k} - \mathbf{X}\_{i}^{k}) \\ &+ \sum\_{j: \mathbf{X}\_{j}^{k} \in \mathbf{B}(\mathbf{Y}\_{i}^{k}, \mathbf{R}\_{4})} \mathbf{K} (\mathbf{Y}\_{j}^{k} - \mathbf{X}\_{i}^{k}) - \mu \mathbf{V}\_{i}^{k+1}, \end{split} \tag{A9}$$

and Equation (A8) is discretized as **X***k*+<sup>1</sup> *<sup>i</sup>* <sup>−</sup> **<sup>X</sup>***<sup>k</sup> i* <sup>Δ</sup>*<sup>t</sup>* <sup>=</sup> **<sup>V</sup>***k*+<sup>1</sup> *<sup>i</sup>* , with **<sup>V</sup>***k*+<sup>1</sup> *<sup>i</sup>* computed before Equation (A8) is solved.

#### *Appendix A.4. Discretization of the SDE*

The stochastic equation of the motion (17) can be decoupled as follows:

$$\begin{cases} \dot{\mathbf{V}}\_{i} = \frac{\gamma}{\mathcal{W}} \int\_{\mathbf{B}(\mathbf{X}\_{i},\mathbf{R})} \chi(f(\mathbf{x},t)) \nabla f(\mathbf{x},t) w\_{i}(\mathbf{x}) d\mathbf{x} + \sum\_{j:\mathbf{Y}\_{j} \in \mathcal{B}(\mathbf{X}\_{i},\mathbf{R}\_{2}) \backslash \{\mathbf{X}\_{i}\}} \mathbf{K}(\mathbf{Y}\_{j} - \mathbf{X}\_{i}) \tag{A10} \\ \quad + \sum\_{j:\mathbf{X}\_{j} \in \mathcal{B}(\mathbf{X}\_{i},\mathbf{R}\_{4}) \backslash \{\mathbf{X}\_{i}\}} \mathbf{K}(\mathbf{X}\_{j} - \mathbf{X}\_{i}) - \mu \mathbf{V}\_{i\prime} \\ \dot{\mathbf{X}}\_{i} = \mathbf{V}\_{i} + \sigma \psi(t). \end{cases} \tag{A11}$$

for *i* = 1, ..., *Ntot*,*I*. The discretization of Equation (A10) coincides with the one for Equation (A7), while Equation (A11) requires the application of the Euler-Maruyama method [68].

Equation (A11) can be written in the differential form and use *dW*(*t*) = *ψ*(*t*)*dt* where *dW*(*t*) denotes the differential form of the Brownian motion:

$$d\mathbf{X}\_i(t) = \mathbf{V}\_i dt + \sigma dW(t),\tag{A12}$$

where **X***i*(*t*) is a one-dimensional Wiener process with drift **V***<sup>i</sup>* and diffusion *σ*. This equation is discretized with the Euler-Maruyama scheme, which is the stochastic version of the deterministic Euler scheme. The increments of the Wiener process are defined as:

$$
\Delta W = W^{k+1} - W^k\_{\ \ \ \ \lambda}
$$

with 0 ≤ *k* ≤ *N*Δ*<sup>t</sup>* − 1. The increment Δ*W* is a random variable with zero mean and variance equal to Δ*t*:

$$
\Delta W \sim \mathcal{N}(0, \Delta t),
$$

and with this increment we can construct approximations by drawing normally distributed numbers from a random generator. We approximate the process (A12) at the discrete time points *tk*, 0 ≤ *k* ≤ *N*Δ*<sup>t</sup>* − 1 by

$$\mathbf{X}\_{i}^{k+1} = \mathbf{X}\_{i}^{k} + \mathbf{V}\_{i}^{k+1} \Delta t + \sigma \Delta \mathbf{W}\_{i} \tag{A13}$$

where <sup>Δ</sup>*<sup>W</sup>* <sup>=</sup> <sup>√</sup>Δ*tZk*, with *<sup>Z</sup><sup>k</sup>* being standard normal variables with mean 0 and variance 1 for all *k*.

#### **References**


## *Article* **Relaxation Limit of the Aggregation Equation with Pointy Potential**

**Benoît Fabrèges 1, Frédéric Lagoutière 1, Sébastien Tran Tien <sup>1</sup> and Nicolas Vauchelet 2,\***


**Abstract:** This work was devoted to the study of a relaxation limit of the so-called aggregation equation with a pointy potential in one-dimensional space. The aggregation equation is today widely used to model the dynamics of a density of individuals attracting each other through a potential. When this potential is pointy, solutions are known to blow up in final time. For this reason, measure-valued solutions have been defined. In this paper, we investigated an approximation of such measure-valued solutions thanks to a relaxation limit in the spirit of Jin and Xin. We study the convergence of this approximation and give a rigorous estimate of the speed of convergence in one dimension with the Newtonian potential. We also investigated the numerical discretization of this relaxation limit by uniformly accurate schemes.

**Keywords:** aggregation equation; relaxation limit; scalar conservation law; finite volume scheme

**MSC:** 35L65; 65M12; 35D30

#### **1. Introduction**

The so-called aggregation equation has been widely used to model the dynamics of a population of individuals in interaction. Let *<sup>W</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup>, sufficiently smooth, be the interaction potential governing the population. Then, in one dimension in space, the dynamics of the density of individuals, denoted by *ρ*, is governed by the following equation, for *<sup>t</sup>* <sup>&</sup>gt; 0 and *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>:

$$
\partial\_t \rho + \partial\_x (a[\rho] \rho) = 0, \qquad \text{with} \quad a[\rho] = -\mathcal{W}' \* \rho. \tag{1}
$$

Such equations appear in many applications in population dynamics: for instance, to describe the collective migration of cells by swarming, the motion of bacteria by chemotaxis, the crowd motion, the flocking of birds, or fishes school, see, e.g., [1–7]. From a mathematical point of view, these equations have been widely studied. When the potential *W* is not smooth enough, it is known that weak solutions may blow up in finite time [8,9]. Thus, the existence of weak (measure) solutions has been investigated in, e.g., [10,11].

In this paper, we consider a relaxation limit in the spirit of Jin–Xin [12] of the aggregation equation in one space dimension on R. It is now well-established that such modifications allow regularizing the solutions. For a given *c* > *a*∞, we introduce the system:

**Citation:** Fabrèges, B.; Lagoutière, F.; Tran Tien, S.; Vauchelet, N Relaxation Limit of the Aggregation Equation with Pointy Potential. *Axioms* **2021**, *10*, 108. https://doi.org/10.3390/ axioms10020108

Academic Editor: Giampiero Palatucci

Received: 9 April 2021 Accepted: 26 May 2021 Published: 28 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

$$
\partial\_t \rho + \partial\_x \sigma = 0,\tag{2a}
$$

$$
\partial\_t \sigma + c^2 \partial\_x \rho = \frac{1}{\varepsilon} (a[\rho]\rho - \sigma) \tag{2b}
$$

$$a[\rho] = -\mathcal{W}' \ast \rho \tag{2c}$$

This system is complemented by initial data *ρ*<sup>0</sup> and *σ*<sup>0</sup> := *a*[*ρ*0]*ρ*0. It is clear, at least formally, that when *ε* → 0, the solution *ρ* of system (2) converges to the one of the aggregation equations (1) (and it is actually only true if *c* > *a*∞). We mention that the aggregation equation may also be derived thanks to a hydrodynamical limit of kinetic equations [6,7,13].

The aim of this work was to study the convergence as *ε* → 0 of the relaxation system (2) towards the aggregation equation. More precisely, we establish a precise estimate of the speed of convergence, and we also illustrate with some numerical simulations. These estimates are obtained only in the case of the Newtonian potential in one dimension *W*(*x*) = <sup>1</sup> <sup>2</sup> |*x*|. Indeed, in this particular case, we may link the aggregation equation to a scalar conservation law [14,15]. The same link holds for the relaxation system (2)—denoting:

$$u(t, \mathbf{x}) = \frac{1}{2} - \int\_{-\infty}^{\mathbf{x}} \rho(t, dy), \qquad v(t, \mathbf{x}) = \frac{1}{2} - \int\_{-\infty}^{\mathbf{x}} \sigma(t, dy),$$

where the notation *ρ*(*t*, *dy*) stands for the integral with respect to the probability measure *ρ*(*t*), then we verify easily that:

$$\mu = -\mathcal{W}' \ast \rho, \qquad \rho = -\partial\_{\mathfrak{x}} \mu\_{\mathfrak{x}}$$

so that *a*[*ρ*] = *u*. Then, integrating (2), we deduce that (*u*, *v*) is a solution to:

$$
\partial\_t \mu + \partial\_\mathbf{x} \upsilon \mathbf{s}. = 0 \tag{3a}
$$

$$
\partial\_t v s. + c^2 \partial\_x u = \frac{1}{\varepsilon} \left(\frac{1}{2}u^2 - v\right), \tag{3b}
$$

which is complemented with the initial data *u*<sup>0</sup> = <sup>1</sup> <sup>2</sup> − *x* <sup>−</sup><sup>∞</sup> *<sup>ρ</sup>*0(*dy*), and *<sup>v</sup>*<sup>0</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> − *x* <sup>−</sup><sup>∞</sup> *<sup>σ</sup>*0(*dy*). Clearly, as *ε* → 0, we expect that the solution of the above system converges to the solution of the following Burgers equation:

$$
\partial\_t u + \frac{1}{2} \partial\_X u^2 = 0.
$$

Introducing the quantities *a* = *v* − *cu* and *b* = *v* + *cu*,(3)is equivalent to the diagonalized system:

$$
\partial\_t a - c \partial\_x a = \frac{1}{\varepsilon} \left( \frac{1}{2} \left( \frac{b-a}{2c} \right)^2 - \frac{a+b}{2} \right) \tag{4a}
$$

$$
\partial\_t b + c \partial\_x b = \frac{1}{\varepsilon} \left( \frac{1}{2} \left( \frac{b-a}{2c} \right)^2 - \frac{a+b}{2} \right). \tag{4b}
$$

We will adapt the techniques developed in [16] to obtain convergence estimates for our system.

In order to illustrate this convergence result, numerical discretizations of the relaxation system (2) are investigated. The schemes we propose are such that they are uniform with respect to *ε*, that is they satisfy the so-called asymptotic preserving (AP) property [17]. Therefore, such schemes in the limit *ε* → 0 must be consistent with the aggregation equation. The numerical simulations of solutions of the aggregation equation for pointy potentials have been studied by several authors, see, e.g., [11,13,18–22]. In particular, some authors pay attention to recover the correct behavior of the numerical solutions after the blow-up

time. To do so, particular attention must be paid to the definition of the product *a*[*ρ*]*ρ* when *ρ* is a measure.

In this article, we propose two discretizations of the relaxation system which satisfy the AP property. In a first approach, we propose a simple splitting algorithm where we split the transport part and the right hand side in system (2). It results in a numerical scheme which is very simple to implement and for which we easily verify the AP property. The second approach relies on a well-balanced discretization in the spirit of [20,23]. This scheme is more expensive to implement than the first scheme, but its numerical solution has less diffusion, as it is illustrated by our numerical results.

The outline of the paper is the following. In Section 2, after recalling some useful notations, we prove our main result: an estimation of the speed of convergence in the Wasserstein *W*<sup>1</sup> distance with respect to *ε* of the solutions of the relaxation system (2) towards the solution of the aggregation Equation (1) in the case *W*(*x*) = <sup>1</sup> <sup>2</sup> |*x*|. The numerical discretization is investigated in Section 3. Two numerical schemes verifying the AP property are proposed. The first scheme is based on a splitting algorithm, whereas the second scheme relies on a well-balanced discretization. Numerical results and comparisons are provided in Section 4.

#### **2. Convergence Result**

#### *2.1. Notations*

Before stating and proving our main results, we first recall some useful notations and results. Since we are dealing with conservation laws (in which the total mass is conserved), we will work in some space of probability measures, namely the Wasserstein space of order *p* ≥ 1, which is the space of probability measures with a finite order *p* moment:

$$\mathcal{P}\_{\mathbb{P}}(\mathbb{R}^N) = \left\{ \mu \text{ nonnegative Borel measure}, \mu(\mathbb{R}^N) = 1, \int |\mathbf{x}|^p \mu(d\mathbf{x}) < \infty \right\}.$$

This space is endowed with the Wasserstein distance defined by (see, e.g., [24,25])

$$\mathcal{W}\_{\mathcal{P}}(\mu, \nu) = \inf\_{\gamma \in \Gamma(\mu, \nu)} \left\{ \int |y - x|^p \, \gamma(dx, dy) \right\}^{1/p},\tag{5}$$

where <sup>Γ</sup>(*μ*, *<sup>ν</sup>*) is the set of measures on <sup>R</sup>*<sup>N</sup>* <sup>×</sup> <sup>R</sup>*<sup>N</sup>* with marginals *<sup>μ</sup>* and *<sup>ν</sup>*, meaning that:

$$\Gamma(\mu, \nu) = \left\{ \gamma \in \mathcal{P}\_p(\mathbb{R}^N \times \mathbb{R}^N) \colon \forall \, \mathfrak{f} \in \mathbb{C}\_0(\mathbb{R}^N), \int\_{\mathbb{R}^{2N}} \mathfrak{f}(y\_0) \, \gamma(dy\_0, dy\_1) = \int\_{\mathbb{R}^N} \mathfrak{f}(y\_0) \mu(dy\_0), \ \forall \, \mathfrak{f} \in \mathbb{C}\_0(\mathbb{R}^N) \right\},$$

$$\int\_{\mathbb{R}^{2N}} \mathfrak{f}(y\_1) \, \gamma(dy\_0, dy\_1) = \int\_{\mathbb{R}^N} \mathfrak{f}(y\_1) \nu(dy\_1) \right\},$$

with *C*0(R*N*), the set of continuous functions on R*<sup>N</sup>* that vanish at infinity. From a simple minimization argument, we know that in the definition of *Wp*, the infimum is actually a minimum. A map that realizes the minimum in the definition (5) of *Wp* is called an optimal transport plan, the set of which is denoted by Γ0(*μ*, *ν*).

In the one-dimensional framework, we may simplify these definitions. Indeed, any probability measure *μ* on the real line R can be described in terms of its cumulative distribution function *Fμ*(*x*) = *μ*((−∞, *x*)), which is a right-continuous and non-decreasing function with *Fμ*(−∞) = 0 and *Fμ*(+∞) = 1. Then, we can define the generalized inverse *<sup>F</sup>*−<sup>1</sup> *<sup>μ</sup>* of *<sup>F</sup><sup>μ</sup>* (or monotone rearrangement of *<sup>μ</sup>*) by *<sup>F</sup>*−<sup>1</sup> *<sup>μ</sup>* (*z*) :<sup>=</sup> inf{*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>/*Fμ*(*x*) <sup>&</sup>gt; *<sup>z</sup>*}, it is a right-continuous and non-decreasing function as well, defined on [0, 1]. We have for every non-negative Borel map *ξ*:

$$\int\_{\mathbb{R}} \xi(x)\mu(dx) = \int\_0^1 \xi(F\_{\mu}^{-1}(z)) \, dz.$$

In particular, *<sup>μ</sup>* ∈ P*p*(R) if and only if *<sup>F</sup>*−<sup>1</sup> *<sup>μ</sup>* <sup>∈</sup> *<sup>L</sup>p*(0, 1). Moreover, in the one-dimensional setting, there exists a unique optimal transport plan realizing the minimum in (5). More precisely, if *<sup>μ</sup>* and *<sup>ν</sup>* belong to <sup>P</sup>*p*(R), with monotone rearrangements *<sup>F</sup>*−<sup>1</sup> *<sup>μ</sup>* and *<sup>F</sup>*−<sup>1</sup> *<sup>ν</sup>* , then <sup>Γ</sup>0(*μ*, *<sup>ν</sup>*) = {(*F*−<sup>1</sup> *<sup>μ</sup>* , *<sup>F</sup>*−<sup>1</sup> *<sup>ν</sup>* )#L(0,1)} where <sup>L</sup>(0,1) is the restriction of the Lebesgue measure on (0, 1). Thus, we have the explicit expression of the Wasserstein distance (see [24,26,27]):

$$\mathcal{W}\_{\mathbb{P}}(\mu, \nu) = \left( \int\_0^1 |F\_{\mu}^{-1}(z) - F\_{\nu}^{-1}(z)|^p \, dz \right)^{1/p} \, \tag{6}$$

and the map *<sup>μ</sup>* → *<sup>F</sup>*−<sup>1</sup> *<sup>μ</sup>* is an isometry between <sup>P</sup>*p*(R) and the convex subset of (essentially) non-decreasing functions of *Lp*(0, 1).

#### *2.2. Convergence Estimates*

Let us first consider the limit *ε* → 0 for the system (3). Compactness methods were used in [28] to get *L*<sup>1</sup> *loc* convergence in space. However, in order to pass to the aggregation equation, one may want global *L*<sup>1</sup> convergence, which we prove in the following theorem, along the lines of Katsoulakis and Tzavaras [16].

**Theorem 1.** *Let <sup>u</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*<sup>∞</sup> <sup>∩</sup> *BV*(R)*, <sup>c</sup>* <sup>&</sup>gt; *u*0*L*<sup>∞</sup> *and set <sup>v</sup>*<sup>0</sup> <sup>=</sup> *<sup>u</sup>*<sup>2</sup> 0 <sup>2</sup> *. There exists a constant C* > 0 *such that, for any ε* > 0*, denoting by* (*u<sup>ε</sup>* , *v<sup>ε</sup>* ) *the solution to* (3) *with initial data* (*u*0, *v*0)*, the following estimate holds:*

$$\forall T > 0, \qquad \|u(T) - u^{\varepsilon}(T)\|\_{L^{1}} \le \mathcal{C}TV(u\_{0})(\sqrt{\varepsilon T} + \varepsilon)\_{\varepsilon}$$

*where u is the entropy solution to the Burgers equation with initial datum u*0*.*

**Proof.** Denote (*a<sup>ε</sup>* , *bε* ) the solution to (4), and *G*(*a*, *b*) = <sup>1</sup> 2 *b*−*<sup>a</sup>* 2*c* 2 <sup>−</sup> *<sup>a</sup>*+*<sup>b</sup>* 2 .

So as to obtain entropy inequalities on (*a<sup>ε</sup>* , *bε* ), we need monotonicity properties on *G*. One can check that *G*(*a<sup>ε</sup>* , *bε* ) is decreasing with respect to *a<sup>ε</sup>* and *b<sup>ε</sup>* if the so-called subcharacteristic condition <sup>|</sup>*u<sup>ε</sup>* | < *c* holds. Up to a slight modification of the nonlinear term *f*(*u<sup>ε</sup>* ) = (*uε*)<sup>2</sup> <sup>2</sup> in (3), which does not affect the value of (*a<sup>ε</sup>* , *bε* ):

$$f(u) := \begin{cases} -||u\_0||u - \frac{||u\_0||^2}{2}, & \text{if } u \le -||u\_0|| \\ \frac{u^2}{2}, & \text{if } -||u\_0|| \le u \le ||u\_0|| \\ ||u\_0||u - \frac{||u\_0||^2}{2}, & \text{if } ||u\_0|| \le u\_\prime \end{cases}$$

the choice *c* > *u*0*L*<sup>∞</sup> ensures that the subcharacteristic condition and the bound *uε* (*t*)*L*<sup>∞</sup> ≤ *u*0*L*<sup>∞</sup> holds for all time.

Now, obtaining entropy inequalities on (*a<sup>ε</sup>* , *bε* ) consists of making a comparison with constant state solutions to (4). Namely, letting *m* = *u*0*L*<sup>∞</sup> *u*0*L*<sup>∞</sup> <sup>2</sup> − *c* , *M* = *u*0*L*<sup>∞</sup> *u*0*L*<sup>∞</sup> <sup>2</sup> + *c* and *<sup>h</sup>*(*a*) = *<sup>a</sup>* <sup>+</sup> <sup>2</sup>*c*<sup>2</sup> <sup>−</sup> <sup>2</sup>*<sup>c</sup>* <sup>√</sup>*c*<sup>2</sup> <sup>+</sup> <sup>2</sup>*a*, we have *<sup>G</sup>*(*k*, *<sup>h</sup>*(*k*)) = 0 for all *<sup>k</sup>* <sup>∈</sup> [*m*, *<sup>M</sup>*], and therefore (*k*, *h*(*k*)) is a solution to (4). Thus, the following system holds:

$$
\partial\_l(a^\varepsilon - k) - \varepsilon \partial\_x(a^\varepsilon - k) = \frac{1}{\varepsilon} \left( G(a^\varepsilon, b^\varepsilon) - G(k, h(k)) \right),
\tag{7a}
$$

$$
\partial\_t \left( b^\varepsilon - h(k) \right) + c \partial\_x \left( b^\varepsilon - h(k) \right) = \frac{1}{\varepsilon} \left( G(a^\varepsilon, b^\varepsilon) - G(k, h(k)) \right). \tag{7b}
$$

Multiplying (7a) by sgn(*a<sup>ε</sup>* <sup>−</sup> *<sup>k</sup>*), (7b) by sgn(*b<sup>ε</sup>* <sup>−</sup> *<sup>h</sup>*(*k*)) and summing yields:

$$\begin{aligned} &\partial\_t \left( |a^\varepsilon - k| + |b^\varepsilon - h(k)| \right) - c \partial\_x \left( |a^\varepsilon - k| - |b^\varepsilon - h(k)| \right) \\ &= \frac{1}{\varepsilon} \Big( \text{sgn}(a^\varepsilon - k) + \text{sgn}(b^\varepsilon - h(k)) \Big) \Big( G(a^\varepsilon, b^\varepsilon) - G(k, h(k)) \Big). \end{aligned}$$

Hence, using the monotonicity of *G*, we obtain the following entropy inequalities on (*a<sup>ε</sup>* , *bε* ):

$$
\partial\_t \left( |a^\varepsilon - k| + |b^\varepsilon - h(k)| \right) - c \partial\_x \left( |a^\varepsilon - k| - |b^\varepsilon - h(k)| \right) \le 0. \tag{8}
$$

We now turn to proving the entropy inequalities on *u<sup>ε</sup>* . Straightforward computations yield the existence of a constant *C* > 0 such that, for all *a*, *b* ∈ [*m*, *M*], one has |*h*(*a*) − *b*| ≤ *C*|*G*(*a*, *b*)|. We therefore work on the variable *<sup>w</sup><sup>ε</sup>* :<sup>=</sup> *<sup>h</sup>*(*aε*)−*a<sup>ε</sup>* <sup>2</sup>*<sup>c</sup>* in the first place. Let *κ* ∈ − *u*0*L*<sup>∞</sup> , *u*0*L*<sup>∞</sup> , and *<sup>k</sup>* <sup>∈</sup> [*m*, *<sup>M</sup>*] such that *<sup>κ</sup>* <sup>=</sup> *<sup>h</sup>*(*k*)−*<sup>k</sup>* <sup>2</sup>*<sup>c</sup>* . We have:

$$|w^{\varepsilon} - \kappa| = \frac{1}{2\varepsilon} \left( |h(a^{\varepsilon}) - h(k)| + |a^{\varepsilon} - k| \right) = \frac{1}{2\varepsilon} \left( |a^{\varepsilon} - k| + |b^{\varepsilon} - h(k)| + r\_1^{\varepsilon} \right), \tag{9}$$

where *r<sup>ε</sup>* <sup>1</sup> <sup>=</sup> <sup>|</sup>*h*(*a<sup>ε</sup>* ) <sup>−</sup> *<sup>h</sup>*(*k*)|−|*b<sup>ε</sup>* <sup>−</sup> *<sup>h</sup>*(*k*)<sup>|</sup> verifies <sup>|</sup>*r<sup>ε</sup>* 1|≤|*h*(*a<sup>ε</sup>* ) <sup>−</sup> *<sup>b</sup><sup>ε</sup>* | ≤ *<sup>C</sup>*|*G*(*a<sup>ε</sup>* , *bε* )|. Thus, we are left to control <sup>|</sup>*G*(*a<sup>ε</sup>* , *bε* )|. To do so, we formally differentiate this quantity and use (4):

$$\begin{split} \left| \partial\_{t} \big| G(a^{\varepsilon}, b^{\varepsilon}) \right| &= \left( \partial\_{t} a^{\varepsilon} \partial\_{a} G(a^{\varepsilon}, b^{\varepsilon}) + \partial\_{t} b^{\varepsilon} \partial\_{b} G(a^{\varepsilon}, b^{\varepsilon}) \right) \text{sgn}(G(a^{\varepsilon}, b^{\varepsilon})), \\ &= \frac{1}{\varepsilon} \Big( \partial\_{a} G(a^{\varepsilon}, b^{\varepsilon}) + \partial\_{b} G(a^{\varepsilon}, b^{\varepsilon}) \Big) \big| G(a^{\varepsilon}, b^{\varepsilon}) \big| \\ &\quad - \varepsilon \operatorname{sgn}(G(a^{\varepsilon}, b^{\varepsilon})) \Big( \partial\_{\mathbf{x}} a^{\varepsilon} \partial\_{\mathbf{a}} G(a^{\varepsilon}, b^{\varepsilon}) + \partial\_{\mathbf{x}} b^{\varepsilon} \partial\_{b} G(a^{\varepsilon}, b^{\varepsilon}) \Big), \\ &\leq \frac{1}{\varepsilon} \sup\_{[a, \mathbf{b}, \mathbf{b}]} \Big( \partial\_{\mathbf{a}} G + \partial\_{b} G \Big) \big| G(a^{\varepsilon}, b^{\varepsilon}) \big| + \varepsilon \sup\_{[a, \mathbf{b}, \mathbf{b}]} \Big( |\partial\_{\mathbf{a}} G| + |\partial\_{\mathbf{b}} G| \Big) \big( |\partial\_{\mathbf{x}} a^{\varepsilon}| + |\partial\_{\mathbf{x}} b^{\varepsilon}| \Big). \end{split}$$

Integrating in space gives:

$$\frac{\mathbf{d}}{\mathbf{d}t} \| G(a^{\varepsilon}, b^{\varepsilon}) \|\_{L^{1}} \le -\frac{A}{\varepsilon} \| G(a^{\varepsilon}, b^{\varepsilon}) \|\_{L^{1}} + B \left( TV(a\_{0}) + TV(b\_{0}) \right),$$

where *A* = − sup[*m*,*M*]<sup>2</sup> (*∂aG* + *∂bG*) and *B* = *c* sup[*m*,*M*]<sup>2</sup> (|*∂aG*| + |*∂bG*|) are positive constants which do not depend on *ε* nor on time. A Gronwall lemma then gives:

$$\|\|G(a^{\varepsilon}(t), b^{\varepsilon}(t))\|\|\_{L^{1}} \le C \left(TV(a\_{0}) + TV(b\_{0})\right)\varepsilon,\tag{10}$$

where we still denote *C* = *B*/*A* as a constant independent of time and of *ε*.

In addition, since, *G*(*a*, *h*(*a*)) = 0, one has <sup>1</sup> 2 *h*(*a*)−*a* 2*c* 2 = <sup>1</sup> <sup>2</sup> (*h*(*a*) + *a*) and therefore:

$$\begin{split} \text{sgn}(w^{\varepsilon} - \kappa) \left( \frac{(w^{\varepsilon})^2}{2} - \frac{\kappa^2}{2} \right) &= \frac{1}{2} \text{sgn} \left( h(a^{\varepsilon}) - h(k) - (a^{\varepsilon} - k) \right) \left( h(a^{\varepsilon}) + a^{\varepsilon} - (h(k) + k) \right), \\ &= \frac{1}{2} \left( |h(a^{\varepsilon}) - h(k)| - |a^{\varepsilon} - k| \right), \\ &= \frac{1}{2} \left( |b - h(k)| - |a^{\varepsilon} - k| + r\_2^{\varepsilon} \right), \end{split} \tag{11}$$

with <sup>|</sup>*r<sup>ε</sup>* <sup>2</sup>| ≤ *<sup>C</sup>*|*G*(*a<sup>ε</sup>* , *bε* )|. Differentiating (9) in time and (11) in space, and using (8) thus yields:

$$
\left(\partial\_t|w^\varepsilon-\kappa|+\partial\_x\operatorname{sgn}(w^\varepsilon-\kappa)\left(\frac{(w^\varepsilon)^2}{2}-\frac{\kappa^2}{2}\right)\leq\frac{1}{2c}\left(\partial\_tr\_1^\varepsilon+c\partial\_xr\_2^\varepsilon\right).\tag{12}
$$

Then, we estimate *u*(*t*) <sup>−</sup> *<sup>w</sup><sup>ε</sup>* (*t*)*L*<sup>1</sup> using Kuznetsov's doubling of variables technique (see, e.g., [29] for scalar conservation laws with viscosity and [30] for a more general formalism) in order to combine (12) with Kruzkov inequalities on the entropy solution *u*, that read:

$$
\partial\_t |u - \kappa| + \partial\_x \operatorname{sgn}(u - \kappa)(f(u) - f(\kappa)) \le 0. \tag{13}
$$

Writing, respectively, (13) at point (*s*, *x*) for *κ* = *w<sup>ε</sup>* (*t*, *y*) and (12) at point (*t*, *y*) for *κ* = *u*(*s*, *x*), we obtain:

$$\left(\partial\_{s}|u(s,\mathbf{x})-w^{x}(t,y)|+\partial\_{\mathbf{x}}\operatorname{sgn}(u(s,\mathbf{x})-w^{x}(t,y))\left(\frac{u(s,\mathbf{x})^{2}}{2}-\frac{(w^{x}(t,y))^{2}}{2}\right)\leq 0,\tag{14a}$$

$$\left|\partial\_{t}|w^{\varepsilon}(t,y)-u(s,x)|+\partial\_{y}\operatorname{sgn}(w^{\varepsilon}(t,y)-u(s,x))\left(\frac{(w^{\varepsilon}(t,y))^{2}}{2}-\frac{u(s,x)^{2}}{2}\right)\right.\tag{14b}$$

$$\leq\frac{1}{2c}\Big(\partial\_{t}r\_{1}^{\varepsilon}(t,y)+c\partial\_{y}r\_{2}^{\varepsilon}(t,y)\Big).$$

Now, let *ωα*(*t*) = <sup>1</sup> *αω t α* and Ω*β*(*x*) = <sup>1</sup> *<sup>β</sup>* Ω *x β* be two mollyfing kernels. Setting *<sup>g</sup>*(*s*, *<sup>t</sup>*, *<sup>x</sup>*, *<sup>y</sup>*) = *ωα*(*<sup>s</sup>* <sup>−</sup> *<sup>t</sup>*)Ω*β*(*<sup>x</sup>* <sup>−</sup> *<sup>y</sup>*) and testing (14a) and (14b) against *<sup>g</sup>*(·, *<sup>t</sup>*, ·, *<sup>y</sup>*)[0,*T*] and *<sup>g</sup>*(*s*, ·, *<sup>x</sup>*, ·)[0,*T*], respectively, and integrating over [0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>, we obtain on the one hand:

$$\begin{aligned} &\iiint \partial\_{t} \mathcal{G}(s, t, x, y) |u(s, x) - w^{\varepsilon}(t, y)| \, \mathrm{d}s \, \mathrm{d}x \, \mathrm{d}t \, \mathrm{d}y \\ &+ \iiint \partial\_{x} \mathcal{G}(s, t, x, y) \, \mathrm{sgn}(u(s, x) - w^{\varepsilon}(t, y)) \left( \frac{u(s, x)^{2}}{2} - \frac{(w^{\varepsilon}(t, y))^{2}}{2} \right) \, \mathrm{d}s \, \mathrm{d}x \, \mathrm{d}t \, \mathrm{d}y \\ &- \iiint \, \mathcal{G}(T, t, x, y) |u(T, x) - w^{\varepsilon}(t, y)| \, \mathrm{d}x \, \mathrm{d}t \, \mathrm{d}y \\ &+ \iiint \, \mathcal{G}(0, t, x, y) |u(0, x) - w^{\varepsilon}(t, y)| \, \mathrm{d}x \, \mathrm{d}t \, \mathrm{d}y \ge 0, \end{aligned} \tag{15}$$

and on the other hand:

$$\begin{aligned} &\iiint \partial\_t g(s, t, x, y) |u^\varepsilon(t, y) - u(s, x)| \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dt} \, \operatorname{dy} \\ &+ \iiint \partial\_t g(s, t, x, y) \, \operatorname{sgn}(w^\varepsilon(t, y) - u(s, x)) \left( \frac{(w^\varepsilon(t, y))^2}{2} - \frac{u(s, x)^2}{2} \right) \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dt} \, \operatorname{dy} \\ &- \iiint g(s, T, x, y) |w^\varepsilon(T, y) - u(s, x)| \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dy} + \iiint g(s, 0, x, y) |w^\varepsilon(0, y) - u(s, x)| \, \operatorname{ds} \, \operatorname{dx} \operatorname{dy} \\ &\geq \frac{1}{2c} \left( \iiint \partial\_t g(s, t, x, y) r\_1^\varepsilon(t, y) \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dt} \, \operatorname{dy} + c \iiint \partial\_y g(s, t, x, y) r\_2^\varepsilon(t, y) \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dt} \, \operatorname{dy} \right) \\ &- \iiint g(s, T, x, y) r\_1^\varepsilon(T, y) \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dy} + \iiint g(s, 0, x, y) r\_1^\varepsilon(0, y) \, \operatorname{ds} \, \operatorname{dx} \, \operatorname{dy} \Big) =: \text{ RHS} \end{aligned}$$

Now, since |·| is even, and *∂sg* = −*∂tg* and *∂<sup>x</sup> g* = −*∂yg*, we deduce by adding (15) and (16):

$$\begin{aligned} & - \iiint \limits\_{\mathcal{S}} \mathcal{g}(T, t, \mathbf{x}, \mathbf{y}) |u(T, \mathbf{x}) - w^{\varepsilon}(t, \mathbf{y})| \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{t} \, \mathrm{d}\mathbf{y} \\ & + \iiint \limits\_{\mathcal{S}} \mathcal{g}(0, t, \mathbf{x}, \mathbf{y}) |u(0, \mathbf{x}) - w^{\varepsilon}(t, \mathbf{y})| \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{t} \, \mathrm{d}\mathbf{y} \\ & - \iiint \limits\_{\mathcal{S}} \mathcal{g}(s, T, \mathbf{x}, \mathbf{y}) |u(s, \mathbf{x}) - w^{\varepsilon}(T, \mathbf{y})| \, \mathrm{d}\mathbf{s} \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{y} \\ & + \iiint \limits\_{\mathcal{S}} \mathcal{g}(s, 0, \mathbf{x}, \mathbf{y}) |u(s, \mathbf{x}) - w^{\varepsilon}(0, \mathbf{y})| \, \mathrm{d}\mathbf{s} \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{y} \geq \mathrm{RHS}. \end{aligned} \tag{17}$$

Then, we write:

$$\begin{split} \|\boldsymbol{u}(T) - \boldsymbol{w}^{\varepsilon}(T)\|\_{L^{1}} &= \iiint \omega\_{\boldsymbol{u}}(T - t)\Omega\_{\beta}(\mathbf{x} - \boldsymbol{y})|\boldsymbol{u}(T, \boldsymbol{y}) - \boldsymbol{w}^{\varepsilon}(T, \boldsymbol{y})| \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{y} \\ &+ \iiint \omega\_{\boldsymbol{u}}(\mathbf{s} - T)\Omega\_{\beta}(\mathbf{x} - \boldsymbol{y})|\boldsymbol{u}(T, \boldsymbol{y}) - \boldsymbol{w}^{\varepsilon}(T, \boldsymbol{y})| \, \mathrm{d}\mathbf{s} \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{y}, \\ &=: I\_{1} + I\_{2}. \end{split} \tag{18}$$

A triangle inequality gives for *I*1:

$$\begin{split} I\_{1} &\leq \iiint \omega\_{\mathfrak{a}}(T-t)\Omega\_{\beta}(\mathbf{x}-\mathbf{y})|u(T,\mathbf{y})-u(T,\mathbf{x})| \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{t} \, \mathrm{d}\mathbf{y} \\ &+ \iiint \omega\_{\mathfrak{a}}(T-t)\Omega\_{\beta}(\mathbf{x}-\mathbf{y})|u(T,\mathbf{x})-w^{\varepsilon}(t,\mathbf{y})| \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{t} \, \mathrm{d}\mathbf{y} \\ &+ \iiint \omega\_{\mathfrak{a}}(T-t)\Omega\_{\beta}(\mathbf{x}-\mathbf{y})|w^{\varepsilon}(t,\mathbf{y})-w^{\varepsilon}(T,\mathbf{y})| \, \mathrm{d}\mathbf{x} \, \mathrm{d}\mathbf{t} \, \mathrm{d}\mathbf{y} \\ &=: T\_{1} + T\_{2} + T\_{3}. \end{split}$$

with *T*<sup>1</sup> ≤ *Cβ* · *TV*(*u*0), the second term *T*<sup>2</sup> appearing in (17) and for the last one we write:

$$T\_3 \le \int\_{\mathbb{R}} \Omega\_{\boldsymbol{\theta}}(\boldsymbol{x} - \boldsymbol{y}) \int\_0^T \omega\_a(\boldsymbol{T} - \boldsymbol{t}) \int\_{\mathbb{R}} |\boldsymbol{w}^\varepsilon(\boldsymbol{t}, \boldsymbol{y}) - \boldsymbol{w}^\varepsilon(\boldsymbol{T}, \boldsymbol{y})| \, \text{d}y \, \text{d}t \, \text{d}x, \quad \boldsymbol{t}$$

and then we use the fact that *w<sup>ε</sup>* is uniformely Lipschitz in *L*1(R) with respect to *ε*. Indeed, one has *<sup>∂</sup>tw<sup>ε</sup>* <sup>=</sup> *<sup>∂</sup>taε*(*<sup>h</sup>* (*aε*)−1) <sup>2</sup>*<sup>c</sup>* with *h* (*aε* ) − 1 being uniformely bounded with respect to *ε* as *<sup>a</sup><sup>ε</sup>* stays in the compact set [*m*, *<sup>M</sup>*] for all time. In addition, estimating *∂ta<sup>ε</sup>* (*t*)*L*<sup>1</sup> can be done reusing (4) and (10):

$$\|\|\partial\_t a^\varepsilon(t)\|\|\_{L^1} \le c \|\partial\_x a^\varepsilon(t)\|\|\_{L^1} + \frac{1}{\varepsilon} \|G(a^\varepsilon(t), b^\varepsilon(t))\|\|\_{L^1} \le C(TV(a\_0) + TV(b\_0)).$$

with *<sup>C</sup>* <sup>&</sup>gt; 0 still independent of time and of *<sup>ε</sup>*. Hence, *∂tw<sup>ε</sup>* (*t*)*L*<sup>1</sup> ≤ *C*(*TV*(*a*0) + *TV*(*b*0)) and *T*<sup>3</sup> ≤ *αC*(*TV*(*a*0) + *TV*(*b*0)). All in all, we get for *I*1:

$$\begin{aligned} I\_1 &\le \iiint \omega\_{\mathfrak{a}}(T - t) \Omega\_{\mathfrak{f}}(\mathbf{x} - y) |u(T, \mathbf{x}) - w^{\mathfrak{e}}(t, y)| \, \mathrm{d}\mathbf{x} \, \mathrm{d}t \, \mathrm{d}y + \mathbb{C}\boldsymbol{\beta} \cdot \boldsymbol{T} V(u\_0) \\ &+ a \mathcal{C}(\boldsymbol{T} V(a\_0) + \boldsymbol{T} V(b\_0)). \end{aligned}$$

Moreover, similarly, for *I*2:

$$I\_2 \le \iiint \omega\_a(\mathbf{s} - T) \Omega\_\beta(\mathbf{x} - y) |u(\mathbf{s}, \mathbf{x}) - w^\varepsilon(T, y)| \, \mathbf{ds} \, \mathbf{dx} \, \mathbf{dy} + \mathcal{C}(\mathbf{a} + \beta)TV(\mathbf{u}\_0).$$

Returning to (18), we obtain:

$$\|\|u(T) - w^{\varepsilon}(T)\|\|\_{L^{1}} \le \iiint \omega\_{a}(t)\Omega\_{\beta}(\mathbf{x} - \mathbf{y})|u(0, \mathbf{x}) - w^{\varepsilon}(t, \mathbf{y})| \, \mathbf{dx} \, \mathrm{d}t \, \mathrm{d}y \tag{19}$$

$$+\iiint \omega\_a(s)\Omega\_\beta(\mathbf{x}-y)|u(s,\mathbf{x}) - w^\varepsilon(0,y)| \,\mathrm{d}s \,\mathrm{d}x \,\mathrm{d}y - RHS \tag{20}$$

$$+\kappa \mathcal{C}(TV(a\_0) + TV(b\_0)) + \mathcal{C}(a + \beta)TV(u\_0). \tag{21}$$

However, using a triangle inequality, one can show that:

$$\iiint \omega\_a(t)\Omega\_\beta(\mathbf{x}-y)|u\_0(\mathbf{x}) - w^x(t,y)| \, \mathrm{d}x \, \mathrm{d}t \, \mathrm{d}y \leq \mathbb{C}\beta \cdot TV(u\_0) + \mathfrak{a}\mathbb{C}(TV(a\_0) + TV(b\_0)),$$

and similarly:

$$\iiint \omega\_a(\mathbf{s}) \Omega\_\beta(\mathbf{x} - \mathbf{y}) |u(\mathbf{s}, \mathbf{x}) - w^\varepsilon(0, y)| \, \text{d}s \, \text{d}x \, \text{d}y \le \mathbb{C}(a + \beta)TV(u\_0).$$

We then bound from above the term RHS using inequality *r<sup>ε</sup> <sup>i</sup>*(*t*)*L*<sup>1</sup> ≤ *C*(*TV*(*a*0) + *TV*(*b*0))*ε* for *i* = 1, 2:

$$\begin{aligned} \left| \text{RHS} \right| &= \frac{1}{2c} \left| \frac{1}{a} \iiint \omega' \left( \frac{s-t}{a} \right) \Omega\_{\beta}(x-y) r\_{1}^{\varepsilon}(t,y) \text{ ds dx dt dy} \right. \\ &\quad + \frac{c}{\beta} \iiint \omega\_{a}(s-t) \Omega'(x-y) r\_{2}^{\varepsilon}(t,y) \text{ ds dx dt dy} \\ &\quad - \iiint \omega\_{a}(s-T) \Omega\_{\beta}(x-y) r\_{1}^{\varepsilon}(T,y) \text{ ds dx dy} \\ &\quad + \iiint \omega\_{a}(s) \Omega\_{\beta}(x-y) r\_{1}^{\varepsilon}(0,y) \text{ ds dx dy} \Big|, \\ &\leq \mathbb{C} \left( \frac{T}{a} + \frac{T}{\beta} + 1 \right) \cdot (TV(a\_{0}) + TV(b\_{0})) \varepsilon. \end{aligned}$$

Finally, we obtain:

$$\begin{aligned} \|\|u(T) - w^\varepsilon(T)\|\|\_{L^1} &\le \mathcal{C} \left(\frac{T}{\mathfrak{a}} + \frac{T}{\mathcal{B}} + 1\right) (TV(a\_0) + TV(b\_0))\varepsilon \\ &+ \mathcal{C}(\mathfrak{a} + \beta)TV(u\_0) + \mathcal{aC}(TV(a\_0) + TV(b\_0))\varepsilon, \end{aligned}$$

which, after optimizing the values of *α* and *β* and noticing that *TV*(*a*0), *TV*(*b*0) ≤ *C* · *TV*(*u*0), gives:

$$\|\|u(T) - w^{\varepsilon}(T)\|\|\_{L^{1}} \leq \mathcal{C}TV(u\_{0})(\sqrt{\varepsilon T} + \varepsilon),$$

and this inequality, along with |*h*(*a*) − *b*| ≤ *C*|*G*(*a*, *b*)| and (10) gives in turn the result.

Denoting *<sup>ρ</sup>* <sup>=</sup> <sup>−</sup>*∂xu*, the convergence of *<sup>u</sup><sup>ε</sup>* (*t*) towards *u*(*t*) in *L*1(R) ensures that *<sup>ρ</sup>*(*t*) is a probability measure. Indeed, since for all *<sup>ε</sup>* <sup>&</sup>gt; 0, *<sup>ρ</sup><sup>ε</sup>* <sup>=</sup> <sup>−</sup>*∂xu<sup>ε</sup>* is a non-negative distribution, so is *ρ*. The Riesz–Markov theorem then ensures that *ρ* can be represented by a non-negative Borel measure. In addition, almost everywhere, for *<sup>t</sup>* <sup>≥</sup> 0, *<sup>u</sup><sup>ε</sup>* (*t*) is a non-increasing function taking values in [0, 1] and hence converges to a certain limit when *x* goes to +∞. The same holds true for the limit function *u*(*t*). However, since *uε* (*t*) <sup>−</sup> *<sup>u</sup>*(*t*) <sup>∈</sup> *<sup>L</sup>*1(R), then *<sup>u</sup><sup>ε</sup>* (*t*, *x*) − *u*(*t*, *x*) must vanish as *x* goes to +∞. Therefore, the total mass of *ρ*(*t*) is 1.

Then, passing to the relaxation system (2) for the aggregation Equation (1) can be done by using (6) with *p* = 1. As a consequence, Theorem 1 translates as follows for the aggregation.

**Theorem 2.** *Let <sup>ρ</sup>*<sup>0</sup> ∈ P2(R)*, <sup>c</sup>* <sup>&</sup>gt; 1/2 *and set <sup>σ</sup>*<sup>0</sup> <sup>=</sup> *<sup>a</sup>*[*ρ*0]*ρ*0*. There exists a constant <sup>C</sup>* <sup>&</sup>gt; <sup>0</sup> *such that, for any ε* > 0*, denoting* (*ρ<sup>ε</sup>* , *σ<sup>ε</sup>* ) *the solution to* (2) *with initial data* (*ρ*0, *σ*0)*, one has:*

> <sup>∀</sup>*<sup>T</sup>* <sup>&</sup>gt; 0, *<sup>W</sup>*1(*ρ*(*T*), *<sup>ρ</sup><sup>ε</sup>* (*T*)) ≤ *C*( √ *εT* + *ε*),

*where <sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*([0, <sup>+</sup>∞),P2(R)) *is the unique solution* (1) *with initial datum <sup>ρ</sup>*0*.*

#### **3. Numerical Discretization**

Hereafter, we denote Δ*t* the time step and we introduce a Cartesian mesh of size Δ*x*. We denote *t <sup>n</sup>* <sup>=</sup> *<sup>n</sup>*Δ*<sup>t</sup>* for *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> and *xj* <sup>=</sup> *<sup>j</sup>*Δ*<sup>x</sup>* for *<sup>j</sup>* <sup>∈</sup> <sup>Z</sup>. In this section, we extend our framework and consider the aggregation Equation (1) with arbitrary pointy potentials *W*, which satisfy the following conditions:


In this framework, the convergence of *ρ<sup>ε</sup>* towards *ρ* for a slightly different problem has also been studied in [7]. Adapting the argument, the convergence still holds provided the sub-characteristic condition *c* > *a*<sup>∞</sup> is verified. However, for such general potentials, the authors were not able to obtain the estimates of the speed of convergence as stated in Theorem 2.

In this section, we propose some numerical schemes able to capture the limit *ε* → 0, thus satisfying the so-called asymptotic preserving (AP) property. We consider two approaches, the first one based on a splitting algorithm, and the second one based on a well-balanced discretization.

#### *3.1. A Splitting Algorithm*

A first simple approach to discretize the system (2) is to use a splitting method. Such a method is known to be convergent and easy to implement but introduces numerical diffusion. Notice that the system (2) rewrites, with *μ* = *σ* − *cρ*, *ν* = *σ* + *cρ*, as

$$
\partial\_t \mu - c \partial\_x \mu = \frac{1}{\varepsilon} \left( a \left[ \frac{\nu - \mu}{2c} \right] \left( \frac{\nu - \mu}{2c} \right) - \frac{\mu + \nu}{2} \right) \tag{22a}
$$

$$
\partial\_t \nu + c \partial\_x \nu = \frac{1}{\varepsilon} \left( a \left[ \frac{\nu - \mu}{2c} \right] \left( \frac{\nu - \mu}{2c} \right) - \frac{\mu + \nu}{2} \right). \tag{22b}
$$

The idea of the method is to solve in a first step on (*t <sup>n</sup>*, *t <sup>n</sup>* + Δ*t*) the system:

$$\begin{aligned} \partial\_t \mu &= \frac{1}{\varepsilon} \left( a \left[ \frac{\nu - \mu}{2c} \right] \left( \frac{\nu - \mu}{2c} \right) - \frac{\mu + \nu}{2} \right) \\ \partial\_t \nu &= \frac{1}{\varepsilon} \left( a \left[ \frac{\nu - \mu}{2c} \right] \left( \frac{\nu - \mu}{2c} \right) - \frac{\mu + \nu}{2} \right) , \end{aligned}$$

with initial data (*μ*(*t <sup>n</sup>*), *ν*(*t <sup>n</sup>*)) = (*μn*, *νn*). We obtain *μ n*+ <sup>1</sup> 2 *<sup>j</sup>* = *μ*(*t <sup>n</sup>* + Δ*t*, *xj*) and *ν n*+ <sup>1</sup> 2 *<sup>j</sup>* = *ν*(*t <sup>n</sup>* + Δ*t*, *xj*). Notice that this system may be solved explicitly. Indeed, by adding and subtracting the two equations, we deduce after an integration:

$$\nu\_j^{n+\frac{1}{2}} - \mu\_j^{n+\frac{1}{2}} = \nu\_j^n - \mu\_j^n \tag{23a}$$

$$
\mu\_{j}^{n+\frac{1}{2}} + \nu\_{j}^{n+\frac{1}{2}} = (\mu\_{j}^{n} + \nu\_{j}^{n})e^{-\Lambda t/\varepsilon} + a\left[\frac{\nu^{n} - \mu^{n}}{2\varepsilon}\right]\left(\frac{\nu^{n} - \mu^{n}}{2\varepsilon}\right)(1 - e^{-\Lambda t/\varepsilon}).\tag{23b}
$$

Then, in a second step, we discretize by a classical finite volume upwind scheme the system:

$$
\partial\_t \mu - c \partial\_x \mu = 0, \qquad \partial\_t \nu + c \partial\_x \nu = 0.
$$

That is:

$$
\mu\_j^{n+1} = \mu\_j^{n+\frac{1}{2}} + c \frac{\Delta t}{\Delta x} (\mu\_{j+1}^{n+\frac{1}{2}} - \mu\_j^{n+\frac{1}{2}}),
\tag{24a}
$$

$$\nu\_{j}^{n+1} = \nu\_{j}^{n+\frac{1}{2}} - c \frac{\Delta t}{\Delta x} (\nu\_{j}^{n+\frac{1}{2}} - \nu\_{j-1}^{n+\frac{1}{2}}) . \tag{24b}$$

Coming back to the variables *ρ* and *σ*, we obtain:

$$\begin{aligned} \nu\_j^{n+\frac{1}{2}} &= \varepsilon \rho\_j^n + \sigma\_j^n e^{-\Delta x/\varepsilon} + a\_j^n \rho\_j^n (1 - e^{-\Delta t/\varepsilon}), \\ \mu\_j^{n+\frac{1}{2}} &= -\varepsilon \rho\_j^n + \sigma\_j^n e^{-\Delta x/\varepsilon} + a\_j^n \rho\_j^n (1 - e^{-\Delta t/\varepsilon}), \end{aligned}$$

with *a<sup>n</sup> <sup>j</sup>* = − ∑ *k*=*j W* (*xj* <sup>−</sup> *xk*)*ρ<sup>n</sup> <sup>k</sup>* . Then, the splitting algorithm reads:

$$\begin{split} \rho\_{j}^{n+1} &= \rho\_{j}^{n} - \frac{1}{2} \frac{\Delta t}{\Delta x} (\mu\_{j+1}^{n+\frac{1}{2}} + \nu\_{j}^{n+\frac{1}{2}} - \mu\_{j}^{n+\frac{1}{2}} - \nu\_{j-1}^{n+\frac{1}{2}}) \\ &= \rho\_{j}^{n} - \frac{1}{2} \frac{\Delta t}{\Delta x} \Big( (\sigma\_{j+1}^{n} - \sigma\_{j-1}^{n}) e^{-\Delta t/\varepsilon} \\ &\quad + (1 - e^{-\Delta t/\varepsilon}) (a\_{j+1}^{n} \rho\_{j+1}^{n} - a\_{j-1}^{n} \rho\_{j-1}^{n}) - c(\rho\_{j+1}^{n} - 2\rho\_{j}^{n} + \rho\_{j-1}^{n}) \Big), \end{split} \tag{25}$$

and:

$$\begin{split} \boldsymbol{\sigma}\_{j}^{n+1} &= \boldsymbol{\sigma}\_{j}^{n+\frac{1}{2}} + \frac{c}{2} \frac{\Delta t}{\Delta x} (\boldsymbol{\sigma}\_{j+1}^{n} - 2\boldsymbol{\sigma}\_{j}^{n} + \boldsymbol{\sigma}\_{j-1}^{n}) \boldsymbol{e}^{-\Delta t/\varepsilon} \\ &+ \frac{c}{2} \frac{\Delta t}{\Delta x} \Big( (\boldsymbol{a}\_{j+1}^{n}\boldsymbol{\rho}\_{j+1}^{n} - 2\boldsymbol{a}\_{j}^{n}\boldsymbol{\rho}\_{j}^{n} + \boldsymbol{a}\_{j-1}^{n}\boldsymbol{\rho}\_{j-1}^{n}) (1 - e^{-\Delta t/\varepsilon}) - \boldsymbol{c} (\boldsymbol{\rho}\_{j+1}^{n} - \boldsymbol{\rho}\_{j-1}^{n}) \Big) \\ &= \boldsymbol{\sigma}\_{j}^{n} e^{-\Delta t/\varepsilon} + \boldsymbol{a}\_{j}^{n} \boldsymbol{\rho}\_{j}^{n} (1 - e^{-\Delta t/\varepsilon}) + \frac{c}{2} \frac{\Delta t}{\Delta x} (\boldsymbol{\sigma}\_{j+1}^{n} - 2\boldsymbol{\sigma}\_{j}^{n} + \boldsymbol{\sigma}\_{j-1}^{n}) e^{-\Delta t/\varepsilon} \\ &+ \frac{c}{2} \frac{\Delta t}{\Delta x} \Big( (\boldsymbol{a}\_{j+1}^{n}\boldsymbol{\rho}\_{j+1}^{n} - 2\boldsymbol{a}\_{j}^{n}\boldsymbol{\rho}\_{j}^{n} + \boldsymbol{a}\_{j-1}^{n}\boldsymbol{\rho}\_{j-1}^{n}) (1 - e^{-\Delta t/\varepsilon}) - \boldsymbol{c} (\boldsymbol{\rho}\_{j+1}^{n} - \boldsymbol{\rho}\_{j-1}^{n}) \Big) . \end{split} \tag{26}$$

**Lemma 1.** *For any <sup>ε</sup>* <sup>&</sup>gt; <sup>0</sup>*, if both the CFL condition <sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* <sup>≤</sup> <sup>1</sup> *and the subcharacteristic condition <sup>c</sup>* <sup>≥</sup> *<sup>a</sup>*<sup>∞</sup> *hold, then the splitting scheme* (23) *and* (24) *is L*1*-stable:*

$$\forall n \in \mathbb{N}, \qquad \sum\_{j \in \mathbb{Z}} \left( |\mu\_j^{n+1}| + |\nu\_j^{n+1}| \right) \le \sum\_{j \in \mathbb{Z}} \left( |\mu\_j^n| + |\nu\_j^n| \right).$$

**Proof.** We have:

$$\begin{aligned} \mu\_{j}^{n+\frac{1}{2}} &= \frac{1}{2} \Big( e^{-\Delta t/\varepsilon} \Big( 1 + \frac{a\_{j}^{n}}{c} \Big) + 1 - \frac{a\_{j}^{n}}{c} \Big) \mu\_{j}^{n} - \frac{1 - e^{-\Delta t/\varepsilon}}{2} \Big( 1 - \frac{a\_{j}^{n}}{c} \Big) \nu\_{j}^{n}, \\\ \nu\_{j}^{n + \frac{1}{2}} &= -\frac{1 - e^{-\Delta t/\varepsilon}}{2} \Big( 1 + \frac{a\_{j}^{n}}{c} \Big) \mu\_{j}^{n} + \frac{1}{2} \Big( e^{-\Delta t/\varepsilon} \Big( 1 - \frac{a\_{j}^{n}}{c} \Big) + 1 + \frac{a\_{j}^{n}}{c} \Big) \nu\_{j}^{n}. \end{aligned}$$

Under the condition *c* ≥ *a*∞, in the expression of *μ n*+ <sup>1</sup> 2 *<sup>j</sup>* , the coefficient in front of *<sup>μ</sup><sup>n</sup> <sup>j</sup>* is non-negative and the one in front of *ν<sup>n</sup> <sup>j</sup>* is non-positive. Similarly, in *ν n*+ <sup>1</sup> 2 *<sup>j</sup>* , the coefficient of *μn <sup>j</sup>* is non-positive and the one in front of *<sup>ν</sup><sup>n</sup> <sup>j</sup>* is non-negative. Taking the absolute value and adding up therefore yields:

$$\left| \mu\_j^{n+\frac{1}{2}} \right| + \left| \nu\_j^{n+\frac{1}{2}} \right| \le \left| \mu\_j^n \right| + \left| \nu\_j^n \right|.$$

It remains to remark that, provided the CFL condition *<sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* <sup>≤</sup> 1 is verified, (24) gives:

$$\begin{split} \sum\_{j \in \mathbb{Z}} \left( |\boldsymbol{\mu}\_{j}^{n+1}| + |\boldsymbol{\nu}\_{j}^{n+1}| \right) &\leq \left( 1 - \frac{c\Delta t}{\Delta \mathbf{x}} \right) \sum\_{j \in \mathbb{Z}} \left( \left| \boldsymbol{\mu}\_{j}^{n + \frac{1}{2}} \right| + \left| \boldsymbol{\nu}\_{j}^{n + \frac{1}{2}} \right| \right) \\ &+ \frac{c\Delta t}{\Delta \mathbf{x}} \sum\_{j \in \mathbb{Z}} \left| \boldsymbol{\mu}\_{j+1}^{n + \frac{1}{2}} \right| + \frac{c\Delta t}{\Delta \mathbf{x}} \sum\_{j \in \mathbb{Z}} \left| \boldsymbol{\nu}\_{j-1}^{n + \frac{1}{2}} \right|, \\ &\leq \left( 1 - \frac{c\Delta t}{\Delta \mathbf{x}} \right) \sum\_{j \in \mathbb{Z}} \left( \left| \boldsymbol{\mu}\_{j}^{n}| + |\boldsymbol{\nu}\_{j}^{n}| \right) + \frac{c\Delta t}{\Delta \mathbf{x}} \sum\_{j \in \mathbb{Z}} \left| \boldsymbol{\mu}\_{j}^{n + \frac{1}{2}} \right| + \frac{c\Delta t}{\Delta \mathbf{x}} \sum\_{j \in \mathbb{Z}} \left| \boldsymbol{\nu}\_{j}^{n + \frac{1}{2}} \right|, \\ &\leq \sum\_{j \in \mathbb{Z}} \left( \left| \boldsymbol{\mu}\_{j}^{n}| + |\boldsymbol{\nu}\_{j}^{n}| \right| \right). \end{split}$$

Note that similar schemes have also been studied in [31] and proved convergent at a rate of <sup>√</sup>Δ*x*.

Let us now verify the AP property. When *ε* → 0, we verify that the equation on *ρ* (25) converges to the following Rusanov discretization of (1) (see [21] for numerical simulations using the Rusanov scheme):

$$
\rho\_{\dot{j}}^{n+1} = \rho\_{\dot{j}}^{n} - \frac{1}{2} \frac{\Delta t}{\Delta \mathbf{x}} \left( a\_{\dot{j}+1}^{\mathrm{n}} \rho\_{\dot{j}+1}^{\mathrm{n}} - a\_{\dot{j}-1}^{\mathrm{n}} \rho\_{\dot{j}-1}^{\mathrm{n}} \right) + \frac{c \Delta t}{2 \Delta \mathbf{x}} (\rho\_{\dot{j}+1}^{\mathrm{n}} - 2\rho\_{\dot{j}}^{\mathrm{n}} + \rho\_{\dot{j}-1}^{\mathrm{n}}), \tag{27a}
$$

$$\mathbf{x}\_{\rangle}^{n} = -\sum\_{k \neq j} \mathcal{W}'(\mathbf{x}\_{j} - \mathbf{x}\_{k}) \rho\_{k}^{n}. \tag{27b}$$

This limiting scheme provides a consistent discretization of (1). Indeed, a similar scheme has been extensively studied in [11] using compactness arguments and the following convergence result was proven:

**Lemma 2.** *Assume <sup>ρ</sup>*<sup>0</sup> ∈ P2(R) *and that the stability conditions <sup>c</sup>* <sup>Δ</sup>*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* ≤ 1 *and c* ≥ *a*<sup>∞</sup> *are satisfied. Let T* > 0 *and suppose we initialize the scheme* (27) *with ρ*<sup>0</sup> *<sup>j</sup>* <sup>=</sup> <sup>1</sup> Δ*x ρ*0(*Cj*) *where Cj* = [*xj*<sup>−</sup> <sup>1</sup> 2 , *xj*<sup>+</sup> <sup>1</sup> 2 )*. Then, denoting ρ*Δ*<sup>x</sup> the reconstruction given by the scheme* (27)*, that is:*

$$\rho\_{\Delta x}(t) = \sum\_{n \in \mathbb{N}} \sum\_{j \in \mathbb{Z}} \rho\_j^n \mathbf{1}\_{[t^n, t^{n+1})}(t) \delta\_{x\_j, t}$$

*then <sup>ρ</sup>*Δ*<sup>x</sup> converges weakly in the sense of measures on* [0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup> *towards the solution <sup>ρ</sup> of Equation* (1)*, as* Δ*x goes to 0.*

It has been also proven in [32] that the scheme (27) converges at a rate of <sup>√</sup>Δ*x*.

#### *3.2. Well-Balanced Discretization*

Although the splitting method provides a simple way to obtain a discretization which is uniform with respect to the parameter *ε*, the resulting scheme has strong numerical diffusion and may not have good large time behavior. Then, well-balanced schemes have been introduced. A scheme is said to be well-balanced when it conserves equilibria. The method proposed in this section comes from [20].

Let us assume that, for some *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, the approximation (*μ<sup>n</sup> <sup>j</sup>* , *<sup>ν</sup><sup>n</sup> <sup>j</sup>* )*j*∈<sup>Z</sup> of (*μ*(*<sup>t</sup> <sup>n</sup>*, *xj*), *ν*(*t <sup>n</sup>*, *xj*))*j*∈<sup>Z</sup> solution of (22) is known. We construct an approximation at time *<sup>t</sup> <sup>n</sup>*+<sup>1</sup> using a finite volume upwind discretization of (22), with the discretization of the source terms *H<sup>n</sup> μ*,*j* , *H<sup>n</sup> <sup>ν</sup>*,*<sup>j</sup>* to be prescribed right afterwards:

$$
\mu\_{j}^{n+1} = \mu\_{j}^{n} + c \frac{\Delta t}{\Delta x} (\mu\_{j+1}^{n} - \mu\_{j}^{n}) + \frac{\Delta t}{\varepsilon} H\_{\mu,j}^{n} \tag{28a}
$$

,

$$\nu\_{j}^{n+1} = \nu\_{j}^{n} - \varepsilon \frac{\Delta t}{\Delta \mathbf{x}} (\nu\_{j}^{n} - \nu\_{j-1}^{n}) + \frac{\Delta t}{\varepsilon} H\_{\nu,j}^{n}. \tag{28b}$$

In order to preserve equilibria, we set :

$$H^{n}\_{\mu,j} = \frac{1}{\Delta x} \int\_{x\_{j-1}}^{x\_{j}} H(\overline{\mu}, \nabla) \, \mathrm{d}x, \qquad H(\mu, \nu) = a \left[ \frac{\nu - \mu}{2c} \right] \left( \frac{\nu - \mu}{2c} \right) - \frac{\mu + \nu}{2}, \tag{29}$$

where (*μ*, *<sup>ν</sup>*) solve the stationary system with incoming boundary conditions, on (*xj*−1, *xj*):

$$-\varepsilon \partial\_{\mathbf{x}} \overline{\mu} = \frac{1}{\varepsilon} H(\overline{\mu}, \nabla) \tag{30a}$$

$$c\partial\_{\mathbf{x}}\overline{\boldsymbol{\nu}} = \frac{1}{\varepsilon}H(\overline{\boldsymbol{\mu}}, \overline{\boldsymbol{\nu}})\tag{30b}$$

$$
\overline{\mu}(\mathbf{x}\_{\circ}) = \mu\_{\circ}^{\eta}, \qquad \overline{\nu}(\mathbf{x}\_{\circ -1}) = \nu\_{\circ -1}^{\eta}. \tag{30c}
$$

In addition, in the same fashion, *H<sup>n</sup> <sup>ν</sup>*,*<sup>j</sup>* <sup>=</sup> <sup>1</sup> Δ*x xj*+<sup>1</sup> *xj H*(*μ*˜, *ν*˜) d*x*, where (*μ*˜, *ν*˜) is the solution of the stationary system on (*xj*, *xj*+1):

$$-c\partial\_{\mathbf{x}}\overline{\mu} = \frac{1}{\varepsilon}H(\overline{\mu}, \overline{\nu})\tag{31a}$$

$$c\partial\_{\vec{x}}\vec{\mu} = \frac{1}{\varepsilon}H(\vec{\mu}, \vec{v})\tag{31b}$$

$$
\bar{\mu}(\mathbf{x}\_{\mathbf{j}+1}) = \mu\_{\mathbf{j}+1\prime}^{\mathbf{n}} \qquad \bar{\nu}(\mathbf{x}\_{\mathbf{j}}) = \nu\_{\mathbf{j}}^{\mathbf{n}}\,\tag{31c}
$$

Reporting Equations (30b) and (31a) into the discretization of the source term, we obtain *H<sup>n</sup> <sup>ν</sup>*,*<sup>j</sup>* <sup>=</sup> *<sup>c</sup><sup>ε</sup>* <sup>Δ</sup>*<sup>x</sup>* (*ν*(*xj*) <sup>−</sup> *<sup>ν</sup>j*−1) and *<sup>H</sup><sup>n</sup> <sup>μ</sup>*,*<sup>j</sup>* <sup>=</sup> <sup>−</sup> *<sup>c</sup><sup>ε</sup>* <sup>Δ</sup>*<sup>x</sup>* (*μ<sup>n</sup> <sup>j</sup>* − *μ*˜(*xj*)). Hence, one may rewrite the scheme (28) as

$$
\mu\_{\dot{j}}^{n+1} = \mu\_{\dot{j}}^{n} + c \frac{\Delta t}{\Delta x} (\tilde{\mu}(\mathbf{x}\_{\dot{j}}) - \mu\_{\dot{j}}^{n}) \tag{32a}
$$

$$\nu\_{\dot{j}}^{n+1} = \nu\_{\dot{j}}^n - c \frac{\Delta t}{\Delta x} (\nu\_{\dot{j}}^n - \overline{\nu}(x\_{\dot{j}})).\tag{32b}$$

Remark that the stationary system:

$$
\varepsilon - c \partial\_x \mu = \frac{1}{\varepsilon} H(\mu, \nu), \qquad c \partial\_x \nu = \frac{1}{\varepsilon} H(\mu, \nu), \tag{33}
$$

is equivalent to:

$$
\partial\_{\mathbf{x}} \sigma = 0, \qquad \sigma^2 \partial\_{\mathbf{x}} \rho = \frac{1}{\varepsilon} (a[\rho]\rho - \sigma). \tag{34}
$$

Therefore, denoting *σj*<sup>+</sup> <sup>1</sup> 2 <sup>=</sup> *<sup>μ</sup>*˜ <sup>+</sup> *<sup>ν</sup>*˜ <sup>2</sup> and *<sup>σ</sup>j*<sup>−</sup> <sup>1</sup> 2 <sup>=</sup> *<sup>μ</sup>* <sup>+</sup> *<sup>ν</sup>* <sup>2</sup> , which are constant, respectively, on (*xj*, *xj*+1) and (*xj*−1, *xj*), one has:

$$
\overline{\mu}(\mathbf{x}\_{\circ}) = 2\sigma\_{\mathbf{j}+\frac{1}{2}} - \nu\_{\mathbf{j}}^{\eta}, \qquad \overline{\nu}(\mathbf{x}\_{\circ}) = 2\sigma\_{\mathbf{j}-\frac{1}{2}} - \mu\_{\mathbf{j}}^{\eta}. \tag{35}
$$

Thus, it turns out that the scheme can be rewritten only in terms of the discretized unknowns and of *<sup>σ</sup>j*<sup>±</sup> <sup>1</sup> 2 :

$$
\mu\_{\dot{j}}^{n+1} = \mu\_{\dot{j}}^{n} - c \frac{\Delta t}{\Delta \mathbf{x}} (\mu\_{\dot{j}}^{n} + \nu\_{\dot{j}}^{n}) + \frac{2c\Delta t}{\Delta \mathbf{x}} \sigma\_{\dot{j} + \frac{1}{\pi}\prime} \tag{36a}
$$

$$\nu\_{\dot{j}}^{n+1} = \nu\_{\dot{j}}^{n} - \varepsilon \frac{\Delta t}{\Delta \mathbf{x}} (\mu\_{\dot{j}}^{n} + \nu\_{\dot{j}}^{n}) + \frac{2\varepsilon \Delta t}{\Delta \mathbf{x}} \sigma\_{\dot{j} - \frac{1}{2}}.\tag{36b}$$

Or equivalently:

$$
\rho\_{\dot{j}}^{n+1} = \rho\_{\dot{j}}^{n} - \frac{\Delta t}{\Delta x} (\sigma\_{\dot{j} + \frac{1}{2}} - \sigma\_{\dot{j} - \frac{1}{2}}) \, \tag{37a}
$$

$$
\sigma\_j^{n+1} = \sigma\_j^n - c \frac{\Delta t}{\Delta x} (2\sigma\_j^n - \sigma\_{j+\frac{1}{2}} - \sigma\_{j-\frac{1}{2}}).\tag{37b}
$$

However, solving the stationary systems (30) and (31) involves the resolution of a nonlinear and nonlocal ODE. Instead, we propose an approximation in the spirit of [20].

We replace the nonlinear term in (30a)–(30b) by *a<sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> 2 · *<sup>ν</sup>*−*<sup>μ</sup>* <sup>2</sup>*<sup>c</sup>* , where *<sup>a</sup><sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> 2 stands for a fixed and consistent discretization of *a ν*−*μ* 2*c* on the interval (*xj*−1, *xj*), to be specified afterwards. Similarly, we will replace the nonlinear term in (31a)–(31b) by *a<sup>n</sup> j*+ <sup>1</sup> 2 · *<sup>ν</sup>*˜−*μ*˜ <sup>2</sup>*<sup>c</sup>* with *an j*+ <sup>1</sup> 2 defined accordingly. In the following, we detail the construction for the problem (30a)–(30b) on (*xj*−1, *xj*).

Obviously, the definition of *a<sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> 2 should be taken with care [11,20]. In [32], the authors showed that, when discretizing the product *a*[*ρ*]*ρ*, if *a*[*ρ*] and *ρ* were not evaluated at the same point, then the resulting scheme produces the wrong dynamics. To take this into account, we will split *ρ* into one contribution coming from the left and one contribution coming from the right, i.e., we set *ρ* = *ρ<sup>L</sup>* + *ρ<sup>R</sup>* and *σ* = *σ<sup>L</sup>* + *σ<sup>R</sup>* where *ρL*(Δ*x*) = 0 and *ρR*(0) = 0. This implies that *ρ*(Δ*x*) = *ρR*(Δ*x*) and *ρ*(0) = *ρL*(0).

More precisely, we solve the two following boundary value problem, on (0, Δ*x*):

$$
\varepsilon c^2 \frac{d}{d\mathbf{x}} \rho\_{\perp} = a^n\_{\mathbf{j} - \frac{1}{2}, L} \rho\_{\perp} - \sigma\_{\perp}, \qquad \rho\_{\perp}(\Delta \mathbf{x}) = 0,\tag{38a}
$$

$$
\varepsilon c^2 \frac{d}{d\mathbf{x}} \rho\_{\mathbb{R}} = a^n\_{j - \frac{1}{2}, \mathbb{R}} \rho\_{\mathbb{R}} - \sigma\_{\mathbb{R} \prime} \qquad \rho\_{\mathbb{R}}(0) = 0,\tag{38b}
$$

We may explicitly solve these linear systems, and since *ρL*(0) = *ρ*(0) and *ρR*(Δ*x*) = *ρ*(Δ*x*), we obtain the relations:

$$
\sigma\_{\rm L} = \overline{\rho}(0) \kappa\_{j-\frac{1}{2}, \rm L'}^{n} \qquad \qquad \qquad \sigma\_{\rm R} = \overline{\rho}(\Delta x) \kappa\_{j-\frac{1}{2}, \rm R}^{n} . \tag{39}
$$

with:

$$\kappa\_{j-\frac{1}{2},L}^{n} = \frac{a\_{j-\frac{1}{2},L}^{n}}{1 - \exp\left(-a\_{j-\frac{1}{2},L}^{n} \Delta x/\left(\varepsilon a^{2}\right)\right)}, \qquad \kappa\_{j-\frac{1}{2},R}^{n} = \frac{a\_{j-\frac{1}{2},R}^{n}}{1 - \exp\left(a\_{j-\frac{1}{2},R}^{n} \Delta x/\left(\varepsilon a^{2}\right)\right)}.\tag{40}$$

Notice that we have:

$$\kappa\_{j-\frac{1}{2},L}^{n} \rightarrow (a\_{j-\frac{1}{2},L}^{n})\_{+}, \qquad \kappa\_{j-\frac{1}{2},R}^{n} \rightarrow -(a\_{j-\frac{1}{2},R}^{n})\_{-}, \quad \text{when } \varepsilon \rightarrow 0,\tag{41}$$

where we denote *a*<sup>+</sup> = max(0, *a*) ≥ 0 and *a*<sup>−</sup> = max(0, −*a*) ≥ 0—the positive and negative negative part of *a*. Using the boundary conditions in (30), we have:

$$
\overline{\rho}(0) = \frac{\nu\_{j-1}^n - \overline{\mu}(0)}{2c}, \qquad \overline{\rho}(\Delta x) = \frac{\overline{\nu}(\Delta x) - \mu\_j^n}{2c}. \tag{42}
$$

with (39) and the fact that *σ* = *σ<sup>L</sup>* + *σ<sup>R</sup>* is constant on [0, Δ*x*], we obtain the following 2 × 2 system on the unknowns *μ*(0), *ν*(Δ*x*):

$$
\mu\_j^n + \overline{\nu}(\Delta \mathbf{x}) = \overline{\mu}(\mathbf{0}) + \nu\_{j-1}^n. \tag{43a}
$$

$$
\mu\_j^n + \overline{\nu}(\Delta x) = \frac{\nu\_{j-1}^n - \overline{\mu}(0)}{2c} \kappa\_{j-\frac{1}{2}, L}^n + \frac{\overline{\nu}(\Delta x) - \mu\_j^n}{2c} \kappa\_{j-\frac{1}{2}, R}^n \tag{43b}
$$

Solving this system yields:

$$\overline{\mu}(0) = -\nu\_{j-1}^n \frac{c - \kappa\_{j-\frac{1}{2}, \mathbb{R}}^n - \kappa\_{j-\frac{1}{2}, L}^n}{c - \kappa\_{j-\frac{1}{2}, \mathbb{R}}^n + \kappa\_{j-\frac{1}{2}, L}^n} - \mu\_j^n \frac{\kappa\_{j-\frac{1}{2}, \mathbb{R}}^n}{c - \kappa\_{j-\frac{1}{2}, \mathbb{R}}^n + \kappa\_{j-\frac{1}{2}, L}^n},\tag{44a}$$

$$\overline{\nu}(\Delta x) = \nu\_{j-1}^n \frac{\kappa\_{j-\frac{1}{2},L}^n}{c - \kappa\_{j-\frac{1}{2},R}^n + \kappa\_{j-\frac{1}{2},L}^n} - \mu\_j^n \frac{c + \kappa\_{j-\frac{1}{2},R}^n + \kappa\_{j-\frac{1}{2},L}^n}{c - \kappa\_{j-\frac{1}{2},R}^n + \kappa\_{j-\frac{1}{2},L}^n}. \tag{44b}$$

From which we deduce with (42):

$$\boldsymbol{\rho}\_{j-\frac{1}{2},L}^{n} := \overline{\boldsymbol{\rho}}(0) = \frac{1}{c} \left( \frac{(\boldsymbol{c} - \boldsymbol{\kappa}\_{j-\frac{1}{2},R}^{n})\boldsymbol{\nu}\_{j-1}^{n} + \boldsymbol{\kappa}\_{j-\frac{1}{2},R}^{n}\boldsymbol{\mu}\_{j}^{n}}{c + \boldsymbol{\kappa}\_{j-\frac{1}{2},L}^{n} - \boldsymbol{\kappa}\_{j-\frac{1}{2},R}} \right) \tag{45a}$$

$$\boldsymbol{\rho}\_{j-\frac{1}{2},\mathcal{R}}^{\boldsymbol{n}} := \overline{\boldsymbol{\rho}}(\boldsymbol{\Delta x}) = \frac{1}{c} \left( \frac{\boldsymbol{\kappa}\_{j-\frac{1}{2},L}^{\boldsymbol{n}} \boldsymbol{\nu}\_{j-1}^{\boldsymbol{n}} - (\boldsymbol{c} + \boldsymbol{\kappa}\_{j-\frac{1}{2},L}^{\boldsymbol{n}}) \boldsymbol{\mu}\_{j}^{\boldsymbol{n}}}{c + \boldsymbol{\kappa}\_{j-\frac{1}{2},L}^{\boldsymbol{n}} - \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{R}}} \right) \tag{45b}$$

and with (39):

$$\sigma\_{j-\frac{1}{2}} := \sigma\_{\mathcal{L}} + \sigma\_{\mathcal{R}} = \rho\_{j-\frac{1}{2},\mathcal{L}}^{n} \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{L}}^{n} + \rho\_{j-\frac{1}{2},\mathcal{R}}^{n} \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{R}}^{n} = \frac{\nu\_{j-1}^{n} \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{L}}^{n} - \mu\_{j}^{n} \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{R}}^{n}}{c - \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{R}}^{n} + \boldsymbol{\kappa}\_{j-\frac{1}{2},\mathcal{L}}^{n}},\tag{46}$$

(the above quantities are well-defined since *κ<sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>L</sup>* <sup>≥</sup> 0 and *<sup>κ</sup><sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>≤</sup> 0). Injecting into (37), it gives the following scheme:

$$\mu\_{j}^{n+1} = \left(1 - \frac{c\Delta t}{\Delta x}\right)\mu\_{j}^{n} - \frac{c\Delta t}{\Delta x}\frac{c - \kappa\_{j+\frac{1}{2},R}^{n} - \kappa\_{j+\frac{1}{2},L}^{n}}{c - \kappa\_{j+\frac{1}{2},R}^{n} + \kappa\_{j+\frac{1}{2},L}^{n}}\nu\_{j}^{n} - \frac{2c\Delta t}{\Delta x}\frac{\kappa\_{j+\frac{1}{2},R}^{n}}{c - \kappa\_{j+\frac{1}{2},R}^{n} + \kappa\_{j+\frac{1}{2},L}^{n}}\mu\_{j+1}^{n},\tag{47a}$$
 
$$\nu\_{j}^{n+1} = \left(1 - \frac{c\Delta t}{\Delta x}\right)\nu\_{j}^{n} - \frac{c\Delta t}{\Delta x}\frac{c + \kappa\_{j-\frac{1}{2},R}^{n} + \kappa\_{j-\frac{1}{2},L}^{n}}{c - \kappa\_{j-\frac{1}{2},R}^{n} + \kappa\_{j-\frac{1}{2},L}^{n}}\mu\_{j}^{n} + \frac{2c\Delta t}{\Delta x}\frac{\kappa\_{j-\frac{1}{2},L}^{n}}{c - \kappa\_{j-\frac{1}{2},R}^{n} + \kappa\_{j-\frac{1}{2},L}^{n}}\nu\_{j-1}^{n},\tag{47b}$$

where the coefficients *κ<sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*<sup>R</sup>* are defined in (40). Equivalently, for the variable (*ρ*, *<sup>σ</sup>*), the scheme reads:

$$\rho\_{j}^{n+1} = \rho\_{j}^{n} - \frac{\Delta t}{\Delta x} \left( \frac{\nu\_{j}^{n} \kappa\_{j+\frac{1}{2},L}^{n} - \mu\_{j+1}^{n} \kappa\_{j+\frac{1}{2},R}^{n}}{c - \kappa\_{j+\frac{1}{2},R}^{n} + \kappa\_{j+\frac{1}{2},L}^{n}} - \frac{\nu\_{j-1}^{n} \kappa\_{j-\frac{1}{2},L}^{n} - \mu\_{j}^{n} \kappa\_{j-\frac{1}{2},R}^{n}}{c - \kappa\_{j-\frac{1}{2},R}^{n} + \kappa\_{j-\frac{1}{2},L}^{n}} \right) \tag{48a}$$

$$
\sigma\_{j}^{n+1} = \sigma\_{j}^{n} - c \frac{\Delta t}{\Delta x} \left( 2\sigma\_{j}^{n} - \frac{\nu\_{j}^{n}\kappa\_{j+\frac{1}{2},L}^{n} - \mu\_{j+1}^{n}\kappa\_{j+\frac{1}{2},R}^{n}}{c - \kappa\_{j+\frac{1}{2},R}^{n} + \kappa\_{j+\frac{1}{2},L}^{n}} - \frac{\nu\_{j-1}^{n}\kappa\_{j-\frac{1}{2},L}^{n} - \mu\_{j}^{n}\kappa\_{j-\frac{1}{2},R}^{n}}{c - \kappa\_{j-\frac{1}{2},R}^{n} + \kappa\_{j-\frac{1}{2},L}^{n}} \right), \tag{48b}
$$

where we recall that *μ<sup>n</sup> <sup>j</sup>* = *<sup>σ</sup><sup>n</sup> <sup>j</sup>* <sup>−</sup> *<sup>c</sup>ρ<sup>n</sup> <sup>j</sup>* and *<sup>ν</sup><sup>n</sup> <sup>j</sup>* = *<sup>σ</sup><sup>n</sup> <sup>j</sup>* + *<sup>c</sup>ρ<sup>n</sup> j* .

It remains to define the velocities *a<sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*<sup>R</sup>* used in (38) and in (40). We take:

$$a\_{j-\frac{1}{2},L/R}^n = -\sum\_{k \neq j} \mathcal{W}'(\mathbf{x}\_j - \mathbf{x}\_k) \rho\_{k-\frac{1}{2},L/R}^n.$$

However, this discretization implies the resolution of a nonlinear problem, since the quantities *ρ<sup>n</sup> <sup>k</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*<sup>R</sup>* depends nonlinearly on *<sup>a</sup><sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*R*.

Then, we implement a fixed point method initialized with *a n*,(0) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>L</sup>* :<sup>=</sup> *<sup>a</sup><sup>n</sup> <sup>j</sup>*−<sup>1</sup> and *<sup>a</sup> n*,(0) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* :<sup>=</sup> *an <sup>j</sup>* . Solving, on each cell (*xj*−1, *xj*), the system of ODEs (38) with these values for the velocities gives two sequences, (*ρ* (1) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L* )*j*∈<sup>Z</sup> and (*<sup>ρ</sup>* (1) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*R*)*j*∈Z. Then, we assign the next value of the velocity to *a n*,(1) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*<sup>R</sup>* :<sup>=</sup> <sup>−</sup> ∑ *k*=*j W* (*xj* − *xk*)*ρ* (1) *<sup>k</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*R*, which allows us to compute new values for the left and right densities, (*ρ* (2) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L* )*j*∈<sup>Z</sup> and (*<sup>ρ</sup>* (2) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*R*)*j*∈Z, through (38). We iterate until *W*2(*ρ* (*i*) *<sup>L</sup>* , *ρ* (*i*+1) *<sup>L</sup>* ) and *W*2(*ρ* (*i*) *<sup>R</sup>* , *ρ* (*i*+1) *<sup>R</sup>* ) pass below a certain threshold. Notice that the velocities *a n*,(*i*) *<sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*<sup>R</sup>* always remain bounded by *<sup>a</sup>*∞. In practice, only a few iterations

are needed.

The resulting scheme is consistent for any *ε* > 0 and stable under standard stability conditions, as shown by the following lemmas.

**Lemma 3** (*L*<sup>1</sup> stability)**.** *Under the CFL condition <sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* <sup>≤</sup> <sup>1</sup> *and the subcharacteristic condition <sup>c</sup>* <sup>≥</sup> *<sup>a</sup>*∞*, there holds that the sequence* (*μ<sup>n</sup> <sup>j</sup>* , *<sup>ν</sup><sup>n</sup> <sup>j</sup>* )*j*,*<sup>n</sup> defined by the scheme* (47)*, verifies the following L*<sup>1</sup> *stability property:*

$$\forall n \in \mathbb{N}\_{\prime} \qquad \sum\_{j \in \mathbb{Z}} \left( |\mu\_{j}^{n+1}| + |\nu\_{j}^{n+1}| \right) \leq \sum\_{j \in \mathbb{Z}} \left( |\mu\_{j}^{n}| + |\nu\_{j}^{n}| \right).$$

**Proof.** In each combination of (47), the first coefficient is non-negative under the CFL condition *<sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* <sup>≤</sup> 1, and so is the last one since *<sup>κ</sup><sup>n</sup> <sup>j</sup>*<sup>±</sup> <sup>1</sup> <sup>2</sup> ,*<sup>L</sup>* <sup>≥</sup> 0 and *<sup>κ</sup><sup>n</sup> <sup>j</sup>*<sup>±</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>≤</sup> 0. Moreover, under the subcharacteristic condition *<sup>c</sup>* <sup>≥</sup> *<sup>a</sup>*∞, it holds that <sup>−</sup>*<sup>c</sup>* <sup>≤</sup> *<sup>κ</sup>j*<sup>±</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>+</sup> *<sup>κ</sup>j*<sup>±</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>≤</sup> *<sup>c</sup>* so the remaining coefficient is non-positive. Thus, applying the triangle inequality and re-indexing the sums appropriately:

∑ *<sup>j</sup>*∈<sup>Z</sup> <sup>|</sup>*μn*+<sup>1</sup> *<sup>j</sup>* <sup>|</sup> <sup>+</sup> <sup>|</sup>*νn*+<sup>1</sup> *j* | ≤ ∑ *<sup>j</sup>*∈<sup>Z</sup> <sup>1</sup> <sup>−</sup> *<sup>c</sup>*Δ*<sup>t</sup>* Δ*x* |*μn <sup>j</sup>* | + ∑ *<sup>j</sup>*∈<sup>Z</sup> *c*Δ*t* Δ*x <sup>c</sup>* <sup>−</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>−</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*L <sup>c</sup>* <sup>−</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>+</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*L* |*νn j* | − ∑ *<sup>j</sup>*∈<sup>Z</sup> 2*c*Δ*t* Δ*x κn j*+ <sup>1</sup> <sup>2</sup> ,*R <sup>c</sup>* <sup>−</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>+</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*L* |*μn <sup>j</sup>*+1| + ∑ *<sup>j</sup>*∈<sup>Z</sup> <sup>1</sup> <sup>−</sup> *<sup>c</sup>*Δ*<sup>t</sup>* Δ*x* |*νn j* | + ∑ *<sup>j</sup>*∈<sup>Z</sup> *c*Δ*t* Δ*x c* + *κ<sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>+</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*L <sup>c</sup>* <sup>−</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>+</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*L* |*μn <sup>j</sup>*+1| + 2*c*Δ*t* Δ*x κn j*+ <sup>1</sup> <sup>2</sup> ,*L <sup>c</sup>* <sup>−</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>+</sup> *<sup>κ</sup><sup>n</sup> j*+ <sup>1</sup> <sup>2</sup> ,*L* |*νn j* |, ≤ <sup>1</sup> <sup>−</sup> *<sup>c</sup>*Δ*<sup>t</sup>* Δ*x* ∑ *<sup>j</sup>*∈<sup>Z</sup> |*μn <sup>j</sup>* <sup>|</sup> <sup>+</sup> <sup>|</sup>*ν<sup>n</sup> j* | <sup>+</sup> *<sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* ∑ *<sup>j</sup>*∈<sup>Z</sup> |*μn <sup>j</sup>*+1<sup>|</sup> <sup>+</sup> *<sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* ∑ *<sup>j</sup>*∈<sup>Z</sup> |*νn j* |, ≤ ∑ *<sup>j</sup>*∈<sup>Z</sup> |*μn <sup>j</sup>* <sup>|</sup> <sup>+</sup> <sup>|</sup>*ν<sup>n</sup> j* | .

This concludes the proof.

**Lemma 4** (Consistency for smooth solutions)**.** *Assume that, for all <sup>j</sup>* <sup>∈</sup> <sup>Z</sup>*, we have <sup>a</sup><sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*<sup>R</sup>* <sup>=</sup> − ∑ *k*=*j W* (*xj* <sup>−</sup> *xk*)*ρk*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*L*/*R. Then, for any <sup>ε</sup>* <sup>&</sup>gt; <sup>0</sup>*, the scheme* (37) *is consistent with* (2) *provided that the solutions are smooth enough.*

**Proof.** For *<sup>j</sup>* <sup>∈</sup> <sup>Z</sup>, one has, using the Taylor expansions as <sup>Δ</sup>*<sup>x</sup>* <sup>→</sup> 0:

$$\begin{split} \frac{\kappa^{n}\_{j-\frac{1}{2},L}}{c - \kappa^{n}\_{j-\frac{1}{2},R} + \kappa^{n}\_{j-\frac{1}{2},L}} &= \frac{1}{2} - \frac{1}{4\epsilon c^{2}} \left( c - \frac{a^{n}\_{j-\frac{1}{2},L} + a^{n}\_{j-\frac{1}{2},R}}{2} \right) \Delta x + O(\Delta x^{2}),\\ \frac{\kappa^{n}\_{j-\frac{1}{2},R}}{c - \kappa^{n}\_{j-\frac{1}{2},R} + \kappa^{n}\_{j-\frac{1}{2},L}} &= -\frac{1}{2} + \frac{1}{4\epsilon c^{2}} \left( c + \frac{a^{n}\_{j-\frac{1}{2},L} + a^{n}\_{j-\frac{1}{2},R}}{2} \right) \Delta x + O(\Delta x^{2}). \end{split}$$

Thus:

$$\begin{split} \sigma\_{j-\frac{1}{2}} = \frac{\sigma\_{j-1}^{\boldsymbol{n}} + \sigma\_{j}^{\boldsymbol{n}}}{2} + c \frac{\rho\_{j-1}^{\boldsymbol{n}} - \rho\_{j}^{\boldsymbol{n}}}{2} - \frac{1}{4\varepsilon c^{2}} \Biggl( \left( c - \frac{a\_{j-\frac{1}{2},L}^{\boldsymbol{n}} + a\_{j-\frac{1}{2},R}^{\boldsymbol{n}}}{2} \right) (\sigma\_{j-1}^{\boldsymbol{n}} + c\rho\_{j-1}^{\boldsymbol{n}}) \\ \quad + \left( c + \frac{a\_{j-\frac{1}{2},L}^{\boldsymbol{n}} + a\_{j-\frac{1}{2},R}^{\boldsymbol{n}}}{2} \right) (\sigma\_{j}^{\boldsymbol{n}} - c\rho\_{j}^{\boldsymbol{n}}) \Biggr) \Delta x + O(\Delta x^{2}). \end{split}$$

In particular, *<sup>σ</sup>j*<sup>−</sup> <sup>1</sup> 2 is clearly consistent with *σ*(*t <sup>n</sup>*, *xj*<sup>−</sup> <sup>1</sup> 2 ) as long as the solution (*ρ*, *σ*) is smooth enough to perform standard consistency analysis for finite differences. This shows that (37a) is consistent with *∂tρ* + *∂xσ* = 0. As for the consistency of (37b) with *∂tσ* + *c*2*∂xρ* = <sup>1</sup> *<sup>ε</sup>*(*a*[*ρ*]*ρ* − *σ*), we write:

$$\begin{split} \sigma\_{j+\frac{1}{2}} + \sigma\_{j-\frac{1}{2}} - 2\sigma\_{j}^{u} &= \frac{\sigma\_{j+1}^{u} - 2\sigma\_{j}^{u} + \sigma\_{j-1}^{u}}{2} + c\frac{\rho\_{j-1}^{u} - \rho\_{j+1}^{u}}{2} - \frac{\Delta x}{4\varepsilon\mathcal{E}^{2}} \left[ c(\sigma\_{j-1}^{u} + 2\sigma\_{j}^{u} + \sigma\_{j+1}^{u}) \right. \\ &+ \frac{a\_{j-\frac{1}{2},L}^{u} + a\_{j-\frac{1}{2},R}^{u}}{2}(\sigma\_{j}^{u} - \sigma\_{j-1}^{u}) + \frac{a\_{j+\frac{1}{2},L}^{u} + a\_{j+\frac{1}{2},R}^{u}}{2}(\sigma\_{j+1}^{u} - \sigma\_{j}^{u}) + c^{2}(\rho\_{j-1}^{u} - \rho\_{j+1}^{u}) \\ &- c\left(\frac{a\_{j-\frac{1}{2},L}^{u} + a\_{j-\frac{1}{2},R}^{u}}{2}\rho\_{j-1}^{u} + \frac{a\_{j-\frac{1}{2},L}^{u} + a\_{j-\frac{1}{2},R}^{u} + a\_{j+\frac{1}{2},L}^{u} + a\_{j+\frac{1}{2},R}^{u}}{2}\rho\_{j}^{u} + \frac{a\_{j+\frac{1}{2},L}^{u} + a\_{j+\frac{1}{2},R}^{u}}{2}\rho\_{j+1}^{u} \right) \Bigg] \\ &+ O(\Delta x^{2}). \end{split}$$

Using Taylor expansions, we have, for smooth solutions *σ*(*t <sup>n</sup>*, *xj*+1) <sup>−</sup>2*σ*(*<sup>t</sup> <sup>n</sup>*, *xj*) + *σ*(*t <sup>n</sup>*, *xj*−1) = *O*(Δ*x*2), *ρ*(*t <sup>n</sup>*, *xj*−1) <sup>−</sup> *<sup>ρ</sup>*(*<sup>t</sup> <sup>n</sup>*, *xj*+1) = *O*(Δ*x*), *σ*(*t <sup>n</sup>*, *xj*) <sup>−</sup> *<sup>σ</sup>*(*<sup>t</sup> <sup>n</sup>*, *xj*−1) = *<sup>O</sup>*(Δ*x*) and *σ*(*t <sup>n</sup>*, *xj*+1) <sup>−</sup> *<sup>σ</sup>*(*<sup>t</sup> <sup>n</sup>*, *xj*) = *<sup>O</sup>*(Δ*x*). Along with the bound <sup>|</sup>*a<sup>n</sup> <sup>j</sup>*<sup>±</sup> <sup>1</sup> <sup>2</sup> ,*L*/*R*| ≤ *<sup>a</sup>*∞, this implies:

$$\begin{split} \sigma\_{j+\frac{1}{2}} + \sigma\_{j-\frac{1}{2}} - 2\sigma\_{j}^{n} &= c \frac{\rho\_{j-1}^{n} - \rho\_{j+1}^{n}}{2} - \frac{1}{4\epsilon\varepsilon^{2}} \Biggl[ c(\sigma\_{j-1}^{n} + 2\sigma\_{j}^{n} + \sigma\_{j+1}^{n}) \\ &- c \left( \frac{a\_{j-\frac{1}{2},L}^{n} + a\_{j-\frac{1}{2},R}^{n}}{2} \rho\_{j-1}^{n} + \frac{a\_{j-\frac{1}{2},L}^{n} + a\_{j-\frac{1}{2},R}^{n} + a\_{j+\frac{1}{2},L}^{n} + a\_{j+\frac{1}{2},R}^{n}}{2} \rho\_{j-1}^{n} \\ &+ \frac{a\_{j+\frac{1}{2},L}^{n} + a\_{j+\frac{1}{2},R}^{n}}{2} \rho\_{j+1}^{n} \Biggr) \Delta x + O(\Delta x^{2}). \end{split}$$

Clearly, *c ρn <sup>j</sup>*−1−*ρ<sup>n</sup> j*+1 <sup>2</sup> and *<sup>c</sup>*(*σ<sup>n</sup> <sup>j</sup>*−<sup>1</sup> <sup>+</sup> <sup>2</sup>*σ<sup>n</sup> <sup>j</sup>* + *<sup>σ</sup><sup>n</sup> <sup>j</sup>*+1) are consistent with an accuracy of *<sup>O</sup>*(Δ*x*2) and *O*(Δ*x*), respectively, with −*c∂xρ*(*t <sup>n</sup>*, *xj*) and 4*cσ*(*t <sup>n</sup>*, *xj*). For the remaining terms, let us recall that, with the notations of (42):

$$\rho\_{j-\frac{1}{2},L} = \frac{\nu\_{j-1}^n - \overline{\mu}(0)}{2c} = \frac{\nu\_{j-1}^n - \sigma\_{j-\frac{1}{2}}}{c}, \qquad \rho\_{j-\frac{1}{2},R} = \frac{\overline{\nu}(\Delta x) - \mu\_j^n}{2c} = \frac{\sigma\_{j-\frac{1}{2}} - \mu\_j}{c}.$$

Hence, *<sup>ρ</sup>j*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>L</sup>* <sup>+</sup> *<sup>ρ</sup>j*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* <sup>=</sup> *<sup>ν</sup><sup>n</sup> <sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>μ</sup><sup>n</sup> j <sup>c</sup>* <sup>=</sup> *<sup>σ</sup><sup>n</sup> <sup>j</sup>*−<sup>1</sup> <sup>−</sup> *<sup>σ</sup><sup>n</sup> j <sup>c</sup>* <sup>+</sup> *<sup>ρ</sup><sup>n</sup> <sup>j</sup>*−<sup>1</sup> <sup>+</sup> *<sup>ρ</sup><sup>n</sup> <sup>j</sup>* . Since *σ*(*t <sup>n</sup>*, *xj*−1) <sup>−</sup> *<sup>σ</sup>*(*<sup>t</sup> <sup>n</sup>*, *xj*) = *O*(Δ*x*), and assuming that:

$$a\_{j-\frac{1}{2},L/R}^n = -\sum\_{k \neq j} \mathcal{W}'(\mathfrak{x}\_j - \mathfrak{x}\_k) \rho\_{k-\frac{1}{2},L/R}$$

we deduce that *a<sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>L</sup>* <sup>+</sup> *<sup>a</sup><sup>n</sup> <sup>j</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> ,*<sup>R</sup>* is consistent with *<sup>a</sup>*[*ρ*(*<sup>t</sup> <sup>n</sup>*)](*xj*−1) + *<sup>a</sup>*[*ρ*(*<sup>t</sup> <sup>n</sup>*)](*xj*) with accuracy *O*(Δ*x*). It follows that *σj*<sup>+</sup> <sup>1</sup> 2 <sup>+</sup> *<sup>σ</sup>j*<sup>−</sup> <sup>1</sup> 2 <sup>−</sup> <sup>2</sup>*σ<sup>n</sup> <sup>j</sup>* is consistent with −*∂xρ*(*t <sup>n</sup>*, *xj*) <sup>−</sup> <sup>1</sup> *ε σ*(*t <sup>n</sup>*, *xj*) <sup>−</sup> *a*[*ρ*(*t <sup>n</sup>*)](*xj*)*ρ*(*t <sup>n</sup>*, *xj*) , again with accuracy *O*(Δ*x*), and this concludes the proof.

The stability conditions in Lemma 3 are independent on *ε*, we recover in the limit *ε* → 0, using (41), the scheme of [20]:

$$\rho\_{j}^{n+1} = \rho\_{j}^{n} - \frac{\Delta t}{\Delta x} \left( \frac{v\_{j}^{n}(a\_{j+\frac{1}{2},L}^{n})\_{+} + \mu\_{j+1}^{n}(a\_{j+\frac{1}{2},R}^{n})\_{-}}{c + (a\_{j+\frac{1}{2},R}^{n})\_{-} + (a\_{j+\frac{1}{2},L}^{n})\_{+}} - \frac{v\_{j-1}^{n}(a\_{j-\frac{1}{2},L}^{n})\_{+} + \mu\_{j}^{n}(a\_{j-\frac{1}{2},R}^{n})\_{-}}{c + (a\_{j-\frac{1}{2},R}^{n})\_{-} + (a\_{j-\frac{1}{2},L}^{n})\_{+}} \right) \tag{49a}$$

$$\sigma\_{j}^{n+1} = \sigma\_{j}^{n} - c\frac{\Delta t}{\Delta x} \left( 2\sigma\_{j}^{n} - \frac{\nu\_{j}^{n}(a\_{j+\frac{1}{2},L}^{n})\_{+} + \mu\_{j+1}^{n}(a\_{j+\frac{1}{2},R}^{n})\_{-}}{c + (a\_{j+\frac{1}{2},R}^{n})\_{-} + (a\_{j+\frac{1}{2},L}^{n})\_{+}} \right)$$

$$-\frac{\nu\_{j-1}^{n}(a\_{j-\frac{1}{2},L}^{n})\_{+} + \mu\_{j}^{n}(a\_{j-\frac{1}{2},R}^{n})\_{-}}{c + (a\_{j-\frac{1}{2},R}^{n})\_{-} + (a\_{j-\frac{1}{2},L}^{n})\_{+} + })},\tag{49b}$$

which is stable under the conditions *<sup>c</sup>*Δ*<sup>t</sup>* <sup>Δ</sup>*<sup>x</sup>* <sup>≤</sup> 1 and *<sup>c</sup>* <sup>≥</sup> *<sup>a</sup>*∞. Notice that with the notation in (46), Equation (49a) may be rewritten as

$$\begin{split} \rho\_{j}^{n+1} &= \rho\_{j}^{n} - \frac{\Delta t}{\Delta x} \Big( \rho\_{j+\frac{1}{2},L}^{n} (a\_{j+\frac{1}{2},L}^{n})\_{+} - \rho\_{j+\frac{1}{2},R}^{n} (a\_{j+\frac{1}{2},R}^{n})\_{-} \\ &- \rho\_{j-\frac{1}{2},L}^{n} (a\_{j-\frac{1}{2},L}^{n})^{+} + \rho\_{j-\frac{1}{2},R}^{n} (a\_{j-\frac{1}{2},R}^{n})\_{-} \Big). \end{split}$$

#### **4. Numerical Experiments**

We present some numerical illustrations for the two schemes described in the previous section. In addition to the potential *W*(*x*) = <sup>|</sup>*x*<sup>|</sup> <sup>2</sup> , we also consider the smooth potential *W*(*x*) = *<sup>x</sup>*<sup>2</sup> 2 .

Numerical tests are conducted on the domain [−1, 1] with the inital data *<sup>ρ</sup>*<sup>0</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> *<sup>δ</sup>*−0.5 <sup>+</sup> <sup>1</sup> <sup>2</sup> *δ*0.5, *σ*<sup>0</sup> = *a*[*ρ*0]*ρ*<sup>0</sup> and both schemes are initialized with:

$$
\rho\_j^0 = \frac{1}{\Delta x} \rho\_0(\mathcal{C}\_j), \qquad \sigma\_j^0 = \frac{1}{\Delta x} \sigma\_0(\mathcal{C}\_j).
$$

Figure 1 shows that both schemes recover the correct dynamics in the limit *ε* → 0: for the potential *W*(*x*) = <sup>|</sup>*x*<sup>|</sup> <sup>2</sup> , one can compute the exact velocity of both Dirac masses for the aggregation Equation (1) and see that they should be located, respectively, in *x* = −0.2 and *x* = 0.2 in final time *T* = 1.2.

This test is set up with *<sup>ε</sup>* <sup>=</sup> <sup>10</sup>−7, on a Cartesian mesh of [−1, 1] with 1500 cells, *<sup>c</sup>* <sup>=</sup> <sup>1</sup> and the CFL *c* Δ*t* <sup>Δ</sup>*<sup>x</sup>* <sup>=</sup> 0.9. Both schemes (27) and (49) display the correct velocity for the Dirac masses, but one can notice that the Rusanov scheme (27) shows more numerical diffusion. Note that both schemes are written in conservation form, they preserve the total mass of *ρ*, which is also verified numerically.

We then investigated the order of convergence when Δ*x* goes to 0 with *ε* fixed, in Wasserstein distance *W*<sup>1</sup> (the numerical results are the same for *W*2).

After performing tests for several values of *ε*, it appears that the convergence rate does not depend on the size of *ε*. Therefore, as an example, we propose simulations in final time *<sup>T</sup>* <sup>=</sup> 0.5, with the same intial data and stability parameters as above, and with *<sup>ε</sup>* <sup>=</sup> <sup>2</sup> <sup>×</sup> <sup>10</sup>−<sup>6</sup> for Figure 2 and with *ε* = 10−<sup>2</sup> for Figure 3:

For a fixed value of *ε*, both schemes seem to converge with order 1/2 with respect to Δ*x* for the smooth potential *W*(*x*) = *<sup>x</sup>*<sup>2</sup> <sup>2</sup> (see Figure 2) whereas they seem to be of order 1 for the potential *W*(*x*) = <sup>|</sup>*x*<sup>|</sup> <sup>2</sup> (see Figure 3). This can be explained as both schemes possess some numerical diffusion which is somehow counterbalanced by the aggregation phenomenon in the case of a pointy potential, as already observed in [21]. Due to the link with the Burgers equation, this superconvergence phenomenon is directly linked to the results of Després [33], which should be rigorously extended to our case (the mere extension to the upwind scheme of [11] for the aggregation is not straightforward).

**Figure 1.** Dynamics of two Dirac masses for the potential *W*(*x*) = <sup>|</sup>*x*<sup>|</sup> <sup>2</sup> in time *T* = 1.2.

**Figure 2.** Order of convergence of the splitting scheme and the well-balanced scheme for the smooth potential *W*(*x*) = *<sup>x</sup>*<sup>2</sup> 2 .

**Figure 3.** Order of convergence of the splitting scheme and the well-balanced scheme for the pointy potential *W*(*x*) = <sup>|</sup>*x*<sup>|</sup> 2 .

Finally, we also verified the well-balanced property of the scheme (48) by computing the *W*<sup>1</sup> distance between the approximated solution at time *T* = 0.5 and the stationary solution of (2) given by

$$\rho(t, \mathbf{x}) = \rho\_0(\mathbf{x}) := \frac{1}{8\varepsilon c^2} \left( 1 - \tanh^2\left(\frac{\mathbf{x}}{4\varepsilon c^2}\right) \right).$$

The test is conducted with *<sup>ε</sup>* <sup>=</sup> <sup>2</sup> <sup>×</sup> <sup>10</sup>−4, with the exact boundary conditions given by the above formula, and for several values of Δ*x*. As we show in Figure 4, the scheme (48) preserves well the above equilibrium for any Δ*x* (although we replaced the resolution of the systems (30) and (31) with linear systems, see (38)), while for the splitting scheme, we recover the linear convergence towards *ρ*<sup>0</sup> which is, in this case, the exact solution.

**Figure 4.** Distance to the equilibrium for the splitting scheme and the well-balanced scheme and for the pointy potential *W*(*x*) = <sup>|</sup>*x*<sup>|</sup> 2 .

**Author Contributions:** B.F., F.L., S.T.T. and N.V. contributed equally in writing this article. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


## *Article* **Macroscopic and Multi-Scale Models for Multi-Class Vehicular Dynamics with Uneven Space Occupancy: A Case Study**

**Maya Briani 1,\*, Emiliano Cristiani 1,\* and Paolo Ranut <sup>2</sup>**


**Abstract:** In this paper, we propose two models describing the dynamics of heavy and light vehicles on a road network, taking into account the interactions between the two classes. The models are tailored for two-lane highways where heavy vehicles cannot overtake. This means that heavy vehicles cannot saturate the whole road space, while light vehicles can. In these conditions, the creeping phenomenon can appear, i.e., one class of vehicles can proceed even if the other class has reached the maximal density. The first model we propose couples two first-order macroscopic LWR models, while the second model couples a second-order microscopic follow-the-leader model with a firstorder macroscopic LWR model. Numerical results show that both models are able to catch some second-order (inertial) phenomena such as stop and go waves. Models are calibrated by means of real data measured by fixed sensors placed along the A4 Italian highway Trieste–Venice and its branches, provided by Autovie Venete S.p.A.

**Keywords:** LWR model; follow-the-leader model; phase transition; creeping; seepage; fundamental diagram; lane discipline; networks

**MSC:** 35L65; 35F25; 90B20; 76T99

#### **1. Introduction**

In this paper, we deal with macroscopic and multi-scale modeling of traffic flow on a road network, focusing on multi-class dynamics which couple light and heavy vehicles (in the following, cars and trucks). The proposed models are characterized by the fact that cars and trucks interact with each other and that trucks are confined to a part of the road space (slow lane) and cannot overtake. As a consequence, when trucks saturate the space and form a queue, cars can still move, although at reduced speed.

#### *1.1. State of the Art*

The literature about traffic flow is very large and many different aspects of traffic dynamics were described through mathematical models. Let us start from classic approaches: in a single-lane *microscopic* (agent-based) framework with *N* vehicles and no overtaking, each vehicle *k* ∈ {1, ... , *N*} is singularly identified by its position *Xk*(*t*) and its velocity *Vk*(*t*). By assumption, the (*k* + 1)-th vehicle is always in front of the *k*-th one. Further, each vehicle is assumed to adjust its acceleration based on the difference in positions and velocities between the vehicle itself and the vehicle in front of it. This approach leads to the following system of ordinary differential equations:

$$\begin{cases} \dot{X}\_k = V\_k\\ \dot{V}\_k = A(X\_{k\prime}X\_{k+1\prime}V\_{k\prime}V\_{k+1}) \end{cases}, \quad k = 1, \dots, N-1 \tag{1}$$

**Citation:** Briani, M.; Cristiani, E., Ranut, P. Macroscopic and Multi-Scale Models for Multi-Class Vehicular Dynamics with Uneven Space Occupancy: A Case Study. *Axioms* **2021**, *10*, 102. https:// doi.org/10.3390/axioms10020102

Academic Editor: Angel R. Plastino

Received: 31 March 2021 Accepted: 15 May 2021 Published: 24 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

where *A* is a given acceleration function. The first vehicle in the row (*k* = *N*), called *leader*, has an independent dynamics. Since the whole dynamics is determined by the leader's one through a domino effect, these kinds of models are known as follow-the-leader.

Adopting, instead, a *macroscopic* (fluid dynamics) point of view, we describe the mass of vehicles by means of their density *ρ*(*x*, *t*) only. The celebrated LWR model [1,2] is based on the observation that the density *ρ* evolves in time, ruled by the following conservation law:

$$
\partial\_t \rho + \partial\_x f(\rho) = 0, \qquad \mathfrak{x} \in \mathbb{R}, \quad t > 0 \tag{2}
$$

where the function *f*(*ρ*), called *fundamental diagram*, is given and represents the flux of vehicles as a function of the density itself. The velocity of the vehicles can be recovered from *ρ* thanks to the relation

$$w(\rho) = \frac{f(\rho)}{\rho}, \qquad (\rho \neq 0). \tag{3}$$

It is important to note that the microscopic model (1) is second-order, i.e., accelerationbased, while the macroscopic model (2) is first-order, i.e., velocity-based. The difference is important because velocity-based models, allowing nonphysical instantaneous accelerations, are not able to catch effects caused by inertia, such as stop and go waves. Due to this difference, it is plain that the model (2) is not the many-particle limit of the model (1). We refer the interested reader to [3] for a review of various many-particle limits (i.e., micro-to-macro correspondences) and the existing multi-scale models.

The first generalization of the models (1) and (2) of our interest is that of *road networks*. While managing junctions in microscopic models is relatively easy, doing the same in a macroscopic setting is more challenging. The reason is that, in general, the conservation of the mass alone is not sufficient to characterize a unique solution at junctions. We refer the reader to the book by Garavello and Piccoli [4] for more details about the ill-posedness of the problem at junctions. Multiple workarounds for such ill-posedness have been suggested in the literature: (i) maximization of the flux across junctions and introduction of priorities among the incoming roads [4–6]; (ii) introduction of a *buffer* to model the junctions by means of an additional ordinary differential equation coupled with (2) [7–9]; (iii) reformulation of the problem on all possible paths on the network rather than on roads and junctions. The last approach has both a global formulation [10–12] and a more manageable local formulation, described in [13], which is the one we will adopt in this paper. All these approaches allow determining a unique solution for the traffic evolution on the network, but the solutions might be different.

The second generalization of our interest is that of *multi-class* dynamics. "Multiclass" is a very generic term used in the literature to refer to the case in which the road is populated by different groups of vehicles/drivers, and tracking each group separately is desired. Again, doing this in a microscopic framework is easy since it is sufficient to label each vehicle on the basis of the class it belongs to. In the macroscopic setting, instead, we need to introduce as many density functions as there are classes and then establish the interactions between classes. This leads to a system of conservation laws of the form

$$
\partial\_t \rho\_{\mathbb{C}} + \partial\_{\mathbb{X}} f\_{\mathbb{C}}(\rho\_{1'}, \dots, \rho\_{\mathbb{C}}) = 0, \qquad \mathbb{C} = 1, \dots, \mathbb{C}, \quad \mathbb{x} \in \mathbb{R}, \quad t > 0 \tag{4}
$$

where *C* is the number of classes, *c* ∈ {1, ... , *C*}, *ρ<sup>c</sup>* is the density of class *c* and *fc* is the flux of class *c* which depends on all densities (usually the dependence is on the sum of all densities ∑*<sup>c</sup> ρc*). Multi-class models are used to describe very different situations, such as the co-presence of vehicles with


• Reserved roads or reserved entry/exit lanes.

A complete review of multi-class models is out of the scope of this paper. We refer to [14–16] and to the recent books [17,18] for an overview of the most used multi-class models.

Before introducing our contributions, let us introduce the *creeping or seepage effect* [14,19] which will be useful to describe the features of the proposed models. This term denotes the situations where the road space is shared by small and large vehicles, and small vehicles are able to move (at reduced velocity) even if large vehicles have reached the maximal density. This is in contrast to classical models such as the one proposed by Benzoni-Gavage and Colombo [20], in which the saturation of a class of vehicles immediately stops all the other classes. It is useful to note that the creeping phenomenon is typically considered in a context of *disordered traffic*, i.e., traffic with no lane discipline: smaller vehicles (e.g., two wheels) slip into the empty spaces left by large vehicles, similar to motion through porous media. This is not the case considered here, since we assume a strict lane discipline.

Finally, let us recall some important contributions about the fundamental diagram and its properties. It is well known that a single function *f* = *f*(*ρ*) is not able, alone, to describe real data correctly. Indeed, by (3), we deduce that for any given density value *ρ*, only one velocity *v*(*ρ*) is possible. This is not what happens in reality, where a scattered fundamental diagram is observed instead, due to the fact that different drivers respond in a different way to the same traffic conditions. Many papers investigated this phenomenon from different points of view, trying to explain its features, including instabilities; see, e.g., [21–31].

#### *1.2. Case Study*

In this paper, we consider the Italian motorway A4 Trieste–Venice and its branches to/from Udine, Pordenone and Gorizia, managed by Autovie Venete S.p.A., see Figure 1.

**Figure 1.** The Italian motorway A4 Trieste–Venice and its branches to/from Udine, Pordenone and Gorizia, managed by Autovie Venete S.p.A.

At the time of the present study (2019), the motorway had two lanes per direction, except for the leftmost segment near Venice (Venice–San Donà). To avoid heterogeneous conditions, we dropped the three-lane segment of the road to focus exclusively on the parts with *two lanes per direction*. In those segments, cars can use both lanes at any time, while trucks can use only the slow lane and cannot overtake. Due to the large flow of heavy vehicles, it happens sometimes that a queue of trucks is formed. In this case, cars move into the fast lane and keep going, although at moderate speed. When traffic conditions are sustained, the two classes of vehicles interact with each other: on the one hand, trucks act as moving bottlenecks for cars (cf. [32]), which are forced to slow down due to the

restricted space; on the other hand, trucks must slow down when cars find it convenient to occupy part of the slow lane.

#### *1.3. Our Contribution*

In this paper, we propose two models for describing multi-class traffic flow on networks in which vehicles belonging to different classes share the road space only partially. More precisely, light vehicles can occupy the whole road, while heavy vehicles can occupy only a part of it. To align with the case study, we will assume that the road has two lanes in total and trucks can occupy only the slow one, without overtaking.


Let us finally mention that the idea of coupling first- and second-order models was already exploited in [3] in a single-class scenario.

**Remark 1.** *Both models distinguish classes, but not lanes. The fact that trucks cannot use the fast lane while cars can occupy both slow and fast lanes is encapsulated in the choice of the fundamental diagrams.*

#### **2. Dataset**

Autovie Venete constantly monitors traffic conditions by means of video cameras, mobile sensors and fixed sensors. In this paper, we focus on the latest kind of data. Fixed sensors are located along the motorway, on each lane, and measure the flux and velocity of all vehicles passing in front of them, also distinguishing the class of vehicles. Data are aggregated per minute and are stored in a database for later analysis. For light vehicles, we further aggregated data coming from slow and fast lanes. For heavy vehicles, instead, we considered the slow lane only. In Figures 2–4, we show some flux and velocity data coming from some fixed sensors, used to conceive and calibrate the models presented in this paper. For better readability, flux data are plotted as both raw (as is) and smoothed by a Gaussian filter. Note that the flux data are always a multiple of 60 since they are evaluated every minute, but they are expressed in terms of vehicles per hour.

**Figure 2.** Typical weekly (from Monday to Sunday) flux data on the A4 motorway of (**a**) light and (**b**) heavy vehicles collected on March 2019 near Redipuglia. Smoothed data are plotted in black. Note the flux drop of cars in the middle of the day and of trucks on the weekend.

**Figure 3.** Typical daily (Thursday) flux and velocity data on the A28 motorway of (**a**–**c**) light and (**b**–**d**) heavy vehicles collected in May 2019 near Sesto al Reghena.

**Figure 4.** Creeping phenomenon registered in May 2019 near Portogruaro: (**a**) Light vehicles move in the fast lane even if (**b**) heavy vehicles queue in the slow lane. (**c**) Light vehicles' velocity drops from ∼140 to ∼60 km/h and then to ∼20 km/h while (**d**) heavy vehicles are completely stopped.

#### **3. Models**

In this section, we present the two models. As already stated in the Introduction, the models are not meant to provide the same results or to be the many-particle limit of the other. Nevertheless, they share the most important constitutive assumptions, and for this reason, they are expected to provide the same qualitative results. The most important common modeling assumption is that the car dynamics is influenced, at any time, by the presence of trucks, while the truck dynamics is affected by cars only if the density of cars exceeds a certain threshold, which corresponds to the fact that cars cannot be confined to the fast lane any longer and must invade the slow lane where trucks live. This assumption comes from an important piece of evidence: cars tend to avoid being trapped between two trucks in the slow lane and prefer moving to the fast lane. Doing this, cars move to the side of trucks (overtaking them if possible) and do not affect their dynamics, unless the density of cars is so high that they must necessarily occupy the slow lane too.

#### *3.1. Macroscopic Model*

We denote by -<sup>L</sup> and by -<sup>H</sup> the average length of light vehicles (cars) and heavy vehicles (trucks), respectively, and we define

$$
\beta := \frac{\ell\_1}{\ell\_{\rm H}} < 1. \tag{5}
$$

We also denote by *ρ*<sup>L</sup> the density of cars and by *ρ*<sup>H</sup> the density of trucks. Similarly, we denote by *ρ*max <sup>L</sup> and *ρ*max <sup>H</sup> the maximal densities for cars and trucks, respectively. They are defined as

$$
\rho\_{\rm L}^{\rm max} = \frac{2}{\ell\_{\rm L}} \quad \text{and} \quad \rho\_{\rm H}^{\rm max} = \frac{1}{\ell\_{\rm H}} \tag{6}
$$

having assumed that there are two available lanes for cars and only one for trucks. Note that density values are expressed in terms of number of vehicles per unit of space. Considering that trucks occupy more space than cars, a direct comparison of the two densities is not meaningful. For this reason, the two classes are typically compared in terms of occupied space.

The two-class dynamics is physically admissible if the two densities fall in the set

$$\mathcal{D} := \left\{ (\rho\_{\mathcal{L}}, \rho\_{\mathcal{H}}) : 0 \le \rho\_{\mathcal{L}} \le \rho\_{\mathcal{L}}^{\max}, \, 0 \le \rho\_{\mathcal{H}} \le \rho\_{\mathcal{H}}^{\max}, \, 0 \le \rho\_{\mathcal{L}} + \frac{\rho\_{\mathcal{H}}}{\beta} \le \rho\_{\mathcal{L}}^{\max} \right\}, \tag{7}$$

which is well defined if *ρ*max <sup>L</sup> <sup>−</sup> *<sup>ρ</sup>*max H *<sup>β</sup>* ≥ 0. In the following, in order to cope with the uneven space occupancy, we assume that the last condition is verified with the strict inequality

$$
\rho\_{\rm L}^{\rm max} - \frac{\rho\_{\rm H}^{\rm max}}{\beta} > 0. \tag{8}
$$

We consider the following two-class model for (*ρ*L, *ρ*H) ∈ D:

$$\begin{cases} \partial\_t \rho\_{\mathbb{L}} + \partial\_x f\_{\mathbb{L}} (\rho\_{\mathbb{L}\prime} \rho\_{\mathbb{H}}) = 0 \\ \partial\_t \rho\_{\mathbb{H}} + \partial\_x f\_{\mathbb{H}} (\rho\_{\mathbb{L}\prime} \rho\_{\mathbb{H}}) = 0 \end{cases} \quad \mathbf{x} \in \mathbb{R}, \quad t > 0,\tag{9}$$

where

$$f\_{\mathsf{L}}(\rho\_{\mathsf{L},\prime}\rho\_{\mathsf{H}}) := \rho\_{\mathsf{L}}\upsilon\_{\mathsf{L}}(\rho\_{\mathsf{L},\prime}\rho\_{\mathsf{H}}), \qquad f\_{\mathsf{H}}(\rho\_{\mathsf{L},\prime}\rho\_{\mathsf{H}}) := \rho\_{\mathsf{H}}\upsilon\_{\mathsf{H}}(\rho\_{\mathsf{L},\prime}\rho\_{\mathsf{H}}).$$

define the two fundamental diagrams and *v*L, *v*<sup>H</sup> are the speed functions for light and heavy vehicles, respectively. We then have a family of flow–density curves *ρ*<sup>L</sup> → *f*L(*ρ*L, *ρ*H) for cars, parameterized by the truck density *ρ*H, and, analogously, a family of flow–density curves *ρ*<sup>H</sup> → *f*H(*ρ*L, *ρ*H) for trucks, parameterized by *ρ*L.

We assume that the flux and speed functions satisfy the following properties:

(L1) *v*L(*ρ*L, *ρ*H) ≥ 0 for all (*ρ*L, *ρ*H) ∈ D and *v*L(*ρ*L, *ρ*H) = 0 iff *ρ*<sup>L</sup> = *ρ*<sup>∗</sup> <sup>L</sup>(*ρ*H), where

$$
\rho\_{\mathbb{L}}^{\*} (\rho\_{\mathbb{H}}) := \rho\_{\mathbb{L}}^{\max} - \rho\_{\mathbb{H}} / \beta \tag{10}
$$

is the maximum admissible car density given the truck density *ρ*H;


$$\sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}) := \arg\max\_{\rho\_{\mathbb{L}}} f\_{\mathbb{L}}(\rho\_{\mathbb{L},\prime} \rho\_{\mathbb{H}}) \tag{11}$$

which represents, as usual, the interface between freeflow and congested regimes; (L5) *f*L(*ρ*L, *ρ*H) is a decreasing function with respect to *ρ*<sup>H</sup> for any *ρ*L.

Similarly,

$$\text{(H1)}\qquad \upsilon\_{\text{H}}(\rho\_{\text{L}}, \rho\_{\text{H}}) \ge 0 \text{ for all } (\rho\_{\text{L}}, \rho\_{\text{H}}) \in \mathcal{D} \text{ and } \upsilon\_{\text{H}}(\rho\_{\text{L}}, \rho\_{\text{H}}) = 0 \text{ iff } \rho\_{\text{H}} = \rho\_{\text{H}}^{\*}(\rho\_{\text{L}}) \text{ where}$$

$$\rho\_{\mathbb{H}}^{\*}(\rho\_{\mathbb{L}}) := \min \{ \rho\_{\mathbb{H}}^{\max}, \beta(\rho\_{\mathbb{L}}^{\max} - \rho\_{\mathbb{L}}) \} \tag{12}$$

is the maximum admissible truck density given the car density *ρ*L;


(H4) *f*H(*ρ*L, *ρ*H) is concave with respect to *ρ*<sup>H</sup> for any *ρ*L. We define

$$\sigma\_{\mathbb{H}}(\rho\_{\mathbb{L}}) := \arg\max\_{\rho\_{\mathbb{H}}} f\_{\mathbb{H}}(\rho\_{\mathbb{L}\prime}\rho\_{\mathbb{H}}) \tag{13}$$

which represents, as usual, the interface between freeflow and congested regimes; (H5) *f*H(*ρ*L, *ρ*H) is a decreasing function with respect to *ρ*<sup>L</sup> for any *ρ*H.

To cope with the peculiarities of the dynamics, we consider a *phase transition* (cf. [34–36]) caused by the presence of two states of the system:

• The *partial coupling phase* is in place when

$$\rho\_{\mathsf{L},\mathsf{L}}(\rho\_{\mathsf{L}},\rho\_{\mathsf{H}}) \in \mathcal{D}\_{\mathsf{L}} := \left\{ 0 \le \rho\_{\mathsf{H}} \le \rho\_{\mathsf{H}}^{\max} \; ; \; 0 \le \rho\_{\mathsf{L}} \le \rho\_{\mathsf{L}}^{\max} - \rho\_{\mathsf{H}}^{\max} / \beta \right\},\tag{14}$$

see Figure 5.

In this phase, we assume that cars are mainly in the fast lane and do not affect the truck dynamics. Trucks are then independent from cars.

For trucks, we choose a triangular fundamental diagram with

*v*H(*ρ*H) = *V*max <sup>H</sup> for all *ρ*<sup>H</sup> ≤ *σ*H, (15)

where *V*max <sup>H</sup> is the maximum speed of trucks, see Figure 6b.

Cars do not interfere with trucks but adapt their dynamics to the presence of them. Moreover, for cars, we choose (a family of) triangular fundamental diagrams, see Figure 6a. Specifically, we set

$$v\_{\mathbb{L}}(\rho\_{\mathbb{L}}, \rho\_{\mathbb{H}}) = \begin{cases} V\_{\mathbb{L}}^\*(\rho\_{\mathbb{H}}) & \text{if } \quad \rho\_{\mathbb{L}} \le \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \\\\ \frac{V\_{\mathbb{L}}^\*(\rho\_{\mathbb{H}})}{\rho\_{\mathbb{L}}^\*(\rho\_{\mathbb{H}}) - \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}})} \left( \frac{\rho\_{\mathbb{L}}^\*(\rho\_{\mathbb{H}})}{\rho\_{\mathbb{L}}} - 1 \right) & \text{if } \quad \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}) < \rho\_{\mathbb{L}} \le \rho\_{\mathbb{L}}^{\max} - \rho\_{\mathbb{H}}^{\max} / \beta, \end{cases} \tag{16}$$

where *V*∗ <sup>L</sup> (*ρ*H) is the maximum speed of cars given the truck density. We also define *V*∗ <sup>L</sup> (0) = *V*max <sup>L</sup> as the maximum speed of cars in the absence of trucks. Then, *V*<sup>∗</sup> <sup>L</sup> (*ρ*H) ≥ 0 and *σ*L(*ρ*H) ≥ 0 are continuous linear decreasing functions of *ρ*H. For (*ρ*L, *ρ*H) ∈ D1, the model (9) then becomes

$$\begin{cases} \partial\_t \rho\_{\mathbb{L}} + \partial\_{\mathbb{x}} f\_{\mathbb{L}} (\rho\_{\mathbb{L}\boldsymbol{\nu}} \rho\_{\mathbb{H}}) = 0 \\ \partial\_t \rho\_{\mathbb{H}} + \partial\_{\mathbb{x}} f\_{\mathbb{H}} (\rho\_{\mathbb{H}}) = 0 \end{cases} \tag{17}$$

where *f*L(*ρ*L, *ρ*H) = *ρ*L*v*L(*ρ*L, *ρ*H) and *f*H(*ρ*H) = *ρ*H*v*H(*ρ*H), as described in Figure 6.

• The *full coupling phase* is in place when (*ρ*L, *ρ*H) ∈ D<sup>2</sup> := D\D1, see Figure 5. In this case, we assume that there are too many cars to find it convenient to be confined to the fast lane. For this reason, they invade the slow lane, thus influencing the dynamics of trucks. The two equations in system (9) are then fully coupled.

As before, we choose for both classes a family of triangular fundamental diagrams which extend, by continuity, those defined in D1, as shown in Figure 7.

We define the *transition level* as the threshold density of light vehicles which acts as an interface between the two phases, see Figure 5. In our setting, trucks are confined to one of the two available lanes, and then the transition level is equal to *ρ*max <sup>L</sup> <sup>−</sup> *<sup>ρ</sup>*max <sup>H</sup> /*β* = *ρ*max <sup>L</sup> /2.

Note also that the fundamental diagrams we use in this work verify all the properties (L1)–(L5) and (H1)–(H5).

**Figure 5.** Domains D<sup>1</sup> and D<sup>2</sup> of the macroscopic model (9).

**Figure 6.** Fundamental diagrams of the macroscopic model in the partial coupling phase, i.e., (*ρ*L, *ρ*<sup>H</sup> ) ∈ D1. (**a**) Light vehicles, (**b**) heavy vehicles.

**Figure 7.** Fundamental diagrams of the macroscopic model in the full coupling phase, i.e., (*ρ*L, *ρ*<sup>H</sup> ) ∈ D2. (**a**) Light vehicles, (**b**) heavy vehicles.

#### *3.2. Multi-Scale Model*

In this section, we describe the multi-scale model. Here, cars are described by a first-order LWR model of type (2), and trucks are described by a second-order microscopic follow-the-leader model of type (1). Let us describe the microscopic model first, dropping, for the moment, the coupling with light vehicles.

#### 3.2.1. Microscopic Model for Heavy Vehicles

The microscopic model is the one presented in [3], which is, in turn, inspired by the model originally proposed by Zhao and Zhang in [33].

In the following, we denote by Δ*<sup>k</sup>* the gap between truck *k* and truck *k* + 1 at any time *t*:

$$
\Delta\_k(t) := X\_{k+1}(t) - X\_k(t).
$$

It is plain that this gap is inversely proportional to the density of heavy vehicles. We define in (1)

$$A\left(X\_{k'}X\_{k+1}, V\_{k'}V\_{k+1}\right) = \begin{cases} \frac{1}{\tau\_{\text{acc}}} \left(\upsilon^{ZZ}(\Delta\_k) - V\_k\right), & \text{if } \upsilon^{ZZ}(\Delta\_k) \ge V\_k\\ \frac{1}{\tau\_{\text{dec}}} \left(\upsilon^{ZZ}(\Delta\_k) - V\_k\right), & \text{if } \upsilon^{ZZ}(\Delta\_k) < V\_k \end{cases} \tag{18}$$

where the function *v*ZZ represents the equilibrium velocity all drivers tend to and depends on the gap Δ*k*. Parameters *τ*acc, *τ*dec > 0 are the relaxation times as usual, differentiated for the acceleration and the deceleration phase. Diversifying the relaxation times appeared to be crucial to fit real data.

The velocity function *v*ZZ is defined by

$$\upsilon^{ZZ}(\Delta) := \begin{cases} 0, & \text{if } \Delta \le \Delta\_{\text{close}} \\ \frac{V\_{\text{ff}}^{\text{max}}}{\Delta\_{\text{far}} - \Delta\_{\text{close}}} (\Delta - \Delta\_{\text{close}}), & \text{if } \Delta\_{\text{close}} < \Delta < \Delta\_{\text{far}} \\ V\_{\text{ff}}^{\text{max}}, & \text{if } \Delta \ge \Delta\_{\text{far}} \end{cases} \tag{19}$$

where Δclose, Δfar, *V*max <sup>H</sup> are positive parameters, see Figure 8.

**Figure 8.** The shape of the velocity function *v*ZZ(Δ) defined in (19).

The plateau in Δ ∈ [0, Δclose] is crucial for correctly reproducing stop and go waves. Indeed, once the relaxation times *τ*acc, *τ*dec are fixed, the capability of the model to trigger stop and go waves is ruled precisely by Δclose.

#### 3.2.2. Full Model

First of all, given the parameter *δ* > 0, we assume that cars located at *x* are influenced by a truck iff the distance between the truck and *x* is less than *δ*. We denote the number of trucks falling in the road interval [*x* − *δ*, *x* + *δ*) at any time *t* by

$$\mathcal{N}\_{\mathsf{H}}^{\delta}(\mathsf{x},t) := \# \{ k : \mathsf{X}\_{k}(t) \in [\mathsf{x} - \delta, \mathsf{x} + \delta) \}. \tag{20}$$

Second, we denote by *ρ*<sup>L</sup> the density of light vehicles and by *v*<sup>L</sup> their velocity. To couple the dynamics of the two classes, we assume that *v*<sup>L</sup> depends on both *ρ*<sup>L</sup> (as in the classical LWR model) and *N<sup>δ</sup>* H. Following standard assumptions, we assume that *v*<sup>L</sup> is decreasing with respect to both arguments.

Finally, we couple the dynamics of heavy vehicles with those of light vehicles. The interaction is obtained by introducing the dependence on *ρ*<sup>L</sup> in the parameters Δclose and Δfar. More precisely, we introduce the increasing functions Δclose = Δclose(*ρ*L) and Δfar = Δfar(*ρ*L), and we denote by *A*<sup>C</sup> = *A*<sup>C</sup>(*Xk*, *Xk*+1, *Vk*, *Vk*+1, *ρ*L) the coupled acceleration defined as *A* in (18) and (19), with the new dependence on *ρ*L.

We are now ready to present the fully coupled multi-scale model which reads as

$$\begin{cases} \begin{cases} \begin{aligned} \dot{X}\_k &= V\_k \\ \dot{V}\_k &= A^\complement (X\_{k'} X\_{k+1'}, V\_{k'} V\_{k+1'} \rho\_\perp) \end{aligned} & k = 1, \ldots, N - 1 \\ \begin{aligned} \partial\_t \rho\_\perp + \partial\_\mathcal{X} \left( \rho\_\perp v\_\perp (\rho\_\perp, N\_\mathcal{H}^\delta) \right) &= 0, \qquad \mathbf{x} \in \mathbb{R}, \quad t > 0. \end{aligned} \end{cases} \end{cases} \tag{21}$$

To be coherent with our modeling assumptions, the functions Δclose and Δfar are constant for car densities below the transition level, i.e., *<sup>ρ</sup>*<sup>L</sup> <sup>≤</sup> *<sup>ρ</sup>*max <sup>L</sup> /2. In this case, the dynamics of trucks is independent from those of cars. Conversely, for *ρ*<sup>L</sup> > *ρ*max <sup>L</sup> /2, we assume that the distances Δclose and Δfar increase linearly with respect to the average number of cars which are positioned between two trucks. This number can be easily computed considering the average number of cars in a road segment of length - (equal to *ρ*L-) and the number of trucks in the same road segment (assuming that all vehicles are uniformly distributed). We were unable to precisely calibrate the shape of the functions Δclose and Δfar from real data because it happens rarely that many cars are found between trucks: indeed, trucks tend to "push" cars into the fast lane rather than reacting to their presence.

#### *3.3. Extension of the Models to General Road Networks*

In order to perform a complete simulation on a generic network of highways, some important generalizations are needed.

#### 3.3.1. Any Number of Lanes

Highways often have more than two lanes. Consider a road with *n* lanes of which *n*<sup>H</sup> can be occupied by trucks. To allow the creeping phenomenon, we assume that *n*<sup>H</sup> < *n*, which corresponds to *ρ*max <sup>L</sup> -<sup>L</sup> <sup>−</sup> *<sup>ρ</sup>*max <sup>H</sup> -<sup>H</sup> > 0 in terms of space occupied, cf. (8).

In the macroscopic approach, the model is easy generalized. Fundamental diagrams are modified in such a way that trucks start interacting with cars when the density of cars becomes greater than *<sup>n</sup>*<sup>H</sup> *<sup>n</sup> <sup>ρ</sup>*max <sup>L</sup> .

In the microscopic model, instead, an important modification is needed if *n*<sup>H</sup> > 1. Indeed, in this case, trucks can overtake, and the microscopic model must be able to handle this. Typically, some new parameters are introduced in order to establish when a truck decides to overtake and if the truck can actually overtake, considering suitable safety constraints. From the computational point of view, additional difficulty arises when one has to find the truck in front of any other truck, since the ordering is lost whenever a truck overtakes. To make the search for the preceding vehicle computationally feasible, one can keep track, in a specific list, of all trucks located in each numerical cell and then update the list whenever a truck leaves or enters the cell.

#### 3.3.2. Junctions

In order to perform a full simulation on a network of highways, both theoretical and numerical treatments of junctions are needed. Typically, highways do not have roundabouts, traffic lights or complex junctions; therefore, we can limit ourselves to handle simple merging (2 incoming roads and 1 outgoing road) and diverging (1 incoming road and 2 outgoing roads). We adopted the approach detailed in [13], in which the dynamics reformulated along paths and junctions "disappears". The price to pay is that the number of equations is multiplied by the number of possible paths the drivers can follow at junctions. In both merging and diverging, we have only two possible paths: for example, in the case of diverging, one can choose among the first and second outgoing roads, while in merging, one can decide to come from the first or the second incoming road.

Following this approach in the macroscopic model, the densities of each class of vehicles are split around every junction, ending up with a system of four conservation laws (two paths for each of the two classes of vehicles) with a discontinuous flux. After the junctions, densities are gathered together again, and the two-equation system (9) is restored.

In the multi-scale model, instead, the path-based approach is applied only for car dynamics since managing trucks is much simpler. Indeed, in the microscopic model, one can just move vehicles from one road to another on the basis of their destination, see [37]. Unfortunately, the ordering of trucks is lost every time a change of road takes place. In order to reduce the computational effort needed for the computation of the preceding truck of every truck, the same solution proposed in Section 3.3.1 can be applied.

#### **4. Numerical Approximation and Calibration**

In this section, we describe how the models introduced above can actually be implemented. First, we briefly recall the numerical methods we have adopted, and then we describe how we used real data to set the models' parameters.

#### *4.1. Macroscopic Model*

For the numerical approximation of the macroscopic model (9), we employ the extension of the *cell transmission model* (CTM) to the heterogeneous multi-class model proposed in [14]. Let Δ*x* and Δ*t* be the space and time steps, respectively, and let (*ρn*,*<sup>i</sup>* <sup>L</sup> , *ρn*,*<sup>i</sup>* <sup>H</sup> ) be the traffic densities in the *i*th cell at the *n*th time step. The finite volume numerical scheme reads

$$\left\{ \begin{array}{c} \rho\_{\text{L}}^{\text{u+1},i} = \rho\_{\text{L}}^{\text{u,i}} + \frac{\Delta t}{\Delta \chi} \left( \mathcal{F}\_{\text{L}}^{\text{u,i}-1/2} - \mathcal{F}\_{\text{L}}^{\text{u,i}+1/2} \right) \end{array} \right. \tag{22a}$$

$$\left\{ \begin{array}{c} \rho\_{\rm H}^{n+1,i} = \rho\_{\rm H}^{n,i} + \frac{\Delta t}{\Delta x} \Big( \mathcal{F}\_{\rm H}^{n,i-1/2} - \mathcal{F}\_{\rm H}^{n,i+1/2} \Big) \right\} \tag{22b}$$

where

$$\mathcal{F}\_{\mathsf{L}}^{n,i+1/2} := \min \left\{ \mathcal{S}\_{\mathsf{L}}(\rho\_{\mathsf{L}}^{n,i}, \rho\_{\mathsf{H}}^{n,i}), \mathcal{R}\_{\mathsf{L}}(\rho\_{\mathsf{L}}^{n,i+1}, \rho\_{\mathsf{H}}^{n,i+1}) \right\},\tag{23}$$

$$\mathcal{F}\_{\rm H}^{n,i+1/2} := \min \left\{ \mathbb{S}\_{\rm H}(\rho\_{\rm L}^{n,i}, \rho\_{\rm H}^{n,i}), \mathbb{R}\_{\rm H}(\rho\_{\rm L}^{n,i+1}, \rho\_{\rm H}^{n,i+1}) \right\}\_{\prime} \tag{24}$$

and (*S*L, *R*L), (*S*H, *R*H) represent the sending and receiving functions of the two vehicle classes, respectively, defined by

$$\begin{split} S\_{\mathbb{L}}(\rho\_{\mathbb{L}\prime}\rho\_{\mathbb{H}}) &:= \begin{cases} f\_{\mathbb{L}}(\rho\_{\mathbb{L}\prime}\rho\_{\mathbb{H}}), & \text{if } \rho\_{\mathbb{L}} \le \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \\ f\_{\mathbb{L}}(\sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \rho\_{\mathbb{H}}), & \text{if } \rho\_{\mathbb{L}} > \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \end{cases} \\ R\_{\mathbb{L}}(\rho\_{\mathbb{L}\prime}\rho\_{\mathbb{H}}) &:= \begin{cases} f\_{\mathbb{L}}(\sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \rho\_{\mathbb{H}}), & \text{if } \rho\_{\mathbb{L}} \le \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \\ f\_{\mathbb{L}}(\rho\_{\mathbb{L}\prime}\rho\_{\mathbb{H}}), & \text{if } \rho\_{\mathbb{L}} > \sigma\_{\mathbb{L}}(\rho\_{\mathbb{H}}), \end{cases} \end{split} \tag{25}$$

and similarly for (*S*H, *R*H).

The numerical grid is chosen as Δ*x* = 100 m and Δ*t* = 2.6 s. The choice of the space step comes from the fact that the company Autovie Venete finds such granularity convenient for sharing traffic information to drivers, while the time step is dictated by the CFL condition.

Calibration of the fundamental diagrams was performed by fitting real data. We used all data measured in 2019 by one fixed sensor located near Cessalto, see Figures 9 and 10. Note that for high densities, the velocities drop rapidly to zero. Since we have no data for completely stationary vehicles under the sensor, we are not able to reconstruct data on high traffic density. For this reason, the maximal densities *ρ*max <sup>L</sup> and *ρ*max <sup>H</sup> are estimated by simply computing the ratio between the number of available lanes for the class and the average length of vehicles of that class, see Equation (6).

**Figure 9.** (**a**) Flux–density and (**b**) velocity–density relationships for cars with real data superimposed.

**Figure 10.** (**a**) Flux–density and (**b**) velocity–density relationships for trucks with real data superimposed.

Model parameters are summarized in Table 1. All functions which rule the dependence of *ρ*, *v*, *f* on the density of the other class are linear.


**Table 1.** Parameters for the macroscopic model.

#### *4.2. Multi-Scale Model*

For the numerical approximation of the macroscopic part of the multi-scale model (21), we again employ scheme (22a), where *N<sup>δ</sup>* <sup>H</sup> plays the role of *ρ*<sup>H</sup> in the obvious manner. The numerical grid is chosen as Δ*x* = 100 m and Δ*t* = 2 s.

The dependence of the flux on *N<sup>δ</sup>* <sup>H</sup> can generate some issues. For example, consider the case of no trucks and a car density *ρ*ˆL close to *ρ*max <sup>L</sup> . When a truck enters the road, the maximal density allowed in the cell occupied by the truck drops to *ρ*∗ L(*N<sup>δ</sup>* <sup>H</sup>) according to (10). Now, if *ρ*∗ L(*N<sup>δ</sup>* <sup>H</sup>) < *ρ*ˆL, the current density *ρ*ˆL is found not to be compatible with the new maximal density. Although the entering truck perceives the cars, it is not guaranteed that the compatibility with the maximal density is respected at any time. To avoid this problem, trucks must be prevented from entering cells if the new maximal density caused by the presence of the truck itself is not compatible with current traffic conditions.

For the numerical approximation of the microscopic part, we used a standard Euler scheme with a time step of *δt* = 0.1 s. Note that this time step is much smaller than the time step Δ*t* used for the Godunov scheme, meaning that the updates of the trucks and cars are asynchronous.

Regarding the parameters, the macroscopic part of the model is treated as in Section 4.1 (Table 1). For the microscopic model, some parameters are easily calibrated by using real data and considering physical constraints. For example, *V*max <sup>H</sup> was defined as in the macroscopic model. Δfar was set in order to guarantee that trucks do not collide even in the event that a truck suddenly brakes with full power until it stops (note that our model allows, in principle, collisions since deceleration is bounded). Δclose, instead, was set to a distance which guarantees catching the maximal observed density of trucks. In other words, when a queue of trucks is formed, the model predicts the correct maximal density.

Parameters *τ*acc and *τ*dec are, instead, more difficult to calibrate since they are not easily measurable. For those values, we considered a real stop and go wave observed by the company staff on 12 June 2017, generated by the slowdown of a truck near a bottleneck. The initial perturbation (slowdown) was amplified and, in a short time, generated a queue which propagated backwards. We run the microscopic model using real inflow data as left boundary conditions, and then we fitted the parameters in order to catch the real queue as measured in the field, see Figure 11.

The role of the parameters *τ*acc and *τ*dec is to adjust the points/times of the start and the end of the queue. We noted a strong sensitivity of the model to those parameters. As a consequence, it is quite difficult to catch the correct speed of the backward propagation of a queue when inertia comes into play. We summarize the values of the parameters in Table 2.


**Table 2.** Parameters for the microscopic model.

#### **5. Numerical Results**

In this section, we present the numerical results obtained with models (9) and (21).

#### *5.1. Macroscopic Model*

Here, we present three tests which highlight how the macroscopic model reproduces some interesting phenomena arising from the coupled dynamics of cars and trucks. In particular, we focus on the creeping phenomenon, the shared occupancy and the stop and go waves.

#### 5.1.1. Test 1A: Creeping

In this simple test, we observe the creeping phenomenon, see Figure 12. The simulation starts with a constant density (*ρ*L, *ρ*H)=(10, 13) veh/km all along the road. At the end of the road (right boundary), trucks are stopped by fixing their density at its maximum value *ρ*max <sup>H</sup> = 56: a queue of trucks propagates backward from the end of the road, while a constant flux of cars approaches the beginning of the queue. Once cars reach the trucks' queue, they have to slow down but do not stop completely. More precisely, the cars' velocity drops to 65 km/h. Note that the car density remains under the transition level, and then the dynamics is in the partial coupling phase all the time. Moreover, cars are always in the freeflow regime and then move at maximal speed, but the maximal speed changes as a function of the truck density.

**Figure 12.** Test 1A: (**a**) Density and (**b**) velocity of light and heavy vehicles as a function of space at final time. (**c**) Density of light and (**d**) heavy vehicles in space–time.

#### 5.1.2. Test 2A: Cars' Congestion Affects Truck Dynamics

In this test, we observe the effect of congestion of cars, see Figure 13. The simulation starts with a constant density (*ρ*L, *ρ*H)=(10, 8) veh/km all along the road. At the end of the road (right boundary), the density of cars is fixed to 186 veh/km to create the slowdown. The car density is larger than the transition level, so cars have to invade the slow lane. Trucks facing the car congestion slow down but do not just occupy the space left to them by cars; rather, they conquer some extra space, thus decreasing the car density. As a result, both cars and trucks proceed slowly without stopping, and the initial car congestion propagates backward with a density lower than the transition level.

**Figure 13.** Test 2A: (**a**) Density and (**b**) velocity of light and heavy vehicles as a function of space at final time. (**c**) Density of light and (**d**) heavy vehicles in space–time.

#### 5.1.3. Test 3A: Stop and Go Wave

In this test, we study the evolution of a small perturbation in the truck density, see Figure 14. At the initial time, the truck density is constant and equal to 12 veh/km, except for a small perturbation at the end of the road where the density is equal to 30 veh/km. The car density instead oscillates just above the transition level. It is plain that a single-class LWR model for trucks only would flatten the perturbation in a short time. Conversely, in this case, the coupling with car dynamics causes the perturbation to propagate backward without vanishing. This second-order-type effect is obtained thanks to the fact that the fundamental diagram of trucks is continuously modified by the oscillating car density.

**Figure 14.** *Cont*.

**Figure 14.** Test 3A: (**a**) Density and (**b**) velocity of light and heavy vehicles as a function of space at *t* = Δ*t* (i.e., just after the initial time). (**c**) Density of light and (**d**) heavy vehicles in space–time. The evolution of the initial perturbation in the truck density starting at 9 km is perfectly visible, which creates, in turn, a perturbation in the car density.

#### *5.2. Multi-Scale Model*

Here, we replicate, with the multi-scale model, the first two scenarios already investigated in Section 5.1. The third scenario was already considered in Figure 11, where the second-order microscopic model is able to reproduce stop and go waves alone, without the need to couple car dynamics. Finally, we consider the case of a merge.

#### 5.2.1. Test 1B: Creeping Effect

Similar to Test 1A in Section 5.1.1, here, one truck stops completely and creates a long queue of trucks behind, which saturates the slow lane. When cars reach the truck queue, they all move to the fast lane staying at the (new, reduced) maximal velocity of 65 km/h, see Figure 15.

**Figure 15.** Test 1B: (**a**) Trajectories of trucks in space–time (for visualization purposes, not all trucks are actually plotted). When the first truck stops, a queue is formed behind. (**b**) Car density, car velocity and car maximal density given the number of trucks at final time. Creeping is visible between 7 and 9 km.

#### 5.2.2. Test 2B: Cars' Congestion Affects Truck Dynamics

Similar to Test 2A in Section 5.1.2, congestion of cars at the end of the road slows down trucks, see Figure 16. The results are similar to those obtained by the macroscopic model, but here, trucks stop completely, forming a queue.

**Figure 16.** Test 2B: (**a**) Trajectories of trucks in space–time (for visualization purposes, not all trucks are actually plotted). They stop for a while and then accelerate. (**b**) Car density, car velocity and car maximal density given the number of trucks at final time.

#### 5.2.3. Test 3B: Merge

In this test, we consider a merge (two incoming roads and one outgoing road). At time *t* = 0, the three roads are empty. A constant inflow of trucks (one every 4 s) comes from the left boundary of both incoming roads, while a constant density of cars (*ρ*<sup>L</sup> = 32) is imposed as a Dirichlet left boundary condition on the second incoming road only. The first incoming road has no cars. When trucks reach the junction and merge, they suddenly break and rapidly form a queue which propagates backward along both incoming roads, see Figure 17a,b.

**Figure 17.** Test 3B: (**a**–**c**) Trajectories of trucks in space–time on the first incoming road, second incoming road and outgoing road, respectively (for visualization purposes, not all trucks are actually plotted). (**d**,**e**) Car density on second incoming road and outgoing road, respectively.

Queues are not identical due to the presence of cars along the second incoming road. One can note that when the trucks downstream of the queue start moving again, their flux is not maximal: indeed, if the flow were maximum, a queue at the junction would

immediately reform as it happened in the first place. This is the well-known *capacity drop* phenomenon, ruled by *τ*acc, cf. [38]. As a consequence, trucks are able to cross the junction without spillback. Cars, instead, move at the maximal flux until they encounter the truck queue. The queue acts as a moving bottleneck and drops the road capacity; therefore, the car traffic immediately enters the congested state, and the density increases. Downstream, the density remains in the freeflow state, and cars cross the junction without spillback, see Figure 17d,e.

#### **6. Conclusions and Future Work**

In this paper, we presented two models for two-class traffic flow. Although the models are tailored for a specific case study, they are sufficiently general to be useful in other motorways. Moreover, both models can be easily generalized to more than two classes of vehicles and a different ratio between the number of lanes used by trucks and the number of lanes used by cars.

We have shown that the models are able to reproduce, both qualitatively and quantitatively, some notable traffic phenomena arising from the interactions of the two classes. Interestingly, the macroscopic model, although purely first-order, is able to reproduce stop and go waves thanks to the coupling of the two classes.

After this preliminary analysis, it is possible to sketch some conclusions about the advantages and drawbacks of the two models: The multi-scale model has a greater potential since the second-order microscopic part makes it more realistic and then suitable for quantitative predictions. Nevertheless, the macroscopic model appears to be simpler and more manageable, thus representing a valid alternative if one wants to avoid tracking all single vehicles, especially for saving computational time.

In conclusion, we believe that both the proposed models represent the best compromise between accuracy and implementability. In fact, decoupling the dynamics of different classes excessively simplifies the problem description and does not allow obtaining an accurate forecast; conversely, moving to second-order macroscopic models or including multi-lane features in the models notably increases the complexity of the code as well as the number of parameters to be tuned. These generalizations would allow, in principle, easily catching inertia-based phenomena in all classes of vehicles and tracking the density of *each class* of vehicle in *each lane*, but, in our opinion, they make that model unfeasible for practical applications.

In the future, we plan to improve the models including the possibility that they are fed by both Lagrangian (GPS-like) and Eulerian data coming from mobile and fixed sensors, respectively, cf. [39]. Moreover, we plan to estimate, in real time, the difference between predicted and measured densities using the machinery developed in [13], hopefully creating an algorithm for the auto-calibration of the models in real time.

**Author Contributions:** Conceptualization, M.B. and E.C.; data curation, P.R.; funding acquisition, M.B., E.C. and P.R.; investigation, M.B., E.C. and P.R.; methodology, M.B. and E.C.; visualization, M.B. and E.C.; writing—original draft, M.B. and E.C.; writing—review and editing, M.B., E.C. and P.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially funded by the company Autovie Venete S.p.A. This work was also carried out within the research project "SMARTOUR: Intelligent Platform for Tourism" (No. SCN\_00166) funded by the Ministry of University and Research with the Regional Development Fund of European Union (PON Research and Competitiveness 2007–2013). The authors also acknowledge the Italian Minister of Instruction, University and Research for supporting this research with funds coming from the project entitled *Innovative numerical methods for evolutionary partial differential equations and applications* (PRIN Project 2017, No. 2017KKJP4X). M.B. and E.C. are members of the INdAM Research group GNCS.

**Data Availability Statement:** Data are not publicly available.

**Acknowledgments:** The authors want to thank all the Autovie Venete staff as well as Gabriella Bretti, Matteo Piu, Elisa Iacomini, Caterina Balzotti and Elia Onofri for valuable help.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **An Information-Theoretic Framework for Optimal Design: Analysis of Protocols for Estimating Soft Tissue Parameters in Biaxial Experiments**

**Ankush Aggarwal 1,†, Damiano Lombardi 2,3,† and Sanjay Pant 4,\*,†**


**Abstract:** A new framework for optimal design based on the information-theoretic measures of mutual information, conditional mutual information and their combination is proposed. The framework is tested on the analysis of protocols—a combination of angles along which strain measurements can be acquired—in a biaxial experiment of soft tissues for the estimation of hyperelastic constitutive model parameters. The proposed framework considers the information gain about the parameters from the experiment as the key criterion to be maximised, which can be directly used for optimal design. Information gain is computed through *k*-nearest neighbour algorithms applied to the joint samples of the parameters and measurements produced by the forward and observation models. For biaxial experiments, the results show that low angles have a relatively low information content compared to high angles. The results also show that a smaller number of angles with suitably chosen combinations can result in higher information gains when compared to a larger number of angles which are poorly combined. Finally, it is shown that the proposed framework is consistent with classical approaches, particularly D-optimal design.

**Keywords:** optimal design; soft tissue mechanics; mutual information; biaxial experiment; inverse problems; information theory

**MSC:** 62K05; 94A15; 92C10

#### **1. Introduction**

Soft tissues exhibit complex biomechanical behaviour, including nonlinearity, anisotropy and heterogeneity [1]. Moreover, the tissues also demonstrate inelastic properties, such as rate-dependence, hysteresis and permanent set. The important link between biomechanics and their physiological function has motivated a large number of ex-vivo studies aimed at characterising their biomechanical properties. Given the complex interplay between the different aspects of their biomechanical properties, the experimental design of ex-vivo soft tissues is extremely challenging and has been a subject of investigation, and a variety of experiments have been proposed [2–6].

Since a variety of soft tissues are thin—e.g., blood vessels, heart valves and skin biaxial testing is a widely used experimental technique that allows the independent stretching of the tissue in two orthogonal directions and for the corresponding forces to be measured [7,8]. Applying different stretches in two directions allows the characterization of the in-plane anisotropic behavior of a given tissue, while a range of stretches provides us with its nonlinear elastic response. However, even with this relatively simple set of options,

**Citation:** Aggarwal, A.; Lombardi, D.; Pant, S. An Information-Theoretic Framework for Optimal Design: Analysis of Protocols for Estimating Soft Tissue Parameters in Biaxial Experiments. *Axioms* **2021**, *10*, 79. https://doi.org/10.3390/ axioms10020079

Academic Editor: Gabriella Bretti

Received: 31 March 2021 Accepted: 22 April 2021 Published: 1 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the choices of which stretches to apply are unclear. Moreover, it is not obvious upon what these choices will depend.

A variety of hyperelastic models have been developed to describe the anisotropic and nonlinear elastic properties of specific soft tissues [4,9–11]. Biaxial experimental data are commonly fit to these models in order to determine the model parameters. As the unknown parameters depend on specific models, the choice of experimental setup—the problem of optimal design—might depend on the choice of model. However, in practice, a predetermined set of experimental protocols is used.

In the present work, an optimal design problem is defined to find the most suitable protocol in view of estimating the parameters of the material model. A comprehensive overview of the optimal design problem can be found in [12,13], and several criteria for optimal design have been proposed in the literature, often based on the minimisation of the variance of the parameters and sensitivities [14,15]. In the present work, we investigate a criterion based on information theoretic quantities, in the spirit of what has been proposed in [16,17] (from a Bayesian point of view) and [18]. Several works have recently proposed information-based criteria to better define experimental protocols. In [19], the authors proposed the maximisation of the mutual information between the parameters and the observations under the assumption that the model error is a Gaussian process. In [20], the authors proposed a framework based on mutual information maximisation to deal with the design of chemistry experiments. The same criterion is proposed in [21]; the authors maximise the mutual information by using a stochastic gradient ascent method. An application to system biology is investigated in [22]. In [23], the maximisation of the information is exploited in order to choose high-fidelity model resolutions in a multi-fidelity modelling framework.

While mutual information has been used for optimal design in previous studies, the novelty of this work is in the proposal of a combination of information-theoretic quantities of both mutual and conditional mutual information. A further novelty is the application of this framework in the optimal design of soft tissue experiments. Estimating informationtheoretic quantities is in general a challenging problem, and this is especially the case in high-dimensional settings. In the present work, a model reduction method is coupled with non-parametric sample-based mutual information estimation in order to provide a pertinent estimation of the information-theoretic quantities involved in the optimal design problem and then apply this to the biaxial testing of soft tissues.

The structure of the work is as follows: in Section 2, the model and informationtheoretical aspects of the problem are introduced. In particular, in Section 2.1, we detail the mathematical model of the biaxial experiments for soft tissues: after having introduced the notation and the non-linear elasticity model, in Section 2.1.1, we apply it to the biaxial testing experimental setup. In Section 2.1.2, we introduce the experimental protocol definition; the second part of the section is devoted to the description of the information-theoretic framework used to solve the optimal design problem. In Section 2.2.1, we introduce the problem; in Sections 2.2.2 and 2.2.3, the information-theoretic quantities and their numerical estimation are detailed. We then present the reduce order modeling method used and how to validate the results obtained by the proposed approach. The section ends with an overview of the method. The results and the discussion are presented in Section 3, followed by the conclusion and perspectives on future work.

#### **2. Methods**

The methodological aspects are divided into two broad categories: the mathematical model of the biaxial experiments and the information-theoretic optimal design framework.

#### *2.1. Mathematical Model of the Biaxial Experiments*

We begin by defining the notation: a material point at its reference position *<sup>X</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> moves to *<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>3</sup> after deformation. The elastic behaviour of soft tissues is described using the hyperelastic strain energy density Ψ, which depends on the deformation gradient tensor **F** = ∇*<sup>X</sup> x*. The ratio of the volume after deformation to that before deformation is given by *J* = det(**F**). Soft tissues are commonly regarded as incompressible due to their high water content; i.e. *J* is constrained to be unity.

We consider the hyperelastic model proposed by Gasser et al. [24], which defines the strain energy density as

$$\Psi = \frac{k\_1}{2k\_2} \left[ e^{k\_2(\kappa I\_1 + (1 - 3\kappa)I\_4 - 1)} - 1 \right] + \mu (I\_1 - 3),\tag{1}$$

where *I*<sup>1</sup> = tr(**FF**) is the first invariant of the right Cauchy–Green strain tensor **C** = **FF** and *I*<sup>4</sup> = *M* · **C***M* is the fourth invariant representing the stretch along fiber direction *M*. The resulting Cauchy stress is given by

$$
\sigma = 2\mathbf{F} \cdot \frac{\partial \Psi}{\partial \mathbf{C}} \cdot \mathbf{F}^\top - p\mathbf{I}\_\prime \tag{2}
$$

where *p* acts as the Lagrange multiplier to enforce incompressibility and **I** is the identity matrix.

For this model, the set of unknown parameters can be written as {*k*1, *k*2, *κ*, *μ*}, assuming that the fiber direction *M* is known a priori (based on another experiment; e.g., light scattering [6]). *κ* represents the dispersion of collagen fibers, which is usually measured from optical experiments. Its value lies between 0 (perfectly anisotropic) and 1/3 (perfectly isotropic). The value of *μ* corresponds to the shear modulus of the neo-Hookean term in (1), which represents the amorphous and non-fibrous extracellular matrix. Its role in the mechanics of soft tissues is limited to small strains and is largely constant across different tissues. In this paper, in order to simplify the problem, we assume that *κ* = 0.1 and *μ* = 1 kPa are known and fixed. Thus, the aim of an ex-vivo biomechanical experiment is to determine parameters *k*<sup>1</sup> ∈ [5, 100] kPa and *k*<sup>2</sup> ∈ [5, 80] robustly and with high confidence [25,26]. A commonly used experiment called biaxial testing is described bellow.

#### 2.1.1. Biaxial Experiments for Soft-Tissues

Many of the soft tissue types are planar with a small thickness. In a biaxial experiment, a square-shaped tissue sample is mounted via clamps or rakes and stretched along two orthogonal directions aligned with the sample edges (Figure 1a). If these directions are used as the two coordinate axes and incompressibility is assumed, the stretching results in a diagonal deformation gradient tensor:

$$\mathbf{F} = \text{diag}\left[\lambda\_1, \lambda\_2, \frac{1}{\lambda\_1 \lambda\_2}\right],\tag{3}$$

where *λ*<sup>1</sup> is the stretch along the first in-plane direction and *λ*<sup>2</sup> is the stretch along the second in-plane direction. The fiber direction *M* is generally aligned with the first coordinate axis, which results in only normal stress components. As no force is applied along the thickness of the tissue, *σ*<sup>33</sup> = 0 is used to determine the Lagrange multiplier *p*. Thus, we obtain

$$
\sigma\_{11} = 2 \frac{\partial \Psi}{\partial I\_1} \left[ \lambda\_1^2 - \frac{1}{\lambda\_1^2 \lambda\_2^2} \right] + 2 \frac{\partial \Psi}{\partial I\_4} \lambda\_1^2 \tag{4}
$$

$$
\sigma\_{22} = 2 \frac{\partial \Psi}{\partial I\_1} \left[ \lambda\_2^2 - \frac{1}{\lambda\_1^2 \lambda\_2^2} \right]. \tag{5}
$$

The applied stresses *σ*11, *σ*<sup>22</sup> are controlled using load cells. The resulting strains, defined as *e*<sup>1</sup> := *λ*<sup>1</sup> − 1 and *e*<sup>2</sup> := *λ*<sup>2</sup> − 1, are measured from the marker positions (although *e*<sup>1</sup> and *e*<sup>2</sup> are not the usual strain measures, we use these as our observations). It is important to note that a homogeneous stress and strain state is assumed in the middle of the sample (Figure 1a). Therefore, an implicit assumption is that the material properties and sample thickness are homogeneous. Moreover, these measurement techniques carry an error

due to the limitations in measurement tools and/or the deviation from homogeneity, incompressibliity and material direction.

**Figure 1.** (**a**) A schematic of a biaxial experimental setup in which a thin planar tissue sample (in light gray) is mounted via rakes and two orthogonal forces are applied to induce stresses *σ*<sup>11</sup> and *σ*22, and the resulting strains are measured by tracking the locations of the markers (in dark gray). (**b**) The *σ*<sup>11</sup> − *σ*<sup>22</sup> space, where the applied stresses lie on the dotted line with a finite number of protocol angles *φ* used.

#### 2.1.2. Protocol Definition

In practice, there are two approaches to the biaxial experiment: (1) displacementcontrolled, where known stretches are imposed and forces are measured; and (2) forcecontrolled, where known forces are applied and stretches are measured. Generally, the force-controlled approach is used as it is easier to implement. Therefore, in the forcecontrolled approach, different values of stresses *σ*<sup>11</sup> and *σ*<sup>22</sup> can be applied.

A single-angle biaxial protocol is defined as a straight line in the *σ*11-*σ*<sup>22</sup> space (Figure 1b). That is, the ratio between the two stresses is kept constant while the applied forces are increased until a maximum value *σ*max = 200 kPa. Thus, for a chosen angle *φ*, we apply

$$
\sigma\_{11} = \begin{cases}
\sigma & \text{if } \phi \le \frac{\pi}{4} \\
\tan(\phi)\sigma & \text{else}
\end{cases}
\tag{6}
$$

$$
\sigma\_{22} = \begin{cases}
\cot(\phi)\sigma & \text{if } \phi \le \frac{\pi}{4} \\
\sigma & \text{else}
\end{cases},
\tag{7}
$$

where *<sup>σ</sup>* <sup>∈</sup> [0, *<sup>σ</sup>*max]. For *<sup>σ</sup>*, 100 linearly spaced observation points between zero and the maximum stress (*σ*max = 200 kPa) are used. The resulting strains are calculated by iteratively solving Equation (5) for *λ*1,2 and thereby obtaining *e*1,2. In practice, a combination of angles can be successively tested. We refer to this combination as the experimental protocol that needs to be optimally designed.

For each angle, it is easy to acquire large numbers of points as the sample is continuously stretched. However, to vary between angles, it is essential to restart the experiment at zero applied force, which further requires the "pre-conditioning" of the sample by cyclically applying small stretches. This makes it practically difficult to apply an arbitrarily large number of angles. Therefore, in practice, usually only five angles are tested.

#### *2.2. Information-Theoretic Framework for Optimal Design*

The problem of optimal design typically refers to the choice of a design of experiments such that the design is optimal with respect to a pre-determined statistical criterion. We propose that the information-theoretic measures naturally define such statistical criteria. The central idea is that information gain [27,28] from an experiment or protocol—as quantified by the information-theoretic quantities of mutual information and conditional mutual information—can be directly used as a reasonable statistical criterion for optimal design. These quantities are described next after presenting the framework for optimal design.

2.2.1. Optimal Design Problem

Consider the following general model:

$$\mathbf{y} = \mathcal{M}(\boldsymbol{\theta}),\tag{8}$$

where <sup>M</sup> denotes a forward model that takes *<sup>θ</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* and outputs **<sup>y</sup>** <sup>∈</sup> <sup>R</sup>*n*. Note that *<sup>θ</sup>* may contain initial and boundary conditions of the model and that **y** may subsume the output at many time-points in the case of a dynamic system. Subsequently, consider that the measurement model is as follows:

$$\mathbf{z} = \mathcal{H}\_{\mathbb{P}}(\mathbf{y}, \boldsymbol{\theta}) + \boldsymbol{\varepsilon}, \tag{9}$$

where <sup>H</sup>*<sup>p</sup>* represents the observation operator, **<sup>z</sup>** <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* represents the measurement vector, and *ε* represents the vector of measurement error/noise. Note that the the observation operator H*<sup>p</sup>* depends on the design of experiments, which specifies which quantities are measured. Given a set of possible H*<sup>p</sup>* = {H1, H2, ··· , H*h*}, and a statistical criterion S(H*p*) to be maximised, the optimal design is given by

$$\hat{\mathcal{H}}\_p = \underset{\mathcal{H}\_p}{\text{arg}\,\text{max}}\,\mathcal{S}(\mathcal{H}\_p). \tag{10}$$

In the case of the biaxial experiments, the model M represents the model for the force controlled experiment (Sections 2.1.1 and 2.1.2) and H*<sup>p</sup>* essentially denotes the experimental protocol (see Section 2.1.2) representing the combination of angles—with each representing a straight line in the *σ*11–*σ*<sup>22</sup> plane—along which the strain measurements of *e*<sup>1</sup> and *e*<sup>2</sup> are acquired. With the possible variation of each angle between 0 and *π*/2, the set Φ of possible angles *φ* is constructed through a uniform discretisation of the space between 0 and *π*/2 into *α* levels; thus,

$$\Phi = \{\phi\_0, \phi\_{1\prime}, \dots, \phi\_n\}.\tag{11}$$

The possible set of protocols is then given by any combination of elements in Φ with the restriction that the number of elements in a protocol must be limited to C. Thus, if Φ ⊂ Φ is a subset of angles representing a protocol, our set of protocols is given by

$$\mathcal{H}\_{\mathcal{P}} = \{ \overleftarrow{\Phi} \subset \Phi \mid 1 \le |\overleftarrow{\Phi}| \le \mathcal{C} \},$$

where |·| represents the number of elements in the set. In other words, we choose at least 1 and up to C elements from Φ, with the total elements in H*<sup>p</sup>* being

$$|\mathcal{H}\_{\mathcal{P}}| = \binom{\alpha}{1} + \binom{\alpha}{2} \cdot \dots + \binom{\alpha}{\mathcal{C}}.\tag{13}$$

2.2.2. Information-Theoretic Quantities for Optimal Design

In the framework of Section 2.2.1, we propose that information-theoretic quantities of mutual information and conditional mutual information are a natural choice for the statistical criterion S. Denoting the random variables associated with *θ* and **z** as **Θ** and **Z**, respectively, the mutual information (MI) between the parameters **Θ** and the measurements **Z** is defined as [27]

$$\mathcal{Z}(\boldsymbol{\Theta}; \mathbf{Z}) = \int\_{\mathcal{X}\_{\mathbf{\Theta}} \times \mathcal{X}\_{\mathbf{\mathcal{Z}}}} p\_{\boldsymbol{\Theta}, \mathbf{Z}}(\boldsymbol{\theta}, \mathbf{z}) \frac{p\_{\boldsymbol{\Theta}, \mathbf{Z}}(\boldsymbol{\theta}, \mathbf{z})}{p\_{\boldsymbol{\Theta}}(\boldsymbol{\theta}) p\_{\mathbf{Z}}(\mathbf{z})} d\boldsymbol{\theta} d\mathbf{z}, \tag{14}$$

where *pX*(*x*) represents the probability density of a random variable *X* with a realisation *X* = *x* and support X*X*. The mutual information I(**Θ**; **Z**) quantifies the amount of information that can be gained on average by one random variable—e.g., **Z**—knowing about the other—e.g., **Θ**. Indeed, with this interpretation, MI is a good candidate for the statistical criterion S for optimal design. For an individual parameter, Θ*i*, or indeed for any combination of parameters {Θ*i*, Θ*j*}, the corresponding information gains can be similarly computed through I(Θ*i*; **Z**) and I({Θ*i*, Θ*j*}; **Z**), respectively. Thus, while I(Θ*i*; **Z**) quantifies the information gain individually for the parameter Θ*i*, the quantity I({Θ*i*, Θ*j*}; **Z**) quantifies information gain for the pair {Θ*i*, Θ*j*} jointly. A measure of correlation between the parameters Θ*<sup>i</sup>* and Θ*<sup>j</sup>* is, however, missing and is provided by conditional mutual information (CMI), defined as

$$\underbrace{\mathcal{I}(\boldsymbol{\Theta}\_{i};\boldsymbol{\Theta}\_{j}|\mathbf{Z})}\_{\mathbf{I}} = \underbrace{\mathcal{I}(\boldsymbol{\Theta}\_{i};\{\boldsymbol{\Theta}\_{j},\mathbf{Z}\})}\_{\mathbf{II}} - \underbrace{\mathcal{I}(\boldsymbol{\Theta}\_{i};\mathbf{Z})}\_{\mathbf{III}}.\tag{15}$$

The CMI I(Θ*i*; Θ*j*|**Z**) represents the additional information gained about the parameter Θ*<sup>i</sup>* when both Θ*<sup>j</sup>* and *Z* are known (term II) relative to when only the measurements Θ*<sup>i</sup>* alone are known (term III). Note that CMI is symmetrical—i.e., I(Θ*i*; Θ*j*|**Z**) = I(Θ*j*; Θ*i*|**Z**) and can be interpreted as a measure of dependence between the parameters given the measurements **Z**. It should also be noted that both MI and CMI are non-negative.

With the above background, many statistical measures can be constructed. For example:


$$\mathcal{S} = \sum\_{i=1}^{m} \mathcal{Z}(\Theta\_{i}; \mathbf{Z}) - \tau \sum\_{i=1}^{m} \sum\_{j=i}^{m} \mathcal{Z}(\Theta\_{i}; \Theta\_{j}|\mathbf{Z}), \tag{16}$$

where *τ* > 0 is a regularisation parameter. Note that high CMI implies that a large amount of information can be gained only about a combination of the two parameters (for instance, their sum or product), but not for each parameter individually. Thus, we seek to minimise the CMI.

Note that the above list is not exhaustive, and based on the interpretations of MI and CMI, other criteria may be constructed based on the desired sense of optimality.

#### 2.2.3. Estimating Mutual Information

In general, the forward model in Equation (8) is non-linear, and thus even if the observation operator is linear (implying linear combinations of the state are measured), the analytical computation of mutual information is intractable. Thus, the informationtheoretic quantities of MI and CMI must be estimated. A common method is to generate samples of **Θ** through the specification of an appropriate prior probability density *p***Θ**(*θ*). Denoting these *Ns* samples as *<sup>θ</sup>*(*i*), *<sup>i</sup>* <sup>=</sup> {1, 2, ··· *Ns*}, each *<sup>θ</sup>*(*i*) can be propagated through the forward and observation models of Equations (8) and (9) to produce corresponding

samples of **Z**, denoted as **z**(*i*). The samples of *θ*(*i*) and **z**(*i*) can subsequently be used on non-parametric estimators of MI and CMI. Such non-parametric estimators can broadly be classified into two categories: kernel density estimators (KDE) [29] and *k*-nearest neighbour (kNN) estimators [30,31]. For an overview of such methods, we refer to [32]. While the estimator proposed by Kraskov et. al. [30] is widely used and performs very well across a range of scenarios, one of its drawbacks is that it suffers from higher errors when extreme correlations are present between the variables and/or when the the data are effectively in a lower-dimensional manifold. Since we are working with models that specify explicit relationships between the variables through the forward and observation model, this is likely to be true for the data set of (*θ*(*i*) , **z**(*i*)). Thus, in this study, we employ the local non-uniformity correction (LNC) proposed in [33], which includes a correction term to the original estimator by Kraskov et al. [30]. This term accounts for strong dependencies between the variables through local principle component analysis [33]. The method of [33] is used for the estimation of all MIs, and CMIs are estimated from the difference of two MIs; see Equation (15).

#### 2.2.4. Dimensionality Reduction for the Biaxial Experiment

One of the main difficulties in estimating information-theoretic quantities is related to the data dimension. Non-parametric estimation is particularly challenging whenever the data are close to manifolds embedded in high-dimensional spaces. This is indeed the case when a physical model relates parameters and observable quantities. One of the possible ways to overcome this difficulty, or at least to mitigate it, is (dimension or) model reduction, which aims at discovering the underlying low-dimensional structure of a set of data (a comprehensive review of the topic can be found in [34–37]). A large spectrum of methods has been proposed in the literature. In the present contribution, we adopt a local reduced-basis method (similar in spirit to the methods proposed in [38,39]). Let the strains computed by the model be *e*1,2(*σ*; *φ*; *k*1, *k*2), where *k*<sup>1</sup> and *k*<sup>2</sup> are the model parameters (*k*1, *<sup>k</sup>*2) <sup>∈</sup> <sup>Ω</sup>*<sup>k</sup>* <sup>⊂</sup> <sup>R</sup>2, and *<sup>σ</sup>* <sup>∈</sup> <sup>Ω</sup>*<sup>σ</sup>* <sup>⊂</sup> <sup>R</sup> is the variable defined in Section 2.1.2. Let *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>∗; thus, we introduce the following approximation:

$$\varepsilon\_{1,2} \approx \sum\_{i=1}^{n} \eta\_i r\_i(\sigma, \phi) s\_i(k\_1, k\_2, \phi),\tag{17}$$

which is well defined by virtue of the Eckart–Young theorem. First, let us observe that a given protocol consists of a set of known angles Φ. An efficient way to construct the local reduced basis is therefore to introduce a Proper Orthogonal Decomposition (POD) for each of the angles *φ<sup>j</sup>* ∈ Φ. This corresponds to the search for an approximation of the form

$$
\sigma\_{1,2}^{(j)}(\sigma; \phi\_{\hat{j}}; k\_1, k\_2) \approx \sum\_{i=1}^{n} \eta\_i^{(j)} r\_i^{(j)}(\sigma) s\_i^{(j)}(k\_1, k\_2), \tag{18}
$$

where *r* (*j*) *<sup>i</sup>* ,*r* (*j*) *<sup>k</sup>* Ω*<sup>σ</sup>* = *δik* and *s* (*j*) *<sup>i</sup>* ,*s* (*j*) *<sup>k</sup>* Ω*<sup>k</sup>* <sup>=</sup> *<sup>δ</sup>ik* (·, ·Ω*σ*,*with*Ω*<sup>k</sup>* being the standard *<sup>L</sup>*<sup>2</sup> scalar product). The error in the approximation is related to the number *n* of modes retained:

$$\|\mathfrak{e}\_{1,2}^{(j)} - \sum\_{i=1}^{n} \eta\_i^{(j)} r\_i^{(j)}(\sigma) \mathbf{s}\_i^{(j)}(k\_1, k\_2)\|\_{L^2(\Omega\_\nu \times \Omega\_k)}^2 = \sum\_{i=n+1}^{\infty} \eta\_i^{(j)^2} \tag{19}$$

In the present work, a number *n* = 4 of modes proved to be sufficient in order to obtain errors smaller than 10−<sup>3</sup> in *L*<sup>2</sup> norm in the solution reconstruction. This means that the set of elements *e*1,2(*σ*; *φj*; *k*1, *k*2) was close to the linear subspace spanned by the first *n* = 4 modes *r* (*j*) *<sup>i</sup>* . Henceforth, instead of considering the discretised *e*1,2 we consider their coordinates in the subspace given by

$$
\langle z\_{1,2}^{(j)} \rangle\_i = \langle e\_{1,2}, r\_i^{(j)} \rangle\_{\Omega\_{\mathcal{V}}} = \eta\_i^{(j)} s\_i^{(j)}(k\_1, k\_2). \tag{20}
$$

#### 2.2.5. Validation of Results against Existing Methods

Several methods and criteria to define and reach an optimal design of experiments have been proposed [12]. Among them, D–optimality criterion attempts at maximising the determinant of the information matrix. In the present case, this is equivalent to minimize the determinant of the inverse of the average Hessian of the loss function we would introduce in a classical parameter estimation method. In a noisy setting, and, in particular, when the noise is Gaussian, this cost function is equivalent to minus the logarithm of the likelihood function. Let the misfit function be *f*(*θ*) and E<sup>Θ</sup> denote the expectation operator. The average of the Hessian reads:

$$H = \mathbb{E}\_{\Theta}[\hat{\sigma}\_{\theta}^{2} f |\_{\theta\_{\*}}]\_{\prime} \tag{21}$$

where *θ*<sup>∗</sup> is the value of the parameter minimising the loss function.

#### 2.2.6. Overview of Approach for the Biaxial Experiments

In the context of the biaxial experiments, the parameters are *k*<sup>1</sup> and *k*2, represented as random variables *K*<sup>1</sup> and *K*2, respectively. The variability in these parameters is considered to be uniform (thus imposing a uniform prior distribution) in the following intervals: *k*<sup>1</sup> ∈ [5, 100] kPa and *k*<sup>2</sup> ∈ [5, 80]. For a single value of angle *φ*, the measurements are the strain values *e*<sup>1</sup> and *e*<sup>2</sup> and are measured at 100 points along the line defined by the angle *φ*. Here, we consider *α* = 16 discrete values of possible measurement angles *φ* uniformly distributed between, and including, 0◦ and 90◦. For each angle *φ*, separate reduced bases of four modes for *e*<sup>1</sup> and *e*<sup>2</sup> are constructed through POD over 400 values of (*K*1, *K*2) sampled uniformly in the aforementioned parametric space. Thus, for any angle *φ*, the dimensionality reduction approach projects *e*<sup>1</sup> and *e*<sup>2</sup> measured at 100 points along the line defined by *φ* to a basis of4+4 modes. For a given protocol consisting of multiple angles, the measurement vector **z** (with a corresponding random variable **Z**) is the collection of all the reduced basis representations of *e*<sup>1</sup> and *e*<sup>2</sup> along the angles in the protocol. Lastly, the maximum number of angles in a protocol is restricted to C = 5, giving a total of 6884 unique combinations of the *α* = 16 angles.

For the estimation of MI and CMI, a total of *N* =10, 000 values of (*K*1, *K*2) are uniformly distributed in the parametric space. For each sample (*k* (*i*) <sup>1</sup> , *k* (*i*) <sup>2</sup> ), the numerical model of the biaxial experiment is run to produce *e* (*i*) <sup>1</sup> and *e* (*i*) <sup>2</sup> , which are then projected on to the reduced basis, giving **z**(*i*). The *N* triplets of (*k* (*i*) <sup>1</sup> , *k* (*i*) <sup>2</sup> , **<sup>z</sup>**(*i*)) are subsequently used for the estimation of MI and CMI through the LNC estimator (see Section 2.2.3). In Equation (16), we use *τ* = 1.

#### **3. Results and Discussion**

For all the 6884 combinations of angles, three statistical criteria are evaluated: (i) I(*K*1; **Z**), (ii) I(*K*2; **Z**) and (iii) I(*K*1; **Z**) + I(*K*2; **Z**) − I(*K*1; *K*2|**Z**). While the first two criteria aim to maximise the information gain about *K*<sup>1</sup> and *K*<sup>2</sup> individually, the third criterion aims to maximise the information gain about *K*<sup>1</sup> and *K*<sup>2</sup> simultaneously while minimising the information dependence between them. Figures 2–4 show the variation in these three criteria when grouped by the number of angles in a protocol. In these figures, the values of information criterion when using two approaches to uniformly discretise the angular space within protocols are also presented. Observations from these plots are as follows:


most cases, the performance of uniform discretisation is close to the mean information gain observed across all the angle combinations;


**Figure 2.** The variation of information criterion S = I(*K*1; **Z**) across the 6884 combinations grouped by the number of angles in a protocol. The vertical lines represent the variation around the mean value, which is shown in black circles. Black text shows the combinations that produce maximum and minimum values of S. The red and blue pointers show S for angle combinations that follow a uniform discretisation of the angular space between 0 and 90 degrees. Red and blue texts show the associated angle combinations.

From this point onward, we present results only for the balanced information criterion S = I(*K*1; **Z**) + I(*K*2; **Z**) − I(*K*1; *K*2|**Z**). Figure 6 shows the variation in S across all the combinations (x-axis and in log-scale to capture the spread) grouped by the number of angles in a protocol and sorted according to the increasing order of S within each such group. Within each group, observing the minimum and maximum values of S shows that a better choice of angles can lead to more than a 100% increase in the information gain compared to a poor choice. Furthermore, this shows that good combinations of a lower number of angles can lead to higher information gain compared to a higher number of angles with poor combinations. For example, the maximum S when only one angle is used is higher than many combinations with two to four angles. This emphasises the utility of optimal design and the proposed framework.

**Figure 3.** The variation of information criterion S = I(*K*2; **Z**) across the 6884 combinations grouped by the number of angles in a protocol. The vertical lines represent the variation around the mean value, which is shown in black circles. Black text shows the combinations that produce maximum and minimum values of S. The red and blue pointers show S for angle combinations that follow a uniform discretisation of the angular space between 0 and 90 degrees. Red and blue texts show the associated angle combinations.

**Figure 4.** The variation of information criterion S = I(*K*1; **Z**) + I(*K*2; **Z**) − I(*K*1; *K*2|**Z**) across the 6884 combinations grouped by the number of angles in a protocol. The vertical lines represent the variation around the mean value, which is shown in black circles. Black text shows the combinations that produce maximum and minimum values of S. The red and blue pointers show S for angle combinations that follow a uniform discretisation of the angular space between 0 and 90 degrees. Red and blue texts show the associated angle combinations.

**Figure 5.** Representative observations from the model with *k*<sup>1</sup> = 40 kPa and *k*<sup>2</sup> = 40. (**a**) The observations using angles *φ* = 0 and 90 degrees, with the latter covering a significantly larger range. (**b**) The change in observations the angle is changed from 18, 24 to 30 degrees shows a transition in *e*<sup>1</sup> from positive to negative values, indicating a coupling between the two directions. Note that *e*<sup>1</sup> is shown here in solid lines (left y-axis) and *e*<sup>2</sup> is shown in dashed lines (right y-axis).

**Figure 6.** The variation of information criterion S = I(*K*2; **Z**) + I(*K*2; **Z**) − I(*K*1; *K*2|**Z**) across the 6884 combinations. The vertical red lines show the groupings with respect to the number of angles in a protocol and S values are sorted in increasing order within each such grouping. The x-axis is represents the index associated with the protocol and is in logarithmic scale to capture the spread between one angle in a protocol (16 values) vs five angles in a protocol (4368 values).

Figure 7 shows S for all the 6884 combinations in increasing order of magnitude, and Figure 8 shows a zoomed plot for the first 150 combinations along with the corresponding combinations of angles. Observing the index values of 26 (red) and 28 (blue) in Figure 8 shows that even though four combinations are used in the index 26 protocol, it produces a lower S compared to when only a single angle is used in the index 28 protocol. Furthermore, since Figure 8 shows the first 150 out of 6884 combinations of Figure 7 (which is sorted in creasing order of S), all combinations here are relatively low S-producing protocols. Observing the high density of angles in the region *φ* < 24◦ is indicative that lower values of angles—in particular, those less than 24◦—are relatively less informative when compared to higher values of angles. This behaviour is also apparent in Figure 9, which shows S values for protocols that use only one angle, and where a sharp jump can be observed when transitioning from 18◦ to 24◦. This peculiar behaviour may be explained by the physics of the biaxial experiment. Looking at the resulting strains *e*<sup>1</sup> and *e*<sup>2</sup> of this transition (Figure 5b), we observe that the *e*<sup>1</sup> changes from positive to negative values. This behavior captures the important coupling between the two normal stresses and strains and is also related to the fiber dispersion in our constitutive model (Equation (1), [24]). It is remarkable and encouraging that the information-theoretic framework captures the physics of the problem without explicitly considering it in the framework. While for simpler lowdimensional models, the association between physics and optimal design may be relatively easy to see, inferring such behaviour is, in general, not trivial for more complex and higher-dimensional models.

Similarly to Figure 9, the results of the information-theoretic optimal design are further analysed for a higher number of angles. When two angles are considered, the S values in increasing order of magnitude and the corresponding angle combinations are shown in Figure 10. This figure re-iterates observations made previously: (i) the choice of combinations significantly affects the information gain, where the best combination gives approximately 20 nats more information compared to the worst combination; and (ii) generally speaking, higher angles are more informative compared to lower angles—in particular, angles below 24◦. While a similar analysis for more than two angle combinations can be easily performed, the efficient visual representation of such results is cumbersome and avoided in this manuscript.

**Figure 7.** The variation of information criterion S = I(*K*1; **Z**) + I(*K*2; **Z**) − I(*K*1; *K*2|**Z**) across the 6884 combinations sorted in increasing order of S.

**Figure 8.** Zoomed view of the first 150 protocols from Figure 7. The upper panel shows S and the lower panel shows the angles (by circles) in the corresponding protocol. The red points showcase a protocol with four measurement angles which yet produces a lower S, implying a poorer protocol, with respect to the blue points which show a protocol with only one measurement angle.

**Figure 9.** Information criterion against the angle when the protocols are restricted to a maximum of one angle.

To further illustrate the validity of the information-theoretic approach, a comparison with a classical method (see Section 2.2.5) is presented. For one and two angles in a protocol, Figure 11 shows a comparison between S and the log of the determinant of the inverse Fisher Information Matrix, log <sup>|</sup>*H*−1|. It is encouraging that a high correspondence between the two metrics is observed. In particular, increases in S, implying higher information gains, are accompanied by corresponding decreases in log <sup>|</sup>*H*−1|, implying a smaller volume of the parameter posteriors. A Pearson correlation coefficient of *r*=−0.76 is observed between <sup>S</sup> and log <sup>|</sup>*H*−1|, implying a high similarity between the two metrics and validating the information-theoretic approach in part. We note that, when the number of the parameters become large, evaluating the Hessian would imply a non-negligible computational cost. On the contrary, the method used to evaluate the mutual information, as a primarily Monte Carlo-based estimation, is less severely dependent on the number of parameters. Furthermore, the computation of derivatives (either numerically or through adjoint based methods) may be cumbersome for certain types of models. Finally, we note that the effect of noise on information gain, and hence optimal design, can be easily assessed in the proposed framework by adding noise to the samples of **Z** (see Equation (9)).

**Figure 10.** Information criterion against the angles when the protocols are restricted to only two angles. The upper panel shows S and the lower panel shows the angles (with circles) in the corresponding protocol.

**Figure 11.** Information criterion S (in blue) and the log of the determinant of the inverse Fisher Information Matrix log10 <sup>|</sup>*H*−1<sup>|</sup> (in red) against the angles when the protocols are restricted to a maximum of two angles. The upper panel shows <sup>S</sup> and log10 <sup>|</sup>*H*−1|, while the lower panel shows the angles (with circles) in the corresponding protocol.

#### **4. Conclusions**

A framework for optimal design based on information-theoretic quantities of mutual information and conditional mutual information is proposed. The framework treats information gain as the central criterion for inverse problems and proposes several informationtheoretic frameworks for a desired sense of optimality. The capabilities of this framework are tested on the optimal design problem for biaxial experiments, where the effect of the angle combinations along which the strains are measured is assessed in terms of parameter estimation through information gain. Without including any physics-based reasoning, and purely through the information-theoretic measures, it is found that low angles ≤ 24◦ are not very informative regarding the parameters relative to high angles. These observations are then found to be consistent based on physics-based reasoning, thereby showing the efficacy of the proposed framework. Furthermore, it is demonstrated that measurements for a low total number of angles which are carefully chosen can be more informative compared to the case when measurements along a high number of poorly chosen angles are acquired, thus highlighting both the importance of optimal design for biaxial experiments and the utility of the proposed framework in determining good angle combinations. The application of the proposed framework to classical optimal design is performed, and it is shown that the results produced by the new framework are consistent with classical frameworks.

#### **5. Limitations and Future Work**

While the proposed framework is shown to perform well on a two-parameter problem, its performance in higher parameter problems is not assessed. This assessment represents the primary limitation of this work and an area of future assessment. In particular, the problems envisaged are largely related to the performance of the MI and CMI estimators in higher dimensions of both parameters and the measurements. While a dimensionality reduction approach was adopted in this study to minimise the adverse effects of the latter, this may not be possible in many forward and inverse problems. Thus, a large area of future work is related to the development of efficient and robust MI and CMI estimators. Note that several approaches are being proposed by researchers to solve this problem; see for example [40–46]. Lastly, a thorough comparison against classical optimal design methods

(C, E, T and V-optimal designs, etc.) needs to be performed, along with the construction and analysis of corresponding information-theoretic metrics.

**Author Contributions:** All authors contributed equally to writing, editing, and reviewing the manuscript. S.P. and D.L. conceptualised the information-theoretic framework. A.A. and S.P. conceptualised the application to the biaxial experiments. AA wrote the numerical code for the biaxial experiment. S.P. and D.L. wrote the code for dimensionality reduction and the estimation of information-theoretic measures. All authors contributed equally to the analysis of the results. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Engineering and Physical Sciences Research Council of the UK (Grant reference EP/R010811/1 to SP and grant reference EP/P018912/1 and EP/P018912/2 to AA).

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Fractional-in-Time Prey–Predator Model with Hunting Cooperation: Qualitative Analysis, Stability and Numerical Approximations**

**Maria Francesca Carfora \* and Isabella Torcicollo**

Istituto per le Applicazioni del Calcolo "Mauro Picone" CNR, 80131 Napoli, Italy; i.torcicollo@iac.cnr.it **\*** Correspondence: f.carfora@iac.cnr.it

**Abstract:** A prey–predator system with logistic growth of prey and hunting cooperation of predators is studied. The introduction of fractional time derivatives and the related persistent memory strongly characterize the model behavior, as many dynamical systems in the applied sciences are well described by such fractional-order models. Mathematical analysis and numerical simulations are performed to highlight the characteristics of the proposed model. The existence, uniqueness and boundedness of solutions is proved; the stability of the coexistence equilibrium and the occurrence of Hopf bifurcation is investigated. Some numerical approximations of the solution are finally considered; the obtained trajectories confirm the theoretical findings. It is observed that the fractional-order derivative has a stabilizing effect and can be useful to control the coexistence between species.

**Keywords:** Caputo fractional derivative; Allee effect; existence and stability; Hopf bifurcation; implicit schemes

**MSC:** 34A08; 34D20; 65R10

#### **1. Introduction**

Population dynamics are regulated by several factors: availability of resources, predation, diseases, etc. Among these factors, the interaction between prey and predators is probably the most studied one in ecology, tracing back to the works of Lotka and Volterra in the early 20th century. Since then, a number of predator–prey models have been proposed and studied (see the excellent reviews [1,2]), considering different extensions of the original one: the realistic assumption that prey populations are limited by food resources and not just by predation leads to the inclusion of terms representing carrying capacity for the prey population; the characterization of specific behaviors of the predator population results in the introduction of several functional responses for them; the complexity and dishomogeneity in the environment often requires a spatial description [3]. It is well-known that species diffusing at different rates can generate spatial patterns, observed in several biological contexts. Such Turing patterns affecting spatial predator–prey models have been deeply investigated in the literature. In addition, the inclusion of group defense has a significant impact on the dynamics of the predator–prey system. Of course, this movement is conditioned by the abundance or scarcity of the other species, so that spatio-temporal population models can include cross-diffusion terms in addition to self-diffusion ones.

Cooperation is a common behavior in many biological groups and can sensibly affect the growth rate of populations in an ecological community. Limiting our interest to the predator–prey dynamics, many mechanisms have been identified, all capable of facilitating reproduction, breeding, foraging and defense in the prey population. All of them can induce a demographic Allee effect [4]. In predators, hunting cooperation, among other interactions, can also result in Allee effects; different intensities of such a cooperative behavior impact on the survival of both species and modify the stability of the ecological system (see [5] and references therein).

**Citation:** Carfora, M.F.; Torcicollo, I. A Fractional-in-Time Prey–Predator Model with Hunting Cooperation: Qualitative Analysis, Stability and Numerical Approximations. *Axioms* **2021**, *10*, 78. https://doi.org/10.3390/axioms10020078

Academic Editor: Clemente Cesarano

Received: 31 March 2021 Accepted: 27 April 2021 Published: 30 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Even if the interest on fractional calculus traces back at least to the 1970s (without considering the seminal suggestions given by Leibniz and Euler more than 300 years ago), in the last decades there has been an explosion of research activities on its application to several areas. Fractional order systems are not only an extension of conventional integerorder systems in mathematics but also have memory and hereditary properties that integer order systems do not have. In the last decades, there has been a huge interest in such models, due to their specific feature of describing memory effects in dynamical systems: many nonlinear models with time fractional derivatives have been considered, not only in population dynamics [6,7], but also in chemistry and biochemistry, medicine, mechanics, engineering, finance, psychology (see, among the recent surveys, [8,9]). In some situations, indeed, fractional-order models, enabling the description of the memory and hereditary properties inherent in various materials, processes and biological systems, seem more consistent with the real phenomena than integer-order models. Models with fractionalorder derivatives can take different forms depending on the system considered; among them, fractional differential equations and fractional partial differential equations are greatly applied to describe continuous systems, both deterministic and stochastic [10]. In addition, heterogeneous non-ergodic diffusion processes [11] and the effect of anomalous diffusion on a population survival have been investigated [12].

The authors considered, in previous studies, some interacting population models that explored the mentioned characteristics (limited resources, nonlinear growth, fear effect, cooperative behavior), also in the presence of spatial dishomogeneity [13–17]. In the above-mentioned research, systems of both ordinary and partial differential equations have been studied. Precisely, ODE descriptions for the dynamics of an intraguild predator–prey model [13] and for the spreading of waterborne diseases [15,17] have been considered and the stability of the model solutions discussed. Furthermore, in order to highlight how the spatial diffusion can both play an important role in the population evolution and lead to the formation of spatial patterns, a reaction–diffusion system modeling hunting cooperation [14] and a predator–prey system with fear and group defense [16] have been analyzed. In such researches, the effect of spatial diffusion on the stability of the equilibria has been highlighted and the conditions on the parameters ensuring the formation of patterns (stripes, spots, mixed, etc.) have been found.

In the present paper, the authors generalize the model introduced in [5], by replacing the ordinary time derivative with a fractional one so to investigate how the fractionalin-time derivative impacts the system dynamics. It is worth underlining, for the sake of completeness, that such a model has been already extended in [14] to include spatiotemporal dynamics and the related occurrence of Turing patterns. Here, instead, the authors, to better highlight the fractional derivative impact on the populations dynamics, take into account the original ODE model and consider the corresponding fractional-intime system.

Such a modeling provides challenges and ideas in many other fields of applied mathematics in which nonlinear mathematical models having a similar structure are considered [18–22]. After reviewing in Section 2 the main prerequisites on fractional calculus, the model is formulated in Section 3 and existence and boundedness of solutions is proved. Section 4 discusses the stability of the coexistence equilibrium, while Section 5 shows some numerical approximations by different schemes. Section 6 concludes the manuscript.

#### **2. Preliminaries**

In this section, we recall the fundamental definitions, concepts and results that we will use throughout the paper. For further details, we refer the reader to [23–25].

**Definition 1.** *The fractional integral of order <sup>θ</sup>* <sup>&</sup>gt; <sup>0</sup> *for a Lebesgue integrable function <sup>f</sup>* : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup> *is defined by*

$$I^{\theta}f(t) = \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} f(\tau) d\tau.$$

*and the Caputo fractional derivative of order θ* ∈ (*m* − 1, *m*) *of a sufficiently differentiable function f*(*t*) *is defined by*

$$D\_t^\theta f(t) = \frac{1}{\Gamma(m-\theta)} \int\_{t\_0}^t (t-\tau)^{m-\theta-1} f^{(m)}(\tau) d\tau, \tau$$

*where* Γ(*θ*) *is the Euler's Gamma function.*

We underline that, under natural conditions on *f*(*t*), the Caputo derivative coincides with the classical derivative whenever *<sup>θ</sup>* <sup>∈</sup> <sup>N</sup>.

**Lemma 1.** *Let <sup>x</sup>*(*t*) *be a continuous function on* [*t*0, <sup>∞</sup>) *satisfying <sup>D</sup>θx*(*t*) ≤ −*Ax*(*t*) + *<sup>B</sup>*, *<sup>x</sup>*(*t*0) = *<sup>x</sup>*0*, with* <sup>0</sup> <sup>&</sup>lt; *<sup>θ</sup>* <sup>&</sup>lt; <sup>1</sup> *and A*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>, *<sup>A</sup>* <sup>&</sup>gt; 0, *<sup>t</sup>*<sup>0</sup> <sup>≥</sup> 0. *Then for x*(*t*) *it holds true*

$$\mathbf{x}(t) \le \left(\mathbf{x}\_0 - \frac{B}{A}\right) E\_\theta[-A(t - t\_0)^\theta] + \frac{B}{A} \,'\, \tag{1}$$

*where E<sup>θ</sup>* (·)*is the Mittag–Leffler function defined by*

$$E\_{\theta} = \sum\_{k=0}^{\infty} \frac{z^k}{\Gamma(\theta k + 1)}. \tag{2}$$

**Lemma 2.** *Let Dθx*(*t*) = *φ*(*t*, *x*)*, with t* > *t*0*, be the system with the initial condition x*(*t*0) *and* <sup>0</sup> <sup>&</sup>lt; *<sup>θ</sup>* <sup>≤</sup> 1, *<sup>φ</sup>* : [*t*0, <sup>∞</sup>) <sup>×</sup> <sup>Ω</sup> <sup>→</sup> <sup>R</sup>*m*, <sup>Ω</sup> <sup>⊆</sup> <sup>R</sup>*m. If <sup>φ</sup>*(*t*, *<sup>x</sup>*) *satisfies the Lipschitz condition with respect to x, then there exists a unique solution of the above system on* [*t*0, ∞) × Ω.

**Lemma 3.** *Let D<sup>θ</sup> <sup>t</sup>* **x** = *φ*(**x**)*, with* **x**(*t*0) = **x**<sup>0</sup> *and* 0 < *θ* < 1, *be an autonomous nonlinear fractional-order system. A point* **x**∗ *is called an equilibrium point of the system if it satisfies φ*(**x**∗) = **0**. *This equilibrium point is locally asymptotically stable if all eigenvalues λ<sup>i</sup> of the Jacobian matrix J* <sup>=</sup> *∂φ <sup>∂</sup><sup>x</sup> evaluated at* **<sup>x</sup>**<sup>∗</sup> *satisfy the Matignon conditions* <sup>|</sup> arg(*λi*)<sup>|</sup> <sup>&</sup>gt; *θπ* 2 .

#### **3. Fractional Model with Caputo Derivative: Statement of the Problem, Boundedness, Existence and Uniqueness**

We present the following fractional-order prey–predator model corresponding to the non-dimensional model introduced in [5]

$$\begin{cases} D\_t^\theta n = \sigma n \left( 1 - \frac{n}{k} \right) - (1 + \alpha p) np, \\ D\_t^\theta p = (1 + \alpha p) np - p \end{cases} \tag{3}$$

where *D<sup>θ</sup> <sup>t</sup>* represents the Caputo fractional derivative, given in Definition 1, with 0 < *θ* < 1 and *n*, *p* are the non-dimensional variables corresponding to the prey and predator densities. All the non-dimensional parameters are positive constant. Precisely, *k* comprises the dimensional carrying capacity, the conversion efficiency as well as the per-capita predator mortality and attack rate, while *σ* is linked to the per-capita intrinsic growth rate of prey and per-capita mortality rate of predator and *α* is linked to the predator cooperation in hunting rate, the attack rate and the per-capita predator mortality rate. Details on both the derivation of model and the biological meaning of parameters can be found in [5,14]. To (3) we append the initial conditions

$$n(t\_0) = n\_{0\prime} \quad p(t\_0) = p\_0. \tag{4}$$

#### *3.1. Boundedness*

In this subsection, we prove the positivity and boundedness of the solution of the system (3). Let R<sup>2</sup> <sup>+</sup> <sup>=</sup> {**x**(*t*) <sup>∈</sup> <sup>R</sup><sup>2</sup> : **<sup>x</sup>**(*t*) <sup>≥</sup> **<sup>0</sup>**} and **<sup>x</sup>**(*t*) = *<sup>n</sup>*(*t*) *p*(*t*) .

**Theorem 1.** *The solution of the fractional-order system (3) and (4) is bounded in* R<sup>2</sup> <sup>+</sup>. *Moreover, the density of the population remains in a nonnegative region.*

**Proof.** Let us define the function

$$
\mathcal{U}(t) = n(t) + p(t) \tag{5}
$$

and *ξ* be a positive constant. Applying the Caputo fractional derivative on both sides in (5) and using (3), it follows that

$$\begin{split} D\_t^\theta U(t) + \tilde{\xi}^\theta U(t) &= D\_t^\theta n(t) + D\_t^\theta p(t) + \tilde{\xi}^\theta n(t) + \tilde{\xi} p(t) \\ &= \sigma n \Big( 1 - \frac{n}{k} \Big) - p + \tilde{\xi} n + \tilde{\xi} p \\ &= -\frac{\sigma n^2}{k} + (\sigma + \tilde{\xi}) n + (\tilde{\xi} - 1) p \\ &\leq -\frac{\sigma}{k} \Big( n - \frac{k(\sigma + \tilde{\xi})}{2\sigma} \Big)^2 + \frac{k(\sigma + \tilde{\xi})^2}{2\sigma} \\ &\leq \frac{k(\sigma + \tilde{\xi})^2}{2\sigma}, \end{split}$$

where *ξ* < 1. Using Lemma 1, it follows that

$$\begin{split} \mathcal{U}(t) &\leq \left( \mathcal{U}(t\_0) - \frac{k(\sigma + \mathfrak{z})^2}{2\sigma \mathfrak{z}\_\mathfrak{z}^\mathfrak{x}} \right) E\_\theta[-\mathfrak{z}t^\theta] + \frac{k(\sigma + \mathfrak{z})^2}{2\sigma \mathfrak{z}\_\mathfrak{z}^\mathfrak{x}} \\ &\leq \mathcal{U}(t\_0) E\_\theta[-\mathfrak{z}t^\theta] + \frac{k(\sigma + \mathfrak{z})^2}{2\sigma \mathfrak{z}\_\mathfrak{z}^\mathfrak{x}} \left[1 - E\_\theta[-\mathfrak{z}t^\theta] \right]. \end{split} \tag{6}$$

Then, for *t* → ∞ it follows that *U*(*t*) ≤ *k*(*σ* + *ξ*)<sup>2</sup> <sup>2</sup>*σξ* . Hence, all the solutions to (3) starting from R<sup>2</sup> <sup>+</sup> are confined in the following domain A ⊆ <sup>R</sup><sup>2</sup> +

$$\mathcal{A} = \left\{ (n(t), p(t)) \in \mathbb{R}\_+^2 : n(t) + p(t) \le \frac{k(\sigma + \mathfrak{z})^2}{2\sigma \mathfrak{z}^3} + \varepsilon, \forall \varepsilon > 0 \right\}$$

which is positively invariant.

#### *3.2. Existence and Uniqueness*

In this subsection, we find the conditions for the existence and uniqueness of the solution to fractional-order prey–predator model (3) in the region Θ × (*t*0, *T*], *T* < ∞, where

$$\Theta = \{(n, p) \in \mathbb{R}^2 : \max(|n|, |p|) \le \mathcal{K}\},$$

by using the fixed point technique. The existence of *K* is guaranteed by the boundedness of the solution. The model can be reformulated in the fractional integral form which gives

$$\begin{split} n(t) - n(t\_0) &= \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} \left( \sigma n(\tau) \left( 1 - \frac{n(\tau)}{k} \right) - (1 + a p(\tau)) n(\tau) p(\tau) \right) d\tau, \\ p(t) - p(t\_0) &= \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} ((1 + a p(\tau)) n(\tau) p(\tau) - p(\tau)) d\tau. \end{split} \tag{7}$$

Denoting by *F*<sup>1</sup> and *F*<sup>2</sup> the following kernels

$$\begin{aligned} F\_1(t, n(t), p(t)) &= \sigma n(t) \left( 1 - \frac{n(t)}{k} \right) - [1 + \alpha p(t)] n(t) p(t), \\ F\_2(t, n(t), p(t)) &= [1 + \alpha p(t)] n(t) p(t) - p(t), \end{aligned} \tag{8}$$

the following theorem holds.

**Theorem 2.** *Let M*<sup>1</sup> = *σ* + (1 + *αK*)*K* + 2 *σ k <sup>K</sup> and <sup>M</sup>*<sup>2</sup> <sup>=</sup> <sup>|</sup>*<sup>K</sup>* <sup>−</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup>*αK*2|*. If* <sup>0</sup> <sup>&</sup>lt; *Mi* <sup>&</sup>lt; 1, (*i* = 1, 2), *then the kernels Fi*(*t*, *n*, *p*),(*i* = 1, 2) *agree with the contraction and Lipschitz conditions in the region* Θ × (*t*0, *T*].

**Proof.** Let *n*(*t*1) = *n*<sup>1</sup> and *n*(*t*2) = *n*<sup>2</sup> be two functions for the kernel *F*<sup>1</sup> and *p*(*t*1) = *p*<sup>1</sup> and *p*(*t*2) = *p*<sup>2</sup> be two functions for the kernel *F*2. Then we have

$$\begin{aligned} & \left\| F\_1(t, n\_1, p) - F\_1(t, n\_2, p) \right\| \\ &= \left\| \sigma \left( 1 - \frac{n\_1}{k} \right) n\_1 - (1 + ap) p n\_1 - \sigma \left( 1 - \frac{n\_2}{k} \right) n\_2 + (1 + ap) p n\_2 \right\| \\ &= \left\| \left[ \sigma - (1 + ap) p \right] n\_1 - \left[ \sigma - (1 + ap) p \right] n\_2 - \frac{\sigma}{k} (n\_1^2 - n\_2^2) \right\| \\ &= \left\| \left[ \sigma - (1 + ap) p \right] (n\_1 - n\_2) - \frac{\sigma}{k} (n\_1 - n\_2) (n\_1 + n\_2) \right\| \\ &\leq \left( \sigma + (1 + aK) K + 2 \frac{\sigma}{k} K \right) \left\| n\_1 - n\_2 \right\| \\ &\leq M\_1 \left\| n\_1 - n\_2 \right\| \end{aligned} \tag{9}$$

and

$$\begin{aligned} & \| F\_2(t, n, p\_1) - F\_2(t, n, p\_2) \| \\ &= \| (1 + \alpha p\_1)np\_1 - p\_1 - (1 + \alpha p\_2)np\_2 + p\_2 \| \\ &= \| (n - 1)(p\_1 - p\_2) + \alpha n(p\_1^2 - p\_2^2) \| \\ &= \| [(n - 1) + \alpha n(p\_1 + p\_2)](p\_1 - p\_2) \| \\ &\le \| K - 1 + 2\alpha K^2 \| \| p\_1 - p\_2 \| \\ &\le M2 \| |p\_1 - p\_2| \| \end{aligned} \tag{10}$$

where *M*<sup>1</sup> = *σ* + (1 + *αK*)*K* + 2 *σ k <sup>K</sup>* and *<sup>M</sup>*<sup>2</sup> <sup>=</sup> <sup>|</sup>*<sup>K</sup>* <sup>−</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup>*αK*2|. Therefore, the Lipschitz conditions are satisfied for kernels *F*<sup>1</sup> and *F*2, and if 0 < *Mi* < 1, (*i* = 1, 2) then *M*<sup>1</sup> and *M*<sup>2</sup> are also contractions for *F*<sup>1</sup> and *F*2, respectively. Assume that the conditions (9) and (10) hold and let us consider the kernels *F*<sup>1</sup> and *F*2. Then (7) can be written

$$\begin{aligned} n(t) &= n(t\_0) + \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} F\_1(\tau, n, p) d\tau, \\ p(t) &= p(t\_0) + \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} F\_2(\tau, n, p) d\tau. \end{aligned} \tag{11}$$

The initial conditions and the recurrence form of the model (11) are, respectively

$$n\_0(t) = n(t\_0), \quad p\_0(t) = p(t\_0) \tag{12}$$

and

$$\begin{split} n\_m(t) &= n(t\_0) + \frac{1}{\Gamma(\theta)} \int\_{t\_0}^t (t - \tau)^{\theta - 1} F\_1(\tau, n\_{m-1}, p) d\tau, \\ p\_m(t) &= p(t\_0) + \frac{1}{\Gamma(\theta)} \int\_{t\_0}^t (t - \tau)^{\theta - 1} F\_2(\tau, n, p\_{m-1}) d\tau. \end{split} \tag{13}$$

The successive difference between the terms is defined as

$$\begin{split} \Phi\_{1m}(t) &= n\_m(t) - n\_{m-1}(t) = \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t-\tau)^{\theta-1} [F\_1(\tau, n\_{m-1}, p) - F\_1(\tau, n\_{m-2}, p)] d\tau, \\ \Phi\_{2m}(t) &= p\_m(t) - p\_{m-1}(t) = \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t-\tau)^{\theta-1} [F\_2(\tau, n, p\_{m-1}) - F\_2(\tau, n, p\_{m-2})] d\tau, \end{split} \tag{14}$$

where

$$m\_m(t) = \sum\_{i=1}^m \Phi\_{1i}(t), \ p\_m(t) = \sum\_{i=1}^m \Phi\_{2i}(t). \tag{15}$$

Taking the norm of (14), it follows from the conditions (9)–(10) that

$$\begin{split} \left\lVert \left\lVert \Phi\_{1\mathfrak{m}}(t) \right\rVert \right\rVert &= \left\lVert n\_{\mathfrak{m}}(t) - n\_{m-1}(t) \right\rVert \leq \frac{M\_{1}}{\Gamma(\theta)} \int\_{t\_{0}}^{t} (t-\tau)^{\theta-1} \left\lVert \Phi\_{1(m-1)}(\tau) \right\rVert d\tau, \\ \left\lVert \left\lVert \Phi\_{2\mathfrak{m}}(t) \right\rVert \right\rVert &= \left\lVert p\_{\mathfrak{m}}(t) - p\_{m-1}(t) \right\rVert \leq \frac{M\_{2}}{\Gamma(\theta)} \int\_{t\_{0}}^{t} (t-\tau)^{\theta-1} \left\lVert \Phi\_{2(m-1)}(\tau) \right\rVert \Big\lVert d\tau. \end{split} \tag{16}$$

The following theorem holds.

**Theorem 3.** *Assume that the conditions (9)–(10) hold. If*

$$\frac{M\_i T^\theta}{\Gamma(\theta + 1)} < 1, \ i = 1, 2,\tag{17}$$

*then the solution of the fractional model given in (3)–(4) exists and is unique.*

**Proof.** Let us consider *n*(*t*), *p*(*t*) bounded functions and the kernels *F*<sup>1</sup> and *F*<sup>2</sup> satisfy the Lipschitz condition. From (15) and (16), it follows that

$$\begin{aligned} \|\Phi\_{1\mathfrak{m}}(t)\| &\le \|m\_0(t)\| \left\{ \frac{M\_1 T^{\theta}}{\Gamma(\theta+1)} \right\}^{\mathfrak{m}}, \\ \|\Phi\_{2\mathfrak{m}}(t)\| &\le \|p\_0(t)\| \left\{ \frac{M\_2 T^{\theta}}{\Gamma(\theta+1)} \right\}^{\mathfrak{m}}. \end{aligned} \tag{18}$$

This shows the existence for the solutions. Moreover, in order to prove that (18) are solutions to (3) and (4), we consider

$$\begin{aligned} n(t) - n(t\_0) &= n\_m(t) - r\_{1m}(t), \\ p(t) - p(t\_0) &= p\_m(t) - r\_{2m}(t), \end{aligned} \tag{19}$$

where *r*1*m*, *r*2*<sup>m</sup>* are the remaining terms. We show that the terms in (19) satisfy *r*1∞(*t*) → 0 and *r*2∞(*t*) → 0. Now, we consider the conditions

$$\begin{split} \|r\_{1m}(t)\| &\leq \left\|\frac{1}{\Gamma(\theta)}\int\_{t\_0}^t (t-\tau)^{\theta-1} |F\_1(\tau, n, p) - F\_1(\tau, n\_{m-1}, p)| d\tau\right\| \\ &\leq \frac{1}{\Gamma(\theta)} \int\_{t\_0}^t (t-\tau)^{\theta-1} \|F\_1(\tau, n, p) - F\_1(\tau, n\_{m-1}, p)\| |d\tau| \\ &\leq \frac{M\_1 T^{\theta}}{\Gamma(\theta+1)} \|n - n\_{m-1}\|. \end{split} \tag{20}$$

On using recursive techniques, we get

$$||r\_{1m}(t)|| \le \left\{ \frac{T^{\theta}}{\Gamma(\theta+1)} \right\}^{m+1} M\_1^{m+1}.$$

For *m* → ∞, it follows that *r*1*m*(*t*) → 0. In a similar way, we conclude that *r*2*m*(*t*) → 0. In order to prove the uniqueness of the solution to the model (3) and (4), let us suppose there exists another solution of the system *n*¯(*t*) and *p*¯(*t*). Then

$$\begin{split} \|\|\boldsymbol{n}(t) - \boldsymbol{n}(t)\|\| &\leq \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} \|\|\boldsymbol{F}\_1(\tau, \boldsymbol{n}\_\tau \boldsymbol{p}) - \boldsymbol{F}\_1(\tau, \boldsymbol{n}\_\tau \boldsymbol{p})\| \|d\tau, \\ \|\|\boldsymbol{p}(t) - \boldsymbol{p}(t)\|\| &\leq \frac{1}{\Gamma(\theta)} \int\_{t\_0}^{t} (t - \tau)^{\theta - 1} \|\|\boldsymbol{F}\_2(\tau, \boldsymbol{n}\_\tau \boldsymbol{p}) - \boldsymbol{F}\_2(\tau, \boldsymbol{n}\_\tau \boldsymbol{p})\| \|d\tau. \end{split} \tag{21}$$

In view of the Lipschitz condition, it follows that

$$\begin{aligned} \|n(t) - \bar{n}(t)\| &\le \frac{M\_1 t^{\theta}}{\Gamma(\theta + 1)} \|n(t) - \bar{n}(t)\|\_{\prime} \\ \|p(t) - \bar{p}(t)\| &\le \frac{M\_2 t^{\theta}}{\Gamma(\theta + 1)} \|p(t) - \bar{p}(t)\|\_{\prime} \end{aligned} \tag{22}$$

and hence

$$\begin{aligned} \|\boldsymbol{n}(t) - \boldsymbol{n}(t)\| \left(1 - \frac{M\_1 t^{\theta}}{\Gamma(\theta + 1)}\right) &\leq 0, \\\|\boldsymbol{p}(t) - \boldsymbol{p}(t)\| \left(1 - \frac{M\_2 t^{\theta}}{\Gamma(\theta + 1)}\right) &\leq 0. \end{aligned} \tag{23}$$

In view of (17), it follows that *n*(*t*) − *n*¯(*t*) = 0 and *p*(*t*) − *p*¯(*t*) = 0.

#### **4. Stability Analysis**

The equilibrium points of system (3) are obtained by considering

$$D\_t^\theta n(t) = 0, \quad D\_t^\theta p(t) = 0.$$

According to [5,14], besides the trivial equilibrium *E*<sup>0</sup> ≡ (0, 0), the system admits the predator-free equilibrium *Eb* <sup>≡</sup> (*k*, 0) and the coexistence equilibrium *<sup>E</sup>*<sup>∗</sup> = (*n*∗, *<sup>p</sup>*∗) <sup>≡</sup> <sup>1</sup> (1 + *αp*∗) , *p*∗ where *<sup>p</sup>*<sup>∗</sup> satisfies *<sup>α</sup>*2*kp*<sup>3</sup> <sup>+</sup> <sup>2</sup>*αkp*<sup>2</sup> <sup>+</sup> *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *ασ*)*<sup>p</sup>* <sup>+</sup> *<sup>σ</sup>*(<sup>1</sup> <sup>−</sup> *<sup>k</sup>*) = 0. As shown in [14], if *<sup>k</sup>* <sup>&</sup>gt; 1, *<sup>E</sup>*<sup>∗</sup> is unique, while if <sup>−</sup><sup>1</sup> <sup>+</sup> <sup>√</sup><sup>1</sup> <sup>+</sup> <sup>3</sup>*σα ασ* <sup>&</sup>lt; *<sup>k</sup>* <sup>≤</sup> 1 and *<sup>σ</sup>* <sup>&</sup>gt; <sup>1</sup> *<sup>α</sup>* there exist two coexistence equilibria. In all other cases, no coexistence equilibria are admissible.

In the following, we will discuss the stability properties of the coexistence equilibrium by using the linearization method.

The Jacobian matrix of system (3) evaluated at the equilibrium point *E*∗ = (*n*∗, *p*∗) is given by

$$f(E^\*) = \begin{pmatrix} \sigma \\ -\frac{\sigma}{k(1+\alpha p^\*)} & -\frac{1+2\alpha p^\*}{1+\alpha p^\*} \\\\ (1+\alpha p^\*)p^\* & \frac{\alpha p^\*}{1+\alpha p^\*} \end{pmatrix} . \tag{24}$$

Now, the characteristic equation of the Jacobian matrix is *<sup>λ</sup>*<sup>2</sup> <sup>−</sup> *<sup>I</sup><sup>λ</sup>* <sup>+</sup> *<sup>A</sup>* <sup>=</sup> 0 and the roots of the characteristic equation are

$$
\lambda\_{1,2} = \frac{I \pm \sqrt{I^2 - 4A}}{2},
\tag{25}
$$

where *I* and *A* are the trace and determinant of the Jacobian matrix *J*(*E*∗) given by

$$I = \frac{k\alpha p^\* - \sigma}{k(1 + \alpha p^\*)},$$

$$A = \frac{p^\*}{k(1 + \alpha p^\*)^2} [k(1 + \alpha p^\*)^2 (1 + 2\alpha p^\*) - \alpha \sigma].$$

The stability analysis of *E*∗ will be divided in the following four cases:

	- (a) If *I* > 0 then *Re*(*λ*1) = *Re*(*λ*2) > 0 and *Im*(*λ*1) = −*Im*(*λ*2) = √ 4*A* − *I*<sup>2</sup> <sup>2</sup> <sup>&</sup>gt; 0. Then, when √ 4*A* − *I*<sup>2</sup> *<sup>I</sup>* <sup>&</sup>gt; tan *<sup>θ</sup> π* <sup>2</sup> , the coexistence equilibrium is asymptotically stable, otherwise it is unstable.
	- (b) If *<sup>I</sup>* <sup>&</sup>lt; 0, then *Re*(*λ*1) = *Re*(*λ*2) <sup>&</sup>lt; 0. Then, <sup>|</sup> arg(*λi*)<sup>|</sup> <sup>&</sup>gt; *<sup>π</sup>* <sup>2</sup> and the equilibrium is asymptotically stable.

A Hopf bifurcation occurs when a pair of complex eigenvalues of the Jacobian matrix at an equilibrium point exists and the stability of the equilibrium changes from stable to unstable when a bifurcation parameter crosses a critical value. We can choose the order of derivation *θ* to be the bifurcation parameter and by using the following well-known result we find the conditions for the Hopf bifurcation to appear.

**Lemma 4.** *[26] When bifurcation parameter θ passes through the critical value θ*<sup>∗</sup> ∈ (0, 1)*, fractional-order system (3) undergoes a Hopf bifurcation at the equilibrium point, if the following conditions hold*


We find the conditions for the model (3) tp undergo a Hopf bifurcation at equilibrium point *E*<sup>∗</sup> ≡ (*n*∗, *p*∗) when the order of derivation passes through the critical value *θ*<sup>∗</sup> = 2 *<sup>π</sup>* arctan[ ,|*I*<sup>2</sup> − <sup>4</sup>*A*| *<sup>I</sup>* ].

Let us assume that *<sup>I</sup>*<sup>2</sup> <sup>−</sup>4*<sup>A</sup>* <sup>&</sup>lt; 0 and *<sup>I</sup>* <sup>&</sup>gt; 0; let the critical value *<sup>θ</sup>*<sup>∗</sup> <sup>=</sup> <sup>2</sup> *<sup>π</sup>* arctan[ ,|*I*<sup>2</sup> − <sup>4</sup>*A*| *<sup>I</sup>* ]. Let us define *<sup>γ</sup>* <sup>=</sup> *<sup>I</sup>* <sup>2</sup> and *<sup>ω</sup>* <sup>=</sup> ,|*I*<sup>2</sup> − <sup>4</sup>*A*| <sup>2</sup> . Then, it follows that the eigenvalues are a pair of complex conjugate *λ*1,2 = *γ* ± *iω* with *γ* > 0. In addition, let *m*(*θ*) = *θ π* <sup>2</sup> <sup>−</sup> min 1≤*i*≤2 | arg *λi*|, then, it follows that

$$\begin{array}{rcl} m(\theta^\*) &= \theta^\* \frac{\pi}{2} - \min\_{1 \le i \le 2} |\arg \lambda\_i| = \theta^\* \frac{\pi}{2} - \arctan \frac{\omega}{\gamma} \\ &= \arctan \frac{\omega}{\gamma} - \arctan \frac{\omega}{\gamma} = 0. \end{array}$$

Finally,

$$\frac{dm(\theta)}{d\theta}|\_{\theta=\theta^\*} = \frac{\pi}{2} \neq 0.$$

Therefore, we can conclude that a Hopf bifurcation of (3) will appear at *E*∗.

#### **5. Numerical Solution**

Even if the interest on fractional calculus traces back at least to the 1970s (without considering the seminal suggestions given by Leibniz and Euler more than 300 years ago), in the last decades there has been an explosion of research activities on its application to several areas. Such a growing interest for fractional-order models had led to the development of specific algorithms devoted to the numerical approximation of the solution to fractional-order differential problems. Several solvers have been proposed in the last years [27–29], all trying to balance efficiency and accuracy while guaranteeing reliability of the numerical approximation. It is well-known, indeed, that the persistent memory of fractional-order operators reflects on the numerical evaluation of the solution: as a natural consequence, at each new timestep all the past history of the solution is to be considered. The number of involved steps increases with time, and so does the computational burden.

In a sense, numerical methods for fractional-order differential systems are then naturally multi-step. They generalize some of the ODE methods, but with significant differences in both complexity and accuracy. We follow here the excellent survey [30] and consider some of the schemes reported there. The simplest schemes are derived by approximating the integrand function **x**(*t*)=(*n*(*t*), *p*(*t*)) in (7) by a piecewise polynomial and proceeding to its exact integration. This leads to "rectangular" (explicit or implicit) product integration formulas when a zero-order approximation of the integrand function in each subinterval is assumed, or "trapezoidal" formulas where first-order approximation is chosen. Of course, implicit formulas are more stable, but they require solving the nonlinear equations that generally arise. To avoid this additional burden, a predictor–corrector approach can be useful.

The convergence order of the rectangular formulas is one: the distance between the numerical approximation and the exact solution in any time point *tn* decreases linearly with the timestep, and this result replicates the well-known one for the ODE formula. Unfortunately, this is not always true for the trapezoidal rule: its convergence order is limited by the minimum between 1 + *θ* and 2 [28], so that for 0 < *θ* < 1 the rate of the corresponding ODE method cannot be reached. Similarly, higher order formulas for fractional systems do not lead to significant improvements in accuracy and convergence rate, and they are not worthy to be considered. As an alternative to product integration formulas, Fractional Linear Multistep methods have been proposed to generalize the standard multistep methods for ODEs. They are very robust and can reach higher convergence order, but their convolution weights are not known explicitly in advance and have to be evaluated. This computation can however be performed very efficiently by FFT derived algorithms.

Numerical approximations of the solution of (3) can then be obtained by applying several different schemes. Even developed from the same ODE methods, they present distinguishing characteristics that deserve to be investigated. For this reason, we considered and compared the following schemes:

• Implicit Rectangular Product Integration rule (PI1)

$$\mathbf{x}\_{\mathsf{il}} = \mathbf{x}\_0 + h^{\theta} \sum\_{i=1}^n b\_{n-i}^{(\theta)} F(t\_i, \mathbf{x}\_i), \qquad b\_n^{(\theta)} = \frac{(n+1)^{\theta} - n^{\theta}}{\Gamma(\theta+1)} \succeq$$

• Implicit Trapezoidal Product Integration rule (PI2)

$$\mathbf{x}\_{n} = \mathbf{x}\_{0} + h^{\theta} \frac{1}{\Gamma(\theta + 2)} \left( a\_{n}^{\prime \,(\theta)} F(t\_{0}, \mathbf{x}\_{0}) + \sum\_{i=1}^{n} a\_{n-i}^{(\theta)} F(t\_{i}, \mathbf{x}\_{i}) \right),$$

with *a* (*θ*) <sup>0</sup> = 1 and

$$a\_n^{(\theta)} = (n-1)^{\theta+1} - n^{\theta}(n-\theta-1), \qquad a\_n^{(\theta)} = (n-1)^{\theta+1} - 2n^{\theta+1} + (n+1)^{\theta+1};$$

• Predictor–Corrector Product Integration rule (PI12)

$$\mathbf{x}\_n^P = \mathbf{x}\_0 + h^\theta \sum\_{i=0}^{n-1} b\_{n-i-1}^{(\theta)} F(t\_i, \mathbf{x}\_i),$$

$$\mathbf{x}\_n = \mathbf{x}\_0 + h^\theta \frac{1}{\Gamma(\theta + 2)} \left( a\_n^{-(\theta)} F(t\_0, \mathbf{x}\_0) + \sum\_{i=1}^{n-1} a\_{n-i}^{(\theta)} F(t\_i, \mathbf{x}\_i) + a\_0^{(\theta)} F(t\_n, \mathbf{x}\_n^P) \right);$$

• Fractional Backward Differentiation Formula (FLMM2)

$$\mathbf{x}\_{n} = \mathbf{x}\_{0} + h^{\theta} \sum\_{i=0}^{n} (-1)^{n-i} \binom{-\theta}{n-i} F(t\_{i}, \mathbf{x}\_{i}), \qquad \text{where} \quad \binom{-\theta}{n-i} = \frac{\Gamma(1-\theta)}{\Gamma(n+1)\Gamma(-\theta - n + 1)}.$$

All these numerical schemes have been implemented in Matlab routines as given in [30,31]. For the problem at hand, the performance of these schemes can be evaluated only qualitatively, due to the lack of an exact (closed form) solution, by comparing their behavior for decreasing timesteps; however, we refer the reader to the above cited literature for an exhaustive assessment of their results on several test cases.

#### *5.1. Preliminary Assessment*

As a preliminary test, we checked the results of all methods in reproducing trajectories when *θ* = 1, because these results can be directly compared with the reference solutions given by classical ODE solvers. Even if the considered methods are all devised for fractionalorder systems, they are expected to reproduce a good numerical approximation of the solution also for the integer order case. When the parameter settings (*σ* = 3, *α* = 0.3, *k* = 5) lead to a stable coexistence equilibrium, all methods are able to reproduce the expected trajectories: the left panel of Figure 1 shows the convergence of all numerical approximations of *n*(*t*) towards the equilibrium value *n*<sup>∗</sup> ≈ 0.66. On the other hand, when Hopf instability occurs (*σ* = 3, *α* = 10, *k* = 0.8), the simplest PI1 method shows a sensible damping of the oscillations around the equilibrium *n*<sup>∗</sup> ≈ 0.188, as shown in the right panel of the same Figure, while the other methods' trajectories are practically indistinguishable from the reference one (as reported in [14]).

**Figure 1.** Trajectories for the numerical approximations PI1 and PI2 of *n*(*t*) solution of system (3) for *θ* = 1 in case of a stable equilibrium (**left panel**) and in the presence of Hopf instability (**right panel**). The lowest order approximation PI1 clearly shows damped oscillations, while the trajectories of the higher order approximations are in excellent agreement with the ones obtained by classical ODE solvers.

#### *5.2. Stabilizing Effect of the Persistent Memory*

As a second test case, we consider a parameter setting for which the corresponding ODE system shows instability: if we choose as before *σ* = 3, *α* = 10, *k* = 0.8, simply considering a slightly lower value for *θ* (*θ* = 0.9) has a powerful stabilizing effect. As Figure 2 shows, all the numerical approximations agree in a fast damping of the oscillations for both

the variables. Again, the damping is more evident in the lower order method PI1, while the other considered schemes agree in accurately reconstructing the system trajectories. To further analyze the different performance of the considered numerical methods, we present in the following Figure 3 a detail of the first part of the same trajectories as reconstructed by any of the numerical schemes for decreasing time steps *h*<sup>1</sup> = 2−4, *h*<sup>2</sup> = 2−5, *h*<sup>3</sup> = 2−6, *h*<sup>4</sup> = 2−7. The less accurate results of PI1 with respect to the other schemes can be clearly seen. Of course, lower values for the fractional-order *θ* result in an even faster damping of the oscillations in the trajectories, confirming the strong stabilizing effect of the persistent memory in the model. For the same test case, Figure 4 compares the numerical approximations by PI2 of the populations' trajectories for different values of the order *θ* (0.95, 0.9, 0.85, 0.8), confirming the strong stabilizing effect of the fractional derivative.

**Figure 2.** Trajectories for all the numerical approximations of the solutions of system (3) in case *θ* = 0.9. While the corresponding ODE systems show instability (see the right panel of Figure 1), the trajectories of the fractional system rapidly stabilizes for both variables (*n*(*t*) shown in the left panel, *p*(*t*) in the right one). The lowest order approximation PI1 clearly shows more damped oscillations, while the higher order approximations are all in excellent agreement.

**Figure 3.** Details of the initial part of the trajectories shown in the left panel of Figure 2 as reconstructed by all the numerical methods for decreasing timesteps *h*1, *h*2, *h*3, *h*4. The lowest order approximation PI1 clearly shows more damped oscillations, and only for the smallest timestep it agrees with the other methods, whose results are practically identical.

**Figure 4.** Trajectories for the numerical approximations PI2 of the solutions of system (3) for different *θ* values (0.95, 0.9, 0.85, 0.8) for both populations (*n*(*t*) shown in the left panel, *p*(*t*) in the right one). The stabilizing effect of the fractional operators can be clearly seen.

#### *5.3. Hopf Instability for the Fractional System*

As a third example, we show how the instability can still occur for the fractional system with a suitable choice of the parameters. We start by considering a parameter setting for which, as stated in Section 4, the conditions for the Hopf instability to appear are met: we set *<sup>σ</sup>* = 3, *<sup>α</sup>* = 3.5, *<sup>k</sup>* = 5. In this case, it is *<sup>I</sup>* <sup>&</sup>gt; 0 and *<sup>I</sup>*<sup>2</sup> <sup>−</sup> <sup>4</sup>*<sup>A</sup>* <sup>&</sup>lt; 0. The eigenvalues of the Jacobian are a couple of complex conjugate numbers with a positive real part so that for *θ* > *θ*∗ = arctan 2/*π* √ 4*A* − *I*2/*I* ≈ 0.89 the coexistence equilibrium *E*<sup>∗</sup> ≈ (0.27, 0.77) loses its stability. The following Figure 5 shows the trajectories of (*n*(*t*), *p*(*t*)) when *θ* = 0.92 as reconstructed by the more accurate numerical algorithms and the corresponding phase plan portrait, clearly showing the appearance of a limit cycle.

**Figure 5.** Numerical approximation PI12 of the solutions of system (3) confirming the Hopf instability predicted by the theory in case *θ* = 0.92 with *σ* = 3, *α* = 3.5, *k* = 5. In the left panel, the trajectories for both variables are reported as functions of time; the right panel shows the limit cycle in the phase plan.

#### **6. Conclusions**

Fractional calculus, implying non-locality and memory effects, allows the description of numerous phenomena in a wide variety of scientific domains. Fractional-order operators have been proved to be a very powerful modeling instrument to represent a variety of processes and biological systems. In this framework, we studied in this paper the dynamic behavior of a fractional-in-time prey–predator model with hunting cooperation. The existence and uniqueness of a non-negative solution has been proved and the local stability of the coexistence equilibrium point has been analyzed by using the linearization technique. The conditions for the occurrence of a Hopf bifurcation have been found. Finally, numerical simulations have been presented to confirm by some selected examples how the order of derivation *θ* affects the dynamical behavior of prey and predator density. The findings of

the present study, in our opinion, can provide hints for the investigation of the dynamics of other processes describing real-world phenomena. As a final consideration, we outline that several authors investigated pattern formation in the related spatial fractional-order systems. Recently, Yin and Wen [32] have shown that fractional-derivative systems can result in persistent spatial patterns even though their ODE counterpart does not induce any steady pattern. Such mechanisms of pattern formation along with the ones induced by anomalous diffusion suggest further directions of research in spatial generalizations of the considered model.

**Author Contributions:** Conceptualization, M.F.C. and I.T.; formal analysis, I.T.; numerical simulations, M.F.C.; writing (original draft preparation, review and editing), M.F.C. and I.T.; funding acquisition, M.F.C. and I.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially supported by Regione Campania Projects REMIAM, ADVISE and MEDIA.

**Acknowledgments:** This paper has been performed under the auspices of the G.N.F.M. and G.N.C.S. of INdAM.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Quadratic Mean Field Games Model for the Langevin Equation**

**Fabio Camilli**

Dipartimento di Scienze di Base e Applicate per l'Ingegneria, Sapienza Università di Roma, Via Scarpa 16, 00161 Roma, Italy; fabio.camilli@uniroma1.it

**Abstract:** We consider a Mean Field Games model where the dynamics of the agents is given by a controlled Langevin equation and the cost is quadratic. An appropriate change of variables transforms the Mean Field Games system into a system of two coupled kinetic Fokker–Planck equations. We prove an existence result for the latter system, obtaining consequently existence of a solution for the Mean Field Games system.

**Keywords:** langevin equation; Mean Field Games system; kinetic Fokker–Planck equation; hypoelliptic operators

**MSC:** 35K40; 91A16

#### **1. Introduction**

The Mean Field Games (MFG in short) theory concerns the study of differential games with a large number of rational, indistinguishable agents and the characterization of the corresponding Nash equilibria. In the original model introduced in [1,2], an agent can typically act on its velocity (or other first order dynamical quantities) via a control variable. Mean Field Games where agents control the acceleration have been recently proposed in [3–5].

A prototype of stochastic process involving acceleration is given by the Langevin diffusion process, which can be formally defined as

$$
\ddot{X}(t) = -b(X(t)) + \sigma \dot{B}(t),
\tag{1}
$$

where *X*¨ is the second time derivative of the stochastic process *X*, *B* a Brownian motion and *σ* a positive parameter. The solution of (1) can be rewritten as a Markov process (*X*, *V*) solving

$$\begin{cases} \begin{aligned} \dot{X}(t) &= V(t), \\ \dot{V}(t) &= -b(X(t)) + \sigma \mathcal{B}(t). \end{aligned} \end{cases}$$

The probability density function of the previous process satisfies the kinetic Fokker– Planck equation

$$
\partial\_t p - \frac{\sigma^2}{2} \Delta\_\upsilon p - b(\mathbf{x}) \cdot D\_\upsilon p + \upsilon \cdot D\_\mathbf{x} p = 0 \qquad \text{in} \quad (0, \infty) \times \mathbb{R}^d \times \mathbb{R}^d.
$$

The previous equation, in the case *b* ≡ 0, was first studied by Kolmogorov [6] who provided an explicit formula for its fundamental solution. Then considered by Hörmander [7] as motivating example for the general theory of the hypoelliptic operators (see also [8–10]).

**Citation:** Fabio Camilli A Quadratic Mean Field Games Model for the Langevin Equation. *Axioms* **2021**, *10*, 68. https://doi.org/10.3390/ axioms10020068

Academic Editor: Gabriella Bretti

Received: 10 January 2021 Accepted: 16 April 2021 Published: 19 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

We consider a Mean Field Games model where the dynamics of the single agent is given by a controlled Langevin diffusion process, i.e.,

$$\begin{cases} \dot{X}(s) = V(s), & s \ge t \\ \dot{V}(s) = -b(X(s)) + a(s) + \sigma \mathcal{B}(s) & s \ge t \\ X(t) = x, \; V(t) = v \end{cases} \tag{2}$$

for (*t*, *<sup>x</sup>*, *<sup>v</sup>*) <sup>∈</sup> [0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*. In (2), the control law *<sup>α</sup>* : [*t*, *<sup>T</sup>*] <sup>→</sup> <sup>R</sup>*d*, which is a progressively measurable process with respect to a fixed filtered probability space such that E[ *T <sup>t</sup>* |*α*(*t*)| <sup>2</sup>*dt*] < +∞, is chosen to *maximize* the functional

$$J(t, \mathbf{x}, \mathbf{v}; \mathfrak{a}) = \mathbb{E}\_{\mathfrak{t}, \mathbf{(x}, \mathbf{v})} \left\{ \int\_{t}^{T} \left[ f(\mathbf{X}(\mathbf{s}), V(\mathbf{s}), m(\mathbf{s})) - \frac{1}{2} |\mathbf{a}(\mathbf{s})|^{2} \right] d\mathbf{s} \right\}$$

$$+ \mu\_{T}(\mathbf{X}(T), V(T)) \Big|\_{\mathbf{v}}$$

where *m*(*s*) is the distribution of the agents at time *s*. Let *u* the value function associated with the previous control problem, i.e.,

$$\mu(t, \mathfrak{x}, \upsilon) = \sup\_{\mathfrak{a} \in \mathcal{A}\_{\mathfrak{f}}} \{ f(t, \mathfrak{x}, \upsilon; \mathfrak{a}) \}$$

where A*<sup>t</sup>* is the the set of the control laws. Formally, the couple (*u*, *m*) satisfies the MFG system (see Section 4.1 in [3] for more details)

$$\begin{cases} \partial\_t \boldsymbol{\mu} + \frac{\sigma^2}{2} \Delta\_v \boldsymbol{\mu} - b(\mathbf{x}) \cdot \boldsymbol{D}\_v \boldsymbol{\mu} + \boldsymbol{\upsilon} \cdot \boldsymbol{D}\_x \boldsymbol{\mu} + \frac{1}{2} |D\_v \boldsymbol{\mu}|^2 = -f(\mathbf{x}, \boldsymbol{\upsilon}, m) \\\\ \partial\_t m - \frac{\sigma^2}{2} \Delta\_v m - b(\mathbf{x}) \cdot \boldsymbol{D}\_v m + \boldsymbol{\upsilon} \cdot \boldsymbol{D}\_x m + \operatorname{div}\_v(m \boldsymbol{D}\_v \boldsymbol{\mu}) = 0 \\\\ m(\boldsymbol{0}, \mathbf{x}, \boldsymbol{\upsilon}) = m\_0(\mathbf{x}, \boldsymbol{\upsilon}), \quad \boldsymbol{u}(T, \mathbf{x}, \boldsymbol{\upsilon}) = \boldsymbol{u}\_T(\mathbf{x}, \boldsymbol{\upsilon}). \end{cases} \tag{3}$$

for (*t*, *<sup>x</sup>*, *<sup>v</sup>*) <sup>∈</sup> (0, *<sup>T</sup>*) <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*. The first equation is a backward Hamilton–Jacobi–Bellman equation, degenerate in the *x*-variable and with a quadratic Hamiltonian in the *v* variable, and the second equation is forward kinetic Fokker–Planck equation. In the standard setting, MFG systems with quadratic Hamiltonians has been extensively considered in literature both as a reference model for the general theory and also since, thanks to the Hopf-Cole change of variable, the nonlinear Hamilton-Jacobi-Bellman equation can be transformed into a linear equation, allowing to use all the tools developed for this type of problem (see for example [2,11–15]). Recently, a similar procedure has been used for ergodic hypoelliptic MFG with quadratic cost in [16] and for a flocking model involving kinetic equations in Section 4.7.3 of [17].

We study (3) by means of a change of variable introduced in [11,14] for the standard case. By defining the new unknowns *φ* = *eu*/*σ*<sup>2</sup> and *ψ* = *me*−*u*/*σ*<sup>2</sup> , the system (3) is transformed into a system of two kinetic Fokker–Planck equations

$$\begin{cases} \begin{aligned} \partial\_t \phi + \frac{\sigma^2}{2} \Delta\_\upsilon \phi - b(\mathbf{x}) \cdot D\_\upsilon \phi + \upsilon \cdot D\_\mathbf{x} \phi &= -\frac{1}{\sigma^2} f(\mathbf{x}, \upsilon, \psi \phi) \phi \\\ \partial\_t \psi - \frac{\sigma^2}{2} \Delta\_\upsilon \psi - b(\mathbf{x}) \cdot D\_\upsilon \psi + \upsilon \cdot D\_\mathbf{x} \psi &= \frac{1}{\sigma^2} f(\mathbf{x}, \upsilon, \psi \phi) \psi \\\ \psi(0, \mathbf{x}, \upsilon) = \frac{m\_0(\mathbf{x}, \upsilon)}{\Phi(0, \mathbf{x}, \upsilon)}, \quad \phi(T, \mathbf{x}, \upsilon) = \varepsilon^{\frac{\underline{w}\_1(\mathbf{x}, \upsilon)}{\sigma^2}} .\end{aligned} \tag{4}$$

for (*t*, *<sup>x</sup>*, *<sup>v</sup>*) <sup>∈</sup> (0, *<sup>T</sup>*) <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*. In the previous problem, the coupling between the two equations is only in the source terms. Following [14], we prove existence of a weak solution to (4) by showing the convergence of an iterative scheme defined, starting from *<sup>ψ</sup>*(0) <sup>≡</sup> 0, by solving alternatively the backward problem

$$\begin{cases} \partial\_t \phi^{(k+\frac{1}{2})} + \frac{\sigma^2}{2} \Delta\_v \phi^{(k+\frac{1}{2})} - b(\mathbf{x}) \cdot D\_v \phi^{(k+\frac{1}{2})} + \mathbf{v} \cdot D\_x \phi^{(k+\frac{1}{2})} \\\\ \qquad \qquad \qquad = -\frac{1}{\sigma^2} f(\psi^{(k)} \phi^{(k+\frac{1}{2})}) \phi^{(k+\frac{1}{2})} \\\\ \phi^{(k+\frac{1}{2})}(T, \mathbf{x}, \mathbf{v}) = \varepsilon^{\frac{\mu\_T(\mathbf{x}, \mathbf{v})}{\sigma^2}} \end{cases} \tag{5}$$

and the forward one

$$\begin{cases} \begin{aligned} \partial\_t \psi^{(k+1)} - \frac{\sigma^2}{2} \Delta\_v \psi^{(k+1)} \ -b(\mathbf{x}) \cdot D\_v \psi^{(k+1)} + \mathbf{v} \cdot D\_x \psi^{(k+1)} \\ \end{aligned} \\ \begin{aligned} \quad &= \frac{1}{\sigma^2} f(\psi^{(k+1)} \phi^{(k+\frac{1}{2})}) \psi^{(k+1)} \\ \end{aligned} \end{cases} \tag{6}$$

We show that the resulting sequence (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *<sup>ψ</sup>*(*k*+1)), *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>, monotonically converges to the solution of (4). Hence, by the inverse change of variable (see again [11,14] for details)

$$
\mu = \frac{\ln(\phi)}{\sigma^2}, \qquad m = \phi \psi,\tag{7}
$$

we obtain a solution of the original problem (3). We have

**Theorem 1.** *The sequence* (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *<sup>ψ</sup>*(*k*+1)) *defined by* (5) *and* (6) *converges in <sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> R*d*) *and a.e. to a weak solution* (*φ*, *ψ*) *of* (4)*. Moreover, the couple* (*u*, *m*) *defined by* (7) *is a weak solution to* (3)*.*

The main difficulty in the study of problems (3) and (4) is due both in the degeneracy of the second order operator with respect to *x* and in the unbounded dependence of the coefficients of the first order terms with respect to *v*. To overcome the previous difficulties we rely on the results for linear kinetic Fokker–Planck equations developed in [18]. We mention that existence of weak solutions for the standard MFG problem, possibly degenerate, has been studied in [19], but the results in this paper do not cover the present setting. The previous iterative procedure also suggests a monotone numerical method for the approximation of (4), hence for (3). Indeed, by approximating (5) and (6) by finite differences and solving alternatively the resulting discrete equations, we obtain an approximation of the sequence (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *ψ*(*k*+1)). A corresponding procedure for the standard quadratic MFG system was studied in [14], where the convergence of the method is proved. We plan to study the properties of the previous numerical procedure in a future work.

#### **2. Well Posedness of the Kinetic Fokker–Planck System**

In this section, we study the existence of a solution to system (4). The proof of the result follows the strategy implemented in Section 2 of [14] for the case of a standard MFG system with quadratic Hamiltonian and relies on the results for linear kinetic Fokker– Planck equations in Appendix A of [18]. We remark the model here studied does not fit exactly the problem treated in [18] because of the presence of a zero order term in the Fokker-Planck equation. Hence some technical aspects should be analyzed in more detail, however the present paper is mainly intended to give some idea on the change of variabile for the kinetic MGF.

We fix the assumptions we will assume in the whole paper. The vector field *<sup>b</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup>* and the coupling cost *<sup>f</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup> <sup>→</sup> <sup>R</sup> are assumed to satisfy

$$b \in L^{\infty}(\mathbb{R}^d),$$

$$f \in L^{\infty}(\mathbb{R}^d \times \mathbb{R}^d \times \mathbb{R}), f \le 0 \text{ and } f(x, v, \cdot) \text{ strictly decreasing.}$$

Moreover, the diffusion coefficient *σ* is positive and the initial and terminal data satisfy

$$\begin{aligned} m\_0 \in L^\infty(\mathbb{R}^d \times \mathbb{R}^d), \; m\_0 \ge 0, \; \iint m\_0(\mathbf{x}, \mathbf{v}) d\mathbf{x} d\mathbf{v} = 1, \\ \text{and } \exists \; R\_0 > 0 \,\text{s.t. } \text{supp}\{m\_0\} \subset \mathbb{R}^d \times B(0, R\_0) \end{aligned} \tag{8}$$

and

$$\begin{aligned} \mu\_T \in \mathbb{C}^0(\mathbb{R}^d \times \mathbb{R}^d) \text{ and } \exists \, \mathbb{C}\_0, \mathbb{C}\_1 > 0 \text{ s.t. } \forall (\mathbf{x}, \boldsymbol{\upsilon}) \in \mathbb{R}^d \times \mathbb{R}^d\\ -\mathbb{C}\_0(|\boldsymbol{\upsilon}|^2 + |\mathbf{x}|) - \mathbb{C}\_0 \le \mu\_T(\mathbf{x}, \boldsymbol{\upsilon}) \le -\mathbb{C}\_1(|\boldsymbol{\upsilon}|^2 + |\mathbf{x}|) + \mathbb{C}\_1. \end{aligned} \tag{9}$$

Note that (9) implies that *euT*/*σ*<sup>2</sup> <sup>∈</sup> *<sup>L</sup>*∞(R*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) <sup>∩</sup> *<sup>L</sup>*2(R*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*). We denote with (·, ·) the scalar product in *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) and with ·, · the pairing between <sup>X</sup> <sup>=</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup> <sup>x</sup>*; *<sup>H</sup>*1(R*<sup>d</sup> <sup>v</sup>*)) and its dual <sup>X</sup> <sup>=</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup> <sup>x</sup>*; *<sup>H</sup>*−1(R*<sup>d</sup> v*)). We define the following functional space

$$\mathcal{Y} = \left\{ \mathcal{g} \in L^2([0, T] \times \mathbb{R}\_{\ge \prime}^d, H^1(\mathbb{R}\_v^d)), \partial\_t \mathcal{g} + \boldsymbol{\upsilon} \cdot D\_\mathbf{x} \boldsymbol{\mathfrak{g}} \in L^2([0, T] \times \mathbb{R}\_{\ge \prime}^d, H^{-1}(\mathbb{R}\_v^d)) \right\}$$

and we set Y<sup>0</sup> = {*g* ∈ Y : *g* ≥ 0}. If *g* ∈ Y, then it admits (continuous) trace values *<sup>g</sup>*(0, *<sup>x</sup>*, *<sup>v</sup>*), *<sup>g</sup>*(*T*, *<sup>x</sup>*, *<sup>v</sup>*) <sup>∈</sup> *<sup>L</sup>*2(R*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) (see [18, Lemma A.1]) and therefore the initial/terminal conditions for (4) are well defined in *L*<sup>2</sup> sense. We first prove the well posedness of problems (5) and (6).

#### **Proposition 2.** *We have*

*(i) For any ψ* ∈ Y0*, there exists a unique solution φ* ∈ Y<sup>0</sup> *to*

$$\begin{cases} \begin{aligned} \partial\_t \phi + \frac{\upsilon^2}{2} \Lambda\_\upsilon \phi - b(\mathbf{x}) \cdot D\_\upsilon \phi + \upsilon \cdot D\_\mathbf{x} \phi &= -\frac{1}{\sigma^2} f(\mathbf{x}, \upsilon, \psi \phi) \phi \\\ \phi(T, \mathbf{x}, \upsilon) = \varepsilon^{\frac{\mu\_\Gamma(\mathbf{x}, \upsilon)}{\sigma^2}} . \end{aligned} \end{cases} \tag{10}$$

*Moreover, <sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*∞([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) *and, for any <sup>R</sup>* <sup>&</sup>gt; <sup>0</sup>*, there exist <sup>δ</sup><sup>R</sup>* <sup>∈</sup> <sup>R</sup> *and <sup>ρ</sup>* <sup>&</sup>gt; <sup>0</sup> *such that*

$$\phi(t, \mathbf{x}, \boldsymbol{\upsilon}) \ge \mathbb{C}\_{\mathbb{R}} := e^{\frac{1}{\sigma^2} (\delta \mathbb{R} - \rho T)} \quad \forall t \in [0, T], \; (\mathbf{x}, \boldsymbol{\upsilon}) \in \mathcal{B}(0, \mathbb{R}) \subset \mathbb{R}^d \times \mathbb{R}^d. \tag{11}$$

*(ii) Let* Φ : Y<sup>0</sup> → Y<sup>0</sup> *be the map which associates to ψ the unique solution of* (10)*. Then, if ψ*<sup>2</sup> ≤ *ψ*1*, we have* Φ(*ψ*2) ≥ Φ(*ψ*1)*.*

**Proof.** We first prove existence of a solution to the nonlinear problem (10) by a fixed point argument exploiting the results for the corresponding linear problem proved in [18]. Fixed *<sup>ψ</sup>* ∈ Y0, consider the map *<sup>F</sup>* <sup>=</sup> *<sup>F</sup>*(*ϕ*) from *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) into itself that associates with *<sup>ϕ</sup>* the weak solution *<sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) of the linear problem

$$\begin{cases} \begin{array}{l} \partial\_t \phi + \frac{\varrho^2}{2} \Delta\_{\upsilon} \phi - b(\mathbf{x}) \cdot D\_{\upsilon} \phi + \upsilon \cdot D\_x \phi = -\frac{1}{\varrho^2} f(\psi \rho) \phi \\\\ \phi(T, \mathbf{x}, \upsilon) = \varepsilon^{\frac{\mathrm{tr}\_T(\mathbf{x}, \upsilon)}{\varrho^2}} .\end{array} \end{cases} \tag{12}$$

By Prop. A.2 of [18], *φ* belongs to Y and it coincides with the unique solution of (12) in this space. Moreover, the following estimate

$$\|\|\boldsymbol{\Phi}\|\|\_{L^{2}([0,T]\times\mathbb{R}^{d}\_{x};H^{1}(\mathbb{R}^{d}\_{v}))} + \|\|\partial\_{t}\boldsymbol{\Phi} + \boldsymbol{\upsilon} \cdot \boldsymbol{D}\_{x}\boldsymbol{\Phi}\|\|\_{L^{2}([0,T]\times\mathbb{R}^{d}\_{x};H^{-1}(\mathbb{R}^{d}\_{v}))} \leq \mathbb{C} \tag{13}$$

holds for some constant *<sup>C</sup>* which depends only on *euT*/*σ*<sup>2</sup> *L*<sup>2</sup> , *f L*<sup>∞</sup> and *σ*. Hence *F* maps *BC*, the closed ball of radius *<sup>C</sup>* of *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*), into itself.

To show that the map *<sup>F</sup>* is continuous on *BC*, consider {*ϕn*}*n*∈N, *<sup>ϕ</sup>* <sup>∈</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) such that *ϕ<sup>n</sup>* <sup>−</sup> *<sup>ϕ</sup>L*<sup>2</sup> <sup>→</sup> 0 and set *<sup>φ</sup><sup>n</sup>* <sup>=</sup> *<sup>F</sup>*(*ϕn*). Then *<sup>φ</sup><sup>n</sup>* ∈ Y, and, by the estimate (13), we get that, up to a subsequence, there exists *φ* ∈ Y such that *φ<sup>n</sup>* → *φ*, *Dvφ<sup>n</sup>* → *Dvφ*

in *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*), *<sup>∂</sup>tφ<sup>n</sup>* <sup>+</sup> *<sup>v</sup>* · *Dxφ<sup>n</sup>* <sup>→</sup> *<sup>∂</sup>tφ<sup>n</sup>* <sup>+</sup> *<sup>v</sup>* · *Dxφ<sup>n</sup>* in *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup> <sup>x</sup>*; *<sup>H</sup>*−1(R*<sup>d</sup> <sup>v</sup>*)). Moreover, *ϕ<sup>n</sup>* → *ϕ* almost everywhere. By the definition of weak solution to (12), we have that

$$\langle \partial\_t \phi\_\hbar + v \cdot D\_\mathbf{x} \phi\_\hbar, w \rangle - \frac{\sigma^2}{2} (D\_\mathbf{v} \phi\_\hbar, D\_\mathbf{v} w) - (b \cdot D\_\mathbf{v} \phi\_\hbar, w) = (-\frac{1}{\sigma^2} \phi\_\hbar f(\phi\_\hbar \psi), w), \tag{14}$$

for any *<sup>w</sup>* ∈ D([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*), the space of infinite differentiable functions with compact support in [0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*. Employing weak convergence for left hand side of (14) and the Dominated Convergence Theorem for the right hand one, we get for *n* → ∞

$$\left(\partial\_t \overline{\phi} + v \cdot D\_x \overline{\phi}, w\right) - \frac{\sigma^2}{2} (D\_v \overline{\phi}, D\_v w) - (b \cdot D\_v \overline{\phi}, w) = (-\overline{\phi} f(\varphi \psi), w)^2$$

for any *<sup>w</sup>* ∈ D([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*). Hence *<sup>φ</sup>* <sup>=</sup> *<sup>F</sup>*(*ϕ*) and *<sup>F</sup>*(*ϕn*) <sup>→</sup> *<sup>F</sup>*(*ϕ*) for *<sup>n</sup>* <sup>→</sup> <sup>∞</sup> in *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*). The compactness of the map *<sup>F</sup>* in *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) follows by the compactness of the set of the solutions to (12), see Theorem 1.2 of [20]. We conclude, by Schauder's Theorem, that there exists a fixed-point of the map *<sup>F</sup>* in *<sup>L</sup>*2, hence in <sup>Y</sup>, and therefore a solution to the nonlinear parabolic Equation (10).

Observe that, if *φ* is a solution of (10), then *φ*˜ = *eλ<sup>t</sup> φ* is a solution of

$$
\partial\_t \tilde{\Phi} + \frac{\sigma^2}{2} \Delta\_\upsilon \tilde{\Phi} - b(\mathbf{x}) \cdot D\_\upsilon \tilde{\Phi} + \upsilon \cdot D\_\mathbf{x} \tilde{\Phi} - \lambda \tilde{\Phi} = -\frac{1}{\sigma^2} f(e^{-\lambda t} \psi \tilde{\Phi}) \tilde{\Phi} \tag{15}
$$

with the corresponding final condition. In the following, we assume that *λ* > 0. To show that *φ* is non-negative, we will exploit the following property (see Lemma A.3 of [18]): given *φ* ∈ Y and defined *φ*<sup>±</sup> = max(±*φ*, 0), then *φ*<sup>±</sup> ∈ X and

$$\langle \partial\_t \phi + \boldsymbol{\upsilon} \cdot D\_x \phi, \boldsymbol{\phi}^- \rangle = \frac{1}{2} \left( \iint |\phi(0, \mathbf{x}, \boldsymbol{\upsilon})^-|^2 d\mathbf{x} d\boldsymbol{\upsilon} - \iint |\phi(T, \mathbf{x}, \boldsymbol{\upsilon})^-|^2 d\mathbf{x} d\boldsymbol{\upsilon} \right). \tag{16}$$

Let *φ* be a solution of (15), multiply the equation by *φ*− and integrate. Then, since *φ*(*T*, *x*, *v*) is non-negative, by (16) we get

$$\begin{aligned} -\frac{1}{\sigma^2} (\phi f(e^{\lambda t} \phi \psi), \phi^-) &= \langle \partial\_t \phi + v \cdot D\_x \phi, \phi^- \rangle - 1 \\ \frac{\sigma^2}{2} (D\_v \phi, D\_v \phi^-) - (b \cdot D\_v \phi, \phi^-) - \lambda (\phi, \phi^-) &= 0 \\ \frac{1}{2} \iint |\phi(0, x, v)^-|^2 dx dv + \frac{\sigma^2}{2} (D\_v \phi^-, D\_v \phi^-) + \lambda (\phi^-, \phi^-) &\ge 0 \\ \lambda (\phi^-, \phi^-). \end{aligned}$$

where it has been exploited that, by integration by parts, (*b* · *Dvφ*, *φ*−) = 0. Since *f* ≤ 0 and therefore

$$- (\phi f(e^{\lambda t} \phi \psi), \phi^-) = (\phi^- f(e^{\lambda t} \phi \psi), \phi^-) \le 0,$$

we get (*φ*−, *φ*−) ≡ 0, hence *φ* ≥ 0.

To prove the uniqueness of the solution to (10), consider two solutions *φ*1, *φ*<sup>2</sup> of (15) and set *φ* = *φ*<sup>1</sup> − *φ*2. Multiplying the equation for *φ* by *φ*, integrating and using *φ*(*x*, *v*, *T*) = 0, we get

$$-\frac{1}{\sigma^2} (f(e^{-\lambda t}\overline{\psi}\phi\_1)\phi\_1 - f(e^{-\lambda t}\overline{\psi}\phi\_2)\phi\_2, \phi\_1 - \phi\_2) = (\partial\_t \overline{\phi} + v \cdot D\_x \overline{\phi}, \overline{\phi}) - 1$$

$$\frac{\sigma^2}{2} (D\_v \overline{\phi}, D\_v \overline{\phi}) - (b \cdot D\_v \overline{\phi}, \overline{\phi}) - \lambda (\overline{\phi}, \overline{\phi}) = \lambda (\overline{\phi}, \overline{\phi})$$

$$-\frac{1}{2} \iint |\overline{\phi}(\mathbf{x}, v, 0)|^2 d\mathbf{x} dv - \frac{\sigma^2}{2} (D\_v \overline{\phi}, D\_v \overline{\phi}) - \lambda (\overline{\phi}, \overline{\phi}) \le -\lambda (\phi\_1 - \phi\_2, \phi\_1 - \phi\_2)$$

and, by the strict monotonicity of *f* , we conclude that *φ*<sup>1</sup> = *φ*2.

To prove that *φ* is bounded from above, we observe that the function *φ*(*t*, *x*, *v*) = *eC*1+(*T*−*t*) *<sup>f</sup>* ∞/*σ*<sup>2</sup> , where *C*<sup>1</sup> as in (9), is a supersolution of the linear problem (12) for any *<sup>ϕ</sup>* <sup>∈</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*), i.e., *<sup>φ</sup>*(*T*, *<sup>x</sup>*, *<sup>v</sup>*) <sup>≥</sup> *<sup>e</sup>uT*(*x*,*v*)/*σ*<sup>2</sup> and

$$
\partial\_t \overline{\mathfrak{J}} + \frac{\sigma^2}{2} \Delta\_v \overline{\mathfrak{J}} - b(\mathfrak{x}) \cdot D\_v \overline{\mathfrak{J}} + \upsilon \cdot D\_x \overline{\mathfrak{J}} \le -\frac{1}{\sigma^2} f(\psi \varrho) \overline{\mathfrak{J}}.
$$

By the Maximum Principle (see Prop. A.3 (i) in [18]), we get that *φ* ≥ *φ*, where *φ* is the solution of (12). Since the previous property holds for any *<sup>ϕ</sup>* <sup>∈</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*), we conclude that *φ* ≥ *φ*, where *φ* is the solution of the nonlinear problem (10).

A similar argument show that *φ*(*x*, *v*, *t*) = *e*(−*C*0(|*v*<sup>|</sup> <sup>2</sup>+|*x*|+1)−*ρ*(*T*−*t*))/*σ*<sup>2</sup> , where *C*<sup>0</sup> as in (9) and *<sup>ρ</sup>* sufficiently large, is a subsolution of (12) for any *<sup>ϕ</sup>* <sup>∈</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*). Indeed, replacing *φ* in the equation, we get that the inequality

$$\begin{split} \partial\_t \underline{\phi} + \frac{\sigma^2}{2} \Delta\_{\overline{v}} \underline{\phi} - b(\mathbf{x}) \cdot D\_{\overline{v}} \underline{\phi} + v \cdot D\_{\underline{x}} \underline{\phi} &= 0 \\ \underline{\phi} = \frac{\underline{\phi}}{\sigma^2} \left( \rho - \mathbb{C}\_0 d\sigma^2 + 2\mathbb{C}\_0^2 \sigma^2 |\underline{v}|^2 + 2\mathbb{C}\_0 b(\mathbf{x}) \cdot v - \mathbb{C}\_0 v \cdot \frac{\mathbf{x}}{|\mathbf{x}|} \right) &\geq 0 \\ -\frac{1}{\sigma^2} f(\Psi \underline{\phi}) \underline{\phi} &\geq 0 \end{split}$$

is satisfied for *<sup>ρ</sup>* large enough and, moreover, *<sup>φ</sup>* <sup>≤</sup> *<sup>e</sup>uT*(*x*,*v*)/*σ*<sup>2</sup> . Hence *φ* ≤ *φ*, where *φ* is the solution of the nonlinear problem (10), and, from this estimate, we deduce (11).

We finally prove the monotonicity of the map Φ. Set *φ<sup>i</sup>* = Φ(*ψi*), *i* = 1, 2, and consider the equation satisfied by *φ* = *eλ<sup>t</sup> <sup>φ</sup>*<sup>1</sup> <sup>−</sup> *<sup>e</sup>λ<sup>t</sup> <sup>φ</sup>*2, multiply it by *<sup>φ</sup>*<sup>+</sup> and integrate. Performing a computation similar to (17), we get

$$-\frac{1}{\sigma^2} (f(\phi\_1 \psi\_1)\phi\_1 - f(\phi\_2 \psi\_2)\phi\_2, \overline{\phi}^+) \le -\lambda(\overline{\phi}^+, \overline{\phi}^+).$$

Since, by monotonicity of *f* and non-negativity of *φi*, we have

$$\begin{aligned} -(f(\phi\_1 \psi\_1)\phi\_1 - f(\phi\_2 \psi\_2)\phi\_2, \overline{\phi}^+) &= -(f(\phi\_1 \psi\_1)(\phi\_1 - \phi\_2), \overline{\phi}^+) - \\ ((f(\phi\_1 \psi\_1) - f(\phi\_2 \psi\_2))\phi\_2, \overline{\phi}^+) &\ge 0, \end{aligned}$$

we get (*φ*+, *<sup>φ</sup>*+) = 0 and therefore *<sup>φ</sup>*<sup>1</sup> <sup>≤</sup> *<sup>φ</sup>*2.

We set

$$\mathcal{Y}\_{\mathbb{R}} = \{ \phi \in \mathcal{Y}\_0 : \phi \ge \mathbb{C}\_{\mathbb{R}} \quad \forall (x, v) \in B(0, \mathbb{R}), \ t \in [0, T] \},$$

where *CR* is defined as in (11).

**Proposition 3.** *Given R* > *R*0*, where R*<sup>0</sup> *as in* (8)*, we have*

*(i) For any φ* ∈ Y*R, there exists a unique solution ψ* ∈ Y<sup>0</sup> *to*

$$\begin{cases} \begin{aligned} \partial\_t \psi - \frac{\sigma^2}{2} \Delta\_v \psi - b(\mathbf{x}) \cdot D\_v \psi + \upsilon \cdot D\_x \psi = \frac{1}{\sigma^2} f(\mathbf{x}, \upsilon, \psi \phi) \psi \\\ \psi(0, \mathbf{x}, \upsilon) = \frac{m\_0(\mathbf{x}, \upsilon)}{\Phi(0, \mathbf{x}, \upsilon)}. \end{aligned} \end{cases} \tag{18}$$

*Moreover*

$$\psi(\mathbf{x}, \boldsymbol{\upsilon}, t) \le \frac{||m\_0||\_{L^\infty}}{\mathbb{C}\_R} \qquad \forall t \in [0, T], \ (\mathbf{x}, \boldsymbol{\upsilon}) \in \mathbb{R}^d \times \mathbb{R}^d,\tag{19}$$

*where CR as in* (11)*.*

*(ii) Let* Ψ : Y*<sup>R</sup>* → Y<sup>0</sup> *be the map which associates with φ* ∈ Y*<sup>R</sup> the unique solution of* (18)*. Then, if φ*<sup>2</sup> ≤ *φ*1*, we have* Ψ(*φ*2) ≥ Ψ(*φ*1)*.*

**Proof.** First observe that, since *R* > *R*0, then *ψ*(0, *x*, *v*) is well defined for *φ* ∈ Y*R*. The proof of the first part of (*i*) is very similar to the one of the corresponding result in Proposition 2, hence we only prove the bound (19). If *ψ* is a solution of (18), then *ψ*˜ = *e*−*λ<sup>t</sup> ψ* is a solution of

$$
\partial\_t \tilde{\boldsymbol{\psi}} - \frac{\sigma^2}{2} \Delta\_v \tilde{\boldsymbol{\psi}} - b(\mathbf{x}) \cdot D\_v \tilde{\boldsymbol{\psi}} + \boldsymbol{\upsilon} \cdot D\_x \boldsymbol{\psi} + \lambda \tilde{\boldsymbol{\psi}} = \frac{1}{\sigma^2} f(\mathbf{x}, \boldsymbol{\upsilon}, \varepsilon^{\lambda t} \tilde{\boldsymbol{\psi}} \boldsymbol{\phi}) \boldsymbol{\psi}.\tag{20}
$$

Let *<sup>ψ</sup>* be a solution of (20), set *<sup>ψ</sup>*¯ <sup>=</sup> *<sup>ψ</sup>* <sup>−</sup> *<sup>e</sup>*−*λ<sup>t</sup> m*0*L*<sup>∞</sup> /*CR* and observe that *<sup>ψ</sup>*¯(0) ≤ 0. Multiply the equation for *ψ*¯ by *ψ*¯<sup>+</sup> and integrate to obtain

$$\begin{aligned} \langle \psi f(e^{\lambda t}\psi \phi), \ddot{\psi}^+ \rangle &= 0\\ \langle \partial\_t \psi + v \cdot D\_x \psi, \ddot{\psi}^+ \rangle + \frac{1}{\sigma^2} (D\_v \psi, D\_v \psi^+) - (b(\mathbf{x}) D\_v \psi, \ddot{\psi}^+) + \lambda (\psi, \dot{\psi}^+) &\ge 0\\ \iint |\dot{\Psi}^+(\mathbf{x}, \mathbf{v}, T)|^2 d\mathbf{x} d\mathbf{v} + \lambda (\ddot{\Psi}^+, \ddot{\Psi}^+) &\ge \lambda (\ddot{\Psi}^+, \ddot{\Psi}^+). \end{aligned}$$

Since *ψ* ≥ 0 and *f* ≤ 0, we have

$$(\psi f(e^{\lambda t}\psi\phi), \psi^+) \le 0$$

and therefore *<sup>ψ</sup>*¯<sup>+</sup> <sup>≡</sup> 0. Hence the upper bound (19).

Now we prove *(ii)*. Set *ψ<sup>i</sup>* = Ψ(*φi*), *i* = 1, 2, and *ψ*¯ = *e*−*λ<sup>t</sup> <sup>ψ</sup>*<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*λ<sup>t</sup> ψ*2. Multiply the equation satisfied by *ψ*¯ by *ψ*¯<sup>+</sup> and integrate. Since, by monotonicity and negativity of *f* , we have

$$\begin{aligned} \left( f(e^{\lambda t} \phi\_1 \psi\_1) \psi\_1 - f(e^{\lambda t} \phi\_2 \psi\_2) \psi\_2, \bar{\psi}^+ \right) &= (f(e^{\lambda t} \phi\_1 \psi\_1)(\psi\_1 - \psi\_2), \bar{\psi}^+) + \\ & \quad (\psi\_2 (f(e^{-\lambda t} \phi\_1 \psi\_1) - f(e^{-\lambda t} \phi\_2 \psi\_2)), \bar{\psi}^+) \le 0. \end{aligned}$$

Then

$$\begin{split} 0 \geq \langle \partial\_t \bar{\psi} + \boldsymbol{\upsilon} \cdot D\_{\boldsymbol{x}} \bar{\boldsymbol{\psi}}, \bar{\boldsymbol{\psi}}^{+} \rangle + \frac{1}{\sigma^2} (D\_{\boldsymbol{\upsilon}} \bar{\boldsymbol{\psi}}\_{\boldsymbol{\nu}} D\_{\boldsymbol{\upsilon}} \bar{\boldsymbol{\psi}}^{+}) - (\boldsymbol{b}(\boldsymbol{x}) D\_{\boldsymbol{\upsilon}} \bar{\boldsymbol{\psi}}, \bar{\boldsymbol{\psi}}^{+}) + \lambda (\bar{\boldsymbol{\psi}}, \bar{\boldsymbol{\psi}}^{+}) \geq 0 \\ \int |\bar{\boldsymbol{\psi}}^{+} (\boldsymbol{x}, \boldsymbol{\upsilon}, \boldsymbol{T})|^{2} d\boldsymbol{x} d\boldsymbol{\upsilon} + \lambda (\bar{\boldsymbol{\psi}}^{+}, \bar{\boldsymbol{\psi}}^{+}) \geq \lambda (\bar{\boldsymbol{\psi}}^{+}, \bar{\boldsymbol{\psi}}^{+}) \cdot \boldsymbol{\varepsilon} \end{split}$$

Hence *<sup>ψ</sup>*¯<sup>+</sup> <sup>≡</sup> 0 and therefore *<sup>ψ</sup>*<sup>1</sup> <sup>≤</sup> *<sup>ψ</sup>*2.

**Proof of Theorem 1.** Given *<sup>ψ</sup>*(0) <sup>≡</sup> 0, consider the sequence (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *<sup>ψ</sup>*(*k*+1)), *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>, defined in (5) and (6). It can rewritten as

$$\begin{cases} \ \Phi^{(k+\frac{1}{2})} = \Phi(\psi^{(k)}) \\ \ \end{cases} \tag{21}$$
 
$$\Psi^{(k+1)} = \Psi(\phi^{(k+\frac{1}{2})}) $$

.

where the maps Φ, Ψ are as in Propositions 2 and, respectively 3. Observe that, by (11), we have *φ*(*k*<sup>+</sup> <sup>1</sup> <sup>2</sup> ) ∈ Y*<sup>R</sup>* for *<sup>R</sup>* <sup>&</sup>gt; *<sup>R</sup>*<sup>0</sup> and *<sup>ψ</sup>*(*k*+1) <sup>≥</sup> 0 for any *<sup>k</sup>*. Hence the sequence (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *ψ*(*k*+1)) is well defined. We first prove by induction the monotonicity of the components of (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *ψ*(*k*+1)). By non-negativity of solutions to (18), we have *ψ*(1) = Φ(*φ*( <sup>1</sup> <sup>2</sup> )) <sup>≥</sup> 0 and therefore *<sup>ψ</sup>*(1) <sup>≥</sup> *<sup>ψ</sup>*(0). Moreover, by the monotonicity of <sup>Φ</sup>, *<sup>φ</sup>*( <sup>3</sup> <sup>2</sup> ) <sup>=</sup> <sup>Φ</sup>(*ψ*(1)) <sup>≤</sup> <sup>Φ</sup>(*ψ*(0)) = *<sup>φ</sup>*( <sup>1</sup> 2 ) Now assume that *<sup>ψ</sup>*(*k*+1) <sup>≥</sup> *<sup>ψ</sup>*(*k*). Then

$$\phi^{(k+\frac{3}{2})} = \Phi(\psi^{(k+1)}) \le \Phi(\psi^{(k)}) = \phi^{(k+\frac{1}{2})}$$

and

$$
\psi^{(k+2)} = \Psi(\phi^{(k+\frac{3}{2})}) \ge \Psi(\phi^{(k+\frac{1}{2})}) = \psi^{(k+1)}.
$$

therefore the monotonicity of two sequences.

Since *φ*(*k*<sup>+</sup> <sup>1</sup> <sup>2</sup> ) <sup>≥</sup> 0 and, by (19), for *<sup>k</sup>* <sup>→</sup> <sup>∞</sup>, the sequence *<sup>ψ</sup>*(*k*+1) ≤ *m*0*L*<sup>∞</sup> /*CR*, (*φ*(*k*<sup>+</sup> <sup>1</sup> 2 ) , *<sup>ψ</sup>*(*k*+1)) converges a.e. and in *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) to a couple (*φ*, *<sup>ψ</sup>*). Taking into account the estimate (13), the a.e. convergence of the two sequences and repeating an argument similar to the one employed for the continuity of the map *F* in Proposition 2, we get that the couple (*φ*, *ψ*) satisfies, in weak sense, the first two equations in (4). The terminal condition for *φ* is obviously satisfied, while the initial condition for *ψ*, in *L*<sup>2</sup> sense, follows by convergence of *φ*(*k*<sup>+</sup> <sup>1</sup> <sup>2</sup> )(0) to *φ*(0).

We now consider the couple (*u*, *m*) given by the change of variable in (7). We first observe that, by Theorem 1.5 of [10], we have *<sup>∂</sup>t<sup>φ</sup>* <sup>+</sup> *<sup>v</sup>* · *Dxφ*, *Dvφ*, <sup>Δ</sup>*v<sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*2([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> R*d*) and a corresponding regularity for *ψ*. Taking into account the boundedness of *φ* and the estimate in (11), we have that *<sup>u</sup>*, *<sup>∂</sup>tu* <sup>+</sup> *<sup>v</sup>* · *Dxu*, *Dvu*, <sup>Δ</sup>*vu* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *loc*([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*). Hence we can write the equation for *u* in weak form, i.e.,

$$\left(\partial\_t u + v \cdot D\_{\overline{x}} u, w\right) - \frac{\sigma^2}{2} \left(D\_{\overline{v}} u, D\_{\overline{v}} w\right) - \left(b \cdot D\_{\overline{v}} u, w\right) + \frac{1}{2} \left(\left|D\_{\overline{v}} u\right|^2, w\right) = -\left(f(m), w\right),$$

for any *<sup>w</sup>* ∈ D([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*), with final datum in trace sense. In a similar way, since *<sup>m</sup>*, *<sup>∂</sup>tm* <sup>+</sup> *<sup>v</sup>* · *Dxm*, *Dvm*, <sup>Δ</sup>*vm* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *loc*([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) and *<sup>m</sup>* is locally bounded, we can rewrite also the equation for *m* in weak form, i.e.,

$$\left(\partial\_t m + \boldsymbol{v} \cdot D\_x m, \boldsymbol{w}\right) + \frac{\sigma^2}{2} \left(D\_v m, D\_v \boldsymbol{w}\right) - \left(\boldsymbol{b} \cdot D\_v m, \boldsymbol{w}\right) - \left(m D\_v \boldsymbol{u}\_\prime \, D \boldsymbol{w}\right) = 0, \boldsymbol{v}$$

for any *<sup>w</sup>* ∈ D([0, *<sup>T</sup>*] <sup>×</sup> <sup>R</sup>*<sup>d</sup>* <sup>×</sup> <sup>R</sup>*d*) with the initial datum in trace sense.

**Funding:** This research received no external funding.

**Acknowledgments:** The author wishes to thank Alessandro Goffi (Univ. di Padova) and Sergio Polidoro (Univ. di Modena e Reggio Emilia) for useful discussions.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **Non-Standard Discrete RothC Models for Soil Carbon Dynamics**

**Fasma Diele 1,\*, Carmela Marangi <sup>1</sup> and Angela Martiradonna 1,2**


**Abstract:** Soil Organic Carbon (SOC) is one of the key indicators of land degradation. SOC positively affects soil functions with regard to habitats, biological diversity and soil fertility; therefore, a reduction in the SOC stock of soil results in degradation, and it may also have potential negative effects on soil-derived ecosystem services. Dynamical models, such as the Rothamsted Carbon (RothC) model, may predict the long-term behaviour of soil carbon content and may suggest optimal land use patterns suitable for the achievement of land degradation neutrality as measured in terms of the SOC indicator. In this paper, we compared continuous and discrete versions of the RothC model, especially to achieve long-term solutions. The original discrete formulation of the RothC model was then compared with a novel non-standard integrator that represents an alternative to the exponential Rosenbrock–Euler approach in the literature.

**Keywords:** soil organic carbon; RothC; non-standard integrators; Exponential Rosenbrock–Euler

**MSC:** 34C60; 65L05; 65D30

#### **1. Introduction**

The United Nations Convention to Combat Desertification (UNCCD) is an international agreement, established in 1994, that links the environment and development with sustainable land management. The first objective indicated in the UNCCD 2018–2030 Strategic Framework is to improve the conditions of affected ecosystems, combat desertification/land degradation and promote sustainable land management [1]. For each country, the commitment is to achieve no net loss of land-based natural capital by 2030 [2]. No net loss means that the quantity and quality of land-based natural capital are maintained or increased, despite the impacts of global environmental change, whether due to human or natural causes. Land degradation is monitored through the changes of the values of a specific set of consistently measured indicators from their baseline quantities, conventionally identified as their initial values. The deviations from the baseline values of these indicators are the basis for monitoring land degradation.

Soil Organic Carbon (SOC) is one key indicator of land degradation [3]. Monitoring the SOC stocks and the loss of soil organic carbon due to land use changes is fundamental for maintaining the physical, chemical and biological quality of soil [4]. Soil organic carbon positively affects soil functions with regard to habitats, biological diversity, soil fertility, crop production potential, erosion control and water retention. A high SOC content improves the processes of soil formation, nutrient storage, water holding capacity and the absorption of organic or inorganic pollutants. Thus, a reduction of the SOC stock not only indicates soil degradation, but may also have potential negative effects on soil-derived ecosystem services.

Starting from the SOC baseline, predictive spatial modelling can simulate the carbon dynamics, estimate carbon sequestration under the actual land use and evaluate the

**Citation:** Diele, F.; Marangi, C.; Martiradonna, A. Non-Standard Discrete RothC Models for Soil Carbon Dynamics. *Axioms* **2021**, *10*, 56. https://doi.org/10.3390/ axioms10020056

Academic Editors: Ioannis Dassios and Clemente Cesarano

Received: 5 February 2021 Accepted: 17 March 2021 Published: 8 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

deviation from the baseline average value of total carbon [5]. Moreover, a dynamical model may determine the optimal potential land use pattern that is suitable to achieve land degradation neutrality in terms of SOC indicator values. Well-validated models, such as Rothamsted Carbon (RothC) [6], CENTURY [7] and MOMOS [8], which take into account the interactions among the climate, pedology, cropping systems and soil and crop management, can be used to predict SOC changes under different management practices and climatic conditions. These models are essentially compartmental, meaning that they represent soil organic matter as a few discrete compartments (generally two to five) characterized by different chemical characteristics of the soil's degradation. The decomposition rates, applied to each compartment, are governed by kinetic and stoichiometric laws and are mainly ruled by the environmental conditions (e.g., soil moisture level, aeration and soil temperature). These models are used in a variety of ways and often for long-term studies. Indeed, being able to compute and predict long-term solutions is extremely valuable for various reasons: it gives a synthetic view of the system in the given agro-climatic conditions; it makes it possible to test if a studied soil has reached equilibrium or not; and it allows envisioning what would be the consequences of specific events on a given soil [9]. In this paper, we compared continuous and discrete versions of the RothC model, especially regarding long-term solutions. Moreover, since the discrete RothC version can be interpreted as a first-order approximation of the continuous model, we introduced a non-standard discrete approximation that can be interpreted as a novel discrete model of soil carbon dynamics. The original discrete formulation of the RothC model was then compared to the novel non-standard integrator, which represents a different approach with respect to the Exponential Rosenbrock–Euler (ERE) discretization [10].

#### **2. Rothamsted Carbon Model—Continuous Formulation**

The SOC indicator is considered to be the result of the equilibrium between the inputs and outputs of the soil system. SOC contained in the organic matter is constantly built up and decomposed and is then released into the atmosphere as CO2 and recaptured through photosynthesis. Inputs in the soil organic matter decomposition model consist of two major components: living organisms' biomass (mainly plant roots and microbial biomass) and plant and animal residues at various stages of decomposition. Outputs result from the heterotrophic respiration processes when soil organic carbon is used as an energy source by soil organisms and returned to the atmosphere as CO2 fluxes. The Rothamsted Carbon model [6,11] (RothC) is a model of carbon turnover in non-waterlogged soils [12]. Initially developed for arable soils, it was later expanded to grasslands and forests. It takes into account the effects of temperature, moisture content and soil type. The RothC model divides the soil carbon into four active compartments and one inactive, characterized by different chemical decomposition rates of degradability (see Figure 1). The Inert Organic Matter (IOM) represents the inactive pool, resistant to decomposition, which does not receive carbon (C) inputs. At each time step, incoming plant residues are split between easily Decomposable Plant Material (DPM) and Resistant Plant Material (RPM), depending on the ratio *<sup>γ</sup>* <sup>1</sup> <sup>−</sup> *<sup>γ</sup>*, which estimates the decomposability of the particular plant material inputs, which in turn depends on the specific cultivation being considered. The fraction 2 *η* of the input of Farmyard Manure (FYM), if any, is equally split between the DPM and RPM compartments; the remaining part 1 − 2 *η* enters in the system directly as Humified organic matter (HUM). Both DPM and RPM decompose to form CO2, microbial Biomass (BIO) and more HUM. The fraction *α* + *β* of metabolised C incorporated into the sum of compartments BIO+HUM is determined by the clay content of the soil, while the remaining part 1 − *α* − *β* is released as CO2 and lost by the system. The BIO+HUM carbon content is then split into *<sup>α</sup> <sup>α</sup>* <sup>+</sup> *<sup>β</sup>* percent BIO and *<sup>β</sup> <sup>α</sup>* <sup>+</sup> *<sup>β</sup>* percent HUM. Finally, both BIO and HUM decompose to form more CO2, BIO and HUM.

**Figure 1.** Flow diagram of the Rothamsted Carbon (RothC) model. FYM, Farmyard Manure; DPM, Decomposable Plant Material; RPM, Resistant Plant Material; BIO, microbial Biomass; HUM, Humified organic matter; IOM, Inert Organic Matter.

The four active compartments undergo decomposition as a function of different rate constants, which correspond to the entries of the vector **k** = [*kdpm*, *krpm*, *kbio*, *khum*] and of the rate modifier *ρ*(*t*), which depends on the clay content of the soil, on climatic variables (rainfall, temperature, open pan evaporation) and land cover.

In real soil systems, processes involved in the RothC model are continuous in time, and thus, in [10], the author proposed the following continuous formulation:

$$
\dot{\mathbf{c}} = \rho(t)A\,\mathbf{c} + \mathbf{b}(t), \qquad \mathbf{c}(t\_0) = \mathbf{c}\_0 \ge 0 \tag{1}
$$

.

where **<sup>c</sup>**(*t*)=[*cdpm*(*t*), *crpm*(*t*), *cbio*(*t*), *chum*(*t*)]*<sup>T</sup>* and **<sup>c</sup>**<sup>0</sup> <sup>≥</sup> 0 denotes the vector of the initial concentrations. The matrix *A* is given by:

$$A = \begin{pmatrix} -k\_{dpm} & 0 & 0 & 0 \\\\ 0 & -k\_{rpm} & 0 & 0 \\\\ \alpha \, k\_{dpm} & \alpha \, k\_{rpm} & (\alpha - 1) \, k\_{\rm bio} & \alpha \, k\_{\rm hun} \\\\ \beta \, k\_{dpm} & \beta \, k\_{rpm} & \beta \, k\_{\rm bio} & (\beta - 1) \, k\_{\rm lum} \end{pmatrix}.$$

The vector **b**(*t*) represents the carbon amount entering the system at time *t*. It considers both the input of plant residues *g*(*t*) **a**(*g*) and the input of FYM *f*(*t*) **a**(*f*), so that:

$$\mathbf{b}(t) := \, \_\mathbf{g}(t) \, \mathbf{a}^{(\mathbf{g})} + f(t) \, \mathbf{a}^{(f)} \, .$$

The entries of vectors **<sup>a</sup>**(*g*) := [*γ*, 1 <sup>−</sup> *<sup>γ</sup>*, 0, 0] *<sup>T</sup>* and **<sup>a</sup>**(*f*) := [*η*, *<sup>η</sup>*, 0, 1 <sup>−</sup> <sup>2</sup> *<sup>η</sup>*] *<sup>T</sup>* are the fraction inputs 0 ≤ *γ* ≤ 1, 0 ≤ *η* ≤ 1/2, which sum up to one; the carbon of plant residues enters the soil only through the DPM and RPM compartments, while the carbon amount of FYM enters through the DPM, RPM and HUM compartments.

It can be shown that, under the hypothesis 0 < *α* + *β* ≤ 1 and *ρ*(*t*) > 0, the solution of the homogeneous part of the RothC system (1) with non-negative initial data verifies the more general assumptions stated in [13,14] for biochemical systems, which ensure the wellposedness and positivity of the solution. Comparison theorems guarantee the positivity of the solution of the complete RothC system when *g*(*t*), *f*(*t*) are assumed positive.

**Definition 1.** *We define as the SOC indicator of the continuous RothC model (1) the function SOC*(*t*) = *ciom* + *cdpm*(*t*) + *crpm*(*t*) + *cbio*(*t*) + *chum*(*t*) *for t* ≥ *t*0*, where ciom denotes the constant carbon content in the compartment IOM.*

**Theorem 1.** *Let α*, *β* > 0 *and δ* := 1 − *α* − *β* > 0*; suppose* 0 < *g*(*t*) + *f*(*t*) ≤ *B and ρ*(*t*) *are uniformly bounded from below by a constant μ* > 0*. The set:*

$$\Omega = \left\{ (\mathfrak{c}\_{\mathrm{dym}}, \mathfrak{c}\_{\mathrm{rpm}}, \mathfrak{c}\_{\mathrm{bio}}, \mathfrak{c}\_{\mathrm{hum}}) \in R\_+^4 : 0 \le (\mathfrak{c}\_{\mathrm{dym}} + \mathfrak{c}\_{\mathrm{rpm}} + \mathfrak{c}\_{\mathrm{bio}} + \mathfrak{c}\_{\mathrm{hum}}) \le \frac{B}{\mu \, \delta \, k\_{\mathrm{min}}} \right\}$$

*with kmin* = min*<sup>i</sup>* **k**(*i*) *is positively invariant and globally attractive for Model (1).*

**Proof.** By denoting *<sup>ω</sup>*(*t*) = *SOC*(*t*) <sup>−</sup> *ciom*, from System (1), and recalling that **<sup>e</sup>***<sup>T</sup>* **<sup>a</sup>**(*g*) <sup>=</sup> **e***<sup>T</sup>* **a**(*f*) = 1, we can show that:

$$\dot{\boldsymbol{\omega}} = \mathbf{e}^T \cdot \dot{\mathbf{c}} = \rho(t) \,\mathbf{e}^T A \,\mathbf{c} + \,\mathcal{g} \,\mathbf{e}^T \mathbf{a}^{(\mathcal{g})} + f \,\mathbf{e}^T \mathbf{a}^{(f)} = \mathcal{g}(t) + f(t) - \rho(t) \,\delta \,\mathbf{k}^T \,\mathbf{c} \tag{2}$$

where **e** = [1, 1, 1, 1] *<sup>T</sup>*. Consequently,

$$
\dot{\omega} \le \bar{\varrho}(t) + f(t) - \rho(t)\,\delta \, k\_{\min} \, \omega \le B - \mu \,\delta \, k\_{\min} \, \omega
$$

so that:

$$
\omega(t) \le \frac{B}{\mu \delta \, k\_{\rm min}} - \left[\frac{B}{\mu \, \delta \, k\_{\rm min}} - \omega(t\_0)\right] e^{-\mu \, \delta \, k\_{\rm min} \, (t - t\_0)} \dots
$$

Therefore, if *ω*(*t*0) ≤ *B μ δ kmin* , then *ω*(*t*) ≤ *B μ δ kmin* , with Ω, resulting in a positively

invariant set for model (1). Moreover, if *ω*(*t*0) ≥ *B μ δ kmin* , then lim*t*→<sup>∞</sup> *ω*(*t*) ≤ *B μ δ kmin* and Ω is globally attractive for Model (1).

Under the hypothesis of Theorem 1, the set:

$$
\Omega\_{\text{SOC}} = \left\{ \text{SOC} \in \mathbb{R}\_{+}, \quad c\_{\text{ion}} \le \text{SOC} \le c\_{\text{ion}} + \frac{B}{\mu \,\delta \, k\_{\text{min}}} \right\},
$$

is globally attractive for the SOC model:

$$\dot{\mathbf{S}} \dot{\mathbf{C}} \mathbf{C} = \mathbf{e}^T \dot{\mathbf{c}} = \rho(t) \mathbf{e}^T A \, \mathbf{c} + \mathbf{g}(t) + f(t). \tag{3}$$

Finally, to complete the understanding of the mathematical features of the RothC model, we provide the following theorem:

**Theorem 2.** *Suppose g*(*t*) *and f*(*t*) *are integrable on every finite subinterval of* [*t*0, +∞)*. If <sup>α</sup>* <sup>+</sup> *<sup>β</sup>* <sup>=</sup> <sup>1</sup>*, then SOC*(*t*) = *SOC*(*t*0) + *<sup>t</sup> t*0 (*g*(*s*) + *f*(*s*)) *ds.*

**Proof.** Under the assumption *<sup>α</sup>* <sup>+</sup> *<sup>β</sup>* <sup>=</sup> 1, the unit vector **<sup>e</sup>** <sup>∈</sup> ker(*AT*). We have *SOC*˙ <sup>=</sup> **e***<sup>T</sup>* **c˙** = *ρ* **e***<sup>T</sup> A* **c** + *g* **e***<sup>T</sup>* **a**(*g*) + *f* **e***<sup>T</sup>* **a**(*f*) = *g* + *f* .

#### *Long-Term Solutions*

When the RothC model is used in real application, the first step is to run the model to equilibrium to calculate the required carbon inputs needed to match the initial SOC content measured. Hence, being able to compute and predict the long-term solution are extremely valuable in order to avoid long-run simulations, which can lead to numerical artefacts if numerical tools are not properly used [15].

We firstly consider the (unrealistic) case when no CO2 release (i.e., *α* + *β* = 1) is considered.

Theorem 2 indicates that:


**Theorem 3.** *Suppose α*, *β*, *δ* > 0 *with α* + *β* < 1 *and* **k**(*i*) > 0*, i* = 1, ... 4*. For ρ*, *g*, *f , not varying with time, the RothC model admits a unique positive globally stable equilibrium, which has the following expression:*

$$\mathbf{c}\_{\rm cont}^{\*} = \frac{1}{\rho} \left[ \frac{\mathbf{g}\,\gamma + f\,\eta}{k\_{\rm dpm}}, \frac{\mathbf{g}\,(1-\gamma) + f\,\eta}{k\_{\rm rpm}}, \frac{(\mathbf{g}+f)\mathbf{a}}{k\_{\rm bia}\,\delta}, \frac{f(1-\mathbf{a})(1-2\,\eta) + \beta(\mathbf{g}+2f\,\eta)}{k\_{\rm hum}\,\delta} \right]^{\dagger}.\tag{4}$$

Consequently, the *SOC* indicator has *SOC*∗ *cont* = *ciom* + **<sup>e</sup>***<sup>T</sup>* **<sup>c</sup>***cont* as the equilibrium, which satisfies *SOC*∗ *cont* <sup>≤</sup> *ciom* <sup>+</sup> *<sup>f</sup>* <sup>+</sup> *<sup>g</sup> ρ kmin δ* .

**Proof.** From the assumptions, it follows that 0 < *α*, *β* < 1. The eigenvalues of the matrix *A* are given by *λ*<sup>1</sup> = −*kdpm* < 0, *λ*<sup>2</sup> = −*krpm* < 0, and *λ*3,4 are the roots of the second0order polynomial:

$$
\lambda^2 + \lambda \left( k\_{bio} \left( 1 - \kappa \right) + k\_{hum} \left( 1 - \beta \right) \right) + \delta \left. k\_{bio} \left. k\_{hum} \right|
$$

with discriminant <sup>Δ</sup>(*λ*)=(*<sup>α</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> *<sup>k</sup>*<sup>2</sup> *bio* + (*<sup>β</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> *<sup>k</sup>*<sup>2</sup> *hum* + 2 *kbio khum*(*α*(*β* + 1) + *β* − 1) > 0. Consequently, from Descartes' rule of sign, *λ*3, *λ*<sup>4</sup> < 0. The matrix *A* turns out to be

negative definite and admits an inverse. It is trivial to check that det(*A*) = *δ* 4 ∏*i*=1 **k**(*i*) > 0. For constant (positive) values of *ρ*, *g*, *f* , the equilibrium is achieved by solving the linear

system *ρ A* **c** = −**b**, i.e.,

$$\mathbf{c}\_{\rm cont}^{\*} = -\frac{1}{\rho} \, A^{-1} \mathbf{b}\_{\prime} \tag{5}$$

with:

$$-A^{-1} = \frac{1}{\det(A)} \begin{pmatrix} \delta \, k\_{2,3,4} & 0 & 0 & 0\\ 0 & \delta \, k\_{1,3,4} & 0 & 0\\ \kappa \, k\_{1,2,4} & \kappa \, k\_{1,2,4} & (1-\beta) \, k\_{1,2,4} & \kappa \, k\_{1,2,4}\\ \beta \, k\_{1,2,3} & \beta \, k\_{1,2,3} & \beta \, k\_{1,2,3} & (1-\alpha) \, k\_{1,2,3} \end{pmatrix}$$

where we adopted the convention that *ki*,*j*,*<sup>m</sup>* := **k**(*i*) **k**(*j*) **k**(*m*) with *i*, *j*, *m* ∈ {1, 2, 3, 4}.

Finally, the stability of the equilibrium is guaranteed as *A* is negative definite.

As concerns the equilibrium of the *SOC* indicator, first notice that, from (2), it results that **k***<sup>T</sup>* **c**<sup>∗</sup> *cont* <sup>=</sup> *<sup>f</sup>* <sup>+</sup> *<sup>g</sup> ρ δ* , then,

$$SOC\_{\rm cont}^{\*} = \mathfrak{c}\_{\rm ion} + \mathbf{e}^{T}\mathfrak{c}\_{\rm cont}^{\*} \leq \mathfrak{c}\_{\rm ion} + \frac{1}{k\_{\rm min}}\mathbf{k}^{T}\mathfrak{c}\_{\rm cont}^{\*} = \mathfrak{c}\_{\rm ion} + \frac{f + \mathfrak{g}}{k\_{\rm min}\rho\,\delta}.$$

When climatic and agricultural variables are considered, the functions *ρ*(*t*), *g*(*t*) and *f*(*t*) are chosen to be time varying on a periodical basis, and the system defined by **c**(*t*) is expected to tend toward an oscillatory state as *t* −→ +∞. If we introduce *ξ*(*t*) := *t <sup>t</sup>*<sup>0</sup> *<sup>ρ</sup>*(*s*) *ds* for all *t* ≥ *t*0, then the solution of (1) is given by:

$$\mathbf{c}(t) = \boldsymbol{e}^{\frac{\pi}{5}(t)\,A}\mathbf{c}\mathbf{o} + \boldsymbol{e}^{\frac{\pi}{5}(t)\,A} \int\_{t\_0}^{t} \boldsymbol{e}^{-\frac{\pi}{5}(s)\,A} \mathbf{b}(s) \,ds. \tag{6}$$

The study of the eigenvalues of *ξ*(*t*) *A* enables characterizing the solution behaviour. We can first observe that if the eigenvalues of *A ξ*(*t*) are negative, **c**(*t*) as *t* → +∞ does not depend on initial conditions **c**0. Secondly, if there exists a periodic solution **c**(*t*) with period *T*, then **c**(*t*<sup>0</sup> + *T*) = **c**0, and the following theorem holds:

**Theorem 4.** *Assume that ρ*(*t*)*, g*(*t*) *and f*(*t*) *are periodic with period T. If α* + *β* = 1 *and* **<sup>k</sup>**(*i*) <sup>=</sup> <sup>0</sup> *for all i* <sup>∈</sup> [*dpm*,*rpm*, *bio*, *hum*]*, then I* <sup>−</sup> *<sup>e</sup> <sup>ξ</sup>*(*t*0+*T*) *<sup>A</sup> is not singular. Starting from:*

$$\mathbf{c}\_0 := \left(I - \mathbf{c}^{\frac{\mathbf{r}}{\mathbf{s}}(t\_0 + T)}\right)^{-1} \mathbf{c}^{\frac{\mathbf{r}}{\mathbf{s}}(t\_0 + T)} \, A \int\_{t\_0}^{t\_0 + T} \mathbf{c}^{-\frac{\mathbf{r}}{\mathbf{s}}(s)} \mathbf{b}(s) \, ds$$

*the RothC model admits a (unique) periodic solution.*

**Proof.** From the periodicity of *ρ*(*t*), it follows that:

$$
\xi(t+T) = \int\_{t\_0}^{t+T} \rho(s)ds = \int\_{t\_0}^{t\_0+T} \rho(s)ds + \int\_{t\_0+T}^{t+T} \rho(s)ds = \xi(t\_0+T) + \xi(t).
$$

By imposing in (6) that **c**(*t*<sup>0</sup> + *T*) = **c**0, it follows that:

$$\mathbf{c}\_{0} = \,^{\mathbb{E}(t\_{0}+T)}A \left(\mathbf{c}\_{0} + \int\_{t\_{0}}^{t\_{0}+T} e^{-\frac{\mathbf{r}}{\mathfrak{s}}(s)A} \mathbf{b}(s) \, ds\right). \tag{7}$$

Under the assumed hypothesis, *A* is not singular, and consequently, the matrix *I* − *e <sup>ξ</sup>*(*t*0+*T*) *<sup>A</sup>* has the inverse. Exploiting the periodicity of **b**(*t*) inherited by the periodicity of both *g*(*t*) and *f*(*t*), we have that:

**c**(*t* + *T*) = *e <sup>ξ</sup>*(*t*+*T*) *<sup>A</sup>* **c**<sup>0</sup> + *<sup>t</sup>*+*<sup>T</sup> t*0 *<sup>e</sup>* <sup>−</sup>*ξ*(*s*)*<sup>A</sup>* **<sup>b</sup>**(*s*) *ds* = *e <sup>ξ</sup>*(*t*)*<sup>A</sup> e ξ*(*t*0+*T*)*A* **c**<sup>0</sup> + *<sup>t</sup>*0+*<sup>T</sup> t*0 *e* <sup>−</sup>*ξ*(*s*)*<sup>A</sup>* **b**(*s*) *ds* + *<sup>t</sup>*+*<sup>T</sup> t*0+*T <sup>e</sup>* <sup>−</sup>*ξ*(*s*)*<sup>A</sup>* **<sup>b</sup>**(*s*) *ds* = *e <sup>ξ</sup>*(*t*)*A***c**<sup>0</sup> + *e <sup>ξ</sup>*(*t*)*<sup>A</sup> e ξ*(*t*0+*T*)*A <sup>t</sup> t*0 *e* <sup>−</sup>*ξ*(*s*+*T*)*<sup>A</sup>* **b**(*s*) *ds* = *e <sup>ξ</sup>*(*t*)*A***c**<sup>0</sup> + *e ξ*(*t*)*A <sup>t</sup> t*0 *e* <sup>−</sup>*ξ*(*s*)*<sup>A</sup>* **b**(*s*) *ds* = **c**(*t*).

This proves the periodicity of **c**(*t*).

The condition *α* + *β* < 1 is always true for the RothC model due to the definition of *α* and *β* (see [6]), and thus, the stock of carbon in each compartment tends towards a periodic solution in large times whatever the input values are.

Although the continuous approach (1) gives a simple explicit solution, the function of both the input variables and the parameters of the model, in real applications, firstorder discretized versions are applied. We analyse the original discrete formulation of the RothC model, the Exponential Rosenbrock–Euler (ERE) version and a novel first-order nonstandard procedure, closer to the classical discrete RothC procedure than the ERE model.

#### **3. Non-Standard RothC Discrete Models**

*3.1. Original Discrete RothC*

By denoting with *I* the identity matrix and with

$$A = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \alpha & \alpha & \alpha & \alpha \\ \beta & \beta & \beta & \beta \end{pmatrix}, \quad D = \begin{pmatrix} k\_{dpm} & 0 & 0 & 0 \\ 0 & k\_{rpm} & 0 & 0 \\ 0 & 0 & k\_{bio} & 0 \\ 0 & 0 & 0 & k\_{hum} \end{pmatrix}.$$

the discrete (monthly) formulation of the RothC Version 26.3 [11] in vectorial form is given by:

$$\mathbf{c}\_{n+1} = \left(\boldsymbol{\Lambda} + (\boldsymbol{I} - \boldsymbol{\Lambda})\,e^{-\boldsymbol{\Lambda}t\,\rho(t\_n)D}\right)\mathbf{c}\_n + \boldsymbol{\Lambda}t\,\mathbf{b}(t\_n) \tag{8}$$

where **c***<sup>n</sup>* ≈ **c**(*tn*) and the discrete temporal grid *tn*+<sup>1</sup> = *t*<sup>0</sup> + *n* Δ*t* advances with stepsize Δ*t* = 1.

Discrete RothC has been applied using data from long-term experiments across several ecosystems, climate conditions and Land Use (LU) classes. It has been extensively applied in Europe for SOC modelling, and applications of RothC to a long-term experiment in semi-arid conditions in Italy can be found in [16,17].

In the case when *g*(*t*), *f*(*t*) and *ρ*(*t*) do not depend on time *t*, then **b** = *g* **a**(*g*) + *f* **a**(*f*); by setting *<sup>F</sup>*(Δ*t*) = <sup>Λ</sup> + (*<sup>I</sup>* <sup>−</sup> <sup>Λ</sup>)*e*−Δ*<sup>t</sup> <sup>ρ</sup>D*, one can demonstrate that, whenever **<sup>k</sup>**(*i*) <sup>=</sup> 0, (*I* − *F*(Δ*t*)) has an inverse, and the system yields a steady-state solution.

**Theorem 5.** *The steady-state solution* **c**∗ *cont in (5) of the continuous model* (1) *and the steady-state solution* **c**∗ *RothC*(Δ*t*) *of the discrete RothC model (8) satisfy the following relation:*

**c**∗ *cont* = *ϕ*(−Δ*t ρ D*) **c**<sup>∗</sup> *RothC*(Δ*t*)

*where <sup>ϕ</sup>*(*z*) = *<sup>z</sup>*−1(*e<sup>z</sup>* <sup>−</sup> <sup>1</sup>)*.*

**Proof.** Firstly, evaluate *<sup>I</sup>* <sup>−</sup> *<sup>F</sup>*(Δ*t*)=(<sup>Λ</sup> <sup>−</sup> *<sup>I</sup>*) (*e*−Δ*<sup>t</sup> <sup>ρ</sup> <sup>D</sup>* <sup>−</sup> *<sup>I</sup>*). From **<sup>c</sup>**<sup>∗</sup> *RothC* = *F*(Δ*t*) **c**<sup>∗</sup> *RothC* + Δ*t* **b**, one can write:

$$\begin{aligned} \mathbf{c}\_{Roth\mathbb{C}}^{\*} (\Delta t) &= \ \Delta t \, (I - F(\Delta t))^{-1} \ \mathbf{b} \\ &= \ \Delta t \, (e^{-\Delta t \rho D} - I)^{-1} \ (\Lambda - I)^{-1} \ \mathbf{b} \end{aligned} \tag{9}$$

From the definition of **c**∗ *cont* in (5) and by noticing that the matrix *A* that defines the continuous problem (1) verifies the relation *A* = (Λ − *I*) *D*, it follows that:

$$\mathfrak{c}\_{\text{Rob}\mathbb{C}}^{\*}(\Delta t) \quad = \ \not\!\! \varphi(-\Delta t \,\not\!\!\rho \, D)^{-1} \mathfrak{c}\_{\text{cont}}^{\*}$$

As the original discrete RothC model is applied with Δ*t* = 1, we can estimate the deviation of the equilibrium of the continuous model with respect to the equilibrium **c**∗ *RothC*(1) of the original discrete RothC model. Given the matrix norm · induced by a vector norm, as −*ρ D* is negative definite, it results that *ϕ*(−*ρ D*)≤ *I* = 1 and **c**<sup>∗</sup> *cont* ≤*ϕ*(−*ρ D*) **c**<sup>∗</sup> *RothC*(1)≤**c**<sup>∗</sup> *RothC*(1). This indicates that the equilibrium **c**∗ *RothC*(1) is, in norm, an overestimation of the theoretical equilibrium **c**<sup>∗</sup> *cont*. The relative error depends on the stepsize Δ*t* according to:

$$\frac{\|\mathbf{c}\_{Roth\mathbb{C}}^{\*}(\Delta t) - \mathbf{c}\_{cont}^{\*}\|}{\|\mathbf{c}\_{cont}^{\*}\|} \le \|\varphi(-\Delta t \,\boldsymbol{\rho} \,\boldsymbol{D})^{-1} - I\|. \tag{10}$$

As concerns the equilibrium of the *SOC* indicator, it results that:

$$\text{SOC}\_{\text{RothC}}^{\*}(\Delta t) = \mathbf{c}\_{\text{ion}} + \mathbf{e}^{T}\mathbf{c}\_{\text{RothC}}^{\*}(\Delta t) = \mathbf{c}\_{\text{ion}} + \mathbf{e}^{T}\,\boldsymbol{\varrho}(-\Delta t\,\boldsymbol{\varrho}\,\boldsymbol{D})^{-1}\mathbf{c}\_{\text{conv}}^{\*}.$$

and consequently,

$$\left(\text{SOC}\_{Roth\mathbb{C}}^{\*}(\Delta t) - \text{SOC}\_{cont}^{\*} = \mathbf{e}^{T} \left(\boldsymbol{\varrho}(-\Delta t \,\boldsymbol{\varrho} \,\boldsymbol{D})^{-1} - \boldsymbol{I}\right) \mathbf{c}\_{cont}^{\*} = \mathbf{w}^{T}(\Delta t) \,\mathbf{c}\_{cont}^{\*}$$

where:

$$\mathbf{w}(\Delta t) = \left[ \frac{1 - \varrho(-\Delta t \,\rho \, k\_{dpm})}{\varrho(-\Delta t \,\rho \, k\_{dpm})}, \frac{1 - \varrho(-\Delta t \,\rho \, k\_{rpm})}{\varrho(-\Delta t \,\rho \, k\_{rpm})}, \frac{1 - \varrho(-\Delta t \,\rho \, k\_{\rm{iso}})}{\varrho(-\Delta t \,\rho \, k\_{\rm{iso}})}, \frac{1 - \varrho(-\Delta t \,\rho \, k\_{\rm{hom}})}{\varrho(-\Delta t \,\rho \, k\_{\rm{hom}})} \right] \tag{11}$$

a vector with positive entries that verifies limΔ*<sup>t</sup>* <sup>→</sup><sup>0</sup> **w**(Δ*t*) = **0**.

Since the original discrete RothC model is applied with Δ*t* = 1, the deviation of the *SOC*∗ *cont* of the continuous model with respect to the equilibrium *SOC*<sup>∗</sup> *RothC*(1) of the original discrete RothC model is given by *SOC*∗ *RothC*(1) = *SOC*<sup>∗</sup> *cont* + **w***T*(1) **c**<sup>∗</sup> *cont*, thus indicating that the evaluation of soil organic carbon by means of the discrete RothC model is an overestimation of the theoretical value *SOC*∗ *cont*.

Usually, *ρ*, *g* and *f* vary through time, but it can be assumed that they have a periodic behaviour. Typically, if the agricultural practices are cyclic and if the weather conditions can be considered periodic, then *ρ* = *ρ*(*t*), *g* = *g*(*t*) and *f* = *f*(*t*) will also behave periodically. Assuming that the periodicity of these variables is *T* = *N*Δ*t*, one looks for a solution of **c** such that **c**<sup>0</sup> = **c***N*. Then, we can write:

$$\mathbf{c}\_{\mathsf{n}+1} = \left(\boldsymbol{\Lambda} + (\boldsymbol{I} - \boldsymbol{\Lambda})\boldsymbol{e}^{-\boldsymbol{\Lambda}t\cdot\boldsymbol{\rho}(t\_{\mathsf{n}})\boldsymbol{D}}\right)\mathbf{c}\_{\mathsf{n}} + \boldsymbol{\Delta t}\ \mathbf{b}(t\_{\mathsf{n}}),$$

for *n* = 0, . . . *N* − 2 and then impose the periodic condition **c***<sup>N</sup>* = **c**0:

$$\mathbf{c}\_{0} = \left(\boldsymbol{\Lambda} + (\boldsymbol{I} - \boldsymbol{\Lambda})\,\boldsymbol{\varepsilon}^{-\Delta t}\boldsymbol{\rho}^{(t\_{N-1})D}\right)\mathbf{c}\_{N-1} + \boldsymbol{\Delta t}\,\mathbf{b}(t\_{N-1}).$$

Setting *Fn* :<sup>=</sup> *Fn*(Δ*t*) :<sup>=</sup> <sup>Λ</sup> + (*<sup>I</sup>* <sup>−</sup> <sup>Λ</sup>)*e*−Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*)*D*, the above relations can be reformulated as:

$$
\begin{pmatrix} 0 & I & 0 & \dots & 0 \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & \ddots & 0 \\ \vdots & & \ddots & \ddots & 0 \\ 0 & \dots & \dots & 0 & I \\ I & 0 & \dots & 0 & 0 \end{pmatrix} \begin{pmatrix} \mathbf{c}\_{0} \\ \vdots \\ \vdots \\ \mathbf{c}\_{N-2} \\ \mathbf{c}\_{N-1} \end{pmatrix} = \begin{pmatrix} F\_{0} & 0 & \dots & \dots & 0 \\ 0 & F\_{1} & \ddots & & \dots \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & \ddots & \vdots \\ \vdots & & \ddots & \ddots & 0 \\ 0 & \dots & \dots & 0 & F\_{N-1} \end{pmatrix} \begin{pmatrix} \mathbf{c}\_{0} \\ \mathbf{c}\_{1} \\ \vdots \\ \mathbf{c}\_{N-2} \\ \mathbf{c}\_{N-1} \end{pmatrix} + \Lambda t \begin{pmatrix} \mathbf{b}\_{0} \\ \mathbf{b}\_{1} \\ \vdots \\ \mathbf{b}\_{N-2} \\ \mathbf{b}\_{N-1} \end{pmatrix}
$$

which yields:

$$
\begin{pmatrix} \mathbf{c}\_{0} \\ \mathbf{c}\_{1} \\ \vdots \\ \mathbf{c}\_{N-2} \\ \mathbf{c}\_{N-1} \end{pmatrix} = -\Delta t \begin{pmatrix} F\_{0} & -I & \dots & \dots & 0 \\ 0 & F\_{1} & -I & & \dots \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & \ddots & -I \\ -I & \dots & \dots & 0 & F\_{N-1} \end{pmatrix}^{-1} \begin{pmatrix} \mathbf{b}\_{0} \\ \mathbf{b}\_{1} \\ \vdots \\ \mathbf{b}\_{N-2} \\ \mathbf{b}\_{N-1} \end{pmatrix}, \tag{12}
$$

where **b***<sup>n</sup>* := **b**(*tn*), for *n* = 0, ... , *N* − 1. Equation (12) provides a vector of dimension 4 × *N* and a sequence of states **c***<sup>n</sup>* = [*c* (*dpm*) *<sup>n</sup>* , *<sup>c</sup>* (*rpm*) *<sup>n</sup>* , *<sup>c</sup>* (*bio*) *<sup>n</sup>* , *<sup>c</sup>* (*hum*) *<sup>n</sup>* ] *<sup>T</sup>*, *<sup>n</sup>* <sup>=</sup> 0, ... *<sup>N</sup>* <sup>−</sup> 1, which characterizes the oscillatory state of the carbon stock in each compartment, keeping track of the temporal variability of the forcing variables over the period.

#### *3.2. Exponential Rosenbrock–Euler Model*

If we regard the stepping procedure in (8), which defines the original discrete RothC model as a first-order approximation of the solution of the continuous model (1), then different discrete RothC models can be formulated. As an example, the discretization of Equation (1) from *tn* = *t*<sup>0</sup> + *n* Δ*t* to *tn*+<sup>1</sup> = *tn* + Δ*t* by means of the exponential Rosenbrock–Euler model (for non-autonomous systems) [10,18,19] leads to:

$$\begin{aligned} \mathbf{c}\_{\text{Il}+1} &= \begin{array}{c} \boldsymbol{\varepsilon}^{\text{Ant}} \boldsymbol{\rho}(\boldsymbol{t}\_{\text{n}}) \, ^{A} \mathbf{c}\_{\text{ll}} + \Delta t \, \boldsymbol{\varrho}(\Delta \mathbf{t} \, \boldsymbol{\rho}(\boldsymbol{t}\_{\text{n}}) \, \mathbf{A}) \, \mathbf{b}(\boldsymbol{t}\_{\text{n}}) \\\\ &= \mathbf{c}\_{\text{ll}} + \Delta t \, \boldsymbol{\varrho}(\Delta \mathbf{t} \, \boldsymbol{\rho}(\boldsymbol{t}\_{\text{ll}}) \, \mathbf{A}) \, \mathbf{f}(\boldsymbol{t}\_{\text{n}}; \mathbf{c}\_{\text{ll}}) \end{aligned} \tag{13}$$

where **f**(*t*; **c**) := *ρ*(*t*) *A* **c**(*t*) + **b**(*t*). Notice that *A* is negative definite, and for *z* < 0, it results in 0 < *ϕ*(*z*) < 1. In [10], it was shown that *A* = (Λ − *I*) *D* and:

$$F\_n(\Delta t) = \Lambda + (I - \Lambda)\,e^{-\Lambda t \, D\,\rho(t\_n)} \approx e^{\Lambda t \,\rho(t\_n) \,A}, \quad \Delta t \,\varphi(\Delta t \,\rho(t\_n) \,A) = \mathcal{O}(\operatorname{diag}(\Delta t)).$$

Of course, the approximated solutions via the Exponential Rosenbrock–Euler (ERE) method (13) differ from the values given by the original discrete RothC model (8); the major consequence is that the constant discrete steady-state solution, which for the ERE method coincides with the continuous equilibrium **c**∗ *cont*, does not depend on Δ*t*. As concerns its stability, given *A* negative definite, it is enough to notice that the eigenvalues of *e*Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*) *<sup>A</sup>* are positive, but all less than one.

To find periodic solutions via the ERE method, we can generalize the approach followed for the discrete original RothC model as follows. Suppose that *T* = *N* Δ*t*, and let us impose that **c**<sup>0</sup> = **c***N*:

$$\mathbf{c}\_{n+1} = \begin{array}{c} \mathbf{c}^{\Delta t A \boldsymbol{\rho}(t\_n)} \ \mathbf{c}\_n + \Delta t \, \boldsymbol{\varrho} \left( \Delta t \, A \, \boldsymbol{\varrho}(t\_n) \right) \mathbf{b}(t\_n) \end{array}$$

for *n* = 0, . . . *N* − 2 and:

$$\mathbf{c}\_{0} = \boldsymbol{\epsilon}^{\Delta t A \boldsymbol{\rho} (t\_{N-1})} \mathbf{c}\_{N-1} + \Delta t \,\boldsymbol{\varrho} (\Delta t \, A \, \boldsymbol{\rho} (t\_{N-1})) \, \mathbf{b} (t\_{N-1}) \, \boldsymbol{\epsilon}$$

which yields:

$$
\begin{pmatrix} \mathbf{c}\_{0} \\ \mathbf{c}\_{1} \\ \vdots \\ \mathbf{c}\_{N-2} \\ \mathbf{c}\_{N-1} \end{pmatrix} = -\Delta t \begin{pmatrix} \varepsilon^{\operatorname{At}A\rho(t\_{0})} & -I & \dots & \dots & 0 \\ 0 & \varepsilon^{\operatorname{At}A\rho(t\_{1})} & -I & & \dots \\ \vdots & \ddots & \ddots & \ddots & \vdots \\ \vdots & & \ddots & \ddots & -I \\ \vdots & & \ddots & \ddots & -I \\ -I & & \dots & \dots & 0 & \varepsilon^{\operatorname{At}A\rho(t\_{N-1})} \end{pmatrix}^{-1} \begin{pmatrix} \tilde{\mathbf{b}}\_{0} \\ \tilde{\mathbf{b}}\_{1} \\ \vdots \\ \tilde{\mathbf{b}}\_{N-2} \\ \tilde{\mathbf{b}}\_{N-1} \end{pmatrix}, \tag{14}
$$

with **<sup>b</sup>**-*<sup>n</sup>* :<sup>=</sup> *<sup>ϕ</sup>*(Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*) *<sup>A</sup>*) **<sup>b</sup>***n*, for *<sup>n</sup>* <sup>=</sup> 0, . . . , *<sup>N</sup>* <sup>−</sup> 1.

The sequence of states **c***n*, *n* = 0, ... *N* − 1 characterizes the oscillatory state of the carbon stock in each compartment. Of course, the solution differs from the periodic solution provided in (12).

#### *3.3. Novel Non-Standard Discrete RothC Model*

The ERE procedure described in (13) belongs to the more general class of non-standard finite difference schemes [14,20]. Different first-order non-standard approximations can be used, depending on the function of the stepsize used to advance the first-order procedure. Each of them can be considered as discrete RothC models, alternative to the ERE model.

A discrete model, closer to the original formulation of the RothC model, can be introduced as follows. Consider the discrete formulation of RothC (8). Simple evaluations lead to the equivalent expressions:

$$\begin{array}{rcl} \mathbf{c}\_{n+1} &=& F\_n(\Delta t) \, \mathbf{c}\_n + \Delta t \, \mathbf{b}(t\_n) \\\\ &=& \mathbf{c}\_n + \Delta t \, \boldsymbol{\varrho}(\Delta t \, \boldsymbol{\varrho}(t\_n) \, \tilde{A}) \, \boldsymbol{\varrho}(t\_n) \, A \, \mathbf{c}\_n + \Delta t \, \mathbf{b}(t\_n) \\\\ & & \tag{12} \end{array} \tag{15}$$

where the eigenvalues of the matrix *<sup>A</sup>*- <sup>=</sup> <sup>−</sup>*<sup>A</sup>* (*<sup>I</sup>* <sup>−</sup> <sup>Λ</sup>)−<sup>1</sup> <sup>=</sup> <sup>−</sup>(*<sup>I</sup>* <sup>−</sup> <sup>Λ</sup>) *<sup>D</sup>* (*<sup>I</sup>* <sup>−</sup> <sup>Λ</sup>)−<sup>1</sup> are the entries of the vector −**k**, and the columns of:

$$(I - \Lambda)^{-1} = \frac{1}{\delta} \begin{pmatrix} \delta & 0 & 0 & 0 \\ 0 & \delta & 0 & 0 \\ \alpha & \alpha & \beta - 1 & \alpha \\ \beta & \beta & \beta & \alpha - 1 \end{pmatrix} \tag{16}$$

are the corresponding eigenvectors.

A novel Non-Standard (NS) discrete RothC model is given by:

$$\begin{aligned} \mathbf{c}\_{n+1} &= \mathbf{c}\_n + \Delta t \boldsymbol{\varrho}(\Delta t \,\boldsymbol{\varrho}(t\_n) \,\bar{A}) \,\mathbf{f}(\mathbf{c}\_n; t\_n) \\ &= \,\, \_F\mathbf{f}\_n(\Delta t) \,\mathbf{c}\_n + \Delta t \,\boldsymbol{\varrho}(\Delta t \,\boldsymbol{\varrho}(t\_n) \,\bar{A}) \,\mathbf{b}(t\_n) \end{aligned} \tag{17}$$

Indeed, we can prove that:

**Theorem 6.** *The NS model (17) is a non-standard first-order approximation of the continuous model (1), i.e.,*

$$\Delta t \,\,\varphi(\Delta t \,\,\rho(t\_n)\,\dot{A}) = \mathcal{O}(\operatorname{diag}(\Delta t)), \quad \text{for } \Delta t \to 0.$$

**Proof.** From the definition of the exponential matrix, we have that:

$$\begin{array}{rcl} \Delta t \boldsymbol{\rho}(\Delta t \,\boldsymbol{\rho}(t\_{\boldsymbol{n}}) \,\tilde{A}) &=& \frac{1}{\rho(t\_{\boldsymbol{n}})} \tilde{A}^{-1} \Big( \boldsymbol{\epsilon}^{\Delta t \,\boldsymbol{\rho}(t\_{\boldsymbol{n}})} \,\bar{A} - I \Big) \\\\ &=& \frac{1}{\rho(t\_{\boldsymbol{n}})} \tilde{A}^{-1} \sum\_{j=1}^{\infty} \frac{1}{j!} \Big( \Delta t \,\boldsymbol{\rho}(t\_{\boldsymbol{n}}) \,\tilde{A} \Big)^{j} \\\\ &=& \operatorname{diag}(\Delta t) \, + \, \frac{\Delta t^{2}}{2} \rho(t\_{\boldsymbol{n}}) \,\tilde{A} + \sum\_{j=2}^{\infty} \frac{\Delta t^{j+1}}{(j+1)!} \Big( \boldsymbol{\rho}(t\_{\boldsymbol{n}}) \,\tilde{A} \Big)^{j} \end{array}$$

It follows that <sup>Δ</sup>*tϕ*(Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*) *<sup>A</sup>*-) = <sup>O</sup>(*diag*(Δ*t*)), for <sup>Δ</sup>*<sup>t</sup>* <sup>→</sup> 0.

By comparing the ERE flow (13) with the above non-standard formulation (17), we notice that the main difference is in the replacement of the matrix *<sup>A</sup>* with the matrix *<sup>A</sup>*-, which has a simple representation in Jordan form. This simplifies the evaluation of the matrix function *<sup>ϕ</sup>*(Δ*tρ*(*tn*) *<sup>A</sup>*-) because it can be evaluated on the known eigenvalues <sup>−</sup>**k**(*i*) and eigenvectors in (16).

Moreover, the comparison of the original discrete formulation of the RothC model (8) with the novel NS procedure (17) allows us to notice that, different from the ERE method (13), the decomposition dynamic is now treated in the same way as the original discrete RothC model. The time updating procedure related to the carbon amount entering the system at time *<sup>t</sup>* <sup>+</sup> <sup>Δ</sup>*<sup>t</sup>* is now evaluated up to a factor given by *<sup>ϕ</sup>*(Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*) *<sup>A</sup>*-), different from the ERE approximation, which advances evaluating *ϕ*(Δ*t ρ*(*tn*) *A*).

As the ERE method, in the case of constant functions *ρ*, *g*, *f* , the novel method NS has **c**∗ *cont* as the steady-state solution.

**Theorem 7.** *Under the hypothesis* 0 < *α* + *β* < 1*, the equilibrium* **c**∗ *cont of the NS method (17) is globally stable.*

**Proof.** It is enough to prove that the eigenvalues of *Fn*(Δ*t*) = <sup>Λ</sup> + (*<sup>I</sup>* <sup>−</sup> <sup>Λ</sup>)*e*−Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*)*<sup>D</sup>* are in modulus less than one. It is easy to see that two eigenvalues are given by *e* −Δ*t ρ*(*tn*) *kdpm* and *e*−Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*) *krpm* . The others are the eigenvalues of the sub-matrix:

$$F\_{3,4} = \begin{pmatrix} \alpha - e^{-\Delta t \rho(t\_n) \cdot k\_{\rm{bias}}} \left( \alpha - 1 \right) & \alpha - \alpha \, e^{-\Delta t \rho(t\_n) \cdot k\_{\rm{hour}}} \\\\ \beta - \beta \, e^{-\Delta t \rho(t\_n) \cdot k\_{\rm{bias}}} & \beta - e^{-\Delta t \rho(t\_n) \cdot k\_{\rm{hour}}} \left( \beta - 1 \right) \end{pmatrix}.$$

The eigenvalues lie in the union of the two Gershgorin sets:

$$K\_1 = \left\{ z \in \mathbb{C} \, : \, |z - \mathfrak{a} + e^{-\Lambda t \, \rho(t\_n) k\_{\mathrm{bias}}} (\mathfrak{a} - 1)| \le |\beta - \beta \, e^{-\Lambda t \, \rho(t\_n) k\_{\mathrm{bias}}}| \right\},$$

$$K\_2 = \left\{ z \in \mathbb{C} \, : \, |z - \mathfrak{a} + \mathfrak{a} \, e^{-\Lambda t \, \rho(t\_n) k\_{\mathrm{mm}}}| \le |\beta - e^{-\Lambda t \, \rho(t\_n) k\_{\mathrm{mm}}} (\beta - 1)| \right\}.$$

Notice that *F*3,4 has positive entries. Set

*M* = max(*F*3,4(1, 1) + *F*3,4(2, 1), *F*3,4(1, 2) + *F*3,4(2, 2)); if *z* ∈ *K*<sup>1</sup> ∪ *K*2, then |*z*| ≤ *M*. evaluate:

$$F\_{\mathfrak{Z}, \mathfrak{A}}(1, 1) + F\_{\mathfrak{Z}, \mathfrak{A}}(2, 1) = (\mathfrak{a} + \beta) \left(1 - e^{-\Delta t \varrho(t\_n) \cdot k\_{\text{bias}}}\right) + e^{-\Delta t \varrho(t\_n) \cdot k\_{\text{bias}}} < 1$$

Similarly,

$$F\_{3,4}(1,2) + F\_{3,4}(2,2) = (\alpha + \beta) \left(1 - e^{-\Lambda t \rho(t\_n) k\_{\text{ham}}}\right) + e^{-\Lambda t \rho(t\_n) k\_{\text{ham}}} < 1$$

Hence it results that the eigenvalue of *F*3,4 are less than one in modulus.

To search for periodic solutions via the novel non-standard model, we need to solve the linear system:

$$-\begin{pmatrix} F\_0 & -I & \dots & \dots & 0\\ 0 & F\_1 & -I & & \dots \\ \vdots & \ddots & \ddots & \ddots & \vdots\\ \vdots & & \ddots & \ddots & -I\\ \vdots & & & \ddots & \ddots & -I\\ -I & \dots & \dots & 0 & F\_{N-1} \end{pmatrix} \begin{pmatrix} \mathbf{c}\_0\\ \mathbf{c}\_1\\ \vdots\\ \mathbf{c}\_{N-2}\\ \mathbf{c}\_{N-1} \end{pmatrix} = \Delta t \begin{pmatrix} \mathbf{\hat{b}}\_0\\ \mathbf{\hat{b}}\_1\\ \vdots\\ \mathbf{\hat{b}}\_{N-2}\\ \mathbf{\hat{b}}\_{N-1} \end{pmatrix} / $$

where **bˆ** *<sup>n</sup>* :<sup>=</sup> *<sup>ϕ</sup>*(Δ*<sup>t</sup> <sup>ρ</sup>*(*tn*) *<sup>A</sup>*-) **<sup>b</sup>***n*, for *<sup>n</sup>* <sup>=</sup> 0, ... , *<sup>N</sup>* <sup>−</sup> 1. As already observed, we have the same coefficient matrix of the system (12), while the knowledge of the explicit Jordan form of the matrix *<sup>A</sup>*- simplifies the evaluation of the coefficients **bˆ** *<sup>n</sup>*.

In the next two sections, we compare the behaviour of the different discrete models on the evaluation of steady-state and long-term periodic solutions. Firstly, the comparison is made on a theoretical basis comparing the accuracy of the methods in approximating the solutions of the continuous model; secondly, we tested both the ERE and the novel nonstandard model with respect to the original discrete RothC model on a classical monthly time-scale experiment where real measurements were available.

A freely accessible MATLAB routine named NSRothC [21] was implemented to replicate all the simulations presented in the next sections. The package includes two versions. The first one (contNSRothC) allowed us to run the model with different stepsizes, when the continuous periodic input functions *ρ*(*t*) and **b**(*t*) have the particular form presented in Section 4. In the second version (monNSRothC), the stepsize was fixed to one month, and the discrete monthly values of input residuals and FYM were required, as well as the monthly values of weather variables (temperature, rainfall and moisture). As an example, data from the Hoosfield spring barley experiment [6] were used.

#### **4. Numerical Tests**

Let us compare long-term solutions obtained by using the original discrete RothC model, the exponential Rosenbrock–Euler method and the novel non-standard first-order scheme. We set parameters *α* = 0.1, *β* = 0.12, *γ* = 0.59 and *η* = 0.49. The vector of the decomposition rates **k** = [0.8333, 0.0250, 0.0550, 0.0017] and functions *ρ*(*t*), *g*(*t*) and *f*(*t*) are supposedly expressed on a monthly scale. They vary with the same period *T* = 12 and have Gaussian distributions according to:

$$\rho(t) = a\_{\rho} + \frac{G(t, \mu\_{\rho}^{(1)}, \sigma\_{\rho}^{(1)})}{\int\_{t\_{0}}^{t\_{0} + T} G(s, \mu\_{\rho}^{(1)}, \sigma\_{\rho}^{(1)}) \, ds} b\_{\rho}^{(1)} + \frac{G(t, \mu\_{\rho}^{(2)}, \sigma\_{\rho}^{(2)})}{\int\_{t\_{0}}^{t\_{0} + T} G(s, \mu\_{\rho}^{(2)}, \sigma\_{\rho}^{(2)}) \, ds} b\_{\rho}^{(2)}$$

$$g(t) = a\_{\g} + \frac{G(t, \mu\_{\rho}, \sigma\_{\g})}{\int\_{t\_{0}}^{t\_{0} + T} G(s, \mu\_{\g}, \sigma\_{\g}) \, ds} b\_{\mathcal{S}'} \qquad f(t) = a\_{f} + \frac{G(t, \mu\_{f}, \sigma\_{f})}{\int\_{t\_{0}}^{t\_{0} + T} G(s, \mu\_{f}, \sigma\_{f}) \, ds} b\_{f}^{-1}$$

$$\text{where } G(t, \mu, \sigma) := \frac{1}{\sigma \sqrt{2\pi}} \operatorname{\
} \frac{-\left(t - t\_{0} - \mu - \left|\frac{t - t\_{0}}{T}\right| \, T\right)^{2}}{2\sigma^{2}} \qquad \text{for } t \ge t\_{0}.$$

$$\text{Values } a\_{\theta} = 0.3561, \mu\_{\rho}^{(1)} = \frac{T}{T}, \mu\_{\rho}^{(2)} = \frac{5}{6} \text{ were set, amplitudes } b\_{\rho}^{(1)} = 0.9596 \text{ and } \text{and }$$

*<sup>ρ</sup>* = *<sup>T</sup>* <sup>2</sup> , *<sup>μ</sup>*(2) *<sup>ρ</sup>* = <sup>5</sup> *<sup>T</sup>* <sup>6</sup> were set, amplitudes *b <sup>ρ</sup>* = 0.9596 and *b* (2) *<sup>ρ</sup>* <sup>=</sup> 1.4996 were chosen, while dispersion coefficients were given by *<sup>σ</sup>*(1) *<sup>ρ</sup>* <sup>=</sup> *<sup>σ</sup>*(2) *<sup>ρ</sup>* = 0.6738. The function *ρ*(*t*) in a time-span of three years is shown in Figure 2 on the left.

Values *ag* = 0, *μ<sup>g</sup>* = *<sup>T</sup>* <sup>2</sup> , *bg* = 2.7996, *σ<sup>g</sup>* = 0.9974 define the function *g*(*t*), while values *a <sup>f</sup>* = 0, *μ<sup>f</sup>* = *<sup>T</sup>* <sup>6</sup> , *bf* = 1.5, *σ<sup>f</sup>* = 0.1995 define the function *f*(*t*). The four components of the input function **b**(*t*) := *g*(*t*) **a**(*g*) + *f*(*t*) **a**(*f*) are depicted in Figure 2, on the right.

**Figure 2.** The function *ρ*(*t*) (on the left) and the vectorial function **b**(*t*) (on the right) in a three-year time-span.

#### *4.1. Steady-State Solution*

To compare long-time periodic and steady-state solutions, we consider the average values of *ρ*(*t*) and **b**(*t*) over one period *T*:

$$\begin{array}{rcl} \mathfrak{h} &=& a\_{\rho} + \frac{b\_{\rho}^{(1)}}{T} + \frac{b\_{\rho}^{(2)}}{T} = 0.5610, \\\\ \mathfrak{h} &=& \left( a\_{\mathcal{S}} + \frac{b\_{\mathcal{S}}}{T} \right) \mathfrak{a}^{(\mathcal{S})} + \left( a\_{f} + \frac{b\_{f}}{T} \right) \mathfrak{a}^{(f)} = 0.2333 \mathfrak{a}^{(g)} + 0.125 \mathfrak{a}^{(f)} = [0.1989, 0.1569, 0, 0.025]^T, \end{array}$$

and we evaluate the steady-state solution (4) of the continuous RothC model (1):

$$\mathbf{c}\_{cont}^{\*} = [0.4254, \ 11.1867, \ 1.4887, \ 61.6253]^T. \tag{18}$$

We recall that both the ERE and the NS method have **c**∗ *cont* as the steady-state solution of their discrete flows, and consequently, they provide the value of *SOC*∗ *cont* as the equilibrium for soil organic carbon; differently, the steady-state solution of the discrete RothC model (9) depends on the chosen stepsize Δ*t*. In Table 1, we report the equilibrium solutions **c**∗ *RothC*(Δ*t*) for decreasing values of the stepsize Δ*t*. When we proceed with Δ*t* = 1, we obtain the original monthly time-stepping procedure, while Δ*t* = 1/30 represents a daily temporal updating. Notice that, as predicted by the theoretical results, **c**<sup>∗</sup> *cont*<sup>2</sup> = 62.6516 ≤ **c**<sup>∗</sup> *RothC*(1)<sup>2</sup> = 62.695, i.e., the original discrete RothC model overestimates in norm the theoretical equilibrium value.

The reduction of the temporal stepsize of 1/30 causes approximately the same reduction of the absolute errors as **c**<sup>∗</sup> *cont* − **c**<sup>∗</sup> *RothC*(1)<sup>2</sup> **c**<sup>∗</sup> *cont* − **c**<sup>∗</sup> *RothC*(1/30)<sup>2</sup> = 31.534, with this confirming the first-order accuracy of the approximation of the steady-state solution. In Figure 3a, we report the relative errors **c**<sup>∗</sup> *cont* − **c**<sup>∗</sup> *RothC*(Δ*t*)<sup>2</sup> **c**<sup>∗</sup> *cont*2 in correspondence of halved stepsizes starting from Δ*t* = 1 together with their bounds evaluated in (10). In the same figure, we report the error on the equilibrium value of SOC normalized respect to **c**∗ *cont*, i.e.,

$$\frac{SOC\_{\text{Roth}\subset\text{C}}^{\*}(\Delta t) - SOC\_{\text{cont}}^{\*}}{||\mathbf{c}\_{\text{cont}}^{\*}||\_2} = \mathbf{w}^{T}(\Delta t)$$

where **w**(Δ*t*) is defined in (11), with respect to the same reduction of the stepsize Δ*t*.

**Table 1.** Dependence of the steady-state solution of the discrete RothC model on the temporal stepsize Δ*t*. The values in each compartment converge, with first-order accuracy, to the entries of **c**∗ *cont* = [0.4254, 11.1867, 1.4887, 61.6253] *T*.


#### *4.2. Periodic Solutions*

In these experiments, we compared the periodic solutions provided by the three discrete models, obtained for the functions *ρ*(*t*) and **b**(*t*) plotted in Figure 2. Different from the evaluation of steady-state solutions, the ERE and the novel NS procedure, as well as the discrete RothC model provide long-term periodic solutions affected by their own numerical errors, also depending on the chosen stepsize Δ*t*. As shown in Figure 3b, the RothC discrete model provides the worst performance when compared with the other procedures in terms of errors with respect to the reference solution. In Figure 4, we plot the related SOC indicator (1) in a one-period time span obtained with the three discrete procedures applied with reduced stepsizes Δ*t* = 1, 1/2, 1/4, 1/8. In the same figure, the reference solution obtained in the long run with the ode45 MATLAB function, with tolerance set at machine precision, is plotted. Notice that all the methods need to be applied with a Δ*t* << 1 in order to have an acceptable level of accuracy when compared to the reference solution. Again, the original discrete RothC model was confirmed as a less accurate integrator of the continuous model (1) with respect to both the ERE and NS models.

**Figure 3.** (**a**) Relative errors of **c**∗ *RothC* (blue) and *SOC*<sup>∗</sup> *RothC* (orange) with halved stepsizes, starting from Δ*t* = 1. The bounds on the relative error *ϕ*(Δ*<sup>t</sup> <sup>ρ</sup> <sup>D</sup>*)−<sup>1</sup> <sup>−</sup> *<sup>I</sup>*<sup>2</sup> in (10) are also plotted in black. The estimated order of convergence is 1.0037 for the relative error on **c**∗ *RothC* and 1.0022 for the relative error on SOC. (**b**) The log-log plot of the norm two errors of the periodic solutions of discrete RothC, Exponential Rosenbrock–Euler (ERE) and Non-Standard (NS) schemes, with respect to the reference solution obtained in the long run with the ode45 Matlab code, at Δ*t* = 1/2*<sup>i</sup>* , *i* = 2, ... , 8. The estimated orders of convergence are 1.0042, 1.0017 and 1.0018 for the discrete RothC, ERE and NS schemes, respectively.

In our second experiment, we started at *t*<sup>0</sup> = 1 with **c**<sup>0</sup> = **c**<sup>∗</sup> *cont* in (18), and we proceeded until the final time *Tf* = *t*<sup>0</sup> + 15 *T* was reached. We recall that the period was set at *T* = 12, and we used the monthly, weekly and daily update by setting Δ*t* = 1, Δ*t* = 1/4 and Δ*t* = 1/30, respectively. In Figure 5, we report the values of SOC obtained for the three different procedures. Two main observations have to be made: first, the initial value **c**∗ *cont* corresponding to the equilibrium solution with mean values *<sup>ρ</sup>*<sup>ˆ</sup> and **<sup>b</sup>**<sup>ˆ</sup> was higher than the attained value of SOC reference value at *Tf* when *ρ*(*t*) and **b**(*t*) were not averaged. This indicates that the temporal oscillations of *ρ*(*t*) and **b**(*t*) around their mean values cannot be neglected when evaluating, through the SOC indicator, the achievement of land degradation neutrality in the fifteen years from 2015–2030. Second, whatever discrete model is chosen, a qualitative long-term accordance between an approximate value of the SOC indicator and its theoretical solution needs, at least, a weekly update procedure.

**Figure 4.** Periodic dynamics of the SOC indicator in a one-period span: convergence of approximated solutions to the reference solution (plotted as a continuous red line) obtained with MATLAB ode45 code, by an increasingly reduction of the stepsize Δ*t*.

**Figure 5.** Long-term simulation of the SOC indicator over 15 years with the discrete RothC, ERE and NS methods. Stepsizes are set as Δ*t* = 1 (1 month), Δ*t* = 1/4 (1 week) and Δ*t* = 1/30 (1 day). The reference solution obtained by ode45 with tolerance at the machine precision is also plotted.

#### **5. The Hoosfield Spring Barley Experiment**

We illustrate the model and test the three methods using data from the Hoosfield spring barley experiment, one of the classical long-term experiments carried out at the Rothamster Experimental Station [22]. The same dataset was used as an example of the use of the RothC model in [6]. The Hoosfield experiment was conducted from 1852 till 2000. Spring barley has been grown continuously in the whole period, except in the years 1912, 1933, 1943 and 1967, when the experiment was fallowed to control weeds. The initial SOC content in the soil at 1852 was measured as 33.8632 *t C* ha−1, split out in the soil compartments in the following way: 2.7 *t C* ha−<sup>1</sup> in IOM, 0.1533 *t C* ha−<sup>1</sup> in DPM, 4.4852 *t C* ha−<sup>1</sup> in RMP, 0.6671 *t C* ha−<sup>1</sup> in BIO and 25.8576 *t C* ha−<sup>1</sup> in HUM.

The *ρ* function was estimated on a monthly basis from the weather dataset in [6], including average monthly air temperature, monthly open pan evaporation and monthly rainfall. It was also affected by the percentage of clay in the soil (23.4% in Rothamsted soil) and by the monthly soil cover factor (covered or fallow). Assuming that the soil was covered only from April to July for each crop year, the *ρ* function assumed the values in Table 2. The other parameters of the model were set as *α* = 0.10, *β* = 0.12, *γ* = 0.59 and *η* = 0.49. Moreover, the monthly decomposition rate vector **k** = [0.8333, 0.0250, 0.0550, 0.0017] was considered.

**Table 2.** Monthly values for *ρ* for the Hoosfield spring barley experiment, estimated from weather data in [6], clay content of 23.4% and soil cover factors in crop and fallow years.


Three different scenarios were simulated with the three numerical methods analysed in Section 3, and the results were compared with direct observations of SOC quantity in the soil in 1882, 1913, 1946, 1975, 1982 and 1987:


**Figure 6.** Hoosfield barley experiment. Scenario 1: unmanured treatment.


**Figure 7.** Hoosfield barley experiment. Scenario 2: farmyard manure annually (two times in 1931).

**Figure 8.** Hoosfield barley experiment. Scenario 3: farmyard manure 1852–1871 and nothing thereafter.

Figures 9–11 show the modelled data for total soil organic C in the three treatments, together with the measured data. The modelled results for Scenario 3 when the treatment considered FYM for only the first twenty years were considerably lower than the measurements; agreement was closer with the other two treatments (Scenarios 1 and 2). To test the models' ability to predict the achievement of land degradation neutrality, note that all the simulations gave results in agreement with measured data, i.e., the loss of neutrality in 2000 for Scenario 1 and the achievement of neutrality for Scenario 2 with respect to the initial value in 1852. Measurements indicated the achievement of land degradation neutrality also for Scenario 3; in this case, however, the three models failed in predicting for *SOC*(2000) a value lower than the initial one, i.e., *SOC*(1852) = 33.80 *t C* h*a*−1.

**Figure 9.** Hoosfield barley experiment. Scenario 1: unmanured treatment. Organic C in soil data (red bullets) modelled by the RothC (8), ERE (13) and NS (17) methods in the temporal horizon [1852, 2000] with Δ*t* = 1. According to Table 3, ERE approximation shows the best performance with respects to both the statistical indicators RMSE and modelling Efficiency (EF).

**Figure 10.** Hoosfield barley experiment. Scenario 2: farmyard manure annually. Organic C in soil data (red bullets) modelled by the RothC (8), ERE (13) and NS (17) methods in the temporal horizon [1852, 2000] with Δ*t* = 1. According to Table 3, RothC approximation shows the best performance with respect to both the statistical indicators RMSE and EF.

**Figure 11.** Hoosfield barley experiment. Scenario 3: farmyard manure 1852–1871 and nothing thereafter. Organic C in soil data (red bullets) modelled by the RothC (8), ERE (13) and NS (17) methods in the temporal horizon [1852, 2000] with Δ*t* = 1. According to Table 3, RothC approximation shows the best performance with respect to both the statistical indicators RMSE and EF.

To assess the performance of the discrete models and compare measured and simulated valued, as in [16], we used two well-known statistical indices: RMSE (Root Mean Squared Error) and EF (modelling Efficiency):

$$RMSE = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} (O\_i - S\_i)^2} \qquad EF = 1 - \frac{\sum\_{i=1}^{N} (O\_i - S\_i)^2}{\sum\_{i=1}^{N} (O\_i - \overline{O})^2}$$

where *Oi* and *Si* are observed and simulated SOC at the *i*th value, *O* is the mean of the observed data and *N* is the number of observations. The closer the RMSE is to zero, the better the simulated solution describes the data. EF can range from −∞ to one, with the best performance at EF = 1. Negative values of EF indicate that the observed mean is a better predictor than the model.

The results of the evaluation of the above indicators related to the three discrete models applied to approximate Hoosfield barley experiments data for the three different scenarios are reported in Table 3. The obtained values confirm that Scenario 3 was the worst case, i.e., the case when simulated data were more different from the measurements. The three discrete models provided similar values for the simulation of the three different treatments, and we could not select a method that clearly outperformed the others. However, from Table 3, we could set the ERE model as the best model for Scenario 1, while the RothC model provided better results for both Scenarios 2 and 3. Comparing the indicators for all scenarios with respect to the approximation of the experimental data, the minimum value of *RMSE* = 1.1596 was assumed by the ERE model in Scenario 1, while the maximum value of *EF* = 0.9609 was assumed by the RothC method in Scenario 2.

**Table 3.** RMSE and EF statistical indices for evaluating and comparing the performances of the RothC, ERE and novel non-standard NS models.


#### **6. Conclusions**

Soil organic carbon is one of the key indicators of land degradation status. In this paper, we analysed the RothC model, a simple tool developed for predicting the dynamic evolution of the content of soil organic carbon under the effect of weather conditions and land use data. Both the continuous version, based on a linear, non-autonomous differential system, and the original discrete monthly time-stepping procedure were considered. The aim of our study was to compare the qualitative analysis of both continuous and discrete dynamics in approximating steady-state and long-term periodic solutions. We focused on these aspects since they provide the information concerning the possible achievement of land degradation neutrality. We found that the steady-state solutions of the original discrete model were first-order accurate approximations of the steady-state solutions of the continuous differential model. The accuracy of the approximation represents the main weakness of the RothC discrete model when considered as a first-order accurate approximation of its continuous version. We point out that well-established numerical procedures such as, for example, the Exponential Rosenbrock–Euler (ERE) method, have, by construction, the same equilibria of the approximating differential system. Conversely, the discrete RothC model overestimates the steady-state equilibrium of the continuous flow, so that, in order to obtain the correct estimate with first-order accuracy, we have to reduce the stepsize of the updating time procedure. This fact may have a negative effect in real applications. It is necessary to run the RothC model to equilibrium first, in order to align the initial content of soil organic carbon with the measured one [6]. Nevertheless, the good matching of real data and numerical approximations, together with the available implementation in the RothC 26.3 open-source interface [6], justifies the wide use in the literature of the original discrete RothC model and explains the lack of use of more accurate numerical integrators. For this reason, in this paper, our additional objective was to propose a novel non-standard first-order procedure, the so-called NS discrete model, able to approximate the decomposition dynamics in the same way as the original discrete RothC model. This procedure features the same steady-state solution of the continuous model and exhibits a computational cost lower than the one required by the ERE time-stepping procedure. We also provided a simple code that implements both the original discrete RothC model and the monthly time-stepping NS and ERE models in a MATLAB environment. That allows the reproduction of our results and facilitates

future comparisons among all the approaches. The discrete models were firstly tested on a hypothetical example as first-order accurate numerical integrators, and secondly, they were tested on the classical Hoosfield barley experiment to evaluate their ability in reproducing observed data. If we consider the discrete RothC model as a difference scheme, which approximates the continuous RothC differential system, numerical tests showed its coarseness with respect to both the ERE and NS procedures. When applied to the Hoosfield barley experiment, the three discrete models provided very similar results. However, the evaluation of the statistical error indicators still revealed the discrete RothC model as the best scheme for approximating real data in both Scenarios 2 and 3, while the ERE model outperformed the discrete RothC and the novel procedure in Scenario 1. In predicting the long-term behaviours, which are useful to establish the achievement of land degradation neutrality, the three discrete models failed in giving results in accordance with theoretical values in our hypothetical test, as well as with real data in the case of Scenario 3 of the Hoosfield spring barley experiment. This motivates our future research to consider more complex SOC dynamics. In particular, we will analyse the MOMOS model [23,24], which puts the microbial community at the centre of all transformation processes in the cycle of organic matter, from assimilation by degradation to loss by mineralization. Further analysis will also consider nonlinear SOC dynamics [25,26] and spatially explicit models described by partial differential equations [27,28] in order to deal with real domains, as protected areas, covered by non-homogeneous land use patterns. Finally, using field measurements and remote sensing information for modelling land degradation will significantly improve the obtained results [29]. The final aim is to select and provide a robust modelling tool for evaluating the trends of SOC stocks in the Alta Murgia National Park, where the achievement of land degradation neutrality is hampered by a combination of anthropic pressures and climate change [30–32].

**Author Contributions:** The authors F.D., C.M. and A.M. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was carried out within the Innonetwork Project COHECO, No. 8Q2LH28, POR Puglia FESR-FSE 2014–2020 and within the eLTER PLUS project. The eLTER PLUS project received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No. 871128.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We thank the anonymous referees for their helpful comments. We thank also Nicholas Tavalion for having carefully read the paper and for his help in improving its revision.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

#### **References**


### *Article* **The Role of Spectral Complexity in Connectivity Estimation**

**Elisabetta Vallarino 1, Alberto Sorrentino 1,2,\*, Michele Piana 1,2 and Sara Sommariva <sup>1</sup>**


**Abstract:** The study of functional connectivity from magnetoecenphalographic (MEG) data consists of quantifying the statistical dependencies among time series describing the activity of different neural sources from the magnetic field recorded outside the scalp. This problem can be addressed by utilizing connectivity measures whose computation in the frequency domain often relies on the evaluation of the cross-power spectrum of the neural time series estimated by solving the MEG inverse problem. Recent studies have focused on the optimal determination of the cross-power spectrum in the framework of regularization theory for ill-posed inverse problems, providing indications that, rather surprisingly, the regularization process that leads to the optimal estimate of the neural activity does not lead to the optimal estimate of the corresponding functional connectivity. Along these lines, the present paper utilizes synthetic time series simulating the neural activity recorded by an MEG device to show that the regularization of the cross-power spectrum is significantly correlated with the signal-to-noise ratio of the measurements and that, as a consequence, this regularization correspondingly depends on the spectral complexity of the neural activity.

**Keywords:** regularization theory; multivariate stochastic processes; cross-power spectrum; magnetoencephalography; MEG; functional connectivity; spectral complexity

#### **1. Introduction**

Magnetoencephalography (MEG) provides high temporal resolution measurements of the magnetic field associated to neural currents. The MEG device relies on superconducting sensors, named SQUIDs, organized in a helmet array close and around the scalp. MEG experimental time series can be used essentially to address two neuroscientific problems, whose solution requires both an accurate mathematical modelization based on Maxwell's equations, and the numerical reduction of such formal models [1].

The first problem is concerned with the dynamical ill-posed inverse problem of estimating parameters associated with the neural sources inducing the magnetic field signal [2–8]. The second problem is concerned with the quantification of the interactions among neural sources located in different cortical areas and intertwined by means of either anatomical or functional connectivity [9–14].

In particular, the connectivity problem can be addressed by either computing proper connectivity metrics directly from the experimental time series provided by the MEG sensors or searching for connections in the source space, i.e., among the neural time series estimated as solutions of the inversion process. This second approach has the advantages of reducing the impact of volume conduction and providing results that can be more easily interpreted in the framework of neuroscientific models [15–17]. Several approaches for identifying connectivity paths rely on physiological models assuming that the functional communication between different brain areas is regulated by the synchronization of their activity at specific temporal frequencies [18,19]. This implies that, for these models, the frequency domain represents the natural computational framework where to perform the connectivity analysis. This is the reason why, in the present paper, we focus on the

**Citation:** Vallarino, E.; Sorrentino, A.; Piana, M.; Sommariva, S. The Role of Spectral Complexity in Connectivity Estimation. *Axioms* **2021**, *10*, 35. https://doi.org/10.3390/ axioms10010035

Academic Editors: Gabriella Bretti and Luigi Brugnano

Received: 30 November 2020 Accepted: 13 March 2021 Published: 16 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

analysis of the cross-power spectrum, which is the mathematical quantity of reference for the computation of most frequency-domain connectivity measures [11,20,21]. From an operational viewpoint, the computation of the cross-power spectrum in the source space typically relies on a two-step procedure: first the neural activity is estimated by applying a regularized inversion method on the recorded time series and then the cross-power spectrum is computed from the Fourier transform of the estimated neural time series [22].

This paper investigates how to optimally select the regularization parameter in the inversion procedure in order to obtain the best possible estimate of the neural cross-power spectrum. In fact, we consider the Tikhonov method (better known as Minimum Norm Estimation (MNE) in the MEG world [2]) as it is one of the most commonly used inverse methods in connectivity studies [23–25]; we study the interplay between the regularization parameter providing the reconstructed neural time series minimizing the relative error in -2-norm, and the one that allows the optimal estimate of the cross-power spectrum according to the normalized Frobenius norm. The conceptual motivation of this problem is illustrated in Figure 1, which tentatively sketches the result of recent investigations in MEG-based connectivity research, i.e., that the regularization parameter leading to the optimal estimate of the neural activity may not lead to the optimal estimate of the crosspower spectrum and vice versa. In fact, in [26] the authors used numerical simulations to compare the parameter that provides the best estimate of the power spectrum with the one that provides the best estimate of coherence and showed that the latter is in general two orders of magnitude smaller than the former. More recently, Vallarino et al. [27] addressed an analogous problem via analytical computations, considering a simplified model. Specifically, under the assumption that the neural time series are realizations of white Gaussian processes, the authors proved that the parameter providing the best neural activity estimate is more than twice as large as the one providing the best estimate of the cross-power spectrum.

**Figure 1.** Schematic representation of the differences between the regularization parameter providing the best time series estimate (*λ***x**) and the one providing the best cross-power spectrum estimate (*λ***S**). The first one provides an optimal reconstruction of the neural activity, but it may not lead to an optimal estimate of the cross-power spectrum; vice versa, *λ***<sup>S</sup>** provides an optimal reconstruction of the cross-power spectrum at the expense of a sub-optimal estimate of the time series.

The present paper focuses on an analysis of the impact of spectral complexity of the actual neural signal on the value of the two regularization parameters. Specifically, we simulate synthetic MEG signals and discuss how the optimal parameter for the reconstruction of the cross-power spectrum depends on its signal-to-noise ratio and how this latter quantity is related to the spectral richness of the neural sources. To this aim, we considered a simulation setting in which the signal is modeled as a multivariate autoregressive process.

This paper is structured as follows. Section 2 introduces the problem in a formal way. Section 3 describes how the synthetic data are simulated and analyzed. Section 4 presents the results of the analysis. Our conclusions are offered in Section 5.

#### **2. Definition of the Problem**

#### *2.1. Forward Model*

Let **<sup>X</sup>**(*t*)=(*X*1(*t*), ... , *XN*(*t*)) <sup>∈</sup> *<sup>R</sup><sup>N</sup>* be a multivariate stationary stochastic process whose realizations **<sup>x</sup>**(*t*) can not be observed and let **<sup>Y</sup>**(*t*)=(*Y*1(*t*), ... ,*YM*(*t*)) <sup>∈</sup> *<sup>R</sup><sup>M</sup>* be the process whose realizations **y**(*t*) are used to infer information on **x**(*t*). Let **Y**(*t*) and **X**(*t*) be related by the following equation

$$\mathbf{Y}(t) = \mathbf{G}\mathbf{X}(t) + \mathbf{N}(t) \quad , \tag{1}$$

where **<sup>G</sup>** <sup>∈</sup> *<sup>R</sup>M*×*<sup>N</sup>* is the forward matrix and **<sup>N</sup>**(*t*)=(*N*1(*t*), ... , *NM*(*t*)) <sup>∈</sup> *<sup>R</sup><sup>M</sup>* is the measurement noise, that is here assumed to be a white Gaussian process with zero mean and covariance matrix *<sup>α</sup>*2**I**, i.e., **<sup>N</sup>**(*t*) ∼ N (0, *<sup>α</sup>*2**I**), independent from **<sup>X</sup>**(*t*).

#### *2.2. Cross-Power Spectrum*

We are interested in reconstructing the cross-power spectrum of **X**(*t*), which describes the statistical dependencies between each pair of time series (*Xj*(*t*), *Xk*(*t*))*j*,*k*∈{1,...,*N*}. The cross-power spectrum is a one parameter family of *<sup>N</sup>* <sup>×</sup> *<sup>N</sup>* matrices **<sup>S</sup>X**(*f*), whose (*j*, *k*)-*th* element is defined as

$$S\_{j,k}^{\mathcal{X}}(f) = \lim\_{T \to \infty} \frac{1}{T} E[\mathcal{X}\_j(f, T)\mathcal{X}\_k(f, T)^H],\tag{2}$$

where *X*ˆ *<sup>j</sup>*(*f* , *T*) is the Fourier transform of *Xj*(*t*) over the interval [0, *T*], defined as

$$
\hat{X}\_j(f, T) = \int\_0^T X\_j(t) e^{-2\pi i f t} \,\mathrm{d}t \tag{3}
$$

and *X<sup>H</sup>* is the Hermitian transpose of *X* [28].

Given a realization **x**(*t*) of the process **X**(*t*), the cross-power spectrum *S***X**(*f*) can be estimated via the Welch method [29], which consists in partitioning the data in *P* overlapping segments multiplied by a window function, {*w*(*t*)**x***p*(*t*)}*<sup>P</sup> <sup>p</sup>*=1, computing their discrete Fourier transform **x**ˆ *<sup>p</sup>*(*f*) = <sup>1</sup> *<sup>L</sup>* <sup>∑</sup>*L*−<sup>1</sup> *<sup>t</sup>*=<sup>0</sup> **<sup>x</sup>***p*(*t*)*w*(*t*)*<sup>e</sup>* −2*πitf <sup>L</sup>* and averaging:

$$\mathbf{S}^{\mathbf{x}}(f) = \frac{L}{\text{PW}} \sum\_{p=1}^{P} \mathbf{\hat{x}}^{p}(f) \mathbf{\hat{x}}^{p}(f)^{H}, \quad f = 0, \ldots, L - 1,\tag{4}$$

where *L* is the length of each segment and *W* = <sup>1</sup> *<sup>L</sup>* <sup>∑</sup>*L*−<sup>1</sup> *<sup>t</sup>*=<sup>0</sup> *<sup>w</sup>*(*t*)2.

It is often the case that the data reach a high dimension, and visual inspection of the cross-power spectrum is not doable. In such cases a metric that describes the spectral properties of the signals would be useful. Here we use the spectral complexity coefficient, defined as follows.

**Definition 1.** *Given a realization* **x**(*t*) *of the process* **X**(*t*)*, and the corresponding cross-power spectrum* **Sx**(*f*)*, we define the spectral complexity coefficient as the average of the elements of the upper triangular part of the matrix obtained by computing the squared* -<sup>2</sup>−*norm over the frequencies of* **S<sup>x</sup>** *<sup>j</sup>*,*k*(*f*)*, j*, *k* = 1, . . . , *N, that is*

$$\mathcal{L} = \frac{2}{N(N+1)} \sum\_{j=1}^{N} \sum\_{k=j}^{N} \sum\_{f} \left| S\_{j,k}^{\mathbf{x}}(f) \right|^2. \tag{5}$$

The spectral complexity coefficient assumes small values if the elements of the crosspower spectrum are flat, that is when time series do not present any periodic trend and no dependencies among the pairs of time series are present. On the contrary, it assumes large values if the elements of the cross-power spectrum are peaked, that is when time series

present periodic trends and complex relations among them. Finally, we observe that in Definition 1 only the elements on the upper triangular part of **Sx**(*f*) are considered because **Sx**(*f*) is Hermitian.

#### *2.3. Two-Step Approach for Cross-Power Spectrum Estimation*

Let us now consider a realization of Equation (1). Further than an estimate of the hidden data **x**(*t*), an estimate of the cross-power spectrum can be obtained from **y**(*t*). Such estimate can be achieved through a two-step process [22]:

i. First, a regularized estimate **x***λ*(*t*) of **x**(*t*) is obtained by solving the inverse problem associated to Equation (1). Here we consider the Tikhonov regularized solution [30] of the problem which is defined as

$$\mathbf{x}\_{\lambda}(t) = \arg\min\_{\mathbf{x}(t)} \left\{ \|\mathbf{G}\mathbf{x}(t) - \mathbf{y}(t)\|\_{2}^{2} + \lambda \|\mathbf{x}(t)\|\_{2}^{2} \right\};\tag{6}$$

where *λ* is a proper regularization parameter and ·<sup>2</sup> is the -2-norm.

ii. Then, the corresponding estimate of the cross-power spectrum **Sx***<sup>λ</sup>* (*f*) is computed from the reconstructed time series using the Welch method, as described in the previous section.

**Remark 1.** *In many applied fields, Tikhonov regularization with an* -<sup>2</sup> *penalty term has been outdated by more modern techniques that use sparsity-inducing penalization terms such as* -1 *or <sup>p</sup> with* 0 < *p* < 1*. Indeed, also in the M/EEG literature there has been considerable effort in developing* -<sup>1</sup> *solutions [31,32], and mixed norm solutions [33]; both these approaches have proved to provide superior performances in terms of localization of neural activity. However, these newer methods are seldom used in connectivity studies, for good reasons:* -<sup>1</sup> *solutions computed independently at each time point produce extremely jittering reconstructions, resulting in highly sparse time courses that are not suitable for computing connectivity metrics. Mixed norms, that have been developed precisely to overcome this jittering problem, are computationally very expensive, and this actually prevents their use with the large datasets typically involved in connectivity studies.*

When applying the described two-step process, the regularization parameter *λ* in Equation (6) has to be set for the computation of **x***λ*(*t*). Thus, the problem naturally arises of the choice of such parameter, which can be set in order to optimally reconstruct either **x***λ*(*t*) or **Sx***<sup>λ</sup>* (*f*). We define optimality through the minimization of the normalized norm of the discrepancy between the true and the reconstructed time series and cross-power spectra as follows.

**Definition 2.** *Given the regularized solution (6) and the cross-power spectrum (4), we define the optimal regularization parameter for the reconstruction of* **x**(*t*) *as*

$$\lambda\_\mathbf{x}^\* = \arg\min\_\lambda \varepsilon\_\mathbf{x}(\lambda) \quad \text{with} \quad \varepsilon\_\mathbf{x}(\lambda) = \frac{\sum\_l \|\mathbf{x}\_\lambda(t) - \mathbf{x}(t)\|\_2^2}{\sum\_l \|\mathbf{x}\_\lambda(t)\|\_2^2 + \sum\_l \|\mathbf{x}(t)\|\_2^2} \quad ; \tag{7}$$

*and the optimal parameter for the reconstruction of* **Sx**(*f*) *as*

$$
\lambda\_\mathbf{S}^\* = \arg\min\_\lambda \varepsilon\_\mathbf{S}(\lambda) \quad \text{with} \quad \varepsilon\_\mathbf{S}(\lambda) = \frac{\sum\_f \left\| \mathbf{S}^{\mathbf{x}\_\lambda}(f) - \mathbf{S}^\mathbf{x}(f) \right\|\_F^2}{\sum\_f \left\| \mathbf{S}^{\mathbf{x}\_\lambda}(f) \right\|\_F^2 + \sum\_f \left\| \mathbf{S}^\mathbf{x}(f) \right\|\_F^2} \tag{8}
$$

*where* ·*<sup>F</sup> is the Frobenius norm; ε***x**(*λ*) *and ε***S**(*λ*) *will be called reconstruction errors.*

The reconstruction errors range from 0 to 1 and penalize both a too small and a too large value of *λ*. In fact, they assume their maximum value when either *λ* is very high and thus **x***λ*(*t*) is negligible with respect to **x**(*t*), or when *λ* is too small and thus, vice versa, **x**(*t*) is negligible with respect to **x***λ*(*t*). This definition may appear overly complex compared to,

e.g., a mere -2-norm of the difference; however, in the presence of sparse data where only a few time series are non-zero, the simple -2-norm would prefer a very high regularization parameter in order to minimize the error on the null time series, at the expense of the error on the non-zero ones; our definition aims to cope with this limitation of the -2-norm. A similar definition has been introduced in [34].

In experimental contexts, where **x**(*t*) is not known, the choice of the optimal regularization parameter is crucial. This matter is widely discussed in literature [35–38], and many criteria have been proposed. Such criteria apply to Equation (1) and can be used to set the regularization parameter *λ***x**. A possibility is to set the regularization parameter as a function of the signal-to-noise ratio (SNR), which describes the level of the desired signal with respect to that of the measurement noise; for Equation (1) the SNR is defined as follows.

**Definition 3.** *Consider the linear model (1). We define the signal-to-noise ratio of* **X**(*t*) *related to such model as*

$$\text{SNR}^{\mathbf{X}} = 10 \log\_{10} \left( \frac{\sum\_{t} \|\mathbf{G}\mathbf{X}(t)\|\_{2}^{2}}{\sum\_{t} \|\mathbf{N}(t)\|\_{2}^{2}} \right) . \tag{9}$$

To the best of our knowledge, the choice of the optimal regularization parameter for the reconstruction of the cross-power spectrum has never been related to the signal-to-noise ratio. This relation is presented in Section 4; however we first need to relate the cross-power spectrum of the unknown **SX**(*f*) with that of the data **SY**(*f*).

By computing the cross-power spectrum of both sides of Equation (1) and from the linearity of the Fourier transform it follows that

$$\mathbf{S}^{\Upsilon}(f) = \mathbf{G}\mathbf{S}^{\Upsilon}(f)\mathbf{G}^{\top} + \mathbf{S}^{\Upsilon}(f),\tag{10}$$

where the mixed terms **SXN**(*f*) and **SNX**(*f*) are negligible thanks to the independence between **X**(*t*) and **N**(*t*). Just like for Equation (1), we can define the signal-to-noise ratio for Equation (10) as follows.

**Definition 4.** *Consider the linear model (10). We define the signal-to-noise ratio of* **SX**(*f*) *related to such model as*

$$\text{SNR}^{\mathbf{S}} = 10 \log\_{10} \left( \frac{\sum\_{f} \left\| \mathbf{G} \mathbf{S}^{\mathbf{X}}(f) \mathbf{G}^{\top} \right\|\_{F}^{2}}{\sum\_{f} \left\| \mathbf{S}^{\mathbf{N}}(f) \right\|\_{F}^{2}} \right). \tag{11}$$

This definition is in line with the definition of SNR**<sup>X</sup>** for the signal, the main difference being in the use of the Frobenius norm rather than the -2-norm, motivated by the fact that we are working with matrices rather than vectors.

#### *2.4. Multivariate Autoregressive Models*

To model the statistical relationships between the different components of the stochastic process **X**(*t*), in this work the latter is assumed to follow a stable multivariate autoregressive model of order *P* [39].

**Definition 5.** *A zero-mean stochastic process* **<sup>X</sup>**(*t*) <sup>∈</sup> <sup>R</sup>*<sup>N</sup> is said to follow a multivariate autoregressive (MVAR) model of order P if*

$$\mathbf{X}(t) = \sum\_{k=1}^{P} \mathbf{A}(k)\mathbf{X}(t-k) + \mathbf{s}(t) \qquad \forall t \text{ \textquotedbl{}t\textquotedbl{}} \tag{12}$$

*where* **<sup>A</sup>**(*k*) <sup>∈</sup> <sup>R</sup>*N*×*<sup>N</sup> are fixed coefficient matrices, and <sup>ε</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>N</sup> is a white Gaussian noise process.*

*Moreover, the MVAR model described by Equation (12) is said to be stable if*

$$\det(\mathbf{I} - \sum\_{k=1}^{p} \mathbf{A}(k)z^{k}) \neq 0 \qquad \forall \; z \in \mathbb{C} \; s.t. \; |z| \; \le 1,\tag{13}$$

*where* **I** *is the identity matrix of size N.*

**Remark 2.** *From Equation (12) it can be easily seen that the process* **X**(*t*) *is uniquely determined by the process ε*(*t*) *and by the first P time points,* **X**(0)*,* ... *,* **X**(*P* − 1)*. Indeed, consider for example an MVAR model of order 1 (a similar proof holds for the general case P* > 1 *and can be found in [39]); then, for each time point t*

$$\begin{aligned} \mathbf{X}(t) &= \mathbf{A}(1)\mathbf{X}(t-1) + \boldsymbol{\varepsilon}(t) \\ &= \mathbf{A}(1)^2 \mathbf{X}(t-2) + \mathbf{A}(1)\boldsymbol{\varepsilon}(t-1) + \boldsymbol{\varepsilon}(t) \\ &= \mathbf{A}(1)^t \mathbf{X}(0) + \sum\_{k=0}^{t-1} \mathbf{A}(1)^k \boldsymbol{\varepsilon}(t-k) \end{aligned}$$

*Such a model satisfies the stability condition defined in Equation (13) if all the eigenvalues of the coefficient matrix* **A**(1) *have a modulus of less then one, the condition that guarantees the sequence of exponential matrices* ! **A**(1)*<sup>k</sup>* " *k is absolutely summable.*

According to Equation (12), if the process **X**(*t*) follows an MVAR model, then at each time point the value of **X**(*t*) can be derived as a weighted sum of the values of the process at the previous *P* time points, **X**(*t* − 1), ..., **X**(*t* − *P*), plus a random perturbation *ε*(*t*). In particular, the (*i*, *j*)-th elements of the coefficient matrices, *aij*(1), ..., *aij*(*P*), describe how the value of the *i*-th component of the process depends on the past of the *j*-th component. Different connectivity patterns, with various levels of complexity, can thus be obtained by tuning the off-diagonal values of the coefficient matrices. Due to their flexibility and simplicity, MVAR models have been used by various authors in the framework of MEG functional connectivity estimation as a benchmark for testing and comparing different connectivity metrics [21,23,34,40–43]. Other models have been proposed to simulate different connectivity patterns, such as coherent sinusoidal time series [26], neural mass models [44,45], or Kuramoto models [46,47]. However a comprehensive comparison of all possible generative models is beyond the scope of this work.

#### **3. Generation and Analysis Pipeline of the MEG Simulated Data**

In this section we describe the numerical simulation that led to the main results of our study. First we introduce the continuous MEG forward problem and its discretized version, then we describe how we generated the data and, finally, we describe the inverse model and how we numerically computed the optimal regularization parameters.

#### *3.1. MEG Forward Model*

The MEG forward problem aims at computing the magnetic field produced outside the head by an electric current that flows inside the brain. The quasi static approximation of Maxwell's equations provides the local relationship between the recorded magnetic field and the neural currents [1,48,49]. The two equations that are of interest here read as

$$\nabla \times \mathbf{E}(\mathbf{r}, t) = 0 \tag{14}$$

$$\nabla \times \mathbf{B}(\mathbf{r}, t) = \mu\_0 \mathbf{J}(\mathbf{r}, t); \tag{15}$$

where **E**(**r**, *t*) and **B**(**r**, *t*) are the electric and magnetic fields at location **r** and time *t*, *μ*<sup>0</sup> is the magnetic permeability in vacuum and **J**(**r**, *t*) is the total electric current that flows inside the brain. The latter is the sum of two contributions

$$\mathbf{J}(\mathbf{r},t) = \mathbf{J}^p(\mathbf{r},t) + \mathbf{J}^v(\mathbf{r},t),\tag{16}$$

**J***p*(**r**, *t*) being the primary current directly related to the brain activity, while **J***v*(**r**, *t*) = −*σ*(**r**)∇*V*(**r**, *t*) is the induced volume current due the non-null conductivity *σ*(**r**) of the brain, *V*(**r**, *t*) being the electric scalar potential.

The manipulation of Maxwell's Equations leads to the Biot–Savart Equation

$$\mathbf{B}(\mathbf{r},t) = \mathbf{B}\_0(\mathbf{r},t) - \frac{\mu\_0}{4\pi} \int\_{\Omega} \sigma(\mathbf{r}') \nabla' V(\mathbf{r}',t) \times \frac{\mathbf{r} - \mathbf{r}'}{|\mathbf{r} - \mathbf{r}'|^3} \mathbf{d}v' \tag{17}$$

where <sup>Ω</sup> is the volume occupied by the brain, the first term **<sup>B</sup>**0(**r**, *<sup>t</sup>*) = *<sup>μ</sup>*<sup>0</sup> 4*π* <sup>Ω</sup> **J**(**r** , *t*) **<sup>r</sup>**−**<sup>r</sup>** <sup>|</sup>*r*−*r*|<sup>3</sup> <sup>d</sup>*<sup>v</sup>* is the magnetic field induced by the primary current, whereas the second term is related to the volume current.

Solving the forward problem requires the computation of these two contributions knowing the primary current. Although for the first one straightforward numerical integration is feasible, for the second one it is common to model the head as the union of nested homogeneous volumes {Ω*j*}*j*=1,...,*<sup>J</sup>* and to replace volume integration with surface integration. In this way, the Biot–Savart equation becomes

$$\mathbf{B}(\mathbf{r},t) = \mathbf{B}\_0(\mathbf{r},t) + \frac{\mu\_0}{4\pi t} \sum\_{i,j} (\sigma\_i - \sigma\_j) \int\_{\partial\Omega\_{i,j}} V(\mathbf{r}',t) \frac{\mathbf{r} - \mathbf{r}'}{|r - r'|^3} \times \mathbf{n}\_{i,j}(\mathbf{r}') \mathrm{d}s',\tag{18}$$

where *∂*Ω*i*,*<sup>j</sup>* is the contact surface between regions Ω*<sup>i</sup>* and Ω*<sup>j</sup>* and **n***i*,*j*(**r** ) is the unit vector normal to the surface *∂*Ω*i*,*<sup>j</sup>* at **r** from region *i* to region *j*.

The forward problem can now be solved by computing the second term at the right hand side of Equation (18) after having computed *V*(**r**, *t*) by solving the equation

$$\nabla \cdot \mathbf{J}^p(\mathbf{r}, t) - \nabla \cdot (\sigma(\mathbf{r}) \nabla V(\mathbf{r}, t)) = 0 \,, \tag{19}$$

which follows from Equation (15) by applying the divergence.

For further details on the MEG forward problem, we refer the reader to [1].

#### *3.2. The Leadfield Matrix*

Experimental contexts require the discretization of the forward problem. This involves a discretization of both the volume occupied by the brain and the volume outside the head.

When using a distributed model for the primary current **J***p*, the brain volume is uniformly divided in *N* small parcels. If *N* is sufficiently big and thus each parcel has a sufficient small area, the activity in each brain parcel is approximated by a point-like source, henceforth denoted as dipole. From a mathematical point of view, each dipole is a vector whose strength and direction represent the intensity and orientation of the primary current in the corresponding brain area [1].

As for the volume outside the brain, it is natural to discretize it in correspondence of the MEG sensors. Let us denote the measured magnetic field as **y**(*t*)=(*y*1(*t*), ... , *yM*(*t*)). Now, observing that the magnetic field **B** depends linearly on the primary current **J***p*, the magnetic field in correspondence of the sensors of the instrument is

$$\mathbf{y}(t) = \sum\_{k=1}^{N} G(\mathbf{r}\_k) \mathbf{q}\_k(t) + \mathbf{n}(t),\tag{20}$$

where **<sup>r</sup>***k*, *<sup>k</sup>* <sup>=</sup> 1, ... , *<sup>N</sup>*, is the location of the *<sup>k</sup>*-th brain parcel, *<sup>G</sup>*(**r***k*) <sup>∈</sup> *<sup>R</sup>M*×<sup>3</sup> is the corresponding leadfield matrix, and {**q***k*(*t*)}*k*=1,...,*<sup>N</sup>* are the electric current intensities along the three orthogonal direction of the *N* dipoles within the brain at time *t* and **n**(*t*) is the measurement noise. The *l*-th column of *G*(**r***k*) contains the measurement at a sensor level when a unit current dipole is placed at location **r***<sup>k</sup>* and oriented along the *l*-th orthogonal direction.

In this work, we assume dipoles to be located only on the brain cortical mantle and their orientation to be normal to the local cortical surface [50]. In this case, the electric current intensities are scalar quantities (we refer to them as {*qk*}*k*=1,...,*N*) and the leadfield matrices are column vectors (we refer to them as {*Gk*}*k*=1,...,*N*).

Let us define

$$\mathbf{x}(t) := (q\_1(t), \dots, q\_N(t))\tag{21}$$

and

$$\mathbf{G} := [\mathbf{G}\_1, \dots, \mathbf{G}\_N] \in \mathbb{R}^{M \times N} \text{ ;} \tag{22}$$

reassembling Equations (21) and (22) in to Equation (20), we get

$$\mathbf{y}(t) = \mathbf{G}\mathbf{x}(t) + \mathbf{n}(t),\tag{23}$$

which can be interpreted as a realization of Equation (1). From now on we refer to **G** as to the leadfield matrix.

For the simulation presented in this work, we used the leadfield matrix available in the sample dataset of MNE Python [51]. We selected magnetometers and set a fixed orientation. For computational reasons, the available source space, containing 1884 sources, was uniformly down-sampled to obtain 274 sources. Thus, our model has *M* = 102 sensors and *N* = 274 dipole sources.

#### *3.3. Data Generation*

We simulated *Nmod* = 10 pairs of active sources, (*z*1(*t*), *z*2(*t*)), with unidirectional coupling from the first to the second; their time series follow a multivariate autoregressive (MVAR) model of order *P* = 5 [39,43]

$$\left( \begin{array}{c} z\_1(t) \\ z\_2(t) \end{array} \right) = \sum\_{k=1}^{P} \binom{a\_{1,1}(k)}{a\_{2,1}(k)} \begin{array}{c} 0 \\ z\_{2,2}(t) \end{array} \left( \begin{array}{c} z\_1(t-k) \\ z\_2(t-k) \end{array} \right) + \binom{\varepsilon\_1(t)}{\varepsilon\_2(t)}, \ t = P, \dots, T. \tag{24}$$

The non-zero elements *ai*,*j*(*k*) of the coefficient matrices were drawn from a normal distribution of zero mean and standard deviation *γ*, and T = 10,000. We retained only coefficient matrices providing (i) a stable MVAR model [39] and (ii) pairs of signals (*z*1(*t*), *z*2(*t*)) such that the -2-norm of the strongest one was less than three times the -2-norm of the weakest one. In order to obtain time series with different spectral complexity coefficients we set *γ* to *Nmod* different values randomly drawn in the interval [0.1, 1]. The values of the spectral complexity coefficient of the *Nmod* simulated time series are reported in Table 1. Finally, the resulting time series (*z*1(*t*), *z*2(*t*)) were normalized by the mean of their standard deviations over time, so that pairs of time series drawn from different models had similar magnitude. Figure 2 shows a sample of the the cross-power spectra among the simulated pairs of time series. The figure shows that for increasing values of the spectral complexity coefficient the cross-power spectrum of the corresponding time series becomes more peaked. Each pair of simulated time series was then assigned to *Nloc* = 20 pairs of point like sources randomly chosen in the source space, so that the ratio of the norms of the corresponding columns of the leadfield matrix was close to one, i.e., they had similar intensity at a sensor level, and their distance was grater than 7 cm. The remaining *N* − 2 sources were set to have null activity.

Source space activity was then projected to sensor level by multiplying the simulated source activity by the leadfield matrix and white Gaussian noise was added to obtain *Nsnr* <sup>=</sup> 6 levels of SNR**<sup>X</sup>** evenly spaced in the interval [−20 dB, 5 dB].

Summarizing, we generated *Nmod* · *Nloc* · *Nsnr* = 1200 different sensor level configurations. The green box in Figure 3 shows a visual representation of the simulation pipeline.

**Table 1.** The table reports the values of the spectral complexity coefficients, *cj*, associated to each simulated multivariate autoregressive (MVAR) model, *mj*, *j* = 1, . . . , *Nmod*.

**Figure 2.** Real and imaginary part of the cross–power spectra of three simulated time series. Higher values of spectral complexity correspond to more peaked spectra.

**Figure 3.** Pipeline of the simulation of the data (green box) and of the estimation of the cross–power spectrum (blue box).

#### *3.4. Inverse Model*

Source space time series were reconstructed using the Tikhonov method, also known as the minimum norm estimate (MNE) [2] within the MEG community. For each combination of source time series, source locations, and SNR**<sup>X</sup>** level, we computed the optimal regularization parameters *λ*∗ **<sup>x</sup>** and *λ*<sup>∗</sup> **<sup>S</sup>** by minimizing the reconstruction errors *ε***x**(*λ*) and *ε***S**(*λ*), defined in Definition 2. The minimization procedure was achieved by using the Matlab built in function fminsearch that implements an iterative procedure based on the simplex method developed by Lagarias and colleagues [52]. In more detail, *λ*∗ **<sup>x</sup>** and *λ*<sup>∗</sup> **<sup>S</sup>** have been obtained by applying such procedure to *ε***x**(*λ*) and *ε***S**(*λ*), respectively; in both cases the starting point of the simplex method was set equal to 10 <sup>−</sup> SNR**<sup>X</sup>** <sup>10</sup> , which corresponds to the optimal value of *λ***<sup>x</sup>** in the case of white Gaussian signals [27]. The blue box in Figure 3 describes the inverse procedure to obtain an estimate of the cross-power spectrum and stresses the role of the regularization parameter in the two-step process.

#### **4. Results**

In this section we illustrate the results of our analysis. We begin with the description of the analytical dependence between SNR**<sup>X</sup>** and SNR**S**, then we highlight how the optimal parameter for the reconstruction of the cross-power spectrum depends on SNR**<sup>S</sup>** and how this implies that the spectral complexity of the signal is behind such dependence. Finally we show how the reconstruction error *ε***S**(*λ*) behaves for different values of the regularization parameters. As a byproduct, this analysis also confirms the results of Vallarino et al. [27] in the case of a more complex setting.

#### *4.1. Analytical Relation between* SNR**<sup>X</sup>** *and* SNR**<sup>S</sup>**

From Equations (9) and (11) and reminding that **<sup>N</sup>**(*t*) ∼ N (0, *<sup>α</sup>*2**I**) it follows that

$$\text{SNR}^{\mathcal{X}} = 10 \log\_{10} \left( \frac{\sum\_{t} \|\mathbf{G}\mathbf{X}(t)\|\_{2}^{2}}{M T a^{2}} \right);\tag{25}$$

and

$$\text{SNR}^{\mathbf{S}} = 10 \log\_{10} \left( \frac{\sum\_{f} \left\| \mathbf{G} \mathbf{S}^{\mathbf{X}}(f) \mathbf{G}^{\top} \right\|\_{F}^{2}}{M N\_{f} a^{2}} \right), \tag{26}$$

where *T* is the number of time points and *Nf* is the number of frequencies used to compute the cross-power spectrum. Observe that to derive Equation (26) we used the fact that the cross-power spectrum of a white noise Gaussian process of zero mean and covariance matrix *α*2**I** is **SN**(*f*) = *α*2**I**.

By isolating *α*<sup>2</sup> from Equation (25) and substituting in Equation (26) we obtain

$$\text{SNR}^{\mathbf{S}} = 10 \log\_{10} \left( \frac{T^2 M \sum\_{f} \left\| \mathbf{G} \mathbf{S}^{\mathbf{X}}(f) \mathbf{G}^{\top} \right\|\_{F}^{2}}{N\_{f} \sum\_{t} \left\| \mathbf{G} \mathbf{X}(t) \right\|\_{2}^{4}} \right) + 2 \text{SNR}^{\mathbf{X}}. \tag{27}$$

Equation (27) relates the signal-to-noise ratio of **X**(*t*) with that of **SX**(*f*). It shows that, for same levels of SNR**X**, SNR**<sup>S</sup>** changes with the spectral complexity coefficient of the signals. In fact, the higher the spectral complexity coefficient, the higher the quantity ) )**GSX**(*f*)**G**) )2 *<sup>F</sup>*. Intuitively, this happens because when the signal has a higher spectral complexity coefficient its cross-power spectrum is more peaked and thus it is stronger over the cross-power spectrum of the noise with respect to a signal with a lower spectral complexity coefficient.

#### *4.2. Dependence of λ*∗ **<sup>S</sup>** *on* SNR**<sup>S</sup>**

As described in Section 3 we simulated several sensor level configurations, based on different combinations of spectral complexity coefficients, source locations, and SNR**<sup>X</sup>** levels. For each configuration, we collected the two optimal parameters *λ*∗ **<sup>x</sup>** and *λ*<sup>∗</sup> **<sup>S</sup>** and we investigated their dependence on the signal-to-noise-ratio. In accordance with classical results from inverse theory [36], we found that *λ*∗ **<sup>x</sup>** depends on the signal-to-noise ratio. What is novel here is the relation between *λ*∗ **<sup>S</sup>** and both SNR**<sup>X</sup>** and SNR**S**. Indeed, for increasing SNR**X**, less regularization is needed, but such dependence varies with the MVAR models. On the other side, the dependence of *λ*∗ **<sup>S</sup>** on SNR**<sup>S</sup>** is neater and does not depend on the models. Figure 4 shows this result; on the left the regularization parameters for the cross-power spectrum reconstruction versus SNR**<sup>X</sup>** are shown, while on the right the same parameters are shown with respect to SNR**S**. For the ease of presentation the figure shows the parameters related to one source location; while on the left lines corresponding to different MVAR models have different heights, on the right they overlap.

**Figure 4.** Optimal regularization parameters for the reconstruction of the cross–power spectrum (*λ*∗ **<sup>S</sup>**) as a function of SNR**<sup>X</sup>** (**left**) and SNR**<sup>S</sup>** (**right**). Different colors correspond to different MVAR models. On the left, the lines have different heights, while on the right they overlap, meaning that the dependence of *λ*∗ **<sup>S</sup>** on SNR**<sup>S</sup>** is neater with respect to SNR**X**.

#### *4.3. λ*∗ **<sup>S</sup>** < *λ*<sup>∗</sup> **<sup>x</sup>** *and Dependency from the Spectral Complexity*

We also investigated the relation between the two optimal regularization parameters. Figure <sup>5</sup> shows the ratio *<sup>λ</sup>*<sup>∗</sup> **S** *λ*∗ **X** versus SNR**<sup>X</sup>** for the simulated MVAR models. The ratio between the two parameters is always smaller than <sup>1</sup> <sup>2</sup> , meaning that *λ*<sup>∗</sup> **<sup>S</sup>** <sup>&</sup>lt; <sup>1</sup> 2*λ*<sup>∗</sup> **<sup>x</sup>**, as it was analytically proved in a simplified case in [27]. Further to this, the figure shows that for increasing spectral complexity coefficients this ratio gets smaller. This latter result is directly related to Equation (27). In fact, for same levels of SNR**X**, signals with higher spectral complexity have higher SNR**<sup>S</sup>** and, thus, need less regularization.

#### *4.4. The Reconstruction Errors*

To show the benefit of using a value of the regularization parameter different from *λ*∗ **<sup>x</sup>** when estimating the cross-power spectrum, in Figure 6 we plotted the reconstruction errors *ε***S**(*λ*) as a function of the regularization parameter (normalized by *λ*<sup>∗</sup> **<sup>x</sup>**) obtained when considering two illustrative realizations of the simulated sensor data. Specifically, we fixed the locations and time courses of the pair of interacting sources and we considered the corresponding simulated MEG data for two levels of SNR**X**, namely SNR**<sup>X</sup>** <sup>=</sup> <sup>−</sup>20 dB and SNR**<sup>X</sup>** = 5 dB. Similar results where obtained when considered the other source configurations.

**Figure 5.** Ratio between the optimal parameters ( *<sup>λ</sup>*<sup>∗</sup> **S** *λ*∗ **x** ) as a function of SNR**X**. Different colors correspond to MVAR model with different spectral complexities. Dashed lines are the mean of the ratio over the different sources location; solid colors correspond to the standard deviation of the mean.

**Figure 6.** Reconstruction error *ε***S**(*λ*) for two simulated data mimicking MEG signals with SNR**<sup>X</sup>** = <sup>−</sup>20 dB (lowest considered signal–to–noise ratio (SNR), **left** panel) and SNR**<sup>X</sup>** <sup>=</sup> 5 dB (highest considered SNR, **right** panel). In each panel, black and red vertical lines highlight the values of *ε***S**(*λ*) in correspondence of *λ*∗ **<sup>S</sup>** and *λ*<sup>∗</sup> **<sup>x</sup>**, respectively.

As shown by Figure 6, for both the values of SNR**<sup>X</sup>** the value of the reconstruction error significantly decreases when *λ*∗ **<sup>S</sup>** is used instead of *λ*<sup>∗</sup> **<sup>x</sup>**. Specifically in this simulation, *ε***S**(*λ*) drops from 0.99 to 0.96 when SNR**<sup>X</sup>** <sup>=</sup> <sup>−</sup>20 dB, and from 0.92 to 0.77 when SNR**<sup>X</sup>** <sup>=</sup> 5 dB.

Notably, one may observe that the relative reconstruction errors shown in Figure 6 are rather large, being above 90% in the low-SNR case and remaining above 75% even in the high-SNR scenario. We point out that this fact is mainly due to the combined effect of two factors: first, Tikhonov regularization tends to produce reconstructions that are small but non-zero almost everywhere, as it reduces but does not cancel entirely backprojection of noise; second, in our simulations the true activity is zero everywhere but in two points. These two facts inevitably lead to large relative errors that, however, pleasantly decrease for increasing values of SNR**X**.

#### **5. Discussion and Conclusions**

In the present work, we investigated the role of the spectral complexity of a time series, **x**(*t*), in the design of an optimal inverse technique for estimating its cross-power spectrum, **Sx**(*f*), from indirect measurements of the time series itself. Motivated by an analysis pipeline widely used for estimating brain functional connectivity from MEG data, we reconstructed the cross-power spectrum in two steps: first, we estimated the unknown time series by using the Tikhonov method, then, we computed the cross-power spectrum of the reconstructed time series. In the present work, we used numerical simulations to study how the spectral complexity of **x**(*t*) impacts the value of the regularization parameter that provides the best reconstruction of the cross-power spectrum.

As a first analytical result, we related SNR**<sup>X</sup>** to SNR**S**, i.e., the signal-to-noise ratio of the time series and the signal-to-noise ratio of the corresponding cross-power spectra. The obtained formula suggests that, for a fixed level of SNR**X**, SNR**<sup>S</sup>** depends on the spectral complexity of **x**(*t*): the higher the spectral complexity coefficient the higher SNR**S**. Intuitively this happens because a higher value of the spectral complexity coefficient corresponds to a more peaked cross-power spectrum that will emerge over the cross-power spectrum of the noise.

To test the effect of this result on the choice of the Tikohonov regularization parameter in a practical scenario, we simulated a large set of MEG data and applied the described two-step approach for estimating the cross-power spectrum of the underlying neural sources. In details, we simulated 1200 synthetic MEG data with varying SNR**<sup>X</sup>** generated by pairs of coupled point-like sources at varying locations and with different spectral complexities. For each simulated data, we computed the two parameters providing the best estimates of the time series (*λ*∗ **x**) and of the cross-power spectrum (*λ*<sup>∗</sup> **<sup>S</sup>**), defined as the ones minimizing the relative -<sup>2</sup> norm of the difference between the true and the reconstructed time series/cross-power spectrum according to Definition 2. As shown by Figure 4, the results of our simulations highlighted a high correlation between the values of *λ*∗ **<sup>S</sup>** and of SNR**S**.

Eventually, we focused on the relationship between the two parameters *λ*∗ **<sup>x</sup>** and *λ*<sup>∗</sup> **S**, whose ratio is shown in Figure 5. The figure points out that this ratio depends on the spectral complexity of the simulated time series. This fact may be understood in lights of the previous results, as *λ*∗ **<sup>S</sup>** depends on SNR**<sup>S</sup>** that in turns depends on the spectral complexity coefficient. Additionally, we found that, for all the simulated data, *<sup>λ</sup>*<sup>∗</sup> **S** *λ*∗ **<sup>x</sup>** <sup>&</sup>lt; <sup>1</sup> 2 , in line with the results shown in [27] for a simplified model where the neural time series were assumed to be white Gaussian processes. Moreover, when the spectral complexity coefficient increases (*c* > 5 in our simulations) the ratio between the two parameters approaches 0.01. This agrees with the results shown in [26] where, by simulating sinusoidal signals, the authors suggested to use for connectivity estimation a parameter of two orders of magnitude lower. In fact, our numerical results indicate that the use of *λ*∗ **<sup>S</sup>** results in a substantially lower reconstruction error on the cross-power spectrum, particularly when the data have a high SNR.

The present work focuses on the cross-power spectrum as a connectivity metric. Even though the cross-power spectrum is the starting point for the computation of many connectivity metrics it would be interesting to directly investigate the behavior of the Thikonov regularization parameters when using such metrics. Future works will be devoted to this. It is also worth noticing that the definition of optimality when defining the regularization parameters is not univocal, since many metrics can be used. A common example is the area under the curve (AUC), which is the metric that was used in [26]. The use of different metrics would firstly strengthen our results and would also allow a more straightforward comparison with the results of [26]. Finally, the dependence of *λ*∗ **<sup>S</sup>** on SNR**<sup>S</sup>** suggests that an analysis of such dependence could be considered for the definition of a rule for choosing *λ*∗ **<sup>S</sup>** in practical scenarios.

**Author Contributions:** Conceptualization: M.P., S.S., and A.S.; methodology: S.S. and E.V.; software: E.V.; validation and writing: all authors. All authors have read and agreed to the published version of the manuscript.

**Funding:** E.V., A.S., S.S. and M.P. have been partially supported by Gruppo Nazionale per il Calcolo Scientifico.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Cellular Potts Model for Analyzing Cell Migration across Constraining Pillar Arrays**

**Marco Scianna\* and Luigi Preziosi**

Department of Mathematical Sciences, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy; luigi.preziosi@polito.it

**\*** Correspondence: marco.scianna@polito.it

**Abstract:** Cell migration in highly constrained environments is fundamental in a wide variety of physiological and pathological phenomena. In particular, it has been experimentally shown that the migratory capacity of most cell lines depends on their ability to transmigrate through narrow constrictions, which in turn relies on their deformation capacity. In this respect, the nucleus, which occupies a large fraction of the cell volume and is substantially stiffer than the surrounding cytoplasm, imposes a major obstacle. This aspect has also been investigated with the use of microfluidic devices formed by dozens of arrays of aligned polymeric pillars that limit the available space for cell movement. Such experimental systems, in particular, in the designs developed by the groups of Denais and of Davidson, were here reproduced with a tailored version of the Cellular Potts model, a grid-based stochastic approach where cell dynamics are established by a Metropolis algorithm for energy minimization. The proposed model allowed quantitatively analyzing selected cell migratory determinants (e.g., the cell and nuclear speed and deformation, and forces acting at the nuclear membrane) in the case of different experimental setups. Most of the numerical results show a remarkable agreement with the corresponding empirical data.

**Keywords:** Cellular Potts model; cell migration; nucleus deformation; microchannel device

**MSC:** 34K34; 37N25; 92C17

#### **1. Introduction**

The ability of cells to move within different environments is crucial in a diverse array of processes. For instance, during development, the coordinated movement of cells of different origin is fundamental for both shaping the growing embryo and organogenesis: migratory defects at all stages may in fact lead to severe malformations [1]. In mature organisms, immune cells are mobilized from the bloodstream to enter sites of infection, and then into the lymph nodes for effector functions [2]. Moreover, the migration of epithelial cells and fibroblasts is vital for proper wound healing and the repair of basement membranes and connective tissues. In pathological conditions, cell migration is involved in chronic inflammatory diseases, such as arteriosclerosis, and in cancer invasion and metastasization [3]. The process of cell migration is finally exploited in biomedical engineering applications for the regeneration of various tissues, such as cartilage, skin, or peripheral nerves in vivo or in vitro [4–7].

The migratory efficacy of cells is determined, to a large extent, by their capacity to squeeze through strictly confined environments. For instance, tissue membranes and vessel walls, as well as dense regions of structural extracellular matrices (ECMs), represent physical barriers characterized by significantly small openings and pores [8,9]. Under these conditions, cells can achieve substantial movement by degrading/modifying their surroundings to create sufficient space, for example, by the secretion of matrix metalloproteinases (MMPs) or by squeezing to fit through the available space [10–12]. In the latter option, the elasticity of the cell becomes an important factor. In this respect, the cytoplasm

**Citation:** Scianna, M.; Preziosi, L. A Cellular Potts Model for Analyzing Cell Migration across Constraining Pillar Arrays. *Axioms* **2021**, *10*, 32. https://doi.org/10.3390/axioms 10010032

Academic Editor: Gabriella Bretti

Received: 27 January 2021 Accepted: 9 March 2021 Published: 12 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

is very flexible and can undergo large deformations. On the other hand, the voluminous nucleus is much stiffer: it is therefore a hampering factor for cell movement [13–16].

The relation between cell mobility and remodeling ability is efficiently studied with two experimental systems. On the one hand, cultured cells are stimulated to move in engineered fibrous scaffolds, which mimic highly confined in vivo connective tissues [17]. On the other hand, they are seeded and allowed to locomote within microfluidic-based devices, characterized by the presence of constrictions formed by fixed and insoluble polymeric structures (e.g., extended walls or arrays of pillars) [18–21].

Such a second type of experimental system, in particular, in the design version proposed by Denais and coworkers in [19] and by Davidson and colleagues in [18], was here reproduced and simulated using an extended Cellular Potts model (CPM, [22–27]). This is a grid-based Monte Carlo technique that employs a stochastic energy minimization principle to determine system evolution. The approach proposed in this work is similar to those proposed in [28–30], which employed a compartmental representation of cells to analyze selected aspects of their movement within matrix environments. However, it is characterized by some relevant novelties and features, namely:


In this respect, as an outcome, we focused on experimentally addressable characteristics of cell shape and locomotion (e.g., the velocity and transit time within a constriction, and nucleus deformation ratio). In particular, we predicted how these quantities were affected by selected manipulations either of cell properties or of channel layout. In this respect, we successfully replicated most of the experimental results proposed in the two reference papers [18,19]. For the sake of completeness, we investigated the rationale underlying the few discrepancies that emerged between the computational and in vitro outcomes.

The rest of the paper is organized as follows. In Section 2, we clarify the assumptions on which our approach was based and describe each model component. In this part of the work, we also introduce and provide an estimation of the model parameters and define how we quantified and characterized cell movement. The computational findings are then presented in Section 3, where we separately deal with simulations relative to the channel layouts proposed in [18,19], respectively. Finally, the proposed results are discussed in the last Section 4, which also contains some hints for future perspectives.

#### **2. Mathematical Model**

The Cellular Potts model (CPM) is a grid-based stochastic approach that realistically preserves the identity of cell-scale elements and describes their behavior and mutual interactions in energetic terms and constraints. An extended version of the CPM was employed here to schematically reproduce cell migratory behavior within the two types of microfluidic-based devices developed in [18,19]. As shown in Figure 1A, both experimental systems are composed of dozens of arrays of aligned polymeric pillars, which form a series of parallel channels. Such structural elements are fixed and represent a constriction for cell movement because they limit the free space and form bottlenecks: their dimensions and distributions can be varied to mimic different patterns of spatial limitations. An extended group of cells, initially disposed just outside the entrance of the channels, then individually moves up to a chemical gradient, which is established by the diffusion of a molecular factor from a sink reservoir located on the opposite side of the devices; see, again, Figure 1A.

**Figure 1.** (**A**) Schematic representation of the microfluidic devices proposed in [18,19]. (**B**) Simulation domain Ω replicating a representative channel of the migration device employed in [19]. (**C**) Simulation domain Ω reproducing a representative migratory channel of the migration device employed in [18]. (**D**) Portion of a 2D Cellular Potts model (CPM) lattice with a generic lattice site **x**, its border *∂***x**, and its first-nearest neighbors **x** . (**E**) The virtual cell is compartmentalized into a central nuclear cluster, the object Σ<sup>1</sup> of type N (in yellow), and in the surrounding cytosolic region, the element Σ<sup>2</sup> of type C (in green). The rigid pillars are reproduced by CPM objects Σ*σ*≥<sup>3</sup> of type P (in gray). The rest of the domain is formed by an extended undifferentiated element Σ<sup>0</sup> of type M (in black).

In order to reduce the computational complexity of the problem, a two-dimensional CPM domain <sup>Ω</sup> <sup>⊂</sup> <sup>R</sup><sup>2</sup> is used to reproduce a planar section of a single migratory channel, taken as representative of each of the two reference devices; see Figure 1B,C. In both cases, Ω is a regular lattice, formed by identical square grid sites that, with an abuse of notation, are identified by their center **<sup>x</sup>** = (*x*, *<sup>y</sup>*) <sup>∈</sup> <sup>R</sup>2. Each grid site is assigned a unique index, i.e., an integer number *<sup>σ</sup>*(**x**) <sup>∈</sup> <sup>N</sup>, that can be interpreted as a degenerate *spin*, a name originally inherited from statistical physics [31,32]. The border of a lattice site **x** is identified as *∂***x**, and one of its neighbors, as **x** , while its overall Moore neighborhood is identified as Ω **<sup>x</sup>**, i.e., Ω **<sup>x</sup>** = {**x** ∈ Ω : **x** is a neighbor of **x**}; see Figure 1D. The subdomains of contiguous sites with identical spin form discrete objects Σ*<sup>σ</sup>* (e.g., Σ*<sup>σ</sup>* = {**x** ∈ Ω : *σ*(**x**) = *σ*}), which have an associated type *τ*(Σ*σ*).

A single representative cell is then included in Ω. Following the approach proposed in [27], the simulated agent, labeled by the integer *η* = 1, is defined as a compartmentalized element. It is composed of two subregions, which, in turn, are classical CPM objects Σ*σ*: the nucleus, a central cluster Σ*σ*=<sup>1</sup> of type *τ* = N, and the surrounding cytosol Σ*σ*=<sup>2</sup> of type *τ* = C; see Figure 1E. Both cell compartments share, as an additional attribute, the individual identification number *η* = 1. In other words, the entire cell is identified by *η* = 1, whereas its internal compartments are identified by the pairs (*η* = 1, *σ* = 1) (the nucleus) and (*η* = 1, *σ* = 2) (the cytoplasm).

The extracellular environment is then differentiated into a polymeric component, *τ* = P, and a medium component, *τ* = M, as done in [30,33,34]. The polymeric state is assigned to identify the rigid pillars forming the migratory channel. In particular, each of them is a disconnected CPM element Σ*σ*, identified by its own identification number *σ* ≥ 3 (since *σ*-values ∈ {1, 2} are used for the intracellular compartments; see above). The dimensions and positions of such structures are specified later on. The medium-like state instead identifies the free surface of the microchannel, i.e., where the cell moves and the chemical substances diffuse: it is conventionally assumed to be a single object Σ*σ*=<sup>0</sup> isotropically distributed throughout the simulation domain, as shown in panel (E) of Figure 1.

Cell movement results from an iterative and stochastic minimization of a free energy, defined by a *Hamiltonian* functional *H* (whose components are defined below). The employed algorithm consists of a series of elementary steps of a modified Metropolis method

for Monte Carlo–Boltzmann dynamics [25,35], which is able to implement the natural exploratory behavior of biological individuals. Procedurally, at each time step *t*, called a Monte Carlo step (MCS, the basic unit of time of the model), a lattice site **x**so (so for *source*) belonging to a cell compartment Σ*σ*(**x**so) is selected at random and attempts to copy its spin, *<sup>σ</sup>*(**x**so), into one of its unlike neighbors, **<sup>x</sup>**ta <sup>∈</sup> <sup>Ω</sup> **<sup>x</sup>**so : **x**ta ∈/ Σ*σ*(**x**so) (ta for *target*), also randomly selected. In particular, if *σ*(**x**so) = 2 and *σ*(**x**ta) = 0, the cell is protruding, i.e., extending its motile membrane structures within the extracellular space, whereas if *σ*(**x**so) = 0 and *σ*(**x**ta) = 2, the cell is retracting. Finally, if *σ*(**x**so) = 1 and *σ*(**x**ta) = 2 or if *σ*(**x**so) = 2 and *σ*(**x**ta) = 1, the cell is reorganizing, i.e., it is undergoing internal remodeling. Polymeric pillars are instead fixed and immutable, i.e., they are not allowed to move or be deformed by the cell. The proposed trial of spin update is finally accepted with a Boltzmann-like probability function *P*(*σ*(**x**so) → *σ*(**x**ta)), whose form is slightly changed here with respect to the general version given in [27]:

$$P(\boldsymbol{\sigma}(\mathbf{x\_{s0}}) \rightarrow \boldsymbol{\sigma}(\mathbf{x\_{ta}})) = \tanh(T(\boldsymbol{\sigma}(\mathbf{x\_{s0}}), \boldsymbol{\sigma}(\mathbf{x\_{ta}}))) \min\left\{1, \exp\left(\frac{-\Delta H}{T(\boldsymbol{\sigma}(\mathbf{x\_{s0}}), \boldsymbol{\sigma}(\mathbf{x\_{ta}}))}\right)\right\}.\tag{1}$$

In Equation (1), Δ*H* is the net difference of the system energy due to the proposed change of domain configuration, whereas the parameter *T*(*σ*(**x**so), *σ*(**x**ta)) ∈ *R*<sup>+</sup> is a Boltzmann temperature that accounts for the trial cell behavior. The retraction/protrusion dynamics are in fact dictated by the intensity and the frequency of plasma membrane (PM) ruffles, which, on a molecular level, are determined by polarization/depolarization processes in the actin cytoskeleton (refer to [36–38] and references therein). Cell internal reorganization instead depends on the agitation rate for the nuclear cluster. According to the above considerations, we indeed have

$$T(\sigma(\mathbf{x\_{0o}}), \sigma(\mathbf{x\_{ta}})) = \begin{cases} \begin{array}{l} T\_{\tau(\sigma(\mathbf{x\_{0o}}))'} & \text{if } \sigma(\mathbf{x\_{so}}) = 2 \text{ and } \sigma(\mathbf{x\_{ta}}) = 0; \\\\ T\_{\tau(\sigma(\mathbf{x\_{0i}}))'} & \text{if } \sigma(\mathbf{x\_{0o}}) = 0 \text{ and } \sigma(\mathbf{x\_{ta}}) = 2; \\\\ T\_{\tau(\sigma(\mathbf{x\_{0i}}))'} & \text{if } \sigma(\mathbf{x\_{0i}}) = 1 \text{ and } \sigma(\mathbf{x\_{ta}}) = 2; \\\\ T\_{\tau(\sigma(\mathbf{x\_{0i}}))'} & \text{if } \sigma(\mathbf{x\_{oo}}) = 2 \text{ and } \sigma(\mathbf{x\_{ta}}) = 1, \end{cases} \tag{2}$$

with the stochastic law regulating cell movement that can be finally specified as

$$P(\sigma(\mathbf{x\_{60}}) \rightarrow \sigma(\mathbf{x\_{10}})) = \begin{cases} \tanh(T\_{\mathbb{C}}) \min\left\{1, \,\exp\left(\frac{-\Delta H}{T\_{\mathbb{C}}}\right)\right\}, \\ \qquad \text{if the cell is protrading or retracting;} \\\\ \tanh(T\_{\mathbb{N}}) \min\left\{1, \,\exp\left(\frac{-\Delta H}{T\_{\mathbb{N}}}\right)\right\}, \\ \qquad \text{if the cell is recognizing.} \end{cases} \tag{3}$$

In particular, we set a sufficiently high *T*<sup>C</sup> > 1 since cells moving in confined environments are widely shown to have an active fluid-like cytoplasm. A lower *T*<sup>N</sup> < 1 < *T*<sup>C</sup> is instead fixed since the nucleus does not have active movement dynamics (i.e., self-propulsion), but it only displaces passively, i.e., is dragged by the surrounding cytoskeleton elements; see, also, [39] for a more detailed mechanical explanation.

*Remark*. The acceptance probability resulting from Equations (2) and (3) differs from the general version given in [27], and used in [28–30], as a consequence of the fact that the Boltzmann temperature *T* depends both on the type of moving cell compartment, as in our previous papers, and on the characteristics of the target grid element. According to us, this is a significant improvement of the CPM algorithm. For instance, it allows differentiating cases of the same cell cytosolic element that tries either to extend in the free extracellular domain or to occupy the space belonging to the nucleus. In fact, in the former case, the possibility of cell morphological updates is biologically determined by the cytoskeletal agitation rate (i.e., by a determinant of the source site), whereas, in the latter case, it relies on the resistance to movement exerted by the stiff organelle (i.e., on a determinant of the target site). The transition probability functions employed in our previous papers did not allow capturing such aspects.

The *Hamiltonian* functional establishing the system energy is then given by the sum of three contributions:

$$H(t) = H\_{\text{adhesion}}(t) + H\_{\text{shape}}(t) + H\_{\text{chemotaxis}}(t). \tag{4}$$

*H*adhesion is the general extension of Steinberg's differential adhesion hypothesis (DAH) [25,40,41]. In particular, it is differentiated in the contributions due to either the generalized contact tension between the nucleus and the cytoplasm within the cell, or the effective adhesion between the migrating individual and an extracellular component:

$$H\_{\text{adhesion}}(t) = H\_{\text{adhesion}}^{\text{int}}(t) + H\_{\text{adhesion}}^{\text{ext}}(t) = \sum\_{\substack{(\mathbf{\hat{m}} \in \mathbb{K}\_{1})^{\gamma} \\ (\mathbf{\hat{m}} \ell \in \partial \mathbb{Z}\_{2})}} f\_{\mathbf{N} \mathcal{E}}^{\text{int}} + \sum\_{\substack{(\mathbf{\hat{m}} \in \partial \mathbb{Z}\_{2})^{\gamma} \\ (\mathbf{\hat{m}} \ell \in \partial \mathbb{Z}\_{2})^{\gamma} \end{bmatrix}} f\_{\mathbf{C}, \mathcal{P}}^{\text{ext}} + \sum\_{\substack{(\mathbf{\hat{m}} \in \partial \mathbb{Z}\_{2})^{\gamma} \\ (\mathbf{\hat{m}} \ell \in \partial \mathbb{Z}\_{2})^{\gamma} \end{bmatrix} \tag{5}$$

where **<sup>x</sup>** and **<sup>x</sup>** are two neighboring sites (i.e., **<sup>x</sup>** <sup>∈</sup> <sup>Ω</sup> **<sup>x</sup>**) and Σ*σ*(**x**) and Σ*σ*(**<sup>x</sup>** ), two neighboring elements. *∂*Σ*<sup>σ</sup>* is instead intended as the border of Σ*<sup>σ</sup>* (i.e., *∂*Σ*<sup>σ</sup>* = 3 **<sup>x</sup>**∈Σ*<sup>σ</sup> <sup>∂</sup>***x**). The coefficients *<sup>J</sup>*<sup>s</sup> <sup>∈</sup> <sup>R</sup> are the binding forces per unit area and are obviously symmetric w.r.t. their indices. In particular, *J*int N,C implicitly models the forces exerted by intermediate actin filaments and microtubules to anchor the nucleus to the cell cytoskeleton. In the perspective of energy minimization, *J*int N,C < 0 is set to prevent cell fragmentation, as done in [28–30]. *J*ext C,M and *<sup>J</sup>*ext C,P, in principle, evaluate the heterophilic contact interactions between the cell and the extracellular elements: however, both are here fixed equal to zero. This choice, successfully employed in [29,39], was made to directly analyze the influence of cell deformability on the motile behavior. The experimental literature also demonstrates that most cell lines display a sustained ameboid movement, characterized by a poorly adhesive mode, when crawling in confined environments [21,42,43].

*H*shape models the geometrical attributes of the subcellular compartments, which are written as nondimensional relative deformations in the following quadratic form:

$$\begin{split}H\_{\text{shape}}(t) &= H\_{\text{surface}}(t) + H\_{\text{perimeter}}(t) = \\ \sum\_{\sigma=1,2} \left[ \kappa\_{\Sigma\_{\upsilon}} \left( \frac{s\_{\Sigma\_{\upsilon}}(t) - s\_{\Sigma\_{\upsilon}}(0)}{s\_{\Sigma\_{\upsilon}}(t)} \right)^{2} + \nu\_{\Sigma\_{\upsilon}} \left( \frac{p\_{\Sigma\_{\upsilon}}(t) - p\_{\Sigma\_{\upsilon}}(0)}{p\_{\Sigma\_{\upsilon}}(t)} \right)^{2} \right]. \end{split} \tag{6}$$

Such an energy term indeed depends on the actual surface and perimeter of the subcellular units, i.e., *s*Σ*<sup>σ</sup>* (*t*) and *p*Σ*<sup>σ</sup>* (*t*), respectively, as well as on the corresponding target quantities, i.e., *s*Σ*<sup>σ</sup>* (0) and *p*Σ*<sup>σ</sup>* (0), respectively, which are here assumed to be characteristic of the relaxed/initial individual configuration. *κ*Σ*<sup>σ</sup>* and *ν*Σ*<sup>σ</sup>* ∈ *R*<sup>+</sup> represent, instead, mechanical moduli in units of energy: in particular, *κ*Σ*<sup>σ</sup>* refer to surface changes of the subcellular compartments, while *ν*Σ*<sup>σ</sup>* relate to their deformability/elasticity, i.e., to the ease with which they are able to remodel, changing their perimeter. Both parameters are here taken to be constant: however, they may vary in time as a consequence, for example, of intracellular chemical dynamics, as commented on in the conclusive section of the work.

The fluctuations of the cell surface are kept negligible by setting high constant values *κ*Σ<sup>1</sup> = *κ*<sup>N</sup> = *κ*Σ<sup>2</sup> = *κ*<sup>C</sup> > 1. This choice was based on the assumptions that the migrating cell has an adequate amount of nutrients to avoid volume loss and that it does not significantly grow during movement, as confirmed by experimental images in [18,19]. A low *ν*Σ<sup>2</sup> = *ν*<sup>C</sup> < 1 then allows the large cytosolic deformations experimentally observed in the cases of cell movement in confined environments. Empirical evidence also shows that the nucleus is able (when needed) to undergo morphological reorganization but to a

lower extent than the surrounding cytosolic compartment. Accordingly, we opted to set 1 > *ν*Σ<sup>1</sup> = *ν*<sup>N</sup> > *ν*Σ<sup>2</sup> = *ν*C.

*H*chemotaxis reproduces the effect of cell preferential migration towards zones with higher concentrations of a diffusing chemoattractant, which was constantly used in the experiments to obtain a sustained cell movement [18,19]. Such an energy contribution was implemented by a local linear-type relation of the form that was firstly used in [44] to reproduce Dictyostelium discoideum aggregation and then constantly adopted in most CPM-based approaches:

$$
\Delta H\_{\text{chemotaxis}} = \mu(\sigma(\mathbf{x}\_{60}), \sigma(\mathbf{x}\_{\text{ta}})) [\boldsymbol{\varepsilon}\_{l}(\mathbf{x}\_{\text{ta}}, t) - \boldsymbol{\varepsilon}\_{l}(\mathbf{x}\_{60}, t)],
\tag{7}
$$

where **x**so and **x**ta are, respectively, the source and the final lattice site randomly selected during a trial update in an MCS and *ct*(**x**, *<sup>t</sup>*) = *<sup>c</sup>*(**x**, *<sup>t</sup>*) + <sup>∑</sup>**x**∈<sup>Ω</sup> **<sup>x</sup>** *c*(**x** , *t*), where **x** ∈ {**x**so, **x**ta} is a nonlocal measure of the molecular substance sensed by the moving cell site, since *<sup>c</sup>* denotes the chemical concentration; see Equation (9). Finally, *<sup>μ</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> represents the strength of the chemotactic response: in particular, we set

$$\mu(\sigma(\mathbf{x}\_{\rm{so}}), \sigma(\mathbf{x}\_{\rm{th}})) = \begin{cases} \mu\_{\tau(\sigma(\mathbf{x}\_{\rm{th}}))} = \mu\_{\mathbf{C}}, & \text{if } \sigma(\mathbf{x}\_{\rm{so}}) = 2 \text{ and } \sigma(\mathbf{x}\_{\rm{th}}) = 0; \\\\ \mu\_{\tau(\sigma(\mathbf{x}\_{\rm{th}}))} = \mu\_{\mathbf{C}}, & \text{if } \sigma(\mathbf{x}\_{\rm{so}}) = 0 \text{ and } \sigma(\mathbf{x}\_{\rm{th}}) = 2; \\\\ 0, & \text{else}. \end{cases} \tag{8}$$

Equation (8) implies that only cytosolic dynamics are affected by molecular signals. Finally, the full expression and activity of cell chemical receptors was set by a high *μ*<sup>C</sup> > 1.

*Evolution of the molecular variable.* According to the experimental designs in [18,19], we assume that the virtual molecular substance is released (i.e., produced) at a constant rate from the top edge of the domain (denoted as *∂*Ωprod), homogeneously, and constantly diffuses and decays, being eventually taken up by the cell. In mathematical terms, we have the following reaction–diffusion (RD) law:

$$\begin{cases} \begin{aligned} \frac{\partial c(\mathbf{x},t)}{\partial t} &= \underbrace{D\_{c} \Delta c(\mathbf{x},t) \delta\_{\sigma(\mathbf{x}),\{0\}}}\_{\text{diffusion}} - \underbrace{\lambda\_{c} c(\mathbf{x},t)}\_{\text{decay}} - \underbrace{\min\{c\_{\text{max}}, \chi\_{c} c(\mathbf{x},t)\} \delta\_{\sigma(\mathbf{x}),\{0,2\}}}\_{\text{cell update}} \\ & \text{in } \mathbf{x} \in \Omega; \end{aligned} \end{cases} \tag{9}$$
 
$$c(\partial \mathbf{x}) = c\_{\text{prod}} \quad \text{at } \partial \mathbf{x} \in \partial \Omega^{\text{prod}};$$
 
$$c(\partial \mathbf{x}) = 0 \quad \text{at } \partial \mathbf{x} \in \partial \Omega/\partial \Omega^{\text{prod}},$$

where *δx*,*<sup>y</sup>* = {1, *x* = *y*; 0, *x* = *y*} is the Kronecker delta. Equation (9) indeed states that the chemical substance (i) diffuses only through the domain grid sites not occupied by the cell or by a rigid pillar, (ii) locally decays everywhere, and (iii) undergoes consumption only at the domain grid sites occupied by a cell compartment. In particular, cell chemical absorption follows a piecewise-linear approximation of a Michaelis–Menten law. This simplification is realistic since cells' capacity to internalize diffusing substances is limited. We finally set *λ*<sup>c</sup> < *χ*c, as the natural decay of a molecule is typically negligible compared to the cell uptake. We remark that Equation (9) neglects the diffusion of the chemical *within* the cell after its uptake: such dynamics would imply the definition of specific coefficients, i.e., characterizing the diffusion of the substance within each of the two intracellular compartments. However, the inclusion of this aspect would not have had an impact on the topic of our study, which was, rather, the dependence of the migratory potential of an individual on its remodeling ability.

#### **3. Results**

*Model parameters and computational details.* The characteristic size (lateral edge) of the domain grid sites is hereafter denoted by |*∂***x**| and was fixed equal to 0.5 μm. The temporal resolution of the model was, as seen, the MCS, which was constantly set to correspond to 2 s, as previously done in [28–30]. The PDE for the evolution of the chemical factors was numerically solved with a finite difference scheme on a grid with the same spatial resolution as Ω, characterized by 30 diffusion steps per MCS. This temporal scale was sufficiently small to guarantee the stability of the numerical method.

In all the simulations, the representative motile cell was initially seeded at the bottom region of the channel, displaying a nonpolarized morphology, with a perimeter and surface given by *p*Σ*<sup>i</sup>* (0),*s*Σ*<sup>i</sup>* (0), for *i* = 1, 2, respectively. In particular, its initially round nucleus lay in the center of the individual. All the forthcoming computational realizations started with 100 annealed MCSs to have a realistic arrangement of the cell body within the structure. Such system configuration updates obeyed the following rule:

$$P(\sigma(\mathbf{x}\_{80}) \rightarrow \sigma(\mathbf{x}\_{\text{ta}}))(t) = \begin{cases} 0, & \text{if } \Delta H > 0; \\ 0.5, & \text{if } \Delta H = 0; \\ 1, & \text{if } \Delta H < 0. \end{cases} \tag{10}$$

During such annealed MCSs, chemical kinetics did not occur yet. The entire set of model parameters, labeled as P, can be divided in two groups:

$$\begin{aligned} \mathcal{P} &= \mathcal{P}\_1 \cup \mathcal{P}\_2 = \\ &\left\{ s\_{\Sigma\_1}(0), p\_{\Sigma\_1}(0), s\_{\Sigma\_2}(0), p\_{\Sigma\_2}(0), \mathcal{D}\_{\mathbf{c}'} \lambda\_{\mathbf{c}'} \chi\_{\mathbf{c}'} c\_{\mathbf{max}} c\_{\mathbf{prod}'} \mu\_{\mathbf{C}} \right\} \cup \left\{ \overline{\mathcal{J}}\_{\mathbf{N}, \mathbf{C}}^{\text{int}} \kappa\_{\mathbf{N}'} \kappa\_{\mathbf{C}'} \nu\_{\mathbf{N}'} \mathcal{V}\_{\mathbf{N}'} T\_{\mathbf{N}'} \right\}. \end{aligned} \tag{11}$$

P<sup>1</sup> is composed of coefficients that directly relate to biological quantities and therefore depend on the specific experimental system. In this respect, the cell dimensions and channel measures were derived from images and movies presented in [18,19] (both in the text and in the Supplementary Material). The kinetic coefficients of the chemicals used in that work were instead evaluated using data from the literature. In particular, the maximal cell uptake was calculated as in [33,39]. The chemotactic response *μ*<sup>C</sup> was finally established by comparing experimental and numerical cell velocities in open spaces (i.e., in regions of the channels far from structural constrictions). The parameters belonging to P2, listed in Table 1, are instead more technical and do not depend on the specific empirical device. In this respect, they were taken from previous published CPMs dealing with cell migration within two- and three-dimensional matrix environments. However, preliminary simulations showed that the behavior of the model proposed in this paper was fairly robust in large regions of the parameter space around this estimate.

**Table 1.** Values of the Cellular Potts model (CPM) technical parameters, i.e., those included in the set P<sup>2</sup> (see Equation (11)), which were used for both experimental settings.


*Quantification of the numerical results*. The *position* of the cell *η* = 1 at any time *t* was established by the position of its center of mass **x**CM *<sup>η</sup>* (*t*)=(*x*CM *<sup>η</sup>* (*t*), *y*CM *<sup>η</sup>* (*t*)). Coherently,

the *position* of its nucleus was established by the position of its center of mass **x**CM <sup>Σ</sup><sup>1</sup> (*t*) = (*x*CM <sup>Σ</sup><sup>1</sup> (*t*), *<sup>y</sup>*CM <sup>Σ</sup><sup>1</sup> (*t*)).

A cell was denoted as *invasive* if at least one of its membrane sites touched the top border of the domain. It was denoted as invasive with respect to a constriction if the center of mass of its nucleus passed the midpoint of that constriction.

The *instantaneous directional velocity* of the cell at a given location *y*˜ along the representative microchannel was defined as *vη*(*y*CM *<sup>η</sup>* <sup>=</sup> *<sup>y</sup>*˜) = *<sup>y</sup>*˜ <sup>−</sup> *<sup>y</sup>*CM *<sup>η</sup>* (˜*<sup>t</sup>* − <sup>Δ</sup>*t*) /Δ*t*, where ˜*t* is such that *y*CM *<sup>η</sup>* (˜*t*) = *y*˜ and Δ*t* = 1 MCS (2 s). Similarly, the *instantaneous directional velocity* of the nucleus was given by *vN*(*y*CM <sup>Σ</sup><sup>1</sup> <sup>=</sup> *<sup>y</sup>*˜) = *<sup>y</sup>*˜ <sup>−</sup> *<sup>y</sup>*CM <sup>Σ</sup><sup>1</sup> (˜*<sup>t</sup>* <sup>−</sup> <sup>Δ</sup>*t*) /Δ*t*.

Morphological changes of the nucleus were quantified by its *deformation ratio*. This was given, for any given location *y*˜ within the representative microchannel, by the ratio between its geometrical moments of inertia evaluated with respect to the horizontal and vertical axes that passed through its center of mass, i.e.,

$$r\_{\mathcal{N}}(y\_{\Sigma\_1}^{\text{CM}} = \overline{y}) = i\_{x\_{\Sigma\_1}^{\text{CM}}}(\overline{t}) / i\_{y\_{\Sigma\_1}^{\text{CM}}}(\overline{t}) = \sum\_{\substack{\mathbf{x} = (x, y) \in \Sigma\_1 \\ \text{at time } l}} (y - \overline{y})^2 / \sum\_{\substack{\mathbf{x} = (x, y) \in \Sigma\_1 \\ \text{at time } l}} (\mathbf{x} - \mathbf{x}\_{\Sigma\_1}^{\text{CM}}(\overline{t}))^2 / \tag{12}$$

where ˜*<sup>t</sup>* has the same meaning as before. In particular, *<sup>r</sup>*<sup>N</sup> ≈ 1 corresponded to an almost round shape of the nucleus, whereas *r*<sup>N</sup> 1 (or 1) corresponded to its horizontal (or vertical) elongation.

The *transit time* of the cell within a constriction was evaluated as the period of time from its first to its last contact with one of the two pillars that formed that constriction.

We finally calculated the *force* acting on any nuclear border site by adapting the algorithmic procedure described and employed in [45]. In particular, we started with the consideration that local forces can be related to the negative gradient of the *Hamiltonian H*, i.e., **F**(**x**)=(*Fx*(**x**), *Fy*(**x**)) = −∇*H* = −(*∂H*/*∂x*, *∂H*/*∂y*), **x** ∈ Ω being a generic lattice site. We then employed a centered approximation to the first partial derivative, also observing that the nuclear cluster could extend or retract in any of the two principal directions by a small step |*∂***x**| (the characteristic size of the domain grid elements). As a result, we obtained that, for each **x** such that *σ*(**x**) = 1,

$$-F\_{\mathbf{x}}(\mathbf{x}) \approx \left( H\_{\mathrm{N}}(\Sigma\_{\sigma(\mathbf{x})} + \triangle\_{\mathbf{x}} \Sigma\_{\sigma(\mathbf{x})}) - H\_{\mathrm{N}}(\Sigma\_{\sigma(\mathbf{x})} - \triangle\_{\mathbf{x}} \Sigma\_{\sigma(\mathbf{x})}) \right) / (2|\partial \mathbf{x}|)\_{\mathrm{\prime}} \tag{13}$$

where *H*<sup>N</sup> includes only the energetic contributions relative to the organelle and *<sup>x</sup>*Σ*σ*(**x**) denotes the one-site-large possible variation of its extension along the horizontal axis. The *y*-component of the force analogously read as

$$-F\_{\mathbf{y}}(\mathbf{x}) \approx \left( H\_{\mathrm{N}}(\Sigma\_{\sigma(\mathbf{x})} + \triangle\_{\mathbf{y}} \Sigma\_{\sigma(\mathbf{x})}) - H\_{\mathrm{N}}(\Sigma\_{\sigma(\mathbf{x})} - \triangle\_{\mathbf{y}} \Sigma\_{\sigma(\mathbf{x})}) \right) / (2|\partial \mathbf{x}|). \tag{14}$$

#### *3.1. Cell Motion between Structural Elements with Different Geometries*

In this section, we specifically focus on cell migration within a channel representative of the device developed in [19]. In this respect, the simulation lattice Ω was formed by 80 × 400 (respectively, in *x* and *y*) sites, which corresponded to an experimental domain of 40 μm × 200 μm. As shown in Figure 1B, it contained only three pairs of pillars. The distance between the rectangular elements was kept fixed and equal to 15 μm, whereas the distance between the two pairs of round pillars, hereafter denoted by *d*, was varied in the different simulation settings. All the performed simulations lasted 1.8 <sup>×</sup> <sup>10</sup><sup>4</sup> MCSs (≈600 min). The estimate of the model parameters belonging to P<sup>1</sup> here refers to breast cancer cells and to epidermal growth factor (EGF), in accordance with the experimental materials used in [19] (cf. Table 2). In particular, the cell had an initial nuclear diameter *d*<sup>N</sup> = 14 μm and an overall size equal to 24 μm.


**Table 2.** Values of the model parameters whose estimates relate to the experimental system proposed in [19]. They belong to the set P<sup>1</sup> defined in Equation (11).

> As shown in Figure 2, when the constriction between the round pillars was large enough (i.e., *d* ≥ 12 μm, so that *d*/*d*<sup>N</sup> ≥ 0.85), the cell was constantly able to crawl along the entire structure, guided by the chemical signals. Reductions of *d* then resulted in decrements in the cell invasive capacity, which was completely lost when *d* fell below 4 μm (i.e., for *d*/*d*<sup>N</sup> < 0.28).

> The top panels in Figure 3 show the cell dynamics in the case of complete channel invasion. First, a long and thin cytoplasmic pseudopodium emerged at the front of the cell, towards the chemical source, and infiltrated between the first pair of round pillars. Such a membrane protrusion then dragged the rest of the cell to enter the pore. In particular, the nucleus adopted a cigar-like shape to overcome the spatial constriction, which was possible since it had a certain degree of elasticity.

**Figure 2.** Percentage of invasive cells (i.e., of cells able to touch the upper border of the domain) in the case of reproduction of the device developed in [19]. Values were calculated over 100 numerical realizations.

**Figure 3.** Simulation image sequences of the cell invading the virtual migratory channel in the representative cases *d* = 8 μm (top panels) and *d* = 4 μm (bottom panels), *d* being the distance between each pair of round pillars. We remark that, for *d* = 8 μm, the cell was able to fully invade the microfluidic structure in only ∼30% of cases, while for *d* = 4 μm, this never happened (cf. Figure 2). We remark that the initial diameter of the nucleus was *d*<sup>N</sup> = 14 μm.

When the cell had passed the first pair of round elements, it relaxed, and its nucleus stabilized in a quasi-spheroidal shape. A constant sustained migration was then maintained by the individual along the rectangular structures: the space between them did not in fact require further morphological deformation. Both cell compartments had to finally squeeze again when the individual approached the second pair of round pillars.

From a modeling perspective, the migratory behavior of the virtual cell was the result of a sequence of action/reaction mechanisms. First, the exogenous chemical stimulus caused the border sites of the cell cytosol to locally protrude in the direction of increasing EGF gradients, with a speed of protrusion that was approximately proportional to the modulus of the local chemotactic strength *μ*C. Dragged by the leading front, the overall cytosolic region then moved forward (eventually deforming) and pulled onto the nucleus with the same force, transmitted by the contact energy *H*int adhesion. However, as a consequence of the higher rigidity and lower motility (i.e., *ν*<sup>N</sup> > *ν*<sup>C</sup> and *T*<sup>N</sup> < *T*C, respectively), the nuclear cluster took more time to deform and displace than the surrounding compartment and, therefore, constantly lay at the trailing part of the individual body.

Such a mechanistic explanation is consistent with experimental and modeling observations presented in [49]. Therein, the authors in fact comment that a cell usually translocates almost the entire cytosol before effective nuclear transmigration, mainly in the case of small-enough pores. They also claim that the hourglass shape adopted by the nucleus in the case of passage within small constrictions is due to the pulling forces exerted by the frontal actomyosin networks. The pushing from the rear part of the cytoskeleton would instead result in an inverted bolt shape, which would not allow successful individual passage within the pore.

As captured in Figure 3 (bottom panels), in the case of small-enough interpillar distances *d*, the front end of the cell cytoplasm extended, as usual, between the first pair of round structures. However, the deformability of the nucleus was no longer sufficient for it to pass through such a confined space. The cell therefore remained stuck, being unable to invade.

These results are indicative of the fact that the presence of the voluminous nucleus represents a steric hindrance for the entire cell and that the degree of nuclear deformability determines its capacity to move within confined spaces. Our numerical outcomes are in remarkable agreement with the experimental evidence provided in the reference work [19]. Denais and coworkers in fact demonstrated that cells of different lineages can pass within subnuclear constrictions by only temporarily rupturing the integrity of the nuclear envelope (NE), so that the organelle becomes as fluid as the cytoplasm. Interestingly, the nuclear membrane can be restored during migration: this is the reason why subsequent NE ruptures are observed within the same cell.

Cell speed and nucleus deformation have complementary behavior, as shown by their time evolution plotted in Figure 4 (for three representative values of *d*). In the case of complete channel invasion (for *d* = 8 and 12 μm, i.e., for *d*/*d*<sup>N</sup> ≈ 0.57 and 0.85), the cell reached and maintained its maximal velocity when crawling between the rectangular elements, whose spacing required minimal nuclear deformation (i.e., *r*<sup>N</sup> ≈ 1). The cell speed was instead reduced in the proximity of the pairs of round pillars. In particular, the closer they were to each other, the more time the cell took to pass the constriction (and then to relax), as a consequence of the necessarily larger nuclear deformations. Finally, in the case of minimal space between the round structures (for *d* = 4 μm, i.e., for *d*/*d*<sup>N</sup> ≈ 0.28), the nuclear deformation quickly went to a maximum threshold, whereas the cell speed dropped to almost zero (the cell remained stuck).

**Figure 4.** Quantification of cell migratory behavior in the experimental design employed in [19]. Time evolution of cell instantaneous directional velocity *vη* (**panel (A)**) and of nucleus deformation ratio *r*<sup>N</sup> (**panel (B)**) for three representative values of the distance *d* between the pairs of round pillars. In both graphs, each value is the mean over 10 simulations. We have not plotted error bars, to avoid unnecessary graphical overcomplication. However, the standard deviations were very small (of the order of 10−2). We also remark that, in the simulation settings employed in this part, the initial nuclear diameter was *d*<sup>N</sup> = 14 μm.

We then turned to analyze the force field at the nuclear boundary at different stages of cell migration. As shown in Figure 5 (left panel), when the nucleus was squeezing through a constriction, its side edges were characterized by significant inward stresses. Outward forces were instead active at the trailing and leading borders, due to the fact that it had to preserve its surface without perimeter shrinking. As soon as the nucleus had overcome the midpoint of the constriction, the inward stresses momentarily pointed almost towards the top edge of the domain, thereby acting as an instantaneous push for cell movement (see the middle panel in Figure 5). Finally, when the cell crawled within the rectangular elements, its nucleus was in a rounded relaxed configuration. In particular, its leading edge was subjected to cytosolic adhesive-based dragging forces, whereas its lateral and trailing borders were subjected only to the forces necessary to keep the surface constant while maximizing the contact with the surrounding cell compartment; see Figure 5 (right panel).

**Figure 5.** Representative force field at the nuclear boundary at different cell migratory stages. (**Left panel**) cell squeezing within a pair of round pillars. (**Middle panel**) cell overcoming the midpoint of the constriction. (**Right panel**) cell moving within the rectangular elements. For graphical purposes, we have only plotted selected force vectors, which are magnified, with intensity normalized with respect to the maximal value. Force components are defined in Equations (13) and (14).

Our numerical outcomes are in remarkable agreement with the analysis of the spatial distribution of nuclear envelope ruptures provided in [49] in the case of breast adenocarcinoma cells. Cao and colleagues, in fact, showed that, when a malignant individual is passing within a small pore, damage mainly occurs at the front and at the back edge of the nuclear envelope (NE), due to the significant tension. In particular, higher chances of ruptures characterize the leading border, since it is even more stretched than the trailing part. These authors also observed that NE buckling is instead located at the side regions of the organelle, i.e., those subjected to compression. Our results are also in line with those obtained in [45], where static CPM cells were shown to have inward forces at boundary convex sites and outward forces at concave border grid elements.

#### *3.2. Cell Movement between Round Pillars with Different Spacing*

Focusing on the microfluidic device used in [18], the CPM lattice Ω had 60 × 468 (respectively, in *x* and *y*) grid elements that replicated a 30 μm × 234 μm representative channel. The polymeric pillars located in the structure were round, with a diameter of either 15 or 30 μm. The space between pairs of smaller elements was kept fixed and equal to 15 μm, whereas the distances between the three couples of larger pillars were identified by *di* (with *i* = 1, 2, 3) and varied to reproduce different channel designs; see Figure 1C. All the forthcoming simulations lasted nearly 24 h (4.3 <sup>×</sup> <sup>10</sup><sup>4</sup> MCSs), in accordance with the temporal scale of the corresponding experiments in [18].

The biologically related parameters, i.e., those grouped in P<sup>1</sup> in Equation (11), here refer to human fibroblasts and to platelet-derived growth factor (PDGF), in accordance with the materials mainly used in [18] (refer, also, to the Supplementary Material). They are summarized in Table 3. In particular, the cell nucleus had an initial diameter *d*<sup>N</sup> = 16 μm while the extension of the overall individual amounted to 28 μm. We finally remark that the CPM technical parameters, i.e., those included in the set P<sup>2</sup> introduced in Equation (11), were kept unaltered with respect to the values fixed in the previous section and listed in Table 1. In particular, we maintained *ν*<sup>N</sup> = 0.9.



We first assessed the effectiveness of chemotactic-driven cell migration in the case of a channel design characterized by a sequence of constrictions with decreasing widths (*d*<sup>1</sup> = 5 μm, *d*<sup>2</sup> = 3 μm, and *d*<sup>3</sup> = 2 μm, which resulted in *d*1/*d*<sup>N</sup> ≈ 0.31, *d*2/*d*<sup>N</sup> ≈ 0.18, and *d*3/*d*<sup>N</sup> ≈ 0.12). Such a domain layout was used hereafter unless explicitly said. As it is possible to see in Figure 6A, the virtual cell was constantly unable to invade the entire structure: it in fact overcame the first (largest) constriction only in a few cases. Such numerical results are not surprising if compared with those summarized in Figure 2. In fact, glioblastoma cells and fibroblasts (used as representative cell lines for the simulations of this and of the previous section, respectively) have almost the same dimensions (compared to the spacing between the pairs of rigid pillars), and their characteristic model parameters were kept unchanged. However, this set of numerical outcomes disagrees with the corresponding empirical evidence. As shown in the Supplementary Figure S4b in [18], a significant number of experimental cells (i.e., nearly 40%) are able to penetrate the entire channel, in the case of stable chemical gradients. The underlying reason relies on the fact that the cell lines used in [18] have a more deformable nucleus than those used in [19] and simulated in the previous section, as a consequence of their deficiency of lamins A and C. These molecules are, in fact, the primary components of the nuclear lamina, the dense protein meshwork underlying the nuclear membrane that has been largely shown to determine the stiffness of the organelle [19,50–52]. To have a closer replication of the in vitro evidence in [18], we indeed reduced the rigidity of the nucleus of our virtual cell by decreasing the corresponding parameter *ν*N, which, however, had to remain larger than *ν*<sup>C</sup> = 0.5. As shown in Figure 6B, a remarkable data fitting was obtained when *ν*<sup>N</sup> ≤ 0.7.

A representative time sequence of a cell able to invade the entire channel, owing to the increased nuclear elasticity, is then proposed in Figure 7: it clearly shows the enhanced nuclear squeezing necessary to promote full invasion. A definitive confirmation in this respect is provided by Figure 8A, which quantifies the nucleus remodeling for different values of *ν*N. From the same graph, we observe a residual deformation of the organelle after its passage through a constriction, as also captured in Figure 4B.

**Figure 6.** Quantification of chemotactic-driven cell migratory behavior in the case of the experimental setup used in [18]. (**A**) Percentage of cells that were able to overcome each constriction for the default value of the nucleus stiffness, i.e., for *ν*<sup>N</sup> = 0.9. (**B**) Percentage of cells that were able to overcome each constriction upon variations in the nuclear stiffness. Values were calculated over 100 numerical realizations. We remark that, in the simulation settings proposed in this paragraph, the initial diameter of the nucleus is *d*<sup>N</sup> = 16 μm.

**Figure 7.** Simulation image sequences of cell invasion within a representative migratory channel characterized by *d*<sup>1</sup> = 5, *d*<sup>2</sup> = 3, and *d*<sup>3</sup> = 2 μm. The virtual individual had an enhanced nuclear elasticity (i.e., *ν*<sup>N</sup> = 0.7) that allowed full invasion in approximately 40% of cases (see Figure 6B). The initial diameter of the cell nucleus was *d*<sup>N</sup> = 16 μm.

**Figure 8.** Quantification of cell migratory behavior in the case of the experimental setup used in [18]. (**A**) Time evolution of the deformation ratio of the nucleus, *r*N, defined in Equation (12), for two different values of its elasticity *ν*N. Each value is given as the mean over 10 simulations. We have not plotted error bars, to avoid unnecessary graphical overcomplication. However, standard deviations were very small (of the order of 10−2). (**B**) Percentage of cells that were able to overcome each constriction upon independent variations either in the nuclear motility *T*<sup>N</sup> or in the nuclear compressibility *κ*<sup>N</sup> (in the case of *ν*<sup>N</sup> = 0.9). As usual, values were calculated over 100 numerical realizations. We again remark that the initial diameter of the cell nucleus was *d*<sup>N</sup> = 16 μm.

With a predictive perspective, we then asked if cell migratory behavior could be promoted by variations of other biophysical determinants of the nucleus. In this respect, we no longer reduced the rigidity of the organelle (we kept a high *ν*<sup>N</sup> = 0.9) but independently enhanced either its motility or its compressibility (which meant, in this context, the possibility of reducing its surface) by altering *T*<sup>N</sup> or *κ*N, respectively. Some reasonable parameter constraints (i.e., *κ*<sup>N</sup> > *ν*<sup>N</sup> > *ν*<sup>C</sup> and *T*<sup>C</sup> > *T*N) were, however, maintained. As shown in Figure 8B, significant increments in cell invasive potential were observed only for substantial variations of the two coefficients (i.e., of at least one order of magnitude from their default values employed up to that point and listed in Table 1). However, such parametric changes led to unrealistic cell dynamics: too high values of *T*N, in fact, resulted in implausibly high cell and nuclear velocities, whereas too high values of *κ*<sup>N</sup> allowed an unreasonable shrinking of the organelle (that, in a realistic scenario, would cause the pathological death of the individual).

These results are experimentally confirmed in [18], where the invasive cells were not observed to undergo significant volumetric changes when passing through small constrictions. Substantial variations in nuclear shape not accompanied by similar changes in nuclear volume were also captured in [49] in the case of breast adenocarcinoma cells. Analogously, in [13], glioma cell lines were shown to transmigrate through narrow locations in a brain model in vivo, thereby increasing their metastatic potential, by only a significant squeezing of their nucleus due to a recruitment of nonmuscle myosin II (NMMII). Moreover, very recently, Irimia and Toner, in [53], demonstrated that the directional persistence of cancer cells in microsized structures is completely dependent on the steric hindrance represented by the presence of a rigid and voluminous nucleus.

We further quantified the migration profile of a cell with an enhanced nuclear elasticity (i.e., with *ν*<sup>N</sup> = 0.7). As shown in panel (A) of Figure 9, an asymmetry emerged between the velocity of the overall cell, *vη*, and the velocity of its organelle, *v*N. On one hand, as expected, the cell had a maximal speed when crawling between the pairs of smaller pillars. Velocity reductions were instead observed when it approached and passed between the three pairs of larger pillars: in particular, the decrements depended on the width of the constrictions. On the other hand, the speed of the nucleus (i) was constant in the case of locomotion within larger spaces, (ii) completely stalled as a constriction impeded its forward movement, (iii) reached an instantaneous peak once the center of the intracellular compartment had passed the midpoint of the pore, and (iv) finally decreased back to the regime value. The underlying rationale, supported by the numerical results summarized in Figure 5 (middle panel), is the following: once the organelle had passed the midpoint of

a constriction, the lateral inward compressive forces temporarily aligned in the direction of cell movement. An instantaneous push then emerged, which allowed the nucleus to rapidly slip out from the constriction.

We then assessed whether successful passages through subsequent equal constrictions facilitated cell migration. We indeed compared the transit time within each pore in the case of a channel characterized by *d*<sup>1</sup> = *d*<sup>2</sup> = *d*<sup>3</sup> = 3 μm (i.e., by *d*1/*d*<sup>N</sup> = *d*2/*d*<sup>N</sup> = *d*3/*d*<sup>N</sup> ≈ 0.18). Additionally, in this case, we fixed *ν*<sup>N</sup> = 0.7. As it is possible to see from Figure 9B (left graph), the virtual cell showed a trend towards faster dynamics in the case of transmigration between the second and the third pair of large pillars. A possible explanation relies on the residual deformation that characterized the nucleus after its passage within a constriction, as previously captured in Figures 5B and 9A.

**Figure 9.** Quantification of cell migratory behavior in the case of reproduction of the device developed in [18]. (**A**) Time evolution of the instantaneous directional velocity of the cell, *vη*, and of its nucleus, *v*N, in the case of enhanced elasticity of the intracellular organelle, given by *ν*<sup>N</sup> = 0.7. In the graph, each value is the mean over 10 simulations. We have not plotted error bars, to avoid unnecessary graphical overcomplication. However, standard deviations were very small (of the order of 10−2). (**B**) Transit time, i.e., time needed by the cell to overcome a constriction, in the case of a channel characterized by three (**left**) or two (**right**) equal pores (each 3 μm-wide). In the latter case, the two constrictions were largely-spaced, as shown in the inset reproducing the employed domain. In both plots, values were calculated over 100 numerical realizations. We recall that the initial diameter of the cell nucleus was *d*<sup>N</sup> = 16 μm.

To further support this hypothesis, we ran a series of simulations based on a channel characterized by two 3 μm-wide constrictions that were separated by nearly 120 μm (i.e., by a sufficient spacing for the nucleus to relax and recover its original shape; see the inset in Figure 9B). The rest of the parameter settings were kept unaltered, with *ν*<sup>N</sup> = 0.7. As shown in Figure 9B (right graph), the transit time was the same for the passage within both pores. Such computational outcomes support our prediction but are in partial contrast with the corresponding experimental evidence. In [18], the authors in fact claim that the facilitated cell movement observed in the case of subsequent constrictions is not due to temporary residual nuclear deformations, since a reduction in the transit time was also captured in the case of spaced-enough pores. They indeed suggest that migrating cells may undergo long-lasting biochemical adaptations such as, for instance, further degradation of lamin proteins or reorganization of the cytoskeletal elements to which the nucleus is anchored. In this respect, Cao and coworkers, in [49], found that (i) the nuclei of lamin A/C-deficient cells (as those used in [18]) behave as plastic materials undergoing large irreversible deformation in the case of passages within small pores and that (ii) the nuclei of malignant cells with expressed and active A/C lamins are instead characterized by the coexistence of elastic dynamics in their envelopes and of plastic dynamics in their interiors. Such two competing effects often result in an ellipsoidal configuration of the nucleus after the exit from a pore, which is then followed by its relaxation towards a more round shape (as captured by our model).

#### **4. Discussion**

The analysis of the mechanisms underlying cell migration within confined environments has recently become a major topic in experimental research, due to cell migration's recognized importance in physiopathological phenomena and its exploitation for tissue engineering. In this respect, an increasing number of in vitro models have been developed: they mainly consist of the use either of matrix scaffolds, which mimic in vivo fibrous connective tissues, or of micropatterned devices characterized by predefined channellike structures.

The resulting evidence has, first, provided insight into selected adhesive and proteolytic mechanisms that motile individuals activate to achieve efficient locomotion in narrow spaces. Furthermore, it has been largely shown that pivotal regulators of cell movement, and, therefore, potential targets for pharmacological interventions in human diseases, are represented by the elastic properties of the cell and of its internal organelles. More specifically, experimental outcomes have revealed the implications of nuclear deformation capacity in the migratory behavior of the overall individual (see, for example, [13–16,20,21,53]).

However, despite the development of such a variety of empirical approaches, little has been done, to our knowledge, from a theoretical point of view, except for a pair of chemomechanical approaches [49,54]. We recently tackled this shortcoming by a series of ad hoc versions of the Cellular Potts model (CPM), which analyzed selected aspects of single cell locomotion within matrix environments [28–30]. As a common feature of the proposed approaches, the moving individuals have been represented as physical objects compartmentalized into the nucleus and cytoplasm, whereas the extracellular domain has been, in turn, differentiated into a medium and a polymeric component. In particular, the introduction of distinct subcellular units has been a fundamental aspect for achieving a detailed description of cell motile behavior within structures of microsized dimensions.

Such models were, here, improved by (i) the use of a tailored Boltzmann transition probability, able to account for the specific type of cell configuration update (i.e., the retraction/extension of the cytosol or reorganization of the nuclear cluster), and (ii) the definition of a procedure to evaluate the force field that acted on the nuclear boundary during the different phases of cell migration.

The resulting CPM was then employed to reproduce the experimental systems used in [18,19], which consisted of microfluidic-based devices composed of dozens of arrays of polymeric pillars with different geometries and dimensions.

Taken together, our results first confirm that mobile cells are able to overcome the effects of size exclusion in the case of small-enough pores by only a substantially high deformability of their nucleus, in accordance with a wide range of empirical studies, e.g., [10,13–16,18,19] and references therein. The proposed numerical outcomes further reveal that, during the passage within a constriction, (i) inward stresses are active along the compressed side edges of the organelle and (ii) outward forces act at its leading and trailing borders. Interestingly, our simulations also showed that, as soon as the nucleus had overcome the midpoint of a pore, inward forces temporarily aligned with the direction of cell movement, representing, therefore, an instantaneous push for nuclear locomotion. This, according to us, is the rationale underlying the peak in the nuclear velocity that was experimentally captured in [18].

We also observed that passages within successive constrictions were facilitated by residual nuclear deformation. This result, as commented in the text, is in contrast with the corresponding empirical evidence in [18]: such a discrepancy may be due to the fact CPM objects have a full elastic behavior, whereas biological cells can instead undergo long-lasting biochemical adaptations that impact the deformation capacity of their internal organelles, an aspect not included in the CPM used here.

Summing up, our computational results are mostly characterized by a remarkable agreement with their experimental counterparts, representing further complementary determinations as well. In the case of discrepancy between numerical and empirical

outcomes, we either adapted our approach in order to tackle the issue or found out a plausible underlying rationale.

Of course, a more realistic reproduction of the proposed experimental settings would be obtained by three-dimensional simulations. This computational refinement would allow having a closer *quantitative* comparison between in vitro and in silico results, mainly in terms of observables such as the cell and nuclear speed and time taken by the cell to squeeze between channel constrictions. In this respect, it is reasonable to hypothesize that the displacement and the deformation of a 3D cell body would be slower than the corresponding 2D dynamics. However, the *qualitative* fitting of our results w.r.t. their empirical counterparts would not change. In fact, the cells seeded within the microfluidic devices used in [18,19] did not experience spatial limitations in the direction orthogonal to the plane of movement, as the rigid pillars (and, therefore, the entire channels) seemed to be "tall" enough to allow a comfortable arrangement of the cell body in this respect. This aspect is confirmed by the fact that, in the experimental images and videos, the cells were constantly adherent to the substrate and subjected only to lateral deformations. For the sake of completeness, we finally remark that a 3D extension of the model would be straightforward: it would only amount to a revision of the parameter estimates, whereas the main aspects of the proposed approach (i.e., the tailored Boltzmann probability, the terms in the Hamiltonian and the law regulating the chemical kinetics) would not require any modification.

Despite the limitations typical of the theoretical modeling, it would be biologically relevant to apply our approach to scenarios not strictly related to experimental assays. First, our study may contribute to a more detailed understanding of how cancer cells invade surrounding confined tissues, permeating through the stroma and eventually entering the vasculature. In fact, these processes mainly involve the ability of single metastatic malignant cells to squeeze and crawl within confined environments with a limited space available due to the presence of dense matrices and cell linings.

It would also be interesting to analyze if the relation between nuclear deformability and cell invasive potential varies, in terms of relevance, in the case of collective migration, which is typically involved in most in vivo phenomena. In these cases, a differentiation may in fact occur among individuals within the same ensemble, such as the emergence of tip and stalk cells during angiogenic processes, which may imply differentiated motile phenotypes [55,56].

The proposed approach could be finally applied to the design of synthetic implant materials, i.e., acellular scaffolds with optimal values of pore size that may accelerate cell in-growth, critical for regenerative treatments [4,5,57].

However, to increase the realism of the future model applications, some mechanisms/processes, here disregarded but that play a major role in establishing cell migratory ability, should be included, such as (i) matrix digestion and deposition by moving individuals, which alter the surrounding space by opening paths, generating traction, increasing adhesion, and contact guidance, and (ii) possible pressure-driven displacements of tissue walls, which result in an adjustment of the geometry of the surrounding environment that may facilitate individual locomotion [56].

A significant model improvement would finally amount to the inclusion of intracellular chemical pathways, triggered by external stimuli of distinct natures. For instance, chemotactic substances (e.g., EGF and PDGF) typically activate PM receptors and therefore initiate downstream cascades that involve the biosynthesis, in the sub-plasma-membrane regions, of molecular mediators such as PI3K and MAPK [58]. Such molecules in turn induce the production of small GTPases [59], which are able to regulate several cell responses, including adhesion, migration, and, eventually, proliferation (which is not relevant for our study). From a modeling perspective, one could first focus on a subgroup of these endogenous chemicals and describe their interconnected kinetics by a system of PDEs, solved within the cytosolic compartment of the virtual cell. Constitutive laws should then be set to establish the dependence of CPM parameters, such as the Boltzmann temperatures *T* and the mechanical moduli *κ* and *ν* (i.e., those that describe cell properties), on the

amount and the distribution of the endogenous chemicals included in the picture. We employed such a strategy to analyze the role of intracellular calcium signals, stimulated by an exogenous chemoattractant, in the process of vascular formation; see [39,60].

Intracellular pathways can be also activated by mechanical stimuli. In this respect, it would be relevant to relate lamin dynamics, which, as seen, regulate nuclear deformability, to the intensity of the stresses to which the nuclear envelope is subjected (that can be measured in terms of the moduli of the forces *Fx* and *Fy* or of the deformation ratio *r*N). Coherently, the parameter *ν*<sup>N</sup> should be defined as a function of the actual amount of lamins present within the cell.

Molecular inside-out signaling also occurs between moving cells and ECM elements, which are able to change the activity of intracellular molecular motors such as the alreadycited GTP proteins, thereby mediating cytoskeletal contractility (Rac and Rho) [11].

It is useful to remark that the inclusion of one or more of the above-described intracellular dynamics is facilitated by the compartmentalization approach at the basis of our cell representation: it in fact allows a proper localization of the endogenous pathways of interest. However, we have to underline that such model refinements would be computationally expensive (especially in the case of the inclusion of complex-enough chemical cascades).

**Author Contributions:** Both authors equally contributed to the work. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by the Italian Ministry of Education, University and Research (MIUR) through the "Dipartimenti di Eccellenza" Programme (2018-2022)—Department of Mathematical Sciences "G. L. Lagrange", Politecnico di Torino (CUP: E11G18000350001). Both authors are members of GNFM (Gruppo Nazionale per la Fisica Matematica) of INdAM (Istituto Nazionale di Alta Matematica), Italy.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data are contained within the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Analysis of the Transient Behaviour in the Numerical Solution of Volterra Integral Equations**

**Eleonora Messina 1,2,\*,† and Antonia Vecchio 2,3,†**


**Abstract:** In this paper, the asymptotic behaviour of the numerical solution to the Volterra integral equations is studied. In particular, a technique based on an appropriate splitting of the kernel is introduced, which allows one to obtain vanishing asymptotic (transient) behaviour in the numerical solution, consistently with the properties of the analytical solution, without having to operate restrictions on the integration steplength.

**Keywords:** Volterra integral equations; asymptotic-preserving; numerical stability

**Citation:** Messina, E.; Vecchio, A. Analysis of the Transient Behaviour in the Numerical Solution of Volterra Integral Equations. *Axioms* **2021**, *10*, 23. https://doi.org/10.3390/ axioms10010023

Academic Editor: Luigi Brugnano

Received: 20 January 2021 Accepted: 18 February 2021 Published: 23 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Volterra integral equations (VIEs) of the type

$$\mathbf{x}(t) = \mathbf{g}(t) + \int\_0^t \mathbf{K}(t, \mathbf{s})\mathbf{x}(\mathbf{s})d\mathbf{s}, \ t \in [0, +\infty), \tag{1}$$

and their discrete version,

$$\mathbf{x}(t) = \mathbf{g}(t) + \sum\_{s=0}^{t} \mathbf{K}(t, \mathbf{s}) \mathbf{x}(s), \ t = 1, 2, \dots, \ x(0) \text{ given},\tag{2}$$

are significative mathematical models for representing real-life problems involving feedback and control [1,2]. The analysis of their dynamics allows one to describe the phenomena they represent. In [3], the two equations were analysed in the unifying notation of time scales, and some results were obtained under linear perturbation of the kernel. Here, we revise this approach to obtain results on classes of linear discrete equations whose kernel can be split into a well-behaving part (the unperturbed kernel) plus a term that acts as a perturbation. The implications for numerical methods are, in general, not straightforward and pass through some restrictions on the step length. Nevertheless, here, we overcome this problem and obtain some results on the stability of numerical methods for VIEs.

For Equations (1) and (2), we assume that *x*(*t*) = (*x*1(*t*),..., *xd*(*t*)) *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*d*, *<sup>g</sup>*(*t*) = (*g*1(*t*),..., *gd*(*t*)) *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*d*, and *<sup>K</sup>*(*t*,*s*) = *Kij*(*t*,*s*) *<sup>i</sup>*,*j*=1,...,*<sup>d</sup>* is a *d* × *d* matrix.

The paper is organised as follows. In Section 2, we introduce the split kernel for Equation (2) and, using a new formulation of Theorem 2 in [3], we provide sufficient conditions for the above-mentioned splitting that imply that the solution vanishes. In Section 3, we propose a reformulation of the (*ρ*, *σ*) methods for (1) as discrete Volterra equations and exploit the theory developed in the previous section in order to investigate its numerical stability properties. In Section 4, some applications are described and analysed through

the tools developed in Sections 2 and 3, for which we obtain new and more general results on the asymptotic behaviour for the numerical solutions of both linear and nonlinear equations. In Section 5, some numerical examples are reported.

#### **2. Asymptotics for Discrete Equations**

Consider the discrete Volterra Equation (2) with *K*(*t*,*s*) = *P*(*t*,*s*) + *Q*(*t*,*s*), where *P* and *Q* are *d* × *d* matrices. Let *rQ*(*t*,*s*) be the resolvent kernel associated with *Q*(*t*,*s*), which is defined as the solution of the equation:

$$r\_Q(t, \mathbf{s}) = Q(t, \mathbf{s}) + \sum\_{l=s+1}^{t} r\_Q(t, l) Q(l, \mathbf{s}). \tag{3}$$

The following theorem, which we proved in [3], represents the starting point of our investigation.

**Theorem 1.** *Assume that for Equation* (2) *with K*(*t*,*s*) = *P*(*t*,*s*) + *Q*(*t*,*s*), *s* = 0, ... , *t, it holds that:*


*Then, if* lim*t*→<sup>∞</sup> *g*(*t*) = 0,

$$\lim\_{t \to \infty} \mathbf{x}(t) = 0.$$

This theorem is particularly interesting when the matrix *P* in the splitting of kernel *K* is such that *P*(*t*,*s*) = 0 for *s* = *M* + 1, ... , *t* − *N* − 1, with *M* and *N* positive constants and *t* ≥ *N* + *M* + 1. Then, Equation (2) can be rewritten as

$$\mathbf{x}(t) = \mathbf{g}(t) + \sum\_{s=0}^{M} P(t,s)\mathbf{x}(s) + \sum\_{s=0}^{t} Q(t,s)\mathbf{x}(s) + \sum\_{s=t-N}^{t} P(t,s)\mathbf{x}(s),\tag{4}$$

and the following result holds. Here and in the following, the limit of matrices is intended element-wise.

**Theorem 2.** *Consider Equation* (4)*, and assume that:*

**(1)** lim*t*→<sup>∞</sup> *P*(*t*,*s*) = 0, *for s* = 0, . . . , *M*, *and* lim*t*→<sup>∞</sup> *P*(*t*, *t*) = 0.

**(2)** ∑*<sup>t</sup> <sup>s</sup>*=<sup>0</sup> *Q*(*t*,*s*) ≤ *α* < 1, lim*t*→<sup>∞</sup> *Q*(*t*,*s*) = 0, *for any fixed s* ≥ 0.

*If there exists a constant G* > 0 *such that g*(*t*) ≤ *G*, *then*

$$\|\|x(t)\|\| \le X\_\prime \text{ with } X > 0,$$

*and if* lim*t*→<sup>∞</sup> *g*(*t*) = 0, *then*

$$\lim\_{t \to \infty} \mathbf{x}(t) = 0.$$

**Proof.** Assumption (1) implies (iii) of Theorem 1 and, applying a well-known result (see, for example, [2] (Section 6) and [4]), for assumption (2), we have that (i) and (ii) hold.

The case lim*t*→<sup>∞</sup> *P*(*t*,*s*) = *P*∞(*s*) = 0 and lim*t*→<sup>∞</sup> *g*(*t*) = *g*<sup>∞</sup> = 0 can be treated by the same technique if it is known that ∑*<sup>t</sup> <sup>s</sup>*=<sup>0</sup> *Q*(*t*,*s*) → *E*<sup>∞</sup> = *I*, for *t* → ∞ (*I* is the *d* × *d* identity matrix). In this case, we have the following corollary.

**Corollary 1.** *Assume that, for Equation* (4)*, there exists M* > 0 *such that*

**(1)** lim*t*→<sup>∞</sup> *P*(*t*,*s*) = *P*∞(*s*), *for j* = 0, . . . , *M*, *and* lim*t*→<sup>∞</sup> *P*(*t*, *t*) = 0; **(2)** ∑*<sup>t</sup> <sup>s</sup>*=<sup>0</sup> *Q*(*t*,*s*) ≤ *α* < 1, lim*t*→<sup>∞</sup> *Q*(*t*,*s*) = 0, *for s* = 0, . . . , *t*; **(3)** lim*t*→<sup>∞</sup> <sup>∑</sup>*<sup>t</sup> <sup>s</sup>*=<sup>0</sup> *Q*(*t*,*s*) = *E*<sup>∞</sup> = *I*.

*If* lim*t*→<sup>∞</sup> *g*(*t*) = *g*∞, *then*

$$\lim\_{t \to \infty} \mathfrak{x}(t) = (I - E\_{\infty})^{-1} \left( \mathfrak{g}\_{\infty} + \sum\_{s=0}^{M} P\_{\infty}(s) \mathfrak{x}(s) \right).$$

**Proof.** Set *<sup>x</sup>*<sup>∞</sup> = (*<sup>I</sup>* <sup>−</sup> *<sup>E</sup>*−<sup>1</sup> <sup>∞</sup> ) *g*<sup>∞</sup> + ∑*<sup>M</sup> <sup>s</sup>*=<sup>0</sup> *P*∞(*s*)*x*(*s*) . A manipulation of (4) gives

$$\mathfrak{x}(t) = \mathfrak{z}(t) + \sum\_{s=0}^{t} \mathcal{Q}(t,s)\mathfrak{x}(s) + \sum\_{s=t-N}^{t} \mathcal{P}(t,s)\mathfrak{x}(s),$$

with *<sup>x</sup>*¯(*t*) = *<sup>x</sup>*(*t*) <sup>−</sup> *<sup>x</sup>*∞, *<sup>P</sup>*¯(*t*,*s*) = 0, for *<sup>s</sup>* <sup>=</sup> 0, ... , *<sup>M</sup>*, *<sup>P</sup>*¯(*t*,*s*) = *<sup>P</sup>*(*t*,*s*), for *<sup>s</sup>* <sup>=</sup> *<sup>t</sup>* <sup>−</sup> *<sup>N</sup>*, ... , *<sup>t</sup>*, *Q*¯(*t*,*s*) = *Q*(*t*,*s*), and

$$\mathbf{g}(t) = \mathbf{g}(t) - \mathbf{g}\_{\infty} + \left(\sum\_{s=0}^{t} \mathbf{Q}(t, s) - E\_{\infty}\right) \mathbf{x}\_{\infty} + \sum\_{s=0}^{M} (P(t, s) - P\_{\infty}(s)) \mathbf{x}\_{\infty} + \sum\_{s=t-N}^{t} P(t, s) \mathbf{x}\_{\infty}.$$

*P*¯, *Q*¯, and *g*¯ play the roles of *P*, *Q*, and *g* in (4). Recalling Theorem 1, all of the assumptions are satisfied, which implies that lim*t*→+<sup>∞</sup> *x*¯(*t*) = 0, and then lim*t*→+<sup>∞</sup> *x*(*t*) = *x*∞.

#### **3. Background Material on (***ρ***,** *σ***) Methods**

The analysis carried out in the previous section can be effectively applied to (*ρ*, *σ*) methods for the systems of VIEs:

$$y(t) = f(t) + \int\_0^t k(t, s)y(s)ds, \ t \in [0, +\infty). \tag{5}$$

Here, we consider the numerical solution to (5) obtained by the (*ρ*, *σ*) methods with Gregory convolution weights (see, for example, [5–7]):

$$y\_n = f(t\_n) + h \sum\_{j=0}^{n\_0 - 1} w\_{nj} k(t\_n, t\_j) y\_j + h \sum\_{j=n\_0}^n \omega\_{n-j} k(t\_n, t\_j) y\_j,\tag{6}$$

*n* = *n*0, *n*<sup>0</sup> + 1, ... , where *yn y*(*tn*), with *tn* = *nh* for *n* = 0, 1, ... , *h* > 0 is the step size and *wnj*, *ω<sup>j</sup>* are the weights. We assume that the weights are non-negative and that *y*<sup>0</sup> = *f*(0), *y*<sup>1</sup> ..., *yn*0−1, *n*<sup>0</sup> ≥ 1, are given starting values.

The weights *wnj*, *n* = 0, 1, ... , *j* = 0, 1, ... , *n*<sup>0</sup> − 1 are called the starting weights and satisfy (see [5]):

$$\sup\_{n\geq 0} w\_{nj} \leq \mathcal{W} < +\infty, \ j = 0, \dots, n\_0 - 1, \text{ and } \lim\_{n \to \infty} w\_{nj} = \vec{w}\_j. \tag{7}$$

Moreover, we want to underline some properties of the Gregory convolution weights, *ω<sup>n</sup>* (see, for example [5,7]), which will be useful in the subsequent sections:

$$\sup\_{\boldsymbol{\Pi}} \omega\_{\mathbb{H}} = \boldsymbol{\Omega} < +\infty,\tag{8}$$

$$
\omega\_i = 1,\text{ for } i \ge n\_0. \tag{9}
$$

From now on, we assume that *h* satisfies

$$\det(I - h\omega\_0 k(t\_{n\prime}, t\_n)) \neq 0,\tag{10}$$

where *I* is the identity matrix of size *d*.

Choose *n*<sup>∗</sup> > *n*<sup>0</sup> and let

$$P(n,j) = \begin{cases} 0, & j = 0, \dots, n\_0 - 1, \\ hk(t\_n, t\_j), & j = n\_0, \dots, n^\* - 1, \\ h\omega\_{n-j}k(t\_n, t\_j), & j = n - n\_0 + 1, \dots, n\_r \end{cases}$$

and

$$Q(n,j) = \begin{cases} 0, & j = 0, \dots, n^\*-1, \ j = n-n\_0+1, \dots, n\_{\tau}, \\\ hk(t\_n, t\_j), & j = n^\*, \dots, n-n\_0. \end{cases}$$

The (*ρ*, *σ*) method (6) can be written, for *n* = *n*<sup>∗</sup> + *n*0, *n*<sup>∗</sup> + *n*<sup>0</sup> + 1, . . . , as follows:

$$y\_n = \lg(n) + \sum\_{j=n\_0}^{n^\*-1} P(n,j)y\_j + \sum\_{j=n^\*}^{n-n\_0} Q(n,j)y\_j + \sum\_{j=n-n\_0+1}^n P(n,j)y\_j. \tag{11}$$

with *g*(*n*) = *f*(*tn*) + *h* ∑*n*0−<sup>1</sup> *<sup>j</sup>*=<sup>0</sup> *wnjk*(*tn*, *tj*)*yj*. This alternative formulation of the method in terms of matrices *P* and *Q* allows us to analyse its asymptotic properties using the theory developed in the previous paragraph for Equation (4). So, (11) corresponds to the discrete Equation (4) with *M* and *N* equal to *n*<sup>∗</sup> − 1 and *n*<sup>0</sup> − 1, respectively, and *Q*(*n*, *j*) = 0, for *j* = 0, . . . , *M*.

#### **4. Dynamic Behaviour of Numerical Approximations and Applications**

In [8], we carried out an analysis of Volterra equations on time scales that allowed us to obtain results on the asymptotic behaviour of the analytical solution of (5) and on its discrete counterpart in *h*Z, under the assumptions:

$$\sup\_{t \ge \overline{t}} \int\_{\overline{t}}^{t} \|k(t, s)\| ds \le \alpha < 1,\tag{12}$$

and

$$\sup\_{n\geq n} h \sum\_{j=n}^{n} ||k(t\_n, t\_j)|| \leq a < 1,\tag{13}$$

respectively, where *<sup>h</sup>* > 0, *tn* = *nh*, *<sup>n</sup>* = 0, 1, ... , and ¯*<sup>t</sup>* = *nh*¯ . If *k*(*t*,*s*) is non-increasing with respect to *s*, the bound (13) is certainly implied by (12) for those values of the parameter *h* such that

$$\sup\_{n\geq\bar{n}} \left( h ||k(t\_n, \bar{t})|| + \int\_{\bar{t}}^{t\_n} ||k(t\_n, \mathbf{s})|| ds \right) \leq \alpha < 1. $$

This relation, which allows one to establish a connection between the behaviour of the analytical solution of (5) and of its discrete counterpart in *h*Z, does not straightforwardly apply to numerical methods due to the presence of the weights *wnj* and *ω<sup>j</sup>* of the (*ρ*, *σ*) methods. This is because the weights can cause the loss of monotonicity, and they may also be greater than 1; then, (13) is not satisfied. In [8], it was proved that if (12), sup*t*∈[*s*,+∞) *k*(*t*,*s*) <sup>&</sup>lt; <sup>+</sup>∞, and lim*t*→<sup>∞</sup> *<sup>k</sup>*(*t*,*s*) = 0, <sup>∀</sup>*<sup>s</sup>* <sup>≥</sup> 0, are satisfied, the analytical solution *y*(*t*) of Equation (5) vanishes at infinity as lim*t*→<sup>∞</sup> *f*(*t*) = 0. Moreover, in [9], it was shown that, if sup *t*>0 *<sup>t</sup>* 0 *∂k*(*t*,*s*)/*∂sds* < +∞, then there exists a positive constant *A* such that

$$\|h\sum\_{j=n}^{n}\omega\_{n-j}\| |k(t\_{n\prime}t\_{j})|| \le \int\_{\overline{1}}^{t\_{n}} \|k(t\_{n\prime}s)\| ds + hA\_{\prime} \quad \forall n \ge n. \tag{14}$$

The bound (14) assures that, when (12) is satisfied, the numerical solution *yn* tends to zero for *n* → ∞ if the step size *h* is small enough, consistently with the behaviour of *y*(*t*).

Theorem 2 in Section 2 allows us to remove the restriction on *h* given by (14). In order to show this result, which states, in fact, the unconditional stability of the (*ρ*, *σ*) methods, we need the following preparatory lemma.

#### **Lemma 1.** *Assume that:*

**(i)** lim*t*→+<sup>∞</sup> *k*(*t*,*s*) = 0, *for any fixed* 0 ≤ *s* ≤ *t,*

**(ii)** *there exists s*¯ ≥ 0 *such that ∂k*(*t*,*s*)/*∂s* ≤ 0, *for s* > *s*¯,

**(iii)** *there exists* ¯*<sup>t</sup>* ≥ <sup>0</sup> *such that t* ¯*<sup>t</sup> k*(*t*,*s*)*ds* ≤ *α* < 1.

*Then, for any h* > 0, ∃*t* <sup>∗</sup> <sup>&</sup>gt; max {*s*¯, ¯*t*} *such that <sup>h</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=*n*<sup>∗</sup> *k*(*tn*, *tj*) ≤ *β* < 1, *where n*<sup>∗</sup> *is such that n*∗*h* > *t* ∗.

**Proof.** Let *h* > 0 be a fixed value of the step size. Assumption (ii) implies that *k*(*t*,*s*) ≤ *k*(*t*, ¯*t*) for any ¯*<sup>t</sup>* <sup>≤</sup> *<sup>s</sup>* <sup>≤</sup> *<sup>t</sup>*. Moreover, lim*t*→+<sup>∞</sup> *k*(*t*, ¯*t*) <sup>=</sup> 0 because of (i); thus, for any > 0, we choose *t* <sup>∗</sup> > max {¯*t*,*s*¯} such that

$$\|k(t,\bar{t})\| < \epsilon. \tag{15}$$

Now, we choose such that *h* + *α* ≤ *β* < 1 and *n*<sup>∗</sup> such that *n*∗*h* ≥ *t* ∗. Since, for *ii*), *k*(*tn*,*s*) is a non-increasing function in *s* for each *s* > *t* <sup>∗</sup>, we have *h* ∑*<sup>n</sup> <sup>j</sup>*=*n*<sup>∗</sup> *k*(*tn*, *tj*) ≤ *hk*(*tn*, *t* <sup>∗</sup>) + *tn <sup>t</sup>*<sup>∗</sup> *k*(*tn*,*s*)*ds* ≤ *h* + *α* ≤ *β* < 1.

**Theorem 3.** *Assume that all the hypotheses of Lemma 1 hold for the kernel k of Equation* (5)*; then, for the numerical approximation to its solution y*(*t*) *obtained by the* (*ρ*, *σ*) *method* (6)*, if* lim*t*→+<sup>∞</sup> *f*(*t*) = 0, *one has*

$$\lim\_{n \to +\infty} y\_n = 0.$$

**Proof.** For a fixed *h* > 0, Lemma 1 provides a value *n*<sup>∗</sup> > *n*<sup>0</sup> for which *h* ∑*<sup>n</sup> <sup>j</sup>*=*n*<sup>∗</sup> *k*(*tn*, *tj*) ≤ *β* < 1, with *β* positive constant. Referring to the reformulation (11) of the method, all the assumptions of Theorem 2 are satisfied. Thus, because of property (7) on the asymptotic behaviour of starting weights and of assumption *i*) of Lemma 1, *g*(*n*) = *f*(*tn*) + *h* ∑*n*0−<sup>1</sup> *<sup>j</sup>*=<sup>0</sup> *wnjk*(*tn*, *tj*)*yj* tends to zero for *n* → +∞. So, in view of Theorem 2, *yn* also vanishes.

Other applications of Theorem 2 are concerned with the equation

$$y(t) = f(t) + \int\_0^t k(t-s)(y(s) + G(s, y(s))ds,\tag{16}$$

which has been the subject of great attention in the literature (see, for example, [10–12]). Here and in the following, we assume that Equation (16) is scalar (*d* = 1), the kernel *k* = *k*(*t* − *s*) is of convolution type, *G*(*t*, *y*) is a continuous function for *t* ∈ [0, +∞), and *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>. A main assumption (see, for example, [13]) that is generally made on the nonlinear term *G* is that it represents a small perturbation, that is, there exists a function *p*(*t*) > 0 such that

$$|G(t, y)| \le p(t)|y|. \tag{17}$$

For Equation (16), the (*ρ*, *σ*) methods with Gregory convolution weights read, for *n* = *n*0, *n*<sup>0</sup> + 1, . . . ,

$$y\_n = f(t\_n) + h \sum\_{j=0}^{n\_0 - 1} w\_{n\uparrow} k(t\_n - t\_{\backslash}) (y\_{\backslash} + \mathcal{G}(t\_{\backslash}, y\_{\backslash})) + h \sum\_{j=n\_0}^n \omega\_{n-\not p} k(t\_n - t\_{\backslash}) (y\_{\backslash} + \mathcal{G}(t\_{\backslash}, y\_{\backslash})).\tag{18}$$

In order to describe the asymptotic behaviour of *yn*, we prove the following theorem.

**Theorem 4.** *Assume that, for Equation* (16)*, the following assumptions hold:*

**Hypothesis 1.** +∞ <sup>0</sup> |*k*(*t*)|*dt* ≤ *α* < 1,

**Hypothesis 2.** <sup>∃</sup>¯*<sup>t</sup>* <sup>&</sup>gt; <sup>0</sup> *such that* <sup>∀</sup>*<sup>t</sup>* <sup>&</sup>gt; ¯*t*, *<sup>d</sup>*|*k*(*t*)<sup>|</sup> *dt* < 0*,*

**Hypothesis 3.** lim*t*→<sup>∞</sup> *f*(*t*) = 0*,*

**Hypothesis 4.** lim*t*→<sup>∞</sup> *p*(*t*) = 0*.*

*Then, for the numerical solution to* (16) *obtained with the method* (18)*, one has*

$$\lim\_{n \to \infty} y\_n = 0.$$

**Proof.** From (17) and (18), with *pj* = *p*(*tj*), *j* = 0, 1, . . . ,

$$|y\_n| \le |f(t\_n)| + h \sum\_{j=0}^{n\_0 - 1} w\_{nj} |k(t\_n - t\_j)|(1 + p\_j)|y\_j| + h \sum\_{j=n\_0}^n \omega\_{n-j} |k(t\_n - t\_j)|(1 + p\_j)|y\_j|.$$

Now, consider the equation

$$\mathcal{L}\_{\mathfrak{n}} = |f(t\_{\mathfrak{n}})| + h \sum\_{j=0}^{n\_0 - 1} w\_{\mathfrak{n}j} |k(t\_{\mathfrak{n}} - t\_j)| (1 + p\_j) \tilde{\varsigma}\_j + h \sum\_{j=n\_0}^{n} \omega\_{\mathfrak{n}-j} |k(t\_{\mathfrak{n}} - t\_j)| (1 + p\_j) \tilde{\varsigma}\_j.$$

Since, from (8), *ω<sup>n</sup>* are bounded, we have that ∑*<sup>n</sup> <sup>j</sup>*=*n*<sup>0</sup> *<sup>ω</sup>n*−*j*|*k*(*tn* <sup>−</sup> *tj*)|*pj* <sup>≤</sup> <sup>Ω</sup> <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=*n*<sup>0</sup> |*k*(*tn* − *tj*)|*pj*, which is the convolution product of an *l*<sup>1</sup> (*kn*) and a vanishing (*pn*) sequence and, therefore, tends to zero as *<sup>n</sup>* <sup>→</sup> <sup>+</sup>∞. Therefore, <sup>∀</sup> <sup>&</sup>gt; 0, <sup>∃</sup>*<sup>ν</sup>* : <sup>∀</sup>*<sup>n</sup>* <sup>&</sup>gt; *<sup>ν</sup>*, *<sup>h</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=*n*<sup>0</sup> *<sup>ω</sup>n*−*j*|*k*(*tn* − *tj*)|*pj* < . We choose > 0 such that *<sup>α</sup>* + ≤ *<sup>β</sup>* < 1 and *<sup>n</sup>*¯ such that *nh*¯ ≥ ¯*<sup>t</sup>* in assumption *h*2). With *n*<sup>∗</sup> ≥ max {*ν*, *n*¯}, the equation for *ζ<sup>n</sup>* can be written in the more convenient form, (11), for which all the assumptions of Theorem 2 hold. Thus, because of property (7) on the asymptotic behaviour of starting weights and because of the vanishing behaviour of the kernel *k*, *g*(*n*) = *f*(*tn*) + *h* ∑*n*0−<sup>1</sup> *<sup>j</sup>*=<sup>0</sup> *wnjk*(*tn* − *tj*)(1 + *pj*)*yj* tends to zero for *n* → +∞. Therefore, lim*n*→<sup>∞</sup> *ζ<sup>n</sup>* = 0. This ends the proof because, using the comparison theorem in [14], |*yn*| ≤ *ζn*.

This theorem states that the numerical solution *yn* of (16) vanishes when the forcing term *f* tends to zero for any step size *h* > 0. The result is, of course, more interesting if we know that the analytical solution to (16) tends to zero. This can be proved by means of a result that the authors proved in [8]. To be more specific, the assumptions of Theorem 4 here assure that all the hypotheses of Theorem 9 in [8] are satisfied, thus implying that lim*t*→<sup>∞</sup> *y*(*t*) = 0.

The following result, which we prove in the case of scalar equations, represents a generalisation of Theorem 3.1 in [15], where the numerical stability of the (*ρ*, *σ*) methods up to order 3 was proved under some restriction on the step length *h*. In this paper, by applying Theorem 2 to the (*ρ*, *σ*) (6), we remove the constraint on the step size, and extend the investigation to any method in the class of (*ρ*, *σ*).

**Theorem 5.** *Assume that, for Equation* (5)*, with d* = 1, *it holds that:*

**(i)** <sup>∃</sup>¯*<sup>t</sup>* <sup>&</sup>gt; <sup>0</sup> *such that* <sup>∀</sup>*<sup>s</sup>* <sup>&</sup>gt; ¯*t*, *<sup>∂</sup> <sup>∂</sup><sup>t</sup>* |*k*(*t*,*s*)| ≤ 0, **(ii)** <sup>|</sup>*k*(*t*, *<sup>t</sup>*)<sup>|</sup> <sup>=</sup> *<sup>ϕ</sup>*(*t*) <sup>∈</sup> *<sup>L</sup>*1[0, <sup>+</sup>∞)*,* **(iii)** *ϕ* (*t*) <sup>≤</sup> 0, +∞ <sup>0</sup> |*ϕ* (*t*)|*dt* ≤ Φ < ∞, *and* lim*t*→<sup>∞</sup> *ϕ*(*t*) = 0 *,* **(iv)** lim*t*→<sup>∞</sup> *f*(*t*) = 0.

*Then. for the numerical solution yn*, *obtained with the* (*ρ*, *σ*) (6)*, it holds that*

$$\lim\_{n \to \infty} y\_n = 0. \qquad \left(\lim\_{n \to \infty} y\_n = 0, \text{ for } \int\_{\text{all}}^{+\infty} \varphi(t) dt + h \int\_{\text{all}}^{+\infty} |\varphi'(t)| dt < 1. \right)$$

**Proof.** Due to hypotheses (i) and (ii), there exists *s*¯ > ¯*t* such that

$$\lim\_{t \to \infty} |k(t, s)| = 0, \text{ for all } s > \overline{s}. \tag{19}$$

Let us fix *h* > 0. From the assumptions, it is clear that *ϕ*(*t*) ≤ *ϕmax*, with *ϕmax* > 0. So, (ii) implies that +∞ *nh*¯ *ϕ*(*t*)*dt* + *hϕ*(*nh*¯ ) ≤ +∞ *nh*¯ *ϕ*(*t*)*dt* + *hϕmax* ≤ *α* < 1, for some *n*¯ = *n*¯(*h*), which we choose such that *nh*¯ > *s*¯. Since, for (iii), *ϕ* is a non-increasing function, we have

$$\|h\sum\_{j=n}^{n-n\_0} |k(t\_n, t\_j)| \le h \sum\_{j=n}^{n-n\_0} |k(t\_j, t\_j)| = h \sum\_{j=n}^{n-n\_0} \varphi(t\_j) \le \int\_{\text{all}}^{+\infty} \varphi(t)dt + h\varphi(\text{lth}) \le a < 1. \tag{20}$$

Then, referring to formulation (11) of the numerical method with *n*∗ = *n*¯, we want to prove that

$$\sup\_{n} h \sum\_{j=n}^{n-n\_0} |Q(n,j)| = \sup\_{n} h \sum\_{j=n}^{n-n\_0} \omega\_{n-j} |k\_{n,j}| < 1. \tag{21}$$

Here, *<sup>n</sup>* − *<sup>j</sup>* > *<sup>n</sup>*0; thus, *<sup>ω</sup>n*−*<sup>j</sup>* = 1. So, (21) is guaranteed by (20). Furthermore, in view of (19), it is lim*n*→<sup>∞</sup> *Q*(*n*, *j*) = 0, for any fixed *j* ≥ *n*¯, and, because of (ii) and (iii), also lim*n*→<sup>∞</sup> *P*(*n*, *n*) = 0. Hence, as all the assumptions of Theorem 2 are accomplished, lim*n*→<sup>∞</sup> *yn* = 0, without imposing any restriction on the step size *h*.

If, however, assumption (iii)2, holds instead of (iii)1, the step size *h* has to be chosen such that *h*Φ ≤ *β*<sup>1</sup> < 1 and +∞ *nh*¯ *ϕ*(*t*)*dt* ≤ *β*2, with *β*<sup>1</sup> + *β*<sup>2</sup> ≤ *α* < 1. So, by Lemma 1 in [9],

$$h\sum\_{j=\bar{n}}^{n-m\_0} |Q(n,j)| \le \int\_{\mathfrak{M}}^{+\infty} q(t)dt + h\int\_{\mathfrak{M}}^{+\infty} |q'(t)|dt < \beta\_2 + \beta\_1 \le n < 1.$$

Consider now the following convolution equation:

$$\mathbf{x}(t) = f(t) - \int\_0^t a(t-s)\mathbf{x}(s)ds. \tag{22}$$

Its solution has the form

$$\mathbf{x}(t) = f(t) - \int\_0^t \mathbf{R}(t-s)f(s)ds,$$

where the resolvent kernel R is the solution of the equation:

$$R(t) = a(t) - \int\_0^t a(t-s)R(s)ds,\tag{23}$$

*<sup>t</sup>* <sup>≥</sup> 0. If the kernel *<sup>a</sup>*(*t*) of Equation (22) is completely monotone, that is, (−1)*<sup>j</sup> a*(*j*)(*t*) > 0, *j* = 0, 1, ... , *t* ≥ 0, then (see, e.g., [16]) the resolvent *R*(*t*) is also completely monotone. Furthermore, the analytical solution *x*(*t*) and its numerical approximation *xn* obtained by a (*ρ*, *σ*) method both tend to zero as *t* → ∞ and *n* → ∞, respectively, when the forcing term *f*(*t*) tends to zero (see [6]). We point out that if lim*t*→<sup>∞</sup> *a*(*t*) = 0, then lim*t*→<sup>∞</sup> *R*(*t*) = 0 as well, as *R*(*t*) is the solution of a Volterra Equation (23) where the kernel is completely monotone and the forcing tends to zero. The significance of completely monotone kernels in Volterra equations is underlined in [13] (p. 27).

A nonlinear perturbation to (22) yields

$$y(t) = g(t) - \int\_0^t a(t - s)(y(s) + G(s, y(s)))ds.\tag{24}$$

This equation can be written in terms of the unperturbed solution as (see [13]):

$$\mathbf{y}(t) = \mathbf{x}(t) - \int\_0^t \mathbf{R}(t-s)\mathbf{G}(s, \mathbf{y}(s))ds. \tag{25}$$

Starting from assumption (17) on the nonlinear term *G*, and from the relation (25), we want to investigate the asymptotic behaviour of the numerical solution to (24) when it is known that lim*n*→<sup>∞</sup> *xn* = 0.

**Theorem 6.** *Consider Equation* (24)*, and assume that* (17) *holds for the function G and that:*

**1.** *a*(*t*) *is completely monotone and* lim*t*→<sup>∞</sup> *a*(*t*) = 0*,* **2.** *<sup>p</sup>*(*t*) <sup>∈</sup> *<sup>L</sup>*1[0, <sup>+</sup>∞), *<sup>p</sup>* (*t*) < 0*.*

*If* lim*t*→<sup>∞</sup> *g*(*t*) = 0, *then the solution y*(*t*) *and the numerical solution yn obtained by the* (*ρ*, *σ*) *method* (6) *satisfy*

$$\lim\_{t \to \infty} y(t) = 0, \text{ and } \lim\_{n \to \infty} y\_n = 0.$$

**Proof.** For assumption 1, the solution *x*(*t*) of Equation (22) with a completely monotone kernel satisfies lim*t*→<sup>∞</sup> *x*(*t*) = 0. This also holds true for its numerical approximation (see, for example, [6]).

Considering Equation (25), it is

$$|y(t)| \le |x(t)| + \int\_0^t R(t-s)p(s)|y(s)|ds.$$

Since *R*(*t*) is completely monotone and *p*(*t*) is bounded, the solution of the equation

$$z(t) = |\mathbf{x}(t)| + \int\_0^t \mathcal{R}(t-s)p(s)z(s)ds,\tag{26}$$

satisfies lim*t*→<sup>∞</sup> *z*(*t*) = 0. By using the comparison theorem (see, for example, [17]), *y*(*t*) also tends to zero. Considering the numerical solution *zn* of (26), we want to show, by means of Theorem 5, that lim*n*→<sup>∞</sup> *zn* = 0. Then, *yn* will also vanish.

This is true because all the assumptions of Theorem 5 are satisfied. Indeed:


#### **5. Numerical Examples**

In this section, we report some numerical experiments in order to experimentally prove the theoretical results illustrated in Section 4. For our experiments, we choose illustrative test equations and we use the (*ρ*, *σ*) method (6) with trapezoidal weights.

In our first example, we refer to Equation (5) with the kernel *k* given by

$$k(t,s) = 10s e^{-s(t+1)},\tag{27}$$

and the forcing term *f*(*t*) such that the solution *y*(*t*) = *e*−*<sup>t</sup>* . Since *f*(*t*) tends to zero as *t* goes to infinity, all the assumptions of Theorem 3 are satisfied (for example, with ¯*t* > 3.5), and thus, both the numerical solution and the continuous one vanish. This is also clear in Figure 1.

**Figure 1.** Numerical solution to problem (27) compared to the analytical one.

Now, consider Equation (16) with

$$k(t-s) = (t-s)e^{-2(t-s)},\ \ G(t,y) = 2y \frac{e^{-y^2 t}}{(1+t^2)(1+y^2)},\ \ \text{and}\ f(t) = e^{-t^2}.\tag{28}$$

In Figure 2, we draw the numerical solution obtained with step size *h* = 0.1, which clearly vanishes at infinity, according to Theorem 4, since all assumptions are accomplished with *p*(*t*) = <sup>2</sup> <sup>1</sup>+*t*<sup>2</sup> and ¯*<sup>t</sup>* <sup>&</sup>gt; <sup>1</sup> 2 .

**Figure 2.** Numerical solution to problem (28) with *h* = 0.1.

Our third example consists in Equation (5) with

$$k(t,s) = \frac{1}{(1+2t-s)^2} \tag{29}$$

and *f*(*t*) such that the solution *y*(*t*) = <sup>1</sup> *<sup>t</sup>*+<sup>1</sup> . According to Theorem 5, with *<sup>ϕ</sup>*(*t*)=(<sup>1</sup> + *<sup>t</sup>*)−2, since *f*(*t*) tends to zero, the numerical solution vanishes regardless of the step size *h*, thus replicating the asymptotic behaviour of the continuous one. This behaviour is shown in Figure 3.

**Figure 3.** Numerical solution to problem (29) with *h* = 0.2, compared to the analytical one.

In all our experiments, we used sizes for the meshes that ensure reasonable accuracy in the numerical solution at finite times. Integration with larger discretisation steps naturally introduces greater errors on finite time intervals, but the numerical solution maintains the expected behaviour at infinity. Thus, this confirms the asymptotic-preserving characteristics of the numerical schemes without restrictions on *h*. This can be observed, for example, in Figure 4, again referring to example (29) with *h* = 1.

**Figure 4.** Numerical solution to problem (29) with *h* = 1, compared to the analytical one.

#### **6. Conclusions**

Starting from an idea developed in [3], here, we have introduced a technique for the analysis of the vanishing behaviour of the numerical solution to VIEs. This new approach, which is based on suitable splittings of the kernel function, allows one to preserve the

character of the analytical solution even in the weighted sums that appear in the method, thus leading to unconditional stability results in many applications of interest.

**Author Contributions:** Both authors contributed equally to this work. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the INdAM-GNCS 2020 project "Metodi numerici per problemi con operatori non locali".

**Acknowledgments:** The authors are grateful to the anonymous reviewers for their constructive comments, which helped to improve the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Three-Phase Fundamental Diagram from Three-Dimensional Traffic Data**

**Maria Laura Delle Monache 1, Karen Chi 2, Yong Chen 3, Paola Goatin 4, Ke Han 5, Jing-mei Qiu <sup>6</sup> and Benedetto Piccoli 7,\***


**Abstract:** This paper uses empirical traffic data collected from three locations in Europe and the US to reveal a three-phase fundamental diagram with two phases located in the uncongested regime. Model-based clustering, hypothesis testing and regression analyses are applied to the speed– flow–occupancy relationship represented in the three-dimensional space to rigorously validate the three phases and identify their gaps. The finding is consistent across the aforementioned different geographical locations. Accordingly, we propose a three-phase macroscopic traffic flow model and a characterization of solutions to the Riemann problems. This work identifies critical structures in the fundamental diagram that are typically ignored in first- and higher-order models and could significantly impact travel time estimation on highways.

**Keywords:** macroscopic models; traffic data; gap analysis; multi-phase models

**MSC:** 35L65; 76A30; 62H30

#### **1. Introduction**

In the last seventy years, many traffic flow models have been developed and researched. Two of the most commonly used macroscopic models are the celebrated firstorder Lighthill–Whitham–Richards (LWR) model, [1,2] and the second-order Aw-Rascle– Zhang model [3,4]. In both cases, the so-called Fundamental Diagram (FD) provides a closure of the evolution equations, thus allowing a well-posed theory and well-grounded simulation tools (see [5]). The FD usually refers to the empirically observed flow-occupancy curve, which in mathematical terms refers to the functional relationship between flow and density (modeling counterpart of occupancy) or between average speed of vehicles and density. For macroscopic fluid-dynamic models, there is a rich discussion on FD (see, e.g., [5–10]).

In this article, we focus on the FD for single roads by proposing a new approach to study the fundamental relationship among flow, density and speed. We propose novel statistical methodologies to analyze traffic data from fixed sensors, focusing on the three-leg relationships among the flow, density and speed. In particular, rather than considering the FD as a two-quantity relationship (flow–density or speed–density), we analyze data in the three-dimensional space represented by flow, density and speed. This allows us to better exploit the statistical tools, in particular for the analysis of traffic regimes.

**Citation:** Delle Monache, M.L.; Chi, K.; Chen, Y.; Goatin, P.; Han, K.; Qiu, J.; Piccoli, B. A Three-Phase Fundamental Diagram from Three-Dimensional Traffic Data. *Axioms* **2021**, *10*, 17. https://doi.org/ 10.3390/axioms10010017

Academic Editor: Hari Mohan Srivastava

Received: 5 November 2020 Accepted: 28 January 2021 Published: 7 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

We recall that, in equilibrium regimes, the fundamental relationship *flow* = *density* × *speed* dictates that traffic measurement points should lie on a three-dimensional surface (see, e.g., [11] Figure 4.1). In reality, observed traffic largely deviates from equilibrium and usually exhibits *free* and *congested* phases, with the first corresponding to stable and regular traffic, while the second reflects delays and congestion. Moreover, in the early 2000s, Kerner [12] introduced a tree-phase traffic theory, based on the distinction among *free flow*, *synchronized flow* and *wide-moving jam*. The last two phases are associated with congested traffic.

In this paper, using clustering methodologies, we are able to identify three traffic regimes, which are distinct in a statistically significant fashion. Interestingly, two regimes appear in what is commonly referred to as the free flow traffic and the third corresponds to the congested phase. This analysis does not contradict Kerner's theory but rather points out that the static/stationary free-flow condition in the FD could exhibit two distinct phases, while the distinction of phases in congested traffic (e.g., Kerner's model) is mainly dynamic.

The second main empirical result of our paper is the clear evidence of the existence of a gap between the two phases of free flow and the congested one. While the appearance of such gap is best visualized in the 3D representation of the FD relationships, we use the classical flow–density relationship to statistically prove the existence of the gap. The main purpose is to prove the ubiquity (with respect to data collected at different geographical location and on different road types) of the gap in the classical setting and to enable a simpler analysis.

Building on the empirical evidence illustrated thus far, we propose a new three-phase macroscopic model. The LWR model is very popular in the traffic literature due to its simple mathematical representation. However, it has certain modeling limitations especially when it comes to describing complex wave structures such as stop and go waves, phantom jam and capacity drop. To overcome various limitations, Aw-Rascle [3] and independently Zhang [4] proposed a new model with conservation of a modified momentum. This so-called Aw-Rascle–Zhang (ARZ) model can be interpreted as part of a general family called General Second-Order Models (GSOM, see [13]). Such models consist of the usual conservation of mass and the advective transport of a Lagrangian (or single driver) variable, which can represent, for instance, the desired speed of drivers. A recently proposed model of this category is the Collapsed Generalized ARZ model (CGARZ [14]), where the driver speed depends on the Lagrangian variable only in the congested phase. Another line of research focuses on models showing two distinct phases, called the phase transition models [6,8,9].

Our proposed model is a combination of the features offered by the ARZ, CGARZ and phase transition models. Our three-phase model not only has the characteristics of a CGARZ model with a gap among phases when analyzed in the flow–density space, but also exhibits the newly discovered phase when analyzed in the speed–density space. After showing how our model performs in data fitting, we provide a complete characterization of the characteristic curves and the solutions of the Riemann problems. The latter are the building block for solutions to Cauchy problems (see [5]). To sum up, the main novelties and contributions of our paper are as follows:


multiple data sources, and the main features (regimes and gaps) are consistent across different geographical areas.

• Building on the first two, we propose a new three-phase macroscopic traffic flow model, which exhibits all the characteristics shown by our data analyses and combines the features of the ARZ, CGARZ and phase transition models. A complete characterization of solutions of the Riemann problems is provided.

The article is organized as follows. In Section 2, we introduce the datasets, their statistical analysis and the results obtained. Moreover, we describe the impacts of these results on traffic modeling. Lastly, in Section 3, we propose a new three-phase macroscopic model.

#### **2. Data Analysis**

In this section, we describe the data analyzed in the paper and then present the statistical analysis performed.

#### *2.1. Experimental Data*

We consider traffic data collected by static sensors (magnetic coils or radars) located on urban and extra-urban roads and highways. Sensors capture these traffic data regularly over a period of time. The sensor data provide the following aggregated quantities which are measured independently over a short time interval (3–10 min).


Occupancy acts as a surrogate for the true density of traffic, as true density is practically difficult to capture, although there is some measurement error involved with its calculation. It is know that density and occupancy are correlated at lower densities, but this does not extend to higher densities. The data were collected from three different locations: Rome (Italy), Las Vegas (NV, USA) and Sophia Antipolis (France). The Rome data were provided by ATAC S.p.a. [15] (the municipal society for traffic monitoring and control of Rome) and refer to a road in the city of Rome, Viale del Muro Torto, which links the historical center with the northern area of the city. Data were collected over a period of a week on three sensors. Each collected quantity (occupancy, flow and speed) was aggregated on 1 min intervals. The data from Las Vegas were collected by the Regional Transportation Commission of Southern Nevada (RTC), Freeway and Arterial System of Transportation (FAST) [16]. The data were collected from 50 urban and freeway sensors over a period of five years and aggregated on 10 min intervals. The data from Sophia Antipolis were collected by the Départment des Alpes-Maritimes [17] on two extra-urban sensors over a period of eight months and aggregated over 6 min intervals. For more details on the data, we refer the reader to Appendix A. Despite the fact that the data were aggregated at different intervals, the results, as shown below, are consistent. Since we primarily focused on the three traffic characteristics of flux, velocity and occupancy, we were conveniently positioned to analyze the data in three dimensions, a novel concept and approach that is described in the next section.

#### *2.2. Statistical Tools*

#### 2.2.1. Cluster Analysis

Cluster analysis is the classification of data with a previously unknown structure and the partitioning of a dataset into meaningful subsets. Clustering sheds light on hidden or non-intuitive relationships between those data and their attributes. Each cluster contains a group of objects that are more closely related to each other than they would be as objects of other clusters. *The concept of distance is thus inherently crucial in the process of cluster analysis, as clusters are grouped based on the results of this measure.* Distance serves as a way to evaluate

the closeness, as well as dissimilarity, of pairs of observations. There are at least two options to conduct cluster analysis for this traffic data: model-based clustering (e.g., mixture of normals) and non-parametric clustering (e.g., k-means). Although k-means is popular for complex and high-dimensional data, it is generally used for data involving variables of the same scale (hence, more suitable for data with spherical clusters, e.g., Euclidean distance in 3D), whereas our data consist of three variables of different scales. For this reason, model-based clustering has more flexibility in the shape of clusters; for instance, mixture models [18] can identify clusters in the traffic data that were ellipsoidal.

Empirical evaluations on the distributions of the three traffic variables through quantile–quantile (Q-Q) plot, Shapiro test and Box–Cox transformation have suggested that normal distributions are appropriate. Here, we propose the use of a finite mixture model with *G* multivariate normals [18]. Specifically, denote data **y** with independent trivariate observations (flux, velocity and occupancy) {**y1**, **y2**,..., **yn**} the likelihood for a mixture model with *G* components is

$$\ell(\theta\_1, \theta\_2, \dots, \theta\_G; \pi\_1, \pi\_2, \dots, \pi\_G | \mathbf{y}) = \prod\_{i=1}^n \sum\_{k=1}^G \pi\_k f\_k(\mathbf{y}\_i | \theta\_k)\_i$$

where *i* stands for *i*th observation, *fk*(·) and *θ<sup>k</sup>* are the density function and model parameters of the *k*th cluster in the mixture and *π<sup>k</sup>* is the probability that an observation belongs to the *<sup>k</sup>*th cluster, subject to the simplex constraint {*π<sup>k</sup>* <sup>≥</sup> 0; <sup>∑</sup>*<sup>G</sup> <sup>k</sup>*=<sup>1</sup> *π<sup>k</sup>* = 1}. Such a model can be fitted by the expectation-maximization (EM) algorithm and is implemented by the R package 'mclust'.

#### 2.2.2. Three Phase Traffic

Figure 1 provides a 3D visualization and cluster analysis result on the Rome dataset, where observations in different clusters are marked by different colors. Previous knowledge assumed that traffic involves two clusters: free flow and congestion. Free flow corresponds to steady traffic flow at high speeds (and low densities), while congestion is characterized by low flux and reduced speeds. From this new 3D visualization of data, we can identify a third phase, which we call the **"free choice phase"** , which corresponds to the situation of a relatively empty road, whereby drivers choose their speed independently without influence from or interaction with other vehicles.

**Figure 1.** 3D visualization and cluster analysis results of Rome data suggest the existence of third phase (red) in addition to the free flow (blue) and congestion (green) phases.

In the free choice phase, the flow of cars is low while the speed is variable. Model selection procedures (e.g., Bayesian information criterion (BIC) or adjusted BIC) have been used to select the number of clusters, and the datasets from Rome, Nevada and Sophia Antipolis have consistently suggested the existence of the third phase. Such additional phase is incorporated into the mathematical modeling.

Notice that our three phases are different from those indicated by Kerner [12]. Indeed, we have two sub-phases in the free phase cluster opposed to Kerner's model with two sub-phases in the congestion phase cluster.

#### 2.2.3. Gap Analysis

We developed and applied a rigorous hypothesis testing procedure to the datasets to formally investigate the presence of phase transitions. Specifically, investigating the presence of phase transition can be formulated as testing the existence of a "gap region" at the upper portion of occupancy in the free phase and its proximity to the lower portion of occupancy in congestion. As shown in Figure A1 (left) in Appendix A, such gap region can be potentially masked by isolated points in the gap, which could be in fact due to measurement errors or random variations in flux and occupancy. To reduce the impact of these isolated points, we propose to take the upper quantile of the free phase (e.g., 95th percentile, denoted as *ρFP*) and the lower quantile of the congested phase (e.g., 5th percentile, denoted as *ρC*) and formally test for *H*<sup>0</sup> : *ρFP* ≥ *ρC*, i.e., there is no gap, against *Ha* : *ρFP* < *ρC*, i.e., there is a gap. Figure 2 illustrates these two scenarios of *H*<sup>0</sup> and *Ha*.

**Figure 2.** Illustration of hypothesis testing procedure for phase transition.

For given pre-specified percentiles *qFP* and *qC* (e.g., 95% and 5%), denote *ρ*ˆ*FP* and *ρ*ˆ*<sup>C</sup>* as the corresponding quantiles in the two clusters. The existence of phase transition can be formally tested by a *one-sided test* based on the Wald statistic

$$T = \min\{\rho\_{FP} - \rho\_C, 0\} / \sqrt{\text{var}(\rho\_{FP} - \rho\_C)}.$$

The variance var(*ρ*ˆ*FP* − *ρ*ˆ*C*) can be approximated by

$$\text{var}(\mathfrak{\hat{\rho}}\_{FP} - \mathfrak{\hat{\rho}}\_{\mathbb{C}}) = \text{var}(\mathfrak{\hat{\rho}}\_{FP}) + \text{var}(\mathfrak{\hat{\rho}}\_{\mathbb{C}}) \approx \frac{q\_{FP}(1 - q\_{FP})}{n\_{FP} \{f(F^{-1}(q\_{FP}))\}^2} + \frac{q\_{\mathbb{C}}(1 - q\_{\mathbb{C}})}{n\_{\mathbb{C}} \{f(F^{-1}(q\_{\mathbb{C}}))\}^2},$$

where *f*(·) and *F*(·) stand for the estimated distribution function and cumulative distribution function of *ρ*, respectively, and *nFP* and *nC* stand for the number of data points in the free and congestion phases, respectively. The first equality in the calculation of variance is due to the independence of *ρ*ˆ*FP* and *ρ*ˆ*<sup>C</sup>* as they are estimated from different sets of observations. The approximation is due to a standard result for asymptotic variance of percentile estimates (see [19]). The complete sets of results for the statistical analysis on the three datasets can be found in Appendix B.

#### **3. A Macroscopic Second-Order Model Accounting for the 3 Phases**

Following the approach of Colombo *et al.* [7], Fan *et al.* [14], we propose a new macroscopic model accounting for the three phases derived in the previous sections. In conservation form, the model can be expressed as

$$\begin{aligned} \partial\_t \rho + \partial\_x (\rho \, v (\rho \, y / \rho)) &= 0, \\ \partial\_t y + \partial\_x (y \, v (\rho \, y / \rho)) &= 0, \end{aligned} \tag{1}$$

where the velocity function is chosen such that

$$v(\rho, y/\rho) = \begin{cases} v\_{\rm FC}(\rho, y/\rho), & \text{if} 0 < \rho \le \rho\_{\rm FC} \\ v\_{\rm FP}(\rho), & \text{if} \rho\_{\rm FC} < \rho \le \rho\_{\rm FP} \\ v\_{\rm C}(\rho, y/\rho), & \text{if} \rho\_{\rm C} \le \rho \le \rho\_{\rm max\_{\rm A}} \end{cases} \tag{2}$$

for some 0 < *ρFC* < *ρFP* < *ρ<sup>C</sup>* < *ρ*max, and it is continuous at *ρFC* and *ρFP*. In (1)–(2), the quantity *w* = *y*/*ρ* ∈ [*w*min, *w*max] may represent various traffic characteristics, such as vehicles classes [20], aggressiveness [21], desired spacing [22] or perturbation from equilibrium [23], which are transported with the traffic stream. We refer to the variable *y* = *ρw* as a *total property* [14]. The function *v* defined in (2) must be:


With the above assumptions, the corresponding flux function *q*(*ρ*, *w*) = *ρ v*(*ρ*, *w*) satisfies *q*(0, *w*) = *q*(*ρ*max, *w*) = 0 for all *w*.

To take into account the possible presence of a gap, as suggested by our analysis, we fix the value *v*max *<sup>C</sup>* <sup>≤</sup> *<sup>v</sup>*min *FP* := *vFP*(*ρFP*) of the maximal speed in congestion, and let *ρ<sup>C</sup>* ∈ [*ρFP*, *ρ*max[ be the density value such that

$$
v\_{\mathcal{C}}(\rho\_{\mathcal{C}'}w\_{\min}) = v\_{\mathcal{C}}^{\max}.$$

Defining the velocity function (see Figure 3) as

$$v\_{\mathcal{S}}(\rho, w) = \begin{cases} v\_{\mathcal{FC}}(\rho, w), & \text{if } 0 < \rho \le \rho\_{\mathcal{FC}}, \\ v\_{\mathcal{FP}}(\rho), & \text{if} \rho\_{\mathcal{FC}} < \rho \le \rho\_{\mathcal{FP}}, \\ \min\{v\_{\mathcal{C}}^{\text{max}}, v\_{\mathcal{C}}(\rho, w)\}, & \text{if} \rho\_{\mathcal{C}} \le \rho \le \rho\_{\text{max}}. \end{cases} \tag{3}$$

the corresponding flux function *qg*(*ρ*, *w*) := *ρvg*(*ρ*, *w*) displays the desired gap between the free-flow and congested phases (see Figure 4).

**Figure 3.** An example of speed function.

**Figure 4.** General (non-concave) fundamental diagram.

#### *3.1. Riemann Solver*

To simplify the construction, it is not restrictive to assume that the fundamental diagram is *ρ*-differentiable, i.e., we assume that

$$\frac{\partial \upsilon\_{FC}}{\partial \rho}(\rho\_{FC'} \, w) = \upsilon\_{FP}'(\rho\_{FC}) \quad \text{for all } w \in [w\_{\text{min}}, w\_{\text{max}}]$$

and

$$\frac{\partial \upsilon\_C}{\partial \rho}(\rho\_{FP}, w) = \upsilon\_{FP}'(\rho\_{FP}) \quad \text{for all } w \in [w\_{\text{min}}, w\_{\text{max}}].$$

System (1) is defined on the invariant domain

$$\Omega = \{ (\rho, \rho w) \in [0, \rho\_{\text{max}}] \times [0, \rho\_{\text{max}} w\_{\text{max}}] \colon w \in [w\_{\text{min}}, w\_{\text{max}}] \}.$$

We note that, under the above assumptions on the velocity function *v*, (*ρ*, *y*) ∈ Ω if and only if *w* ∈ [*w*min, *w*max] and *v*(*ρ*, *y*/*ρ*) ∈ [0, *v*(0, *w*max)]. The eigenvalues are given by

$$
\lambda\_1(\rho, y/\rho) = v(\rho, y/\rho) + \rho \frac{\partial}{\partial \rho} v(\rho, y/\rho) \quad \text{and} \quad \lambda\_2(\rho, y/\rho) = v(\rho, y/\rho), \tag{4}
$$

so the system is strictly hyperbolic for *ρ* > 0 as long as *∂ v*(*ρ*, *y*/*ρ*)/*∂ρ* = 0. We note that the second characteristic field is linearly degenerate, giving origin to contact discontinuity waves, while the first characteristic field is genuinely non-linear if

$$\frac{\partial^2 q}{\partial \rho^2}(\rho, w) = 2 \frac{\partial v}{\partial \rho}(\rho, w) + \rho \frac{\partial^2 v}{\partial \rho^2}(\rho, w) < 0,\qquad \text{for } \rho \in [0, \rho\_{\text{max}}],\tag{5}$$

holds. Moreover, the Riemann invariants of the systems are given by *w* and *v*. In particular, the iso-values *w* = *const* correspond to waves of the first family (we recall that the system belongs to Temple class, i.e., shock and rarefaction curves coincide) and the contact discontinuities verify *v* = *const*. More precisely, in the strictly concave case (5) the elementary waves are constructed as follows.

• **1-rarefaction waves.** Two points (*ρl*, *ρlwl*) and (*ρr*, *ρrwr*) are connected by a 1-rarefaction wave if and only if

$$w\_l = w\_r \qquad \text{and} \qquad \lambda\_1(\rho\_{l\prime} w\_l) < \lambda\_1(\rho\_{r\prime} w\_r).$$

• **1-shock waves.** Two points (*ρl*, *ρlwl*) and (*ρr*, *ρrwr*) are connected by a 1-shock wave if and only if

$$w\_l = w\_r \qquad \text{and} \qquad \lambda\_1(\rho\_l, w\_l) > \lambda\_1(\rho\_r, w\_r).$$

In this case, the jump discontinuity moves with speed

$$
\sigma = \frac{\rho\_l v(\rho\_{l\prime} w\_l) - \rho\_m v(\rho\_{m\prime} w\_m)}{\rho\_l - \rho\_m}.
$$

• **2-contact discontinuity.** Two points (*ρl*, *ρlwl*) and (*ρr*, *ρrwr*) are connected by a 2 contact wave if and only if

$$
v(\rho\_{l\prime}w\_l) = v(\rho\_{r\prime}w\_r).
$$

In the general (non-concave) case (see Figure 4), the 1-waves consist of a concatenation of shocks and rarefactions (see ([24] Section 1)).

Based on the above elementary waves, the solution corresponding to general Riemann data (*ρl*, *ρlwl*), (*ρr*, *ρrwr*) can be constructed as follows. Let (*ρm*, *ρmwm*) be the intermediate point defined by

$$w\_{\mathfrak{m}} = w\_{l\mathfrak{m}} \qquad \upsilon(\rho\_{\mathfrak{m}\prime} w\_{\mathfrak{m}}) = \upsilon(\rho\_{r\prime} w\_{r}) .$$

Setting *vwl* (*ρ*) = *v*(*ρ*, *wl*), *ρ<sup>m</sup>* is given by

$$\rho\_{\mathfrak{M}} = \begin{cases} \upsilon\_{w\_l}^{-1}(\upsilon(\rho\_r, w\_r)), & \text{if } \upsilon(\rho\_r, w\_r) < \upsilon(0, w\_l), \\ 0, & \text{otherwise.} \end{cases}$$

In the latter case, a vacuum zone appears between the sector

$$v(0, w\_l) \, t < \infty < v(\rho\_{r\_r} w\_r) \, t.$$

The complete solution is then given by a 1-wave connecting (*ρl*, *ρlwl*) and (*ρm*, *ρmwm*), followed by a 2-contact discontinuity between (*ρ*˜*m*, *ρ*˜*mwm*) and (*ρr*, *ρrwr*) (eventually separated by a vacuum zone if *v*(*ρr*, *wr*) > *v*(*ρm*, *wl*) and *ρ<sup>m</sup>* = 0).

The presence of the gap between *v*max *<sup>C</sup>* and *<sup>v</sup>*min *FP* does not modify the procedure, since the definition domain

$$\Omega\_{\mathbb{S}} = \left\{ (\rho, \rho w) \in \Omega \colon v(\rho, w) \in [0, v\_{\mathbb{C}}^{\max}] \cup [v\_{FP}^{\min}, v(0, w\_{\max})] \right\}$$

is still invariant. We set Ω*<sup>g</sup>* = Ω*FP* ∪ Ω*<sup>C</sup>* with

$$\begin{aligned} \Omega\_{FP} &= \left\{ (\rho, \rho w) \in \Omega \colon v(\rho, w) \in \left[v\_{FP}^{\min}, v(0, w\_{\max})\right] \right\}, \\ \Omega\_{\mathbb{C}} &= \left\{ (\rho, \rho w) \in \Omega \colon v(\rho, w) \in \left[0, v\_{\mathbb{C}}^{\max}\right] \right\}. \end{aligned}$$

We can distinguish the following cases:


$$w\_c = w\_{l\prime} \qquad v(\rho\_{c\prime} w\_c) = v\_C^{\max}.$$

The solution is composed by 1-waves connecting (*ρl*, *ρlwl*) and (*ρc*, *ρcwl*), a phasetransition jump between (*ρc*, *ρcwl*) and (*ρFP*, *ρFPwl*) moving with speed

$$\sigma = \frac{\rho\_{\mathcal{C}} v\_{\mathbb{C}}^{\text{max}} - \rho\_{FP} v\_{FP}^{\text{min}}}{\rho\_{\mathcal{C}} - \rho\_{FP}} \rho$$

followed by 1-waves connecting (*ρFP*, *ρFPwl*) and (*ρm*, *ρmwm*) and eventually a 2 contact from (*ρm*, *ρmwm*) to (*ρr*, *ρrwr*).

• If (*ρl*, *ρlwl*) ∈ Ω*FP* and (*ρr*, *ρrwr*) ∈ Ω*C*, the intermediate point (*ρm*, *ρmwm*) belongs to Ω*C*. Therefore, the solution always contains a 1-wave (shock phase-transition) from (*ρl*, *ρlwl*) to (*ρm*, *ρmwm*), followed by a 2-contact discontinuity. Notice that the solution may also contain an intermediate 1-wave in the congested phase.

#### **4. Numerical Scheme and Simulations**

#### *4.1. Numerical Scheme*

For simplicity, let us rewrite problem (1) in compact form:

$$
\partial\_t \mathbf{u} + \partial\_x \mathbf{f}(\mathbf{u}) = 0 \qquad \mathbf{u} \in \Omega\_{FC} \cup \Omega\_{FP} \cup \Omega\_C \tag{6}
$$

where **u** = (*ρ*, *y*) and

$$\mathbf{f}(\mathbf{u}) = \begin{cases} (\rho v\_{\mathrm{FC}}, \mathcal{y}v\_{\mathrm{FC}}), & \mathrm{if} 0 < \rho \le \rho\_{\mathrm{FC}}, \\ (\rho v\_{\mathrm{FP}}, \mathcal{y}v\_{\mathrm{FP}}), & \mathrm{if} \rho\_{\mathrm{FP}} < \rho \le \rho\_{\mathrm{FP}}, \\ (\rho v\_{\mathrm{C}}, \mathcal{y}v\_{\mathrm{C}}), & \mathrm{if} \rho\_{\mathrm{C}} < \rho \le \rho\_{\mathrm{max}}. \end{cases}$$

Let us fix constant space step Δ*x* and time step Δ*t* and *ν* = Δ*t*/Δ*x*. Let us define the mesh interfaces *xj*<sup>+</sup>1/2 <sup>=</sup> *<sup>j</sup>*Δ*<sup>x</sup>* for *<sup>j</sup>* <sup>∈</sup> <sup>Z</sup> and the intermediate times *<sup>t</sup> <sup>n</sup>* <sup>=</sup> *<sup>n</sup>*Δ*<sup>t</sup>* for *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. A piecewise constant approximated solution **u***ν*(*x*, *t <sup>n</sup>*) of **u** is given by

$$\mathbf{u}\_V(\mathbf{x}, t^n) = \mathbf{u}\_j^n \text{ for all } \mathbf{x} \in \mathbb{C}\_j = [\mathbf{x}\_{j - 1/2}; \mathbf{x}\_{j + 1/2}], \ j \in \mathbb{Z}, \ n \in \mathbb{N}.$$

In this paper, we use the numerical scheme introduced in [25]. This scheme is a modified Godunov scheme composed of two steps. The first step looks at the evolution in time of the Cauchy problem, while the second step projects it onto piecewise constant functions:

**Step 1:** Evolution in time.

This step consists in solving the Riemann problem at each cell interface *xj*<sup>+</sup>1/2 with initial

data (**u***<sup>n</sup> <sup>j</sup>* , **<sup>u</sup>***<sup>n</sup> <sup>j</sup>*+1), obtaining an exact solution **u***ν*(*x*, *t <sup>n</sup>*+1−). **Step 2:** Projection to time *t n*+1

Once all Riemann problems at interfaces are solved, Chalons and Goatin [25] proposed a new averaging procedure. The idea is that, since the solution can contain states in different phases, the average is not done on the regular mesh cells but on modified non-uniform cells that contain only values belonging to the same phase. We denote this modified cells by <sup>C</sup>*<sup>n</sup> <sup>j</sup>* = [*x<sup>n</sup> <sup>j</sup>*−1/2, *<sup>x</sup><sup>n</sup> <sup>j</sup>*+1/2[. Afterwards, a sampling strategy allows us to recover a piecewise constant solution on the initial mesh cells *Cj*.

We define the new interface *x<sup>n</sup> <sup>j</sup>*+1/2 at time as *t n*+1

$$\overline{\mathbf{x}}\_{j+1/2}^{\mathbf{u}} = \mathbf{x}\_{j+1/2} + \sigma\_{j+1/2}^{\mathbf{u}} \Delta t, \quad j \in \mathbb{Z} \tag{7}$$

and the new space intervals

$$
\overline{\Delta \mathbf{x}\_j^n} = \overline{\mathbf{x}}\_{j+1/2}^n - \overline{\mathbf{x}}\_{j-1/2}^n \quad j \in \mathbb{Z}\_\prime
$$

with *σ<sup>n</sup> <sup>j</sup>*+1/2 = *<sup>σ</sup>*(**u***<sup>n</sup> <sup>j</sup>* , **<sup>u</sup>***<sup>n</sup> <sup>j</sup>*+1)*j*∈<sup>Z</sup> the characteristic speeds of propagation of phase transitions at interfaces. Then, we average the solution of Step 1 on the cells <sup>C</sup>*<sup>n</sup> <sup>j</sup>* , obtaining a piecewise constant approximate solution **u***n*+<sup>1</sup> *<sup>j</sup>* on a non-uniform mesh with

$$
\overline{\mathbf{u}}\_{j}^{n+1} = \frac{1}{\overline{\Delta x}\_{j}^{\mathbb{F}}} \int\_{\overline{\mathbf{x}}\_{j-1/2}^{\mathbf{u}}}^{\overline{\mathbf{x}}\_{j+1/2}^{\mathbf{u}}} \overline{\mathbf{u}}\_{\mathbf{v}}(x, t^{n+1} -) \, dt, \qquad j \in \mathbb{Z}.
$$

The modified Godunov scheme then reads:

$$\mathbf{u}\_{\dot{j}}^{n+1} = \frac{\Delta x}{\overline{\Delta x}\_{\dot{j}}^{n}} \mathbf{u}\_{\dot{j}}^{n} - \frac{\Delta t}{\overline{\Delta x}\_{\dot{j}}^{n}} (\mathbf{f}^{n,-}(\mathbf{u}\_{\dot{j}}^{n}, \mathbf{u}\_{\dot{j}+1}^{n}) - \mathbf{f}^{n,+}(\mathbf{u}\_{\dot{j}}^{n}, \mathbf{u}\_{\dot{j}-1}^{n})) \text{ for all } \boldsymbol{j} \in \mathbb{Z},\tag{8}$$

with

$$\tilde{\mathbf{f}}^{\boldsymbol{n},\boldsymbol{\pm}}(\mathbf{u}\_{\boldsymbol{j}}^{\boldsymbol{n}},\mathbf{u}\_{\boldsymbol{j}+1}^{\boldsymbol{n}}) \quad = \quad \mathbf{f}(\mathcal{R}\mathcal{S}(\boldsymbol{v}\_{\boldsymbol{j}+1/2}^{\boldsymbol{n},\boldsymbol{\pm}};\mathbf{u}\_{\boldsymbol{j}}^{\boldsymbol{n}},\mathbf{u}\_{\boldsymbol{j}+1}^{\boldsymbol{n}})) - \boldsymbol{v}\_{\boldsymbol{j}+1/2}^{\boldsymbol{n}}\mathcal{R}\mathcal{S}(\boldsymbol{v}\_{\boldsymbol{j}+1/2}^{\boldsymbol{n},\boldsymbol{\pm}};\mathbf{u}\_{\boldsymbol{j}}^{\boldsymbol{n}},\mathbf{u}\_{\boldsymbol{j}+1}^{\boldsymbol{n}}), \tag{9}$$

and RS the solution to the Riemann problem as given in Section 3.1.

We then project the solution onto the original mesh *Cj* using a well distributed random sequence (*an*) ∈ ]0, 1[ as follows:

$$\mathbf{u}\_{j}^{n+1} = \begin{cases} \overline{\mathbf{u}}\_{j-1}^{n+1} & \text{if } \qquad a\_{n+1} \in \left[0, \frac{\Delta t}{\Delta x} \max\{\sigma\_{j-1/2}^{n}, 0\}\right], \\ \overline{\mathbf{u}}\_{j}^{n+1} & \text{if } \quad a\_{n+1} \in \left[\frac{\Delta t}{\Delta x} \max\{\sigma\_{j-1/2}^{n}, 0\}, 1 + \frac{\Delta t}{\Delta x} \min\{\sigma\_{j+1/2}^{n}, 0\}\right], \\ \overline{\mathbf{u}}\_{j+1}^{n+1} & \text{if } \qquad a\_{n+1} \in \left[1 + \frac{\Delta t}{\Delta x} \min\{\sigma\_{j+1/2}^{n}, 0\}, 1\right]. \end{cases} \tag{10}$$

Following Chalons and Goatin [25], we consider the van der Corput random sequence defined by

$$a\_{\mathfrak{H}} = \sum\_{k=0}^{m} i\_k \mathfrak{2}^{-(k+1)}\mathfrak{.}$$

where *n* = ∑*<sup>m</sup> <sup>k</sup>*=<sup>0</sup> *ik*2*k*, *ik* <sup>=</sup> 0, 1, is the binary expansion of *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>.

#### *4.2. Numerical Simulations*

We compared our model (1) to the model in [25] in a simulation on a single road on length 1 with *x* ∈ [−0.5, 0.5] with an initial data

$$\rho(\mathbf{x},0) = \begin{cases} \begin{array}{ll} 0.1 \\ 0.6 \end{array} & \text{if} \qquad \mathbf{x} \le \mathbf{0} \\ \end{array} \tag{11}$$

We point out that our model is profoundly different from the phase transition models, even with gap between phases. The main reason is that the second order model (1) admits phase transitions, i.e., shock waves connecting phases, but as classical first family waves, and allows different speeds also in free choice for different values of the variable *w*. This is well captured by the simulation in Figure 5. An initial condition with one backward moving shock is perturbed by a boundary datum presenting oscillations in the *w* variable. As a result, for large times, small oscillations are visible on the left (free flow) and large oscillations are propagated through the shock (congested flow). On the other side, the phase transition model of Chalons and Goatin [25] is insensitive to such oscillations in the *w* variable, as shown in Figure 6.

**Figure 5.** Evolution of the density for our model on a single road.

**Figure 6.** Evolution of the density for a classical phase transition model on a single road.

#### **5. Conclusions**

We analyzed three different datasets collected from different locations in Europe and the US from fixed sensors. Representing data via a three-dimensional fundamental diagram, we showed the presence of three traffic phases, two in free flow regime and one in congested flow regime, and of a statistically significant gap between free and congested flow. Based on these results, we designed a new second-order macroscopic model that is capable describing analytically the gap and the three different phases. Moreover, a characterization of Riemann problem solutions and a numerical example are provided, illustrating the difference with phase transition models.

**Author Contributions:** Software, M.L.D.M. and K.C.; Validation, M.L.D.M.; Visualization, M.L.D.M. and K.H.; Data curation, Y.C. and K.H.; Supervision, Y.C., P.G. and B.P.; Formal analysis, P.G. and B.P.; Methodology, P.G., J.-m.Q. and B.P.; Investigation, J.-m.Q. All authors have read and agreed to the published version of the manuscript.

**Funding:** Supported by the French National Research Agency under the "Investissements d'avenir" program (ANR-15-IDEX-02). Supported in part by the National Institute of Health grants 1R01LM0126 07 and 1R01AI130460. Supported by the NSF project Grant CNS No. 1446715, the NSF project KI-Net Grant DMS No. 1107444 and the Lopez chair endowment.

**Acknowledgments:** M.L.D.M. acknowledges that this article was developed in the framework of the Grenoble Alpes Data Institute, supported by the French National Research Agency under the "Investissements d'avenir" program (ANR-15-IDEX-02). Y.C.'s research is supported in part by the National Institute of Health grants 1R01LM012607 and 1R01AI130460. B.P. acknowledges the support of the NSF project Grant CNS No. 1446715, the NSF project KI-Net Grant DMS No. 1107444 and the Lopez chair endowment. The authors thank ATAC S.p.a. for providing the traffic data from the city of Rome and the Départment des Alpes Maritimes for providing data from Sophia Antipolis.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Data Description**

Our dataset sources are:


We use the information collected in Rome as the primary example to illustrate the data structure. The Rome dataset contains data of each minute of the entire day for one week; thus, 10,080 observations were collected. Since our primary dataset from Rome consists of one week of observations, we only analyzed data for one week in the other locations as well. The datasets from Las Vegas and Sophia Antipolis were used to validate the results from the Rome data.

Figure A1 illustrates the pairwise plots among the three measured variables based on the dynamic data collected from a sensor located in the road Viale del Muro Torto in the city of Rome on a Monday. These plots can provide useful insight on the functional relationship between these variables in two-dimensional space. For instance, the plot of flux against occupancy suggests a linear relationship with small variation when occupancy is less than a threshold (known as the **free phase**) and much larger variation when occupancy is larger than the threshold (known as the **congestion phase**). Furthermore, *both flux vs. speed and flux vs. occupancy plots suggest a possible "gap" between free and congestion phases, which corresponds to phase transition*. *These are important features that need to be taken into consideration in the mathematical modeling.* Such pairwise plots are useful to generate datadriven hypotheses that need to be formally tested statistically and validated across different datasets.

**Figure A1.** Pairwise scatterplots of Rome Data in the road Viale del Muro Torto: flux vs. occupancy (**left**); flux vs. speed (**middle**); and occupancy vs. speed (**right**).

#### **Appendix B. Results of Cluster Analysis**

We conducted model-based cluster analyses on three datasets. We found statistically significant gaps between free and congested phases in all three datasets. We also used regression analysis to demonstrate the existence of free choice phase in explaining the variability in observed flux.

Table A1 presents the results from the described gap analysis of the Rome, Nevada and Sophia datasets. After "trimming" a very small percent of data (e.g., 3%) and considering (97%, 3%) quantiles of free and congestion phases, the test statistics suggested strong evidence of a gap (indicating phase transition) in the three datasets.


**Table A1.** Results of model-based cluster analysis using Rome, Las Vegas and Sophia Antipolis data. Phase: estimated phase using cluster analysis. FP, free phase; C, congestion phase. Density, estimated density value at the percentile.

Graphical representations of the results are illustrated below. To clarify the color code for the graphs, the region colored in red corresponds to the Free Choice phase, blue to the Free Flow phase and green to Congestion. The Free Flow phase in this three cluster model corresponds to the remainder of the original conception of Free Flow without Free Choice. The Free Choice and Free Flow phases from the three cluster model are collectively referred to as the Free Phase from here on.

Figure A2 illustrates the clustering performed by 'mclust' of the Rome data on a twodimensional level. Pairs plots present the data according to each pair of variables: velocity and flux, occupancy and flux and velocity and occupancy. This type of plot provides insight on the shape and characteristics of the data in 2D. For instance, we observed some sort of gap between Free Phase and Congestion, which can be most easily viewed in the pairs plot of the variables occupancy and flux. Figure 1 is the three-dimensional representation of the Rome data, a novel approach to visualizing traffic data. Through this 3D plot of data, the proposed gap between Free Phase and Congestion is even more noticeable, reinforcing our observations from the 2D case. Specifying 'mclust' to filter through the data for two or three clusters indicated there is some margin of difference between the original Free Flow phase in the two cluster model and the Free Phase in the three cluster model; the latter model is generally neater than the former. The disparity is minimal and perhaps insignificant, although it is worth noting.

**Figure A2.** Pairs plot of clustered Rome data.

Cluster analyses using the Las Vegas and Sophia Antipolis data corroborate the results obtained with the Rome data. Las Vegas data collected from highway sensors 25 and 99, which we refer to as Nevada 25 and Nevada 99, respectively, clusters into nine phases through 'mclust'. However, we forced R to choose only two and three clusters to match the original two-phase model of traffic flow proposed by the field and the three-phase model discovered in this study. The results are reported in Figures A3 and A4. The data and analyses suggest that the three-cluster model was preferred using Bayesian information criterion (BIC) for model selection [18].

**Figure A3.** Pairs plot of clustered Las Vegas data.

**Figure A4.** 3D plot of clustered Las Vegas data.

The data from Sophia Antipolis also demonstrate the existence of the Free Choice phase. These two datasets both have a wide range of observed speeds at low levels of occupancy, as shown in the pairs plots and 3D plots in Figures A5 and A6.

**Figure A5.** Pairs plot of clustered Sophia Antipolis data.

**Figure A6.** 3D plot of clustered Sophia Antipolis data.

#### *Quantifying the Improved Goodness of Fit Through RSS Comparisons*

We conducted further analyses by using residual sum of squares (RSS) values to compare various models considered in this paper. The RSS value was calculated as the sum of differences between the observed and fitted values of flux, where the fitted values of flux were obtained from two or three cluster models with or without speed as an additional predictor in addition to the occupancy. The RSS is an objective measure of the remaining variability of the flux that has not been explained by a particular model.

With Rome data, we considered a baseline model with two-phase and occupancy as the only predictor. In other words, this is a 2D and two-phase model as in the existing literature. We calculated the RSS value for this model and compared with the RSS values from more complex models. Specifically, we found that adding speed into the model (i.e., a 3D and two-phase model) explained additionally 1.2% of the variability in observed flux, on top of the baseline 2D and two-phase model. Furthermore, adding the Free Choice phase alone (i.e., a 2D and three-phase model) reduced 13.6% of the remaining variability. Adding both speed and the free choice phase to the model (i.e., a 3D and three-phase model) further reduced 16.0% of the variability in observed flux, compared to the baseline model.

The results of these comparisons suggest that the three-cluster model is indeed superior to the two-cluster model and that 3D rendering of the data is appropriate. The percent change for adjustment of number of clusters as well as dimensions both increases indicates a clear improvement; the three-cluster model is better than the two-cluster model, 3D analysis of data is more informative than that in 2D and the 3D three-cluster model provides the more favorable RSS value overall. These results are consistent across our datasets (see Figures A7–A10 and Tables A2–A6). These results have important implications in understanding traffic flow. They confirm the utility of analyzing this type of data in three dimensions and reveal the presence of a third phase.

**Table A2.** RSS analysis with two clusters with flux and occupancy. *β*<sup>0</sup> is the value of the intercept and *β*<sup>1</sup> the occupancy.


**Figure A7.** Flux vs. occupancy: RSS analysis.

**Table A3.** RSS analysis with two clusters with flux, occupancy and speed. *β*<sup>0</sup> is the value of the intercept, *β*<sup>1</sup> the occupancy, and *β*<sup>2</sup> the speed.


**Figure A8.** Flux vs. occupancy + speed: RSS analysis.

**Table A4.** RSS analysis with three clusters with flux and occupancy. *β*<sup>0</sup> is the value of the intercept and *β*<sup>1</sup> the occupancy.


**Figure A9.** Flux vs. occupancy: RSS analysis.

**Table A5.** Residual sum squared analysis with three clusters with flux, occupancy and speed. *β*<sup>0</sup> is the value of the intercept and *β*<sup>1</sup> the occupancy, *β*<sup>2</sup> the speed.


**Figure A10.** Flux vs. occupancy + speed: RSS analysis.



#### **References**


## *Article* **Input-to-State Stability of a Scalar Conservation Law with Nonlocal Velocity**

**Simone Göttlich 1,\*, Michael Herty <sup>2</sup> and Gediyon Weldegiyorgis <sup>3</sup>**


**Abstract:** In this paper, we study input-to-state stability (ISS) of an equilibrium for a scalar conservation law with nonlocal velocity and measurement error arising in a highly re-entrant manufacturing system. By using a suitable Lyapunov function, we prove sufficient and necessary conditions on ISS. We propose a numerical discretization of the scalar conservation law with nonlocal velocity and measurement error. A suitable discrete Lyapunov function is analyzed to provide ISS of a discrete equilibrium for the proposed numerical approximation. Finally, we show computational results to validate the theoretical findings.

**Keywords:** conservation laws; feedback stabilization; input-to-state stability; numerical approximations; nonlocal velocity

**MSC:** 35L65; 93D15; 65N08

**Citation:** Göttlich, S.; Herty, M.; Weldegiyorgis, G. Input-to-State Stability of a Scalar Conservation Law with Nonlocal Velocity. *Axioms* **2021**, *10*, 12. https://doi.org/10.3390/ axioms10010012

Received: 6 November 2020 Accepted: 14 January 2021 Published: 21 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

The nature of modern high-volume production is characterized by a large number of items passing through many production steps. This type of production system has fluidlike properties and has been modeled successfully by continuum models [1–5]. In these models, the product at different production stages and the speed of production are the quantities of interest.

Specifically, in the manufacturing system of a factory that involves a highly re-entrant system where products visit machines multiple times, such as the production of semiconductor devices, a continuum model has been introduced in [3] that is inspired by the Lighthill–Whitham traffic model [6]. The dynamics of this model is mathematically given by hyperbolic partial differential equation of the form

$$
\partial\_t \rho(t, \mathbf{x}) + \lambda(W(t)) \partial\_\mathbf{x} \rho(t, \mathbf{x}) = 0, \quad t \in [0, +\infty), \; \mathbf{x} \in [0, 1], \tag{1}
$$

where *ρ*(*t*, *x*) is the product density which describes the total mass *W*(*t*) at the time *t* and the production stage *x*,

$$\mathcal{W}(t) = \int\_0^1 \rho(t, \mathbf{x}) d\mathbf{x}, \quad t \in (0, +\infty). \tag{2}$$

Contrary to classical traffic flow models, the differential equation depends on the **nonlocal** quantity (2). The function *λ*(*W*(*t*)) is a velocity. In production systems, it is natural to assume that the velocity function is positive and decreasing as the total mass is increasing. In the manufacturing system, the initial density of products at production stage *x* is taken as the initial data

$$
\rho(0, x) = \rho\_0(x), \quad x \in [0, 1], \tag{3}
$$

and the influx is used to control the system or stabilize the system at an equilibrium. Since the velocity is positive, we only require boundary conditions at *x* = 0, i.e., the influx

$$
\rho(t,0)\lambda(\mathcal{W}(t)) = \mathcal{U}(t), \quad t \in [0, +\infty). \tag{4}
$$

Under suitable assumptions on *λ*, *ρ*<sup>0</sup> and *U*, the existence and uniqueness of a classical solution of the Cauchy problem for the scalar conservation law Equation (1) with Equations (3) and (4) is proven in [7–10].

General stabilization problems with boundary controls have been studied in the past years in [11–22] for hyperbolic systems and recently in [7,10] for scalar conservation laws with nonlocal velocity. The focus is to derive an asymptotic stability around a given equilibrium such that solutions to the conservation laws reach the equilibrium state as time tends to infinity. Such a property is attained by an exponential stability result and presented for example in ([21], Theorem 2.3) for quasi-linear hyperbolic systems. Further references also on hyperbolic balance laws other hyperbolic systems may be found in the recent book [15].

However, when boundary controls are subjected to unknown disturbances, solutions reaching the given equilibrium point are influenced by the disturbances and a notion of asymptotic stability is required. The concept of input-to-state stability (ISS) [11,20,23] has been used to describe asymptotic stability. Concerning an asymptotic behavior of classical solutions, the Lyapunov method is used to investigate sufficient conditions to achieve an exponential stability in [16,17] for hyperbolic systems and in [7,10] for scalar conservation laws with nonlocal velocity. The Lyapunov method is also used for ISS of (local) hyperbolic systems in [11,20]. For the numerical analysis of asymptotic behavior of numerical solutions discretized by a first-order finite volume scheme, a discrete Lyapunov function is used to prove exponential stability results for hyperbolic systems in [24–28] and for scalar conservation laws with nonlocal velocity in [10], and ISS results for (local) hyperbolic systems could be established recently in [29,30]. Please note that the previous given references refer to ISS for hyperbolic systems. However, the theory of ISS has also been developed for other systems as for example, linear systems, time-delay equations or parabolic differential equations. A detailed review of those results is beyond the scope of this presentation and we refer the interested reader to the recent review article [31] for additional references and a review of the state-of-the-art in this field.

The previously given references refer all to ISS theory for hyperbolic problems. However, it is worth mentioning that there exists a huge amount of literature on ISS stability for problems related to other differential equations. We can not review those at this point but would like to point to some references on ISS theory for infinite-dimensional problems [32,33] and for linear [34], semi-linear [35] and nonlinear [36] parabolic system with boundary inputs. A systematic treatment of ISS using (linear) operator theory has been presented for example in [37] and non-coercive Lyapunov theory for ISS in [38,39].

Our focus in this work is hyperbolic problems. In connection with (hyperbolic) scalar conservation law with nonlocal velocity, in [10], the authors have studied global feedback stabilization of the closed-loop system in Equation (1) under the feedback law

$$\mathcal{U}(t) - \rho^\* \lambda(\rho^\*) = k \left( \rho(t, 1) \lambda(\mathcal{W}(t)) - \rho^\* \lambda(\rho^\*) \right), \ t \in (0, +\infty), \tag{5}$$

where *<sup>k</sup>* <sup>∈</sup> [0, 1) is the feedback parameter and *<sup>ρ</sup>*<sup>∗</sup> <sup>∈</sup> <sup>R</sup> is a given equilibrium. They generalize the stabilization results of [7] by using a Lyapunov function. In particular, for a given equilibrium *<sup>ρ</sup>*<sup>∗</sup> <sup>=</sup> 0 and a general velocity function *<sup>λ</sup>* <sup>∈</sup> *<sup>C</sup>*1([0, <sup>+</sup>∞); [0, <sup>+</sup>∞)), the global stabilization result in *L*<sup>2</sup> for the closed-loop system of Equations (1), (3) and (5) is generalized to *<sup>L</sup><sup>p</sup>* (*<sup>p</sup>* <sup>≥</sup> <sup>1</sup>). Then, the global stabilization result in *<sup>L</sup>*<sup>2</sup> for the closed-loop system of Equations (1), (3) and (5) with a family of velocity functions

$$
\lambda(s) = \frac{A}{B + s'}, \quad s \in [0, +\infty) \quad \text{with} \quad A > 0, \ B > 0,\tag{6}
$$

is obtained for a given equilibrium *ρ*∗ > 0. By using a discrete Lyapunov function, they also established stabilization results for a discrete scalar conservation law with nonlocal velocity and using a first-order finite volume scheme.

In this paper, we study ISS for the closed-loop system of Equations (1) and (3) under the feedback law defined by

$$\mathcal{U}(t) - \rho^\* \lambda(\rho^\*) = k \left( (\rho(t, 1) + d(t)) \lambda(\mathcal{W}(t)) - \rho^\* \lambda(\rho^\*) \right), \ t \in (0, +\infty), \tag{7}$$

where *<sup>d</sup>*(*t*) <sup>∈</sup> <sup>R</sup> is a bounded perturbation in the measurement. In particular, we use an ISS-Lyapunov function to investigate sufficient and necessary conditions for ISS in *L*<sup>2</sup> for an equilibrium *ρ*<sup>∗</sup> ≥ 0 and the velocity function defined by Equation (6). The numerical analysis of sufficient and necessary conditions for ISS is performed by using a discrete ISS-Lyapunov function for numerical solution obtained by a first-order finite volume scheme. Moreover, we provide numerical simulations to illustrate theoretical results for some velocity functions of type Equation (6).

The paper is organized as follows: In Section 2, we present stabilization results of ISS for a scalar conservation law with nonlocal velocity and measurement error. The numerical discretization of stabilization results of ISS for the scalar conservation law with nonlocal velocity and measurement error is presented in Section 3. Finally, in Section 4, we show numerical simulations for the scalar conservation law with nonlocal velocity and measurement error to illustrate the theoretical results.

#### **2. Asymptotic Stability of a Scalar Conservation Law with Nonlocal Velocity and Measurement Error**

We study ISS of a closed-loop system of scalar conservation laws with nonlocal velocity and measurement error of the form:

$$\begin{cases} \partial\_t \rho(t, \mathbf{x}) + \lambda(\mathcal{W}(t)) \partial\_\mathbf{x} \rho(t, \mathbf{x}) = 0, \quad t \in (0, +\infty), \; \mathbf{x} \in (0, 1), \\ \rho(0, \mathbf{x}) = \rho\_0(\mathbf{x}), \quad \mathbf{x} \in (0, 1), \\ \mathcal{U}(t) - \rho^\* \lambda(\rho^\*) = k((\rho(t, 1) + d(t))\lambda(\mathcal{W}(t)) - \rho^\* \lambda(\rho^\*)), \quad t \in (0, +\infty), \\ \rho(t, 0)\lambda(\mathcal{W}(t)) = \mathcal{U}(t), \quad t \in [0, +\infty), \\ \mathcal{W}(t) = \int\_0^1 \rho(t, \mathbf{x}) d\mathbf{x}, \quad t \in (0, +\infty), \end{cases} \tag{8}$$

where *<sup>ρ</sup>*(*t*, *<sup>x</sup>*) is the product density, *<sup>λ</sup>*(·) <sup>∈</sup> *<sup>C</sup>*1([0, <sup>+</sup>∞),(0, <sup>+</sup>∞)) is the velocity function, *W*(*t*) is total mass, *U*(*t*) is the controller and *k* ∈ [0, 1) is a non-negative feedback parameter, *<sup>ρ</sup>*<sup>∗</sup> <sup>≥</sup> 0 is an equilibrium solution and *<sup>d</sup>*(*t*) <sup>∈</sup> <sup>R</sup> is a bounded (known) perturbation in the measurement. A weak solution of the closed-loop system in Equation (8) is defined below.

**Definition 1** (Weak solution)**.** *Fix <sup>T</sup>* <sup>&</sup>gt; 0. *A function <sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, *<sup>T</sup>*]; *<sup>L</sup>*1(0, 1)) *is called a weak solution to Equation (8) if for every s* <sup>∈</sup> (0, *<sup>T</sup>*] *and every <sup>ϕ</sup>* <sup>∈</sup> *<sup>C</sup>*1([0,*s*] <sup>×</sup> [0, 1]) *satisfying*

$$
\varphi(\mathbf{s}, \mathbf{x}) = 0,\ \forall \mathbf{x} \in [0, 1] \quad \text{and} \quad \varphi(t, 1) = \kappa \varphi(t, 0), \ \forall t \in [0, s],
$$

*the following equation holds:*

$$\begin{aligned} &\int\_0^s \int\_0^1 \rho(t, \mathbf{x}) (\partial\_t \varrho(t, \mathbf{x}) + \lambda(\mathcal{W}(t)) \partial\_{\mathbf{x}} \varrho(t, \mathbf{x})) dx dt \\ &+ \int\_0^s ((1 - k)\rho^\* \lambda(\rho^\*) + d(t)) \varrho(t, 0) dt + \int\_0^1 \rho(0, \mathbf{x}) \varrho(0, \mathbf{x}) dx = 0. \end{aligned}$$

Let *d* ≡ 0, *ρ*<sup>∗</sup> ≥ 0, *p* ∈ [1, +∞) and *k* ∈ [0, 1] be given. Then, the existence and uniqueness of the non-negative weak solution *<sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>+</sup>∞); *<sup>L</sup>p*(0, 1)) and the non-negative classical solution *<sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*1([0, <sup>+</sup>∞) <sup>×</sup> [0, 1]) of the closed-loop system in Equation (8) are available in [7,10].

We now analyze ISS for the system Equation (8) with *ρ*<sup>∗</sup> ≥ 0 in the sense of the following definitions. This is also known as global ISS. Note that ISS Lyapunov functions can be defined within a very general setting and we refer to ([31], Definition 2.11) for such a definition. In Definition (3) below, we introduce ISS-Lyapunov functions tailored to system Equation (8).

**Definition 2** (Input-to-state stability (ISS))**.** *Let D* > 0. *An equilibrium ρ*<sup>∗</sup> ≥ 0 *of the closedloop system in Equation (8) is exponential ISS in L*2*-norm with respect to any disturbance function <sup>d</sup>*(·) <sup>∈</sup> *<sup>L</sup>*∞(0, <sup>∞</sup>) *such that dL*∞(0,∞) <sup>≤</sup> *<sup>D</sup> if there exist positive constants <sup>γ</sup>*1, *<sup>γ</sup>*2, *<sup>γ</sup>*<sup>3</sup> *independent of <sup>d</sup> such that, for every initial condition <sup>ρ</sup>*0(*x*) <sup>∈</sup> *<sup>L</sup>*2(0, 1)*, the <sup>L</sup>*2*-solution to the closed-loop system in Equation (8) satisfies*

$$\|\rho(t, \cdot) - \rho^\*\|\_{L^2} \le \gamma 2e^{-\gamma\_1 t} \|\rho\_0 - \rho^\*\|\_{L^2} + \gamma \rho \|d(s)\|\_{L^\infty(0, t)}, \ t \in [0, +\infty). \tag{9}$$

Hence, the equilibrium *ρ*<sup>∗</sup> is ISS with respect to disturbances *d* ∈ D := {*d*(·) ∈ *<sup>L</sup>*∞(0, <sup>∞</sup>) : *dL*<sup>∞</sup> <sup>≤</sup> *<sup>D</sup>*}.

**Definition 3** (ISS-Lyapunov function)**.** *The function* **<sup>L</sup>** : *<sup>L</sup>*2(0, 1) <sup>→</sup> <sup>R</sup><sup>+</sup> *is said to be an ISS-Lyapunov function for the closed-loop system in Equation (8) if*

*(i) there exist positive constants <sup>α</sup>*<sup>1</sup> <sup>&</sup>gt; <sup>0</sup> *and <sup>α</sup>*<sup>2</sup> <sup>&</sup>gt; <sup>0</sup> *such that for all solutions <sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>∞</sup>); *<sup>L</sup>*2(0, 1)) *and t* <sup>∈</sup> [0, <sup>+</sup>∞)

$$\alpha\_1 \|\rho(t\_\prime \cdot) - \rho^\*\|\_{L^2}^2 \le \mathbf{L}(\rho(t\_\prime \cdot)) \le \alpha\_2 \|\rho(t\_\prime \cdot) - \rho^\*\|\_{L^2}^2 \tag{10}$$

*(ii) there exist positive constants <sup>η</sup>* <sup>&</sup>gt; <sup>0</sup> *and <sup>ν</sup>* <sup>&</sup>gt; <sup>0</sup> *such that for all solutions <sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>∞</sup>); *<sup>L</sup>*2(0, 1)) *and t* <sup>∈</sup> [0, <sup>+</sup>∞)

$$\frac{d}{dt}\mathbf{L}(\rho(t,\cdot)) \le -\eta \mathbf{L}(\rho(t,\cdot)) + \nu d^2(t).$$

For a notion of differentiability of **L**, we also refer for example to ([31], Section 2.2). To simplify the notation we also introduce the function

$$\mathcal{L}(t) := \mathbf{L}(\rho(t, \cdot)), \tag{11}$$

where *<sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>∞</sup>); *<sup>L</sup>*2(0, 1)) is the solution to Equation (8).

**Theorem 1** (ISS for *<sup>ρ</sup>*<sup>∗</sup> <sup>≥</sup> 0)**.** *Fix any <sup>ρ</sup>*<sup>∗</sup> <sup>≥</sup> <sup>0</sup>*, <sup>k</sup>* <sup>∈</sup> [0, 1)*, <sup>R</sup>* <sup>&</sup>gt; 0, *<sup>D</sup>* <sup>&</sup>gt; <sup>0</sup> *and any <sup>ρ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2(0, 1) *satisfying ρ*<sup>0</sup> ≥ 0 *a.e. in* (0, 1)*. Assume further*

$$\|\rho\_0(\cdot) - \rho^\*\|\_{L^2(0,1)} \le R. \tag{12}$$

*Assume there exists a non-negative almost everywhere weak solution <sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>+</sup>∞); *<sup>L</sup>*2(0, 1)) *to the Cauchy problem in Equation (8) where λ is given by Equation (6).*

*Then, the steady-state ρ*<sup>∗</sup> *of the system in Equation (8) is exponential ISS in L*2*-norm with respect to any disturbance function d* ∈ {*d*(·) <sup>∈</sup> *<sup>L</sup>*∞(0, <sup>∞</sup>) : *dL*<sup>∞</sup> <sup>≤</sup> *<sup>D</sup>*}*.*

Before we begin the proof of Theorem 1, we consider the following transformation at the equilibrium *ρ*∗,

$$\begin{aligned} \tilde{\rho}(t, \mathbf{x}) &:= \rho(t, \mathbf{x}) - \rho^\*, \quad \mathcal{W}(t) := \mathcal{W}(t) - \rho^\*, \quad \tilde{\rho}\_0(\mathbf{x}) := \rho\_0(\mathbf{x}) - \rho^\*, \\ \tilde{\lambda}\_{\tilde{W}}(t) &:= \lambda(\rho^\* + \tilde{\mathcal{W}}(t)), \quad \tilde{\mathcal{U}}(t) := \lambda\_{\tilde{W}}(t)\tilde{\rho}(t, 0). \end{aligned}$$

Then, the system in Equation (8) with Equation (6) can be rewritten as follows for *t* ∈ (0, +∞):

$$\begin{cases} \partial\_t \tilde{p}(t, \mathbf{x}) + \bar{\lambda}\_{\tilde{W}}(t) \partial\_x \tilde{p}(t, \mathbf{x}) = 0, \; \mathbf{x} \in (0, 1), \\ \tilde{\rho}(0, \mathbf{x}) = \tilde{\rho}o(\mathbf{x}), \; \mathbf{x} \in (0, 1), \\ \tilde{\mathcal{U}}(t) = k \bar{\lambda}\_{\tilde{W}}(t) (\tilde{\rho}(t, 1) + d(t)) + (1 - k) \rho^\* \left(\lambda(\rho^\*) - \bar{\lambda}\_{\tilde{W}}(t)\right), \\ \bar{\lambda}\_{\tilde{W}}(t) := \lambda \left(\rho^\* + \tilde{\mathcal{W}}(t)\right), \\ \tilde{\mathcal{W}}(t) = \int\_0^1 \tilde{\rho}(t, \mathbf{x}) d\mathbf{x} \ge -\rho^\*, \\ \lambda(s) = \frac{A}{B + s}, \quad \text{with} \quad A > 0, \; B > 0, \; s \in [0, +\infty). \end{cases} \tag{13}$$

By using the velocity function Equation (6) in Equation (13), we have

$$\rho^\* \left( \lambda(\rho^\*) - \vec{\lambda}\_{\widetilde{W}}(t) \right) = \theta \vec{\lambda}\_{\widetilde{W}}(t) \widetilde{W}(t), \quad t \in [0, +\infty), \tag{14}$$

where

$$\theta := \frac{\rho^\*}{B + \rho^\*} < 1.$$

For convenience, until the end of this proof, we omit the symbol "~". Then, the system in Equation (13) with Equation (14) can be rewritten in the following form for *t* ∈ (0, +∞):

$$\begin{cases} \partial\_t \rho(t, \mathbf{x}) + \lambda\_W(t) \partial\_x \rho(t, \mathbf{x}) = 0, \ \mathbf{x} \in (0, 1), \\ \rho(0, \mathbf{x}) = \rho\_0(\mathbf{x}), \ \mathbf{x} \in (0, 1), \\ \mathcal{U}(t) = k \lambda\_W(t) (\rho(t, 1) + d(t)) + (1 - k) \theta \lambda\_W(t) W(t) \text{ with } \theta = \frac{\rho^\*}{B + \rho^\*}, \\ \lambda\_W(t) \coloneqq \lambda (\rho^\* + \mathcal{W}(t)), \\ \rho(t, 0) \lambda\_W(t) = \mathcal{U}(t), \\ \mathcal{W}(t) = \int\_0^1 \rho(t, \mathbf{x}) d\mathbf{x} \ge -\rho^\*, \\ \lambda(\mathbf{s}) = \frac{A}{B + \mathbf{s}^\*}, \quad \text{with} \quad A > 0, \ B > 0, \ \mathbf{s} \in [0, +\infty). \end{cases} \tag{15}$$

With the above notation, the assumption in Equation (12) of Theorem 1 reads

$$||\rho\_0||\_{L^2(0,1)} \le R. \tag{16}$$

**Proof.** The following proof of Theorem 1 is an extension of the proof of Theorem 3.2 in [10]. Since *C*1-functions are dense in *L*2(0, 1), we can analyze ISS for the system Equation (15) with non-negative weak solution *<sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>+</sup>∞); *<sup>L</sup>*2(0, 1)) as follows: For *<sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*2(0, 1), we first define a candidate ISS-Lyapunov function by

$$\mathbf{L}(\boldsymbol{\phi}) = \int\_0^1 \boldsymbol{\phi}^2(\boldsymbol{\lambda}) \boldsymbol{e}^{-\beta \mathbf{x}} d\boldsymbol{x} + a \left( \int\_0^1 \boldsymbol{\phi}(\boldsymbol{\lambda}) d\boldsymbol{x} \right)^2 \boldsymbol{\lambda}$$

and then we have according to (11)

$$\mathcal{L}(t) := \mathbf{L}(\rho(t, \cdot)) = \int\_0^1 \rho^2(t, \mathbf{x}) e^{-\beta \mathbf{x}} d\mathbf{x} + a \mathcal{W}^2(t), \quad \forall t \in [0, +\infty), \tag{17}$$

where *<sup>β</sup>* <sup>&</sup>gt; 0 and *<sup>a</sup>* <sup>∈</sup> <sup>R</sup> are constants. By definition of *<sup>W</sup>* and Hölder inequality, we have

$$\mathcal{W}(t)^2 = \left(\int\_0^1 \rho(t, \mathbf{x}) e^{-\frac{1}{2}\beta \mathbf{x}} e^{\frac{1}{2}\beta \mathbf{x}} d\mathbf{x}\right)^2 \le \int\_0^1 e^{\beta \mathbf{x}} d\mathbf{x} \int\_0^1 e^{-\beta \mathbf{x}} \rho^2(t, \mathbf{x}) d\mathbf{x}.\tag{18}$$

Hence, if

$$a > -\frac{\beta}{e^{\beta} - 1},\tag{19}$$

then L(*t*) > 0 for all *t* ≥ 0. We will further assume from now on that *a* ≤ 0. Furthermore, for <sup>0</sup> <sup>&</sup>lt; *<sup>C</sup>*<sup>1</sup> :<sup>=</sup> *<sup>C</sup>*1(*β*) = *<sup>e</sup>β*−<sup>1</sup> *<sup>β</sup>* we obtain

$$\mathcal{W}^2(t) \le \mathcal{C}\_1 \int\_0^1 e^{-\beta \mathbf{x}} \rho^2(t, \mathbf{x}) d\mathbf{x} \tag{20}$$

and for *C*<sup>2</sup> := *C*2(*a*, *β*) = *C*<sup>1</sup> + max{*a*, 1} > 0 we have

$$\mathcal{L}(t) \le \mathcal{C}\_2 \int\_0^1 e^{-\beta \mathbf{x}} \rho^2(t, \mathbf{x}) d\mathbf{x}.\tag{21}$$

Since *a* < 0 we also obtain

$$(1 + a\mathbb{C}\_1) \int\_0^1 e^{-\beta x} \rho^2(t, x) dx \le \mathcal{L}(t). \tag{22}$$

Summarizing, there exist positive constants *Ci* = *Ci*(*a*, *β*), *i* ∈ {3, 4} such that for all *t* ≥ 0

$$\mathcal{W}^2(t) \le \mathbb{C}\_3 \int\_0^1 \rho^2(t, \mathbf{x}) e^{-\beta \mathbf{x}} d\mathbf{x} \le \mathcal{L}(t) \le \mathbb{C}\_4 \int\_0^1 \rho^2(t, \mathbf{x}) e^{-\beta \mathbf{x}} d\mathbf{x} \tag{23}$$

and therefore <sup>L</sup> is equivalent to the *<sup>L</sup>*2-norm of *<sup>ρ</sup>*. Note that for *<sup>ρ</sup>*<sup>∗</sup> <sup>=</sup> 0 we may set *<sup>a</sup>* <sup>=</sup> 0 in Equation (17). The time derivative of the candidate ISS-Lyapunov function in Equation (17) is given by:

$$\begin{split} \frac{d\mathcal{L}}{dt}(t) &= \int\_{0}^{1} 2\rho(t, \mathbf{x})\rho\_{t}(t, \mathbf{x})e^{-\beta\mathbf{x}}d\mathbf{x} + 2a\mathcal{W}(t)\frac{d\mathcal{W}}{dt}(t) \\ &= -\beta\lambda\_{\mathcal{W}}(t)\int\_{0}^{1} \rho^{2}(t, \mathbf{x})e^{-\beta\mathbf{x}}d\mathbf{x} + \frac{1}{\lambda\_{\mathcal{W}}(t)}\Big(\lambda\_{\mathcal{W}}^{2}(t)\rho^{2}(t, \mathbf{0}) - \lambda\_{\mathcal{W}}^{2}(t)\rho^{2}(t, \mathbf{1})e^{-\beta}\Big) \\ &\quad + 2a\mathcal{W}(t)\big(\lambda\_{\mathcal{W}}(t)\rho(t, \mathbf{0}) - \lambda\_{\mathcal{W}}(t)\rho(t, \mathbf{1})\Big) \\ &= -\beta\lambda\_{\mathcal{W}}(t)\int\_{0}^{1} \rho^{2}(t, \mathbf{x})e^{-\beta\mathbf{x}}d\mathbf{x} + A\_{1}(t), \end{split}$$

where *A*1(*t*) contains all contributions due to the boundary conditions. In the following we will analyze and estimate *A*1. Note that *λW*(*t*)*ρ*(*t*, 0) = *U*(*t*) and *U* is given by Equation (15). More precisely, we will use the following estimate for any > 0

$$\begin{split} 2a\mathcal{W}(t)\mathcal{U}(t) &= 2a\mathcal{W}(t)(k\lambda\_W(t)(\rho(t,1) + d(t)) + (1-k)\theta\lambda\_W(t)\mathcal{W}(t)) \\ &\leq k^2 d^2(t)\lambda\_W(t) \frac{1}{\varepsilon} + \varepsilon a^2 \mathcal{W}^2 \lambda\_W(t) + 2a\mathcal{W}(t) \left(k\lambda\_W(t)\rho(t,1) + (1-k)\theta\lambda\_W(t)\mathcal{W}(t)\right), \\ \mathcal{U}^2(t) &= \left(k\lambda\_W(t)\left(\rho(t,1) + d(t)\right) + (1-k)\theta\lambda\_W(t)\mathcal{W}(t)\right)^2 \\ &\leq \left(1+\varepsilon\right)\left(k\lambda\_W(t)\rho(t,1) + (1-k)\theta\lambda\_W(t)\mathcal{W}(t)\right)^2 + k^2 d^2(t)\lambda\_W^2(t)\left(1+\frac{1}{\varepsilon}\right). \end{split}$$

In order to simplify the notation of the following computations, we neglect the time dependence and we define

$$y := y(t) := \lambda\_W(t)\rho(t,1),\ b\_1 := b\_1(t) := (1+\frac{2}{\epsilon})k^2\lambda\_W(t)d^2(t) + \epsilon a^2\mathcal{W}^2(t)\lambda\_W(t).$$

Then, we have

*A*1(*t*) ≤*b*<sup>1</sup> + 1 *λ<sup>W</sup>* (<sup>1</sup> <sup>+</sup> )(*ky* + (<sup>1</sup> <sup>−</sup> *<sup>k</sup>*)*θλWW*)<sup>2</sup> <sup>−</sup> *<sup>y</sup>*2*<sup>e</sup>* −*β* + 2*aW*((*k* − 1)*y* + (1 − *k*)*θλW*) <sup>=</sup>*b*<sup>1</sup> <sup>+</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> λ<sup>W</sup> y* + *λWW a*(*k* − 1) (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup>* <sup>−</sup> (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)*k*(<sup>1</sup> <sup>+</sup> )*<sup>θ</sup>* (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup>* <sup>2</sup> + *λWW*<sup>2</sup> *<sup>θ</sup>*2(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)2(<sup>1</sup> <sup>+</sup> ) <sup>−</sup> <sup>2</sup>*aθ*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>) <sup>−</sup> (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup>* (*<sup>a</sup>* <sup>−</sup> *<sup>k</sup>*(<sup>1</sup> <sup>+</sup> )*θ*)<sup>2</sup> <sup>=</sup>*b*<sup>1</sup> <sup>+</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> λ<sup>W</sup>* (...) 2 <sup>−</sup> *<sup>λ</sup>WW*<sup>2</sup> (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> <sup>a</sup>*<sup>2</sup> <sup>−</sup> <sup>2</sup>*ak*(<sup>1</sup> <sup>+</sup> )*<sup>θ</sup>* <sup>+</sup> *<sup>k</sup>*2(<sup>1</sup> <sup>+</sup> )2*θ*<sup>2</sup> <sup>+</sup> <sup>2</sup>*aθ*((<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*β*) (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>) <sup>−</sup> *<sup>θ</sup>*2(<sup>1</sup> <sup>+</sup> )((<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>* <sup>−</sup>*β*) <sup>=</sup>*b*<sup>1</sup> <sup>+</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> λ<sup>W</sup>* (...) 2 <sup>−</sup> *<sup>λ</sup>WW*<sup>2</sup> (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> <sup>a</sup>*<sup>2</sup> <sup>−</sup> <sup>2</sup>*a<sup>θ</sup> <sup>k</sup>*(<sup>1</sup> <sup>+</sup> ) <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup>* <sup>1</sup> <sup>−</sup> *<sup>k</sup>* <sup>+</sup> *<sup>θ</sup>*2(<sup>1</sup> <sup>+</sup> )*<sup>e</sup>* −*β* <sup>=</sup>*b*<sup>1</sup> <sup>+</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> λ<sup>W</sup>* (...) 2 <sup>−</sup> *<sup>λ</sup>WW*<sup>2</sup> (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> a* − *θ <sup>k</sup>*(<sup>1</sup> <sup>+</sup> ) <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup>* 1 − *k* 2 <sup>−</sup> *<sup>λ</sup>WW*<sup>2</sup> (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> (<sup>1</sup> <sup>+</sup> )*k*<sup>2</sup> <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup> θ*2(1 + )*e* <sup>−</sup>*<sup>β</sup>* <sup>−</sup> *<sup>θ</sup>*<sup>2</sup> *k*(<sup>1</sup> <sup>+</sup> ) <sup>−</sup> *<sup>e</sup>*−*<sup>β</sup>* 1 − *k* <sup>2</sup> .

Even so, it is not necessary that the proof simplifies if is chosen depending on *β*. We set for

$$
\epsilon := \epsilon(\beta) = \beta^2. \tag{24}
$$

For any fixed 0 <sup>≤</sup> *<sup>k</sup>* <sup>&</sup>lt; 1 and all 0 <sup>&</sup>lt; *<sup>β</sup>*<sup>2</sup> <sup>&</sup>lt; <sup>∗</sup> with <sup>∗</sup> :<sup>=</sup> min{1, <sup>1</sup> 2 1−*k <sup>k</sup>* }, we have

$$(1+\beta^2)k^2 < (1+\beta^2)k<1\tag{25}$$

and hence for all *β* < min{ <sup>√</sup>∗, *<sup>β</sup>*∗} with *<sup>β</sup>*<sup>∗</sup> :<sup>=</sup> <sup>−</sup> ln((<sup>1</sup> <sup>+</sup> ∗)*k*), we have

$$e^{-\beta} > (1 + \beta^2)k > (1 + \beta^2)k^2. \tag{26}$$

Furthermore, consider

$$a(\beta) = \theta \frac{k(1+\beta^2) - e^{-\beta}}{1-k} < 0. \tag{27}$$

For *<sup>β</sup>* <sup>→</sup> 0, we have lim*β*→<sup>0</sup> *<sup>a</sup>*(*β*) = <sup>−</sup>*<sup>θ</sup>* <sup>&</sup>gt; <sup>−</sup>1 and we have lim*β*→<sup>0</sup> *<sup>β</sup> <sup>e</sup>β*−<sup>1</sup> <sup>=</sup> 1. Hence, there exists a *β*∗∗ > 0 such that for all *β* ≤ min{*β*∗, *β*∗∗} and for *a*(*β*) as given by Equation (27), the inequalities (26), (27) and (19) hold true. Using the inequality (26) and the particular choice for *a*(*β*) and (*β*), we obtain for all *β* sufficiently small

$$A\_1(t) \le b\_1(t) + \theta^2 \lambda\_W \mathcal{W}^2 \frac{(k-1)^2 (1+\beta^2) \varepsilon^{-\beta} - (k(1+\beta^2) - \varepsilon^{-\beta})^2}{\varepsilon^{-\beta} - (1+\beta^2)k^2} \tag{28}$$

$$= \left(1 + \frac{2}{\beta^2}\right) k^2 \lambda\_W d^2 + \beta^2 a^2(\beta) \mathcal{W}^2 \lambda\_W + \theta^2 \lambda\_W \mathcal{W}^2 b\_2(\beta, k), \tag{29}$$

$$b\_2(\beta, k) := \frac{(k-1)^2 (1+\beta^2) e^{-\beta} - (k(1+\beta^2) - e^{-\beta})^2}{e^{-\beta} - (1+\beta^2)k^2}. \tag{30}$$

Using the estimate (20) to bound *W*<sup>2</sup> and using *k* < 1, we obtain

$$\begin{split} A\_{1}(t) &\leq \left(1+\frac{2}{\beta^{2}}\right)k^{2}\lambda\_{W}d^{2} + \lambda\_{W}\theta^{2}\frac{e^{\beta}-1}{\beta}\left(\beta^{2}\frac{a^{2}(\beta)}{\theta^{2}} + b\_{2}(\beta,k)\right)\int\_{0}^{1}e^{-\beta x}\rho^{2}(t,\mathbf{x})d\mathbf{x} \\ &\leq \lambda\_{W}\left(1+\frac{2}{\beta^{2}}\right)d^{2} + \lambda\_{W}\theta^{2}b\_{3}(\beta,k)\int\_{0}^{1}e^{-\beta x}\rho^{2}(t,\mathbf{x})d\mathbf{x}, \\ b\_{3}(\theta,k) &:= \frac{e^{\theta}-1}{\beta}\left(\beta^{2}\frac{a^{2}(\beta)}{\theta^{2}} + b\_{2}(\beta,k)\right). \end{split}$$

An elementary computation shows that *f*(*β*, *k*) has the following properties

$$b\_3(0,k) = 0 \text{ and } \partial\_{\beta}b\_3(0,k) = 1.$$

Replacing *b*<sup>3</sup> by a second-order Taylor expansion in *β* at *β* = 0 therefore yields the estimate

$$A\_1(t) \le \left(1 + \frac{2}{\beta^2}\right) \lambda\_W d^2 + \theta^2 \lambda\_W \left(\beta + O(\beta^2)\right) \int\_0^1 \varepsilon^{-\beta x} \rho^2(t, x) dx. \tag{31}$$

Now, we proceed with the estimate of *<sup>d</sup> dt*L(*t*) as

$$\frac{d}{dt}\mathcal{L}(t) \le -\beta\lambda\_W(t)\int\_0^1 \varepsilon^{-\beta\mathbf{x}}\rho^2(t,\mathbf{x})d\mathbf{x}\left(1-\theta^2+O(\beta)\right) + \left(1+\frac{2}{\beta^2}\right)\lambda\_W(t)d^2(t). \tag{32}$$

Since *<sup>θ</sup>* <sup>&</sup>lt; 1 there exists 0 <sup>&</sup>lt; *<sup>β</sup>*¯ <sup>&</sup>lt; min{*β*∗, *<sup>β</sup>*∗∗} sufficiently small, such that

$$0 < 1 - \theta^2 + O(\beta). \tag{33}$$

Using the estimate (22) there is a constant 0 < *η* := *η*(*k*, *ρ*∗), we obtain

$$\frac{d}{dt}\mathcal{L}(t) \le -\eta\lambda\_W(t)\mathcal{L}(t) + \left(1 + \frac{2}{\overline{\beta^2}}\right)\lambda\_W(t)d^2(t). \tag{34}$$

By definition, we have that 0 ≤ *λW*(*t*). Next, we show that *λW*(*t*) is bounded from below by a positive constant. This requires to obtain an upper bound on *W*(*t*). The previous inequality (34) yields the following bound on *W*2(*t*) for *C*<sup>5</sup> := *C*5(*β*¯) = *<sup>C</sup>*<sup>1</sup> (1+*a*)*C*<sup>1</sup> and for *<sup>C</sup>*<sup>6</sup> :<sup>=</sup> *<sup>C</sup>*6(*β*¯) = 1 + <sup>2</sup> *β*¯2 :

$$\frac{1}{\mathcal{C}\_{\mathsf{S}}} \mathcal{W}^2(t) \le \mathcal{L}(t) \le e^{-\eta \int\_0^t \lambda\_W(s) ds} \mathcal{L}(0) + \int\_0^t \mathcal{C}\_{\mathsf{G}} \lambda\_W(s) d^2(s) e^{-\eta \int\_s^t \lambda\_W(r) dr} \tag{35}$$

$$\leq \mathcal{L}(0) + \frac{\mathcal{C}\_{\mathsf{f}}}{\eta} \|d(t)\|\_{L^{\infty}(0,t)} \Big(1 - e^{-\eta \int\_{0}^{t} \lambda\_{\mathsf{W}}(s)ds} \Big). \tag{36}$$

By assumption *ρ*0<sup>2</sup> <sup>≤</sup> *<sup>R</sup>*2. By definition we have <sup>−</sup>*ρ*<sup>∗</sup> <sup>≤</sup> *<sup>W</sup>*(*t*) and therefore

$$-\rho^\* \le \mathcal{W}(t) \le \sqrt{\mathcal{C}\_5 \mathcal{R}^2 + \frac{\mathcal{C}\_5 \mathcal{C}\_6}{\eta} \|d(\cdot)\|\_{L^\infty(0,t)}}.\tag{37}$$

Due to the definition of *λW*, it is uniformly bounded from above by *σ*<sup>1</sup> := *<sup>A</sup> B* . Furthermore, we have that *W*(*t*) is bounded from above due to Equation (37) and since *d* is bounded. Hence, *<sup>λ</sup><sup>W</sup>* is bounded from below by *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> *<sup>σ</sup>*2(*dL*∞(0,∞), *<sup>R</sup>*, *<sup>β</sup>*¯). Note that the *L*<sup>∞</sup> norm of the disturbances are uniformly bounded by the constant *D*. This yields that for all *t* ≥ 0

$$
\sigma\_1 \ge \lambda\_W(t) \ge \sigma\_2. \tag{38}
$$

Using the previous estimate for *λ<sup>W</sup>* in Equation (34) yields the assertion. The decay rate *η*<sup>∗</sup> of the Lyapunov function is *η*<sup>∗</sup> = *ησ*<sup>2</sup> and *ν* = *σ*1*C*6.

Some remarks are in order.

**Remark 1.** *Note that the rate η* = *η*(*ρ*∗, *k*) *as a function of k tends to zero as k tends to one, this can be seen for example in Equation (26) defining the upper bound for <sup>β</sup>*¯. *Similarly, if <sup>θ</sup>* <sup>→</sup> <sup>1</sup>*, i.e., ρ*<sup>∗</sup> → ∞*, we observe that η* → 0 *due to Equation (33).*

*The bound on W*(*t*) *is required to obtain the exponential decay. Therefore, the final rate depends on the constant R and we refer to Equation* (37) *and following for its detailed dependence. Note that in the case ρ*∗ = 0 *we may set a* = 0 *and therefore no bound on W is necessary.*

*Further, the result holds true for any solution <sup>ρ</sup>* <sup>∈</sup> *<sup>C</sup>*0([0, <sup>∞</sup>); *<sup>L</sup>*2(0, 1)) *and hence uniqueness of solutions is not required. Regarding existence of solutions, it might be possible to extend recent results [40–42]. However, so far existence results in the case d* ≡ 0 *exist [10].*

*Note that the decay rate η will be dependent on the bound of the disturbance as well as on R, but will be uniform with respect to ρ*<sup>0</sup> *provided that ρ*<sup>0</sup> *fulfills* (12)*.*

*In ([7], Lemma 3.5) it has been shown that in the case d* ≡ 0 *and ρ*<sup>∗</sup> = 0 *exponential stability does not hold if k* > 1.

*For <sup>ρ</sup>*<sup>∗</sup> <sup>=</sup> <sup>0</sup>*, Theorem <sup>1</sup> holds true for any velocity function <sup>λ</sup>*(·) <sup>∈</sup> *<sup>C</sup>*1([0, <sup>+</sup>∞),(0, <sup>+</sup>∞))*. This case is similar to a problem studied in [10]. Therein, a detailed discussion of the case d* ≡ 0 *has been presented and we refer in particular to ([10], Theorem 3.1).*

#### **3. Numerical Study of Asymptotic Stability of a Scalar Conservation Law with Nonlocal Velocity and Measurement Error**

In the following section, we extend the result to a proper discretization of the continuous dynamics. The following results are based on similar estimates as in the previous section and it is a minor extension of the proof presented in ([10], Section 4.2). In order to not repeat the estimates obtained in [10], we will use a similar notation and mostly report on the changes in estimates due to the additional disturbance *d*. As seen in the previous proof in Equation (24), it is possible to chose = *β*<sup>2</sup> and we will do so in the following proof directly. This simplifies the notation and reduces the technicality of the computations.

As in ([10], Section 4.2) we introduce a first-order Upwind discretization of the closedloop system in Equation (8). To this end we divide the spatial domain [0, 1] using an equidistant grid with cell width <sup>Δ</sup>*<sup>x</sup>* and *<sup>J</sup>* <sup>∈</sup> <sup>N</sup> cells such that <sup>Δ</sup>*x J* <sup>=</sup> 1. The cell centers are denoted by *xj* = (*<sup>j</sup>* <sup>−</sup> <sup>1</sup> <sup>2</sup> )Δ*x*, *j* ∈ {1, ... , *J*} and, the boundary of the domain are *x*<sup>0</sup> and *xJ*, respectively. Moreover, we discretize *W*(*t*) by

$$\mathcal{W}^n = \Delta \mathbf{x} \sum\_{j=1}^l \rho\_j^n, \quad n \in \{1, 2, \dots\}, \tag{39}$$

with the point wise values of the solution *ρ<sup>n</sup> <sup>j</sup>* = *ρ*(*t <sup>n</sup>*, *xj*). Further, we define the discrete values *λ<sup>n</sup>* by

$$
\lambda^n := \lambda(\mathcal{W}^n) = \frac{A}{B + \mathcal{W}^n}, \quad A > 0, \ B > 0,\tag{40}
$$

where *t <sup>n</sup>* <sup>=</sup> *<sup>n</sup>*Δ*t*, *<sup>n</sup>* ∈ {0, 1, ...} denotes the discrete time such that the time step size <sup>Δ</sup>*<sup>t</sup>* satisfies a stability condition due to Courant–Friedrichs–Lewy condition (CFL). This condition states that Δ*t* is chosen such that

$$0 < r^n := \frac{\lambda^n \Delta t}{\Delta x} \le 1, \forall n \in \{0, 1, \ldots\}. \tag{41}$$

Since *<sup>λ</sup><sup>n</sup>* <sup>≤</sup> *<sup>A</sup> <sup>B</sup>* for all *n* ≥ 0, we can choose a possibly small but fixed Δ*t* such that the previous condition (41) holds true for all *n* with fixed Δ*t* and Δ*x*. This choice allows to take a uniform grid in time. As in the continuous case we have *ρ*∗ > 0. For the given initial values *ρ*<sup>0</sup> = (*ρ*<sup>0</sup> <sup>0</sup>, *<sup>ρ</sup>*<sup>0</sup> <sup>1</sup>, ... , *<sup>ρ</sup>*<sup>0</sup> *<sup>J</sup>*) with *<sup>ρ</sup>*<sup>0</sup> *<sup>j</sup>* ≥ 0, *j* ∈ {0, ... , *J*}, we employ a first–order finite volume scheme, given by the explicit Upwind method, to discretize the system in Equation (8).

$$\begin{cases} \rho\_{j}^{n+1} = (1 - r^n)\rho\_{j}^{n} + r^n \rho\_{j-1}^{n}, & j \in \{1, \dots, l\}, \ n \in \{0, 1, \dots\},\\ \rho\_{0}^{n+1} = k\rho\_{J}^{n+1} + (1 - k)\frac{\rho^{\*}\lambda(\rho^{\*})}{\lambda^{n+1}} + kd^{n+1}, & n \in \{0, 1, \dots\}. \end{cases} \tag{42}$$

We now define discrete version of ISS and ISS-Lyapunov function as follows:

**Definition 4** (Discrete ISS)**.** *Let D* > 0. *An equilibrium ρ*<sup>∗</sup> ≥ 0 *of the discrete closed-loop system in Equation (42) is ISS in <sup>L</sup>*2*-norm with respect to discrete disturbances <sup>d</sup><sup>n</sup>* <sup>≤</sup> *D, <sup>n</sup>* ∈ {1, 2, ...} *if there exist positive real constants γ*<sup>1</sup> > 0*, γ*<sup>2</sup> > 0 *and γ*<sup>3</sup> > 0 *such that, for every initial condition ρ*0 *<sup>j</sup> , <sup>j</sup>* ∈ {1, ... , *<sup>J</sup>*}*, the solution <sup>ρ</sup><sup>n</sup> <sup>j</sup> , j* ∈ {1, ... , *J*}*, n* ∈ {0, 1, ...} *to the discrete closed-loop system in Equation (42) satisfies*

$$\|\|\overrightarrow{\rho}^{n} - \rho^{\*}\|\|\_{L^{2}\_{\Lambda x}} \leq \gamma\_{2} e^{-\gamma\_{1}t^{n}} \|\|\overrightarrow{\rho}^{0}\|\|\_{L^{2}\_{\Lambda x}} + \gamma\_{3} \max\_{0 \leq s < n} (|d^{s}|), \ n \in \{1, 2, \dots\},\tag{43}$$

*where* −→*ρ <sup>n</sup>* = (*ρ<sup>n</sup> j* ) *J <sup>j</sup>*=<sup>1</sup> *and*

$$\|\|\overrightarrow{\rho}^n\|\|\_{\ell^2}^2 := \Delta x \sum\_{j=1}^{\bar{I}} \left(\rho\_j^n\right)^2, \quad n \in \{0, 1, \ldots\}.$$

**Definition 5** (Discrete ISS-Lyapunov function)**.** *A function* **<sup>L</sup>** : <sup>R</sup>*<sup>J</sup>* <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *is said to be a discrete ISS-Lyapunov function for the discrete closed-loop system in Equation (42) if*

*(i) there exist positive constants α*<sup>1</sup> > 0 *and α*<sup>2</sup> > 0 *such that for all n* ∈ {0, 1, . . .}

$$\|\boldsymbol{a}\_1\|\|\overrightarrow{\boldsymbol{\rho}}^n - \boldsymbol{\rho}^\*\|\_{\ell^2}^2 \le \mathbf{L}(\overrightarrow{\boldsymbol{\rho}}^n) \le \boldsymbol{a}\_2\|\|\overrightarrow{\boldsymbol{\rho}}^n - \boldsymbol{\rho}^\*\|\_{\ell^{2,\prime}}^2\tag{44}$$

*(ii) there exist positive constants η* > 0 *and ν* > 0 *such that for all n* ∈ {0, 1, . . .}

$$\frac{\mathbf{L}(\overrightarrow{\rho}^{n+1}) - \mathbf{L}(\overrightarrow{\rho}^{n})}{\Delta t} \le -\eta \mathbf{L}(\overrightarrow{\rho}^{n}) + \nu (d^{n})^{2}.$$

To simplify the notation later on we will define the sequence of discrete values <sup>L</sup>*<sup>n</sup>* by

$$\mathcal{L}^n := \mathbf{L}(\overrightarrow{\rho}^n), \ n \in \{0, 1, \dots\} \tag{45}$$

and where −→*ρ <sup>n</sup>* are given as solution to the system in (42).

**Theorem 2.** *(Discrete ISS for ρ*<sup>∗</sup> ≥ 0*) Assume that the CFL condition in Equation (41) holds. Let D* > 0*. For every ρ*<sup>∗</sup> ≥ 0*, every k* ∈ [0, 1)*, every R* > 0 *and for every initial data ρ*<sup>0</sup> = (*ρ*<sup>0</sup> <sup>0</sup>, *<sup>ρ</sup>*<sup>0</sup> <sup>1</sup>,..., *<sup>ρ</sup>*<sup>0</sup> *<sup>J</sup>*) *with <sup>ρ</sup>*<sup>0</sup> *<sup>j</sup>* ≥ 0*, j* ∈ {1, . . . , *J*} *and*

$$\|\vec{\rho}^0 - \rho^\*\vec{e}\|\_{\ell^2} \le R,\tag{46}$$

$$\hat{\tau}^{+1}$$

*where e* = ' (% & (1, . . . , 1)*, the solution ρ<sup>n</sup>* = (*ρ<sup>n</sup>* <sup>0</sup> , *<sup>ρ</sup><sup>n</sup>* <sup>1</sup> , ... , *<sup>ρ</sup><sup>n</sup> <sup>J</sup>*) *to the system in Equation (42) satisfies ρ<sup>n</sup> <sup>j</sup>* ≥ 0*, j* ∈ {0, ... , *J*}*, n* ∈ {0, 1, ...} *and the steady-state ρ*<sup>∗</sup> *of the discrete system in Equation (42) is ISS in <sup>L</sup>*2*-norm with respect any discrete disturbance function <sup>d</sup>n, <sup>n</sup>* ∈ {1, 2, ...} *such that d<sup>n</sup>* <sup>≤</sup> *<sup>D</sup>*.

In order to analyze the ISS of the discrete system in Equation (42) by the discrete Lyapunov method, we use the following transformation

$$\bar{\rho}\_{\dot{j}}^{n} = \rho\_{\dot{j}}^{n} - \rho^{\*}, \; \tilde{\mathcal{W}}^{n} = \Delta x \sum\_{j=1}^{l} \bar{\rho}\_{\dot{j}}^{n}, \quad \bar{\lambda}\_{\bar{W}}^{n} = \lambda \left(\rho^{\*} + \tilde{\mathcal{W}}^{n}\right), \; \bar{r}^{n} = \frac{\Delta t}{\Delta x} \bar{\lambda}\_{\bar{W}}^{n}, \; n \in \{0, 1, \ldots\}. \tag{47}$$

For simplicity, we omit the symbol "~" in Equation (47) and discretize the system in Equation (15) as follows

$$\begin{cases} \rho\_{j}^{n+1} = (1 - r^n)\rho\_{j}^n + r^n \rho\_{j-1,j}^n, j \in \{1, \dots, J\}, \ n \in \{0, 1, \dots\},\\ \rho\_{0}^{n+1} = k\rho\_{j}^{n+1} + (1 - k)\theta \mathsf{W}^{n+1} + kd^{n+1} \text{ with } \theta = \frac{\rho^{\*}}{\mathsf{B} + \rho^{\*}}, \ n \in \{0, 1, \dots\},\\ r^{n} = \frac{\Delta t}{\Delta x} \lambda\_{\mathsf{W}}^{n}, n \in \{0, 1, \dots\},\\ \lambda\_{\mathsf{W}}^{n} = \lambda(\rho^{\*} + \mathsf{W}^{n}), n \in \{0, 1, \dots\},\\ W^{n} = \Delta x \sum\_{j=1}^{J} \rho\_{j}^{n} \geq -\rho^{\*}, n \in \{0, 1, \dots\},\\ \lambda(s) = \frac{A}{\mathsf{B} + \mathsf{s}^{\*}}, s \geq 0. \end{cases} \tag{48}$$

Thus, the assumption in Equation (46) in Theorem 2 is now expressed as

$$\|\vec{\rho}^{0}\|\_{\ell^{2}} \leq \text{R.} \tag{49}$$

Note that the proof of Theorem 2 is an extension of the proof of Theorem 4.2 in [10]. Thus, some details of the proof can be found in [10] and we will point to the corresponding estimates in order to reduce the technicality of the proof.

**Proof.** As in the continuous case the proof simplifies if *ρ*∗ = 0. Therefore, we consider in the forthcoming proof only the more interesting case

$$
\rho^\* > 0.\tag{50}
$$

Since the initial data *ρ*<sup>0</sup> *<sup>j</sup>* ≥ 0, *j* ∈ {0, ... , *J*}, by the discrete system in Equation (48) and the CFL condition in Equation (41), we have *ρ<sup>n</sup> <sup>j</sup>* ≥ 0, *j* ∈ {0, . . . , *J*}, *n* ∈ {0, 1, . . .}.

Consider the following candidate Lyapunov function Equation (17) for any −→*<sup>φ</sup>* <sup>∈</sup> <sup>R</sup>*<sup>J</sup>*

$$\mathbf{L}(\overrightarrow{\phi}^{\flat}) = \Delta x \sum\_{j=1}^{I} (\phi\_j)^2 e^{-\beta x\_j} + a \left(\Delta x \sum\_{j=1}^{I} \phi\_j\right)^2.$$

where *β* > 0. In particular, we set *a*

$$a = \theta \frac{k - \varepsilon^{-\beta}}{1 - k} < 0 \tag{51}$$

and since *<sup>θ</sup>* <sup>&</sup>lt; 1 there exists *<sup>β</sup>*<sup>∗</sup> sufficiently small such that 0 <sup>&</sup>gt; *<sup>a</sup>* <sup>&</sup>gt; <sup>−</sup> *<sup>β</sup> <sup>e</sup>β*−<sup>1</sup> , see ([10], (3.25), (3.26)).

According to (45), the values of **L** at the solution −→*ρ <sup>n</sup>* at time *t <sup>n</sup>* for *<sup>n</sup>* <sup>≥</sup> 0 are given by

$$\mathcal{L}^n = \|\vec{\rho}^n\|\_{\mathcal{\beta}}^2 + a(\mathcal{W}^n)^2,\tag{52}$$

$$\|\|\bar{\rho}^{\mathfrak{m}}\|\|\_{\beta}^{2} := \Delta \mathfrak{x} \sum\_{j=1}^{J} (\rho\_{j}^{\mathfrak{m}})^{2} e^{-\beta \mathbf{x}\_{j}}.\tag{53}$$

For fixed *k* ∈ [0, 1), we assume as in [10] there exists a *β*∗∗ such that for 0 < *β* < *β*∗∗

$$\exp(-\beta) > k > k^2 \text{ and } \beta < 1 - k,\tag{54}$$

holds true and that

$$0 < \Delta \mathbf{x} < 1.\tag{55}$$

As a first step, we prove that <sup>L</sup>*<sup>n</sup>* is equivalent to *ρn*<sup>2</sup> *<sup>β</sup>*. This part does not dependent on the boundary condition *<sup>ρ</sup>n*+<sup>1</sup> <sup>0</sup> and is therefore analogous to [10]. In particular, due to estimate ([10], (4.32), (4.34)) we have for all *n* ≥ 1

$$\mathbb{E}(\mathcal{W}^n)^2 \le \Delta x^2 \sum\_{j=1}^I (\rho\_j^n)^2 e^{-\beta x\_j} \sum\_{j=1}^I e^{\beta x\_j} \le \frac{\Delta x(e^{\beta} - 1)}{1 - e^{-\beta \Delta x}} ||\tilde{\rho}^n||\_{\beta}^2 \tag{56}$$

$$1 \le (1+\beta)^2 \|\vec{\rho}^n\|\_{\beta}^2 \le (1+3\beta) \|\vec{\rho}^n\|\_{\beta}^2. \tag{57}$$

Due to the bounds on *a*, we obtain the estimate ([10], (4.38)) for all *n* ≥ 0

$$\|\|\overline{\rho}^{a}\|\|\_{\beta}^{2} \geq \mathcal{L}^{n} \geq \|\|\overline{\rho}^{a}\|\|\_{\beta}^{2} \left(1 + \theta \frac{k - e^{-\beta}}{1 - k} \frac{\Delta x (e^{\beta} - 1)}{1 - e^{-\beta \Delta x}}\right) \tag{58}$$

$$\geq \left(1 - \theta(1 + 3\beta)\right) \||\vec{\rho}^n||\_{\hat{\boldsymbol{\beta}}}^2 \geq \frac{1 - \theta}{2} \||\vec{\rho}^n||\_{\hat{\boldsymbol{\beta}}'}^2\tag{59}$$

where the last inequality is true provided that

$$0 < \beta \le \min\{1, \beta^\*, \beta^{\*\*}, \frac{1-\theta}{6\theta}\}. \tag{60}$$

Furthermore, the discrete weighted norm is equivalent to the -2-norm as in ([10], (4.39)) for all *n* ≥ 0

$$\varepsilon^{-\beta} \| |\vec{\rho}^{\eta}| \|\_{\ell^2}^2 \le \| |\vec{\rho}^{\eta}| \|\_{\beta}^2 \le \| |\vec{\rho}^{\eta}| \|\_{\ell^2}. \tag{61}$$

As a second step, we estimate a finite difference approximation to the temporal derivative of L.

$$\frac{\mathcal{L}^{n+1} - \mathcal{L}^n}{\Delta t} = \frac{\Delta \mathbf{x}}{\Delta t} \sum\_{j=1}^{J} \left[ \left( \rho\_j^{n+1} \right)^2 - \left( \rho\_j^n \right)^2 \right] e^{-\beta \mathbf{x}\_j} \tag{62}$$

$$+\frac{a(\Delta x)^2}{\Delta t} \left[ \left( \sum\_{j=1}^{J} \rho\_j^{n+1} \right)^2 - \left( \sum\_{j=1}^{J} \rho\_j^n \right)^2 \right].\tag{63}$$

Precisely, as in [10], we use the discrete scheme (48), the CFL condition (41) that ensures 0 <sup>&</sup>lt; *<sup>r</sup><sup>j</sup>* <sup>≤</sup> 1 and the convexity *<sup>z</sup>* <sup>→</sup> *<sup>z</sup>*<sup>2</sup> to estimate for all *<sup>i</sup>* <sup>=</sup> 1, . . . , *<sup>J</sup>* and *<sup>n</sup>* <sup>≥</sup> <sup>0</sup>

$$(\rho\_i^{n+1})^2 = [(1 - r^n)\rho\_i^n + r^n\rho\_{i-1}^n]^2 \le (1 - r^n)(\rho\_i^n)^2 + r^n(\rho\_j^n)^2. \tag{64}$$

Then, we obtain the discrete counterpart to the integration by parts formula

$$\frac{\mathcal{L}^{n+1} - \mathcal{L}^n}{\Delta t} \le \lambda\_W^n \left( \sum\_{j=1}^J (\rho\_{j-1}^n)^2 e^{-\beta x\_{j-1}} e^{-\beta \Delta x} - \sum\_{j=1}^J (\rho\_j^n)^2 e^{-\beta x\_j} \right) \tag{65}$$

$$+\frac{a(\Delta x)^2}{\Delta t} \left( \left(\sum\_{j=1}^{l} \rho\_j^n - r^n \rho\_j^n + r^n \rho\_0^n \right)^2 - \left(\sum\_{j=1}^{l} \rho\_j^n \right)^2 \right) \tag{66}$$

$$\lambda\_{\text{all } n-\mathcal{G}\Delta x} \left( \mathbf{1}\_{\{\Delta x \to \Pi \ge 2\}} - \mathbf{1}\_{\{\Delta x \to \Pi \ge 2\}} \right) \quad \lambda\_{\text{W } \parallel \to \Pi \ge 2}^{\text{H}} \tag{67}$$

$$\mathcal{L} = \lambda\_W^n \varepsilon^{-\beta \Delta x} \left( \frac{1}{\Delta x} ||\vec{\rho}^n||\_{\beta}^2 - \varepsilon^{-\beta} (\rho\_f^n)^2 + (\rho\_0^n)^2 \right) - \frac{\lambda\_W^n}{\Delta x} ||\vec{\rho}^n||\_{\beta}^2 \tag{67}$$

$$\left( + \frac{a}{\Delta t} \left( \left( \mathcal{W}^n - r^n \Delta \mathbf{x} \rho\_I^n + r^n \Delta \mathbf{x} \rho\_0^n \right)^2 - \left( \mathcal{W}^n \right)^2 \right) \tag{68}$$

$$=\frac{e^{-\beta\Delta x} - 1}{\Delta x} \lambda\_W^n ||\vec{\rho}||\_\beta^2 + A\_1^n. \tag{69}$$

Here, the last line is as in ([10], (4.29)) except that the boundary term *ρ<sup>n</sup>* <sup>0</sup> that is part of *An* <sup>1</sup> includes now the disturbance *<sup>d</sup>n*. We split the boundary condition at *<sup>x</sup>* = 0 as

$$
\rho\_0^n = \overline{\rho}\_0^n + kd^n,\ \overline{\rho}\_0^n := k\rho\_J^n + (1 - k)\theta W^n \tag{70}
$$

and obtain

$$\mathcal{A}\_1^n = \lambda\_W^n e^{-\beta \Delta x} \left( (\overline{\rho}\_0^n + kd^n)^2 - e^{-\beta} (\rho\_I^n)^2 \right) \tag{71}$$

$$+a\lambda\_W^n \left(r^n \Delta x \left(\overline{\rho}\_0^n + kd^n - \rho\_I^n\right)^2 + 2\left(\overline{\rho}\_0^n + kd^n - \rho\_I^n\right)\mathcal{W}^n\right).\tag{72}$$

As in the continuous case, we estimate

$$(\overline{\rho}\_0^{\mathfrak{n}} + kd^{\mathfrak{n}})^2 \le (1 + \beta^2)(\overline{\rho}\_0^{\mathfrak{n}})^2 + (1 + \frac{1}{\beta^2})(kd^{\mathfrak{n}})^2 \tag{73}$$

and similarly for the term 2*kdnWn* and *ρn* <sup>0</sup> <sup>+</sup> *kd<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup><sup>n</sup> J* 2 , respectively. Hence, we obtain

$$\begin{split} &A\_{1}^{\
u} \leq A\_{2}^{\
u} + A\_{3}^{\
u} + A\_{4}^{\
u}, \\ &A\_{2}^{\
u} := \lambda\_{W}^{\
u}e^{-\beta \Delta x} \Big( (\overline{\rho}\_{0}^{\
u})^{2} - e^{-\beta}(\rho\_{f}^{\
u})^{2} \Big) + a\lambda\_{W}^{\
u} \Big( r^{\nu}\Delta x \left( \overline{\rho}\_{0}^{\
u} - \rho\_{f}^{\
u} \right)^{2} + 2\left( \overline{\rho}\_{0}^{\
u} - \rho\_{f}^{\
u} \right) \mathsf{W}^{\
u} \Big), \\ &A\_{3}^{\
u} := \beta^{2}\lambda\_{W}^{\
u}e^{-\beta \Delta x} (\overline{\rho}\_{0}^{\
u})^{2} + \beta^{2}||a||\lambda\_{W}^{\
u}r^{\nu}\Delta x \Big( \overline{\rho}\_{0}^{\
u} - \rho\_{f}^{\
u} \Big)^{2} + \beta^{2}\lambda\_{W}^{\
u} (\mathcal{W}^{\
u})^{2}, \\ &A\_{4}^{\
u} := \lambda\_{W}^{\
u}e^{-\beta \Delta x} (1 + \frac{1}{\beta^{2}})(kd^{\nu})^{2} + ||a||\lambda\_{W}^{\
u}r^{\nu}\Delta x (1 + \frac{1}{\beta^{2}})(kd^{\nu})^{2} + \frac{1}{\beta^{2}}||a||\lambda\_{W}^{\
u}(kd^{\nu})^{2}. \end{split}$$

Next, we estimate *A<sup>n</sup>* <sup>3</sup> and *<sup>A</sup><sup>n</sup>* <sup>4</sup> . Here, we use that *<sup>a</sup>* defined by (51), *<sup>λ</sup><sup>n</sup> <sup>W</sup>*, are bounded by

$$||a|| \le \frac{\beta}{\varepsilon^{\beta} - 1} \le 1, \ \lambda\_W^n \le \frac{A}{B}, \text{ and } r^n \le 1,$$

respectively, and that *rn*, Δ*x* and *θ* are all bounded by one. Additionally, we have a bound on (*Wn*)<sup>2</sup> due to (56) and *<sup>β</sup>* <sup>≤</sup> 1 by (60) such that

$$\left(\left(\overline{\rho}\_0^n\right)^2 \le 2\left(\rho\_l^n\right)^2 + 2\left(\mathcal{W}^n\right)^2 \le \left(2 + 2(1+3)\right) \|\vec{\rho}^n\|\_{\dot{\mathcal{B}}'}^2 \text{ and } \left(\overline{\rho}\_0^n - \rho\_l^n\right)^2 \le 22 \|\vec{\rho}^n\|\_{\dot{\mathcal{B}}}^2.$$

Hence, there exists a constant *C* > 0 such that *A<sup>n</sup>* <sup>3</sup> and *<sup>A</sup><sup>n</sup>* <sup>4</sup> are estimated by

$$A\_3^n \le \mathbb{C} \beta^2 \lambda\_W^n ||\bar{\rho}^n||\_{\beta}^2 \text{ and } A\_4^n \le (1 + \frac{3}{\beta^2}) \lambda\_W^n (d^n)^2. \tag{74}$$

A crucial estimate is now performed on *A<sup>n</sup>* <sup>2</sup> . Due to the previous estimates as well as due to Equation (70) we have that *A<sup>n</sup>* <sup>2</sup> coincides with ([10], *A*2) and hence we may use the same estimates ([10], (4.31), (4.34)) to obtain

$$\begin{split} &A\_2^n \leq \lambda\_W^n \theta^2 (\mathcal{W}^n)^2 \left( (k - \varepsilon^{-\beta})(2 - \varepsilon^{\beta \Delta x}) + \varepsilon^{-\beta \Delta x} (1 - k) \right) \\ &\leq \lambda\_W^n \theta^2 (1 + 3\beta) \|\bar{\rho}^n\|\_{\beta}^2 \left( (k - \varepsilon^{-\beta})(2 - \varepsilon^{\beta \Delta x}) + \varepsilon^{-\beta \Delta x} (1 - k) \right) \\ &\leq \lambda\_W^n \theta^2 (1 + 3\beta) \|\bar{\rho}^n\|\_{\beta}^2 \beta . \end{split}$$

The previous estimates allow to estimate the discrete temporal derivative of L in Equation (65) for *n* ≥ 0 :

$$\begin{split} \frac{\mathcal{L}^{n+1} - \mathcal{L}^{n}}{\Delta t} &\leq \frac{e^{-\beta \Delta x} - 1}{\Delta x} \lambda\_{\mathcal{W}}^{n} \|\vec{\rho}\|\_{\beta}^{2} + A\_{2}^{n} + A\_{3}^{n} + A\_{4}^{n} \\ &\leq \left( (-\beta + \frac{\Delta x}{2}\beta^{2}) + \theta^{2}(\beta + 3\beta^{2}) + \mathsf{C}\beta^{2} \right) \lambda\_{\mathcal{W}}^{n} \|\vec{\rho}\|\_{\beta}^{2} + (1 + \frac{3}{\beta^{2}})\lambda\_{\mathcal{W}}^{n}(d^{n})^{2}, \\ &\leq -\beta \left( 1 - \frac{\beta}{2} - \theta^{2}(1 + 3\beta) - \mathsf{C}\beta \right) \lambda\_{\mathcal{W}}^{n} \|\vec{\rho}\|\_{\beta}^{2} + (1 + \frac{3}{\beta^{2}})\lambda\_{\mathcal{W}}^{n}(d^{n})^{2}, \\ &\leq -\beta \frac{1 - \theta^{2}}{2} \lambda\_{\mathcal{W}}^{n} \|\vec{\rho}^{n}\|\_{\beta}^{2} + (1 + \frac{3}{\beta^{2}})\lambda\_{\mathcal{W}}^{n}(d^{n})^{2}. \end{split}$$

The last inequality holds true provided that 0 < *β* is sufficiently small such that (60) and

$$
\beta \le \frac{1-\theta^2}{7+2C}.\tag{75}
$$

hold true.

Finally, it remains to show that *λ<sup>n</sup> <sup>W</sup>* is bounded from below by a strictly positive number. This is equivalent to show that *W<sup>n</sup>* is bounded from above and similar to the continuous analysis. Note that due to *ρn*<sup>2</sup> *<sup>β</sup>* ≥ L*<sup>n</sup>* and therefore

$$\frac{\mathcal{L}^{n+1} - \mathcal{L}^n}{\Delta t} \le -b\_1 \lambda\_W^n \mathcal{L}^n + b\_2 (d^n)^2,\tag{76}$$

$$b\_1 := b\_1(\beta) = \beta \frac{1 - \theta^2}{2}, \ b\_2 := b\_2(\beta) := 1 + \frac{3}{\beta^2}. \tag{77}$$

Solving recursively (76), we obtain with ∏*<sup>n</sup> <sup>r</sup>*=*n*+1(·) = 1

$$\mathcal{L}^{n+1} \le \prod\_{m=0}^{n} \left( 1 - \Delta t b\_1 \lambda\_W^m \right) \mathcal{L}^0 + b\_2 \Delta t \sum\_{m=0}^n \lambda\_W^m (d^m)^2 \prod\_{r=m+1}^n \left( 1 - b\_1 \Delta t \lambda\_W^r \right) \tag{78}$$

$$0 \le \exp\left(-b\_1 \Delta t \sum\_{m=0}^n \lambda\_W^m\right) \mathcal{L}^0 + \max\_{0 \le s \le n} (d^s)^2 b\_2 \Delta t \sum\_{m=0}^n \lambda\_W^m \prod\_{r=m+1}^n (1 - b\_1 \Delta t \lambda\_W^r). \tag{79}$$

The following equalities show that the last term of the previous sum can be bounded independent of *λ<sup>n</sup> <sup>W</sup>* :

$$-\frac{1}{b\_1 \Delta t} \sum\_{m=0}^{n} -b\_1 \Delta t \lambda\_W^m \prod\_{r=m+1}^{n} \left(1 - b\_1 \Delta t \lambda\_W^r \right) \tag{80}$$

$$=-\frac{1}{b\_1 \Delta t} \sum\_{m=0}^{n} \left(1 - b\_1 \Delta t \lambda\_W^m - 1\right) \prod\_{r=m+1}^{n} \left(1 - b\_1 \Delta t \lambda\_W^r\right) \tag{81}$$

$$=-\frac{1}{b\_1\Delta t}\sum\_{m=0}^{n}\left(\prod\_{r=m}^{n}(1-b\_1\Delta t\lambda\_W^r)-\prod\_{r=m+1}^{n}(1-b\_1\Delta t\lambda\_W^r)\right)\tag{82}$$

$$=-\frac{1}{b\_1 \Delta t} \prod\_{r=0}^{n} (1 - b\_1 \Delta t \lambda\_W^r) - 1 \tag{83}$$

$$=\frac{1}{b\_1\Delta t}\left(1-\prod\_{r=0}^{n}(1-b\_1\Delta t\lambda\_W^r)\right).\tag{84}$$

Note that since *b*<sup>1</sup> < 1 and Δ*t* fulfills the CFL condition (41) we have that for all *n* ≥ 0,

$$b\_1 \Delta t \lambda\_W^n \le b\_1 \Delta x \le 1$$

and therefore 1 <sup>−</sup> *<sup>b</sup>*1Δ*tλ<sup>r</sup> <sup>W</sup>* is non–negative. In addition, by definition <sup>−</sup>*ρ*<sup>∗</sup> <sup>≤</sup> *<sup>W</sup><sup>n</sup>* and due to (59) and (60), we have

$$\|(\mathcal{W}^n)^2 \le 4\|\bar{\rho}^n\|\_{\beta}^2 \le \frac{8}{1-\theta} \mathcal{L}^n. \tag{85}$$

Combing the previous estimate, (79) and (84), we obtain

$$\begin{split} \frac{1-\theta}{8} (\mathcal{W}^n)^2 \leq \mathcal{L}^n \leq \exp\left(-b\_1 \Delta t \sum\_{m=0}^n \lambda\_W^m\right) \mathcal{L}^0 + \max\_{0 \leq s \leq n} (d^s)^2 \frac{b\_2}{b\_1} \left(1 - \prod\_{r=0}^n (1 - b\_1 \Delta t \lambda\_W^r)\right), \\ \leq \mathcal{L}^0 + \max\_{0 \leq s \leq n} (d^s)^2 \frac{b\_2}{b\_1} \leq \|\vec{\rho}^0\|\_{\beta}^2 + \max\_{0 \leq s \leq n} (d^s)^2 \frac{2(\beta^2 + 3)}{\beta^3 (1 - \theta^2)}. \end{split}$$

Since the norm of *ρ*0*l*<sup>2</sup> is bounded according to assumption (49), this shows that *<sup>W</sup><sup>n</sup>* is bounded from above by constant *c* = *c*(*R*, *θ*, *β*, *d*-<sup>∞</sup> ). This implies that there exists a constant 0 > *σ*<sup>2</sup> = *σ*2(*R*, *ρ*∗, *θ*, *β*, *d*-<sup>∞</sup> ) such that

$$
\sigma\_2 \le \lambda\_W^n \le \frac{A}{B'} \quad \forall n \ge 0. \tag{86}
$$

Note that the norm *d*-<sup>∞</sup> can be bounded by *D* by assumption.

In the last step we now use the bound on *λ<sup>n</sup> <sup>W</sup>* to obtain the exponential decay of <sup>L</sup>*n*. Using (86) in estimate (76), we obtain for all *n* ≥ 0

$$\frac{\mathcal{L}^{n+1} - \mathcal{L}^n}{\Delta t} \le -\beta \frac{1 - \theta^2}{2} \sigma\_2 \mathcal{L}^n + (1 + \frac{3}{\beta^2}) \frac{A}{B} (d^n)^2 = -\eta \mathcal{L}^n + \nu (d^n)^2,\tag{87}$$

and *η* := *β*1−*θ*<sup>2</sup> <sup>2</sup> *<sup>σ</sup>*<sup>2</sup> <sup>&</sup>gt; 0 and *<sup>ν</sup>* := (<sup>1</sup> <sup>+</sup> <sup>3</sup> *<sup>β</sup>*<sup>2</sup> ) *<sup>A</sup> <sup>B</sup>* . This concludes the proof in the discrete case.

#### **4. Numerical Simulations**

In this section, we illustrate the theoretical results in Sections 2 and 3 by providing numerical computations of ISS of a scalar conservation law with nonlocal velocity and boundary measurement error. We apply the discretization introduced in the previous section and we chose *A* = *B* = 1 which leads to the velocity function

$$
\lambda\left(\mathcal{W}(t)\right) = \frac{1}{1 + \mathcal{W}(t)}, \quad \text{with} \quad \mathcal{W}(t) = \int\_0^1 \rho(t, \mathbf{x})d\mathbf{x}.\tag{88}
$$

As measurement error, we consider

$$d(t) = 2.4 \times 10^{-3} \sin(t), \quad t \in (0, \infty). \tag{89}$$

#### *4.1. Example 1*

In this example, we consider the equilibrium solution *ρ*∗ = 0 and an initial condition *ρ*0(*x*) = 1 + sin(2*πx*) for *x* ∈ [0, 1]. In the figures following, we show the decay of the discrete *<sup>L</sup>*2-error *ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*∗-<sup>2</sup> of the system Equation (8) for two given CFL conditions 0.5 and 0.9 in Table 1, respectively. Here, CFL=*a* ≤ 1 is a stronger condition than (41) and it implies that Δ*t* is such that

$$
\lambda\_W^n \frac{\Delta t}{\Delta x} \le a < 1, \ n \ge 0. \tag{90}
$$

A value CFL≤ 1 improves the stability of the scheme at the expense of additional artificial diffusion of the scheme. Due to the artificial diffusion and the disturbance we observe only approximately the excepted first-order convergence with respect to Δ*x* of the Upwind scheme. In Figure 1, the convergence of the solution of the system in Equation (8) to the equilibrium for different values of *k* is shown. As expected we observe that as *k* increases the rate of decay of the Lyapunov function decreases. Furthermore, we observe that below the mesh accuracy of Δ*x* = 10−<sup>3</sup> no further decay is observed.

(a) CFL = 0.5. *<sup>J</sup> ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*∗-<sup>2</sup> order 100 1.9171 e-05 – 200 1.1899 e-05 0.6881 e+00 400 6.9631 e-06 0.7730 e+00 800 3.7638 e-06 0.8875 e+00 1600 1.5902 e-06 1.2430 e+00 (b) CFL = 0.9. *<sup>J</sup> ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*∗-<sup>2</sup> order 100 1.3831 e-05 – 200 8.1304 e-06 0.7665 e+00 400 4.8604 e-06 0.7423 e+00 800 2.8262 e-06 0.7822 e+00 1600 1.1624 e-06 1.2818 e+00

**Table 1.** Comparison of *ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*∗<sup>2</sup> -<sup>2</sup> for different number of grid points *J* with *ρ*∗ = 0, *k* = 0.3 and *T* = 10.

**Figure 1.** Comparison of log-scale of *ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*<sup>∗</sup> *eL*<sup>2</sup> <sup>Δ</sup>*<sup>x</sup>* with Courant–Friedrichs–Lewy condition (CFL) = 0.75 and *ρ*∗ = 0.

#### *4.2. Example 2*

We repeat the previous experiment for a non-zero steady state, i.e., we choose *ρ*∗ = 1 and as initial condition *ρ*0(*x*) = 2 + 2 sin(2*πx*) *x* ∈ [0, 1]. We show similar results as above for the system in Equation (8) which are presented in Table 2 and Figure 2.

**Table 2.** Comparison of *ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*∗-<sup>2</sup> of the solution for number of grids *J* with *ρ*∗ = 1, *k* = 0.3 and *T* = 20.


**Figure 2.** Comparison of log-scale of *ρ<sup>n</sup>* <sup>−</sup> *<sup>ρ</sup>*<sup>∗</sup> *e*-<sup>2</sup> with CFL = 0.75 and *ρ*∗ = 1.

#### **5. Conclusions and Outlook**

This paper considered input-to-state stability (ISS) for a scalar conservation law with nonlocal velocity and boundary measurement error. An ISS-Lyapunov function is employed to investigate conditions for ISS of an equilibrium for the scalar conservation law with nonlocal velocity and measurement error. Numerical study of a decay of ISS-Lyapunov function is analyzed. Finally, numerical simulations illustrate the theoretical results.

Possible extensions might be to consider also ISS with respect to the *L*2-norm in time in the continuous and discrete case.

A drawback of Theorem 1 is the fact that the system might not have a solution a priori. As stated in Remark 1, it might be possible to extend results [40–42] to obtain a continuous in time and *L*2-space solution for the presented problem. This is subject of future work.

**Author Contributions:** S.G., M.H. and G.W. contributed equally to the derivation, formal analysis, writing of draft and revision as well editing. M.H. and S.G. acquired funding for this project through the German Research Foundation (DFG). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by DFG under grant number HE5386/18-1, HE5386/19-1 and GO1920/10-1.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Axioms* Editorial Office E-mail: axioms@mdpi.com www.mdpi.com/journal/axioms

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com