**An Explicit Meshless Point Collocation Solver for Incompressible Navier-Stokes Equations**

#### **George C. Bourantas 1,\*, Benjamin F. Zwick 1, Grand R. Joldes 1, Vassilios C. Loukopoulos 2, Angus C. R. Tavner 1, Adam Wittek <sup>1</sup> and Karol Miller <sup>1</sup>**


Received: 8 July 2019; Accepted: 28 August 2019; Published: 3 September 2019

**Abstract:** We present a strong form, meshless point collocation explicit solver for the numerical solution of the transient, incompressible, viscous Navier-Stokes (N-S) equations in two dimensions. We numerically solve the governing flow equations in their stream function-vorticity formulation. We use a uniform Cartesian embedded grid to represent the flow domain. We discretize the governing equations using the Meshless Point Collocation (MPC) method. We compute the spatial derivatives that appear in the governing flow equations, using a novel interpolation meshless scheme, the Discretization Corrected Particle Strength Exchange (DC PSE). We verify the accuracy of the numerical scheme for commonly used benchmark problems including lid-driven cavity flow, flow over a backward-facing step and unbounded flow past a cylinder. We have examined the applicability of the proposed scheme by considering flow cases with complex geometries, such as flow in a duct with cylindrical obstacles, flow in a bifurcated geometry, and flow past complex-shaped obstacles. Our method offers high accuracy and excellent computational efficiency as demonstrated by the verification examples, while maintaining a stable time step comparable to that used in unconditionally stable implicit methods. We estimate the stable time step using the Gershgorin circle theorem. The stable time step can be increased through the increase of the support domain of the weight function used in the DC PSE method.

**Keywords:** transient incompressible Navier-Stokes; meshless point collocation method; stream function-vorticity formulation; strong form; explicit time integration

#### **1. Introduction**

In this paper, we describe the development of a strong form meshless point collocation method for numerically solving the two-dimensional, non-stationary, incompressible Navier-Stokes (N-S) equations in their stream function-vorticity formulation. Our method relies on the ability of meshless methods to accurately compute spatial derivatives, and on their flexibility to solve partial differential equations (PDEs) in complex geometries. The proposed numerical method can efficiently handle uniform Cartesian point clouds embedded in a complex geometry, and/or irregular sets of nodes that directly discretize the flow domain. To compute the spatial derivatives that appear in the governing flow equations, we apply an interpolation meshless scheme, which computes spatial derivatives on both Cartesian embedded grids and irregular nodal distributions.

The mathematical formulation of the N-S equations, depending on the choice of dependent variables, can be classified as (i) primitive variables, (ii) stream function-vorticity, and (iii) velocity-vorticity. The majority of numerical methods are expressed in the primitive variables (velocity-pressure) formulation [1–3]. Although the primitive variables formulation is widely employed, difficulties occur when dealing with pressure boundary conditions, and coupling of the velocity and pressure fields is challenging. The vorticity-vector-potential [4–6] and velocity-vorticity [7] formulations have a clear advantage over the primitive variables formulation, because pressure does not appear in the field equations, and the difficulty of determining the pressure boundary conditions in incompressible flows is avoided [8]. The extension to 3D flow is also straightforward, although the number of field variables increases from three to six. For 2D cases, stream function-vorticity is the preferable method for the numerical solution of the incompressible N-S equations, since the continuity equation (conservation of mass) is inherently fulfilled [4].

The non-stationary N-S equations can be numerically solved using explicit, explicit-implicit or implicit time integration schemes in the framework of finite element (FE), finite volume (FV), finite difference (FD) or spectral methods. Fletcher [9] provides an overview of finite element and finite difference techniques for incompressible fluid flow, while Gresho and Sani [10] review the field, focusing on finite elements. Standard texts on computational fluid dynamics (CFD) using finite difference and finite volume methods can be found in [11–13], and finite elements in [14–16].

Substantial effort has been invested in reducing or eliminating problems related to mesh generation. Attempts to eliminate the human and computational effort related to mesh generation have resulted in the development of meshless methods (MMs). Various MMs, developed for computational fluid dynamics (CFD), have been used to solve the incompressible N-S equations. Of particular interest is the Meshless Local Petrov-Galerkin (MLPG) method, which has been used to solve the N-S equations in their primitive variables [17], velocity-vorticity [18,19] and stream function-vorticity [20] formulations. The moving least squares (MLS) approximation method has been used to compute shape functions and derivatives. Additionally, radial basis functions (RBFs) are able to interpolate shape function and its derivatives [21]. The Local Boundary Integral Method (LBIE) has been successfully applied to solve incompressible, laminar flow cases [22], while the multi-region Local Boundary Integral Equation (LBIE) has been combined with the radial basis functions (RBF) interpolation method [23]. The mesh-free Local Radial Basis Function Collocation Method (LRBFCM) for transient heat transfer and fluid dynamics problems was presented in [24], while others [25,26] reported the use of an indirect/integrated radial-basis-function network (IRBFN) method to solve transient partial differential equations (PDEs) governing fluid flow problems. Velocity-vorticity flow equations were solved in [27–29] using a point collocation method, along with a velocity-correction method, while a meshless point collocation with both velocity and pressure correction methods [30] was applied in several fluid mechanics problems. Eulerian based meshless methods, in contrast to Lagrangian based meshless methods, such as Smoothed Particle Hydrodynamics (SPH), have been mainly applied to regular geometries (square, circles, tubes), where generating a computational grid is relatively straightforward.

In this work, we focus on the Eulerian formulation of the N-S equations. A meshless point collocation method is used for the numerical solution of the non-stationary, incompressible, laminar 2D N-S equations in their stream function-vorticity formulation in complex geometries. The stream function-vorticity formulation is an effective way to describe 2D incompressible flow, since the incompressibility constraint for the velocity is automatically satisfied and the pressure variable is eliminated. The method is generalizable to three dimensions via vector potential-vorticity formulation [4–6]. The vorticity boundary conditions are imposed in a local form. We treat the transient term using a Euler explicit time integration scheme. Since the explicit integration scheme is conditionally stable, we compute the critical time step that ensures stable numerical results using the Gershgorin circle theorem [31,32]. This circumvents the need to calculate eigenvalues, which would be time consuming. The requirement for a stable time-step is counterbalanced by the low computational cost of each step, since only one Poisson-type equation is solved for the stream function equation and the vorticity field is updated using the explicit solver (in three dimensions three Poisson-type equations need to be solved [4–6]). The proposed scheme applies to both uniform embedded and irregular nodal distributions and can be used with ease in complex geometries.

This paper is structured as follows: in Section 2, we (i) present the governing equations (ii) present a numerical scheme for the stream function-vorticity formulation based on the meshless point collocation method (iii) briefly describe the Discretization Correction Particle Strength Exchange (DC PSE) differentiation method and (iv) discuss issues related to the accuracy and the computational cost of the proposed scheme. In Section 3, we highlight the accuracy of the scheme through benchmark problems, while in Section 4, we demonstrate the accuracy, efficiency and the ease of use of the proposed scheme using numerical examples. Finally, in Section 5 we present our conclusions and preliminary results for 3D lid-driven cavity flow using the generalization of the proposed scheme via vorticity-vector potential formulation.

#### **2. Methods Governing Equations and Solution Procedure**

#### *2.1. Stream Function-Vorticity Formulation of Navier-Stokes Equations*

The governing equations for the non-stationary, viscous, laminar flow of an incompressible fluid are based on conservation of mass and momentum. The non-dimensional form of the stream function-vorticity (ψ-ω) formulation is:

$$
\frac{
\partial \boldsymbol{\omega}
}{
\partial \boldsymbol{\theta}
} + \frac{
\partial \boldsymbol{\psi}
}{
\partial \boldsymbol{y}
} \frac{
\partial \boldsymbol{\omega}
}{
\partial \boldsymbol{\chi}
} - \frac{
\partial \boldsymbol{\psi}
}{
\partial \boldsymbol{\chi}
} \frac{
\partial \boldsymbol{\omega}
}{
\partial \boldsymbol{y}
} = \frac{1}{R\boldsymbol{\omega}} \boldsymbol{\nabla}^2 \boldsymbol{\omega} \tag{1}
$$

$$
\nabla^2 \psi = -\omega \tag{2}
$$

where *Re* = *UL <sup>v</sup>* is the Reynolds number, with *U* and *L* being a characteristic velocity and length, respectively, and *v* the kinematic viscosity of the fluid. Taking the spatial derivative of the stream function ψ yields the velocity components *u* and *v*, as follows:

$$
\mu = \frac{\partial \psi}{\partial y} \tag{3}
$$

$$
v = -\frac{\partial \psi}{\partial x}\tag{4}$$

We require a solution to Equations (1) and (2), defined in the spatial domain Ω with boundary ∂Ω, given the initial conditions at time *t* = 0

$$
\mu = \mu\_0 \tag{5}
$$

$$
\omega = \omega\_0 = \nabla \times \mathfrak{u}\_0.\tag{6}
$$

and boundary conditions

$$
\mu = \mu\_{\partial \Omega} \tag{7}
$$

$$
\omega = (\nabla \times \mathfrak{u}\_{\partial \Omega}) \cdot \hat{\mathfrak{k}} \tag{8}
$$

with *k*ˆ being the unit vector in the direction of the *z* axis of a three-dimensional Cartesian coordinate system. We solve the governing equations using an explicit Euler time integration scheme for the transient terms and a strong form meshless collocation method to discretize the flow equations.

#### *2.2. Temporal Discretization Using Explicit Euler Method*

We treat the viscous term explicitly [33] to obtain the following difference formula in time:

$$\frac{\partial \boldsymbol{\nu}^{n+1} - \boldsymbol{\omega}^{n}}{\Delta t} + \frac{\partial \boldsymbol{\psi}^{n}}{\partial \boldsymbol{y}} \frac{\partial \boldsymbol{\omega}^{n}}{\partial \mathbf{x}} - \frac{\partial \boldsymbol{\psi}^{n}}{\partial \mathbf{x}} \frac{\partial \boldsymbol{\omega}^{n}}{\partial \boldsymbol{y}} = \frac{1}{R\boldsymbol{\varepsilon}} \boldsymbol{\nabla}^{2} \boldsymbol{\omega}^{n}. \tag{9}$$

$$
\nabla^2 \psi^{n+1} = -\omega^{n+1} \tag{10}
$$

using the notation ω*n*+<sup>1</sup> = ω *t n*+1 and ψ*n*+<sup>1</sup> = ψ *t n*+1 , where *t <sup>n</sup>*+<sup>1</sup> = *t <sup>n</sup>* + δ*t*. The proposed scheme computes the stream function (along with velocity components) and vorticity field by applying a three-step marching algorithmic procedure:

Step 1 Update vorticity at the interior grid points

$$\boldsymbol{\alpha}^{\rm n+1} = \boldsymbol{\alpha}^{\rm n} - \Delta t \frac{\partial \boldsymbol{\psi}^{\rm n}}{\partial \boldsymbol{y}} \frac{\partial \boldsymbol{\omega}^{\rm n}}{\partial \mathbf{x}} + \Delta t \frac{\partial \boldsymbol{\psi}^{\rm n}}{\partial \mathbf{x}} \frac{\partial \boldsymbol{\omega}^{\rm n}}{\partial \mathbf{y}} + \Delta t \frac{1}{\text{Re}} \left( \frac{\partial^2 \boldsymbol{\omega}^{\rm n}}{\partial \mathbf{x}^2} + \frac{\partial^2 \boldsymbol{\omega}^{\rm n}}{\partial \mathbf{y}^2} \right) \tag{11}$$

Step 2 Compute stream function by solving Poisson type problem

$$\frac{\partial^2 \psi^{n+1}}{\partial x^2} + \frac{\partial^2 \psi^{n+1}}{\partial y^2} = -\omega^{n+1} \tag{12}$$

Step 3 Update velocity *u*(*n*+1)

$$
\mu^{n+1} = \frac{\partial \psi^{n+1}}{\partial y} \tag{13}
$$

$$
\sigma^{n+1} = -\frac{\partial \psi^{n+1}}{\partial \mathbf{x}}\tag{14}
$$

#### *2.3. Discretization Corrected Particle Strength Exchange (DC PSE) Method*

The accuracy of the proposed scheme relies on the accurate computation of spatial derivatives for the unknown field functions in Equations (11)–(14). In the present study, we use the Discretization Corrected Particle Strength Exchange (DC PSE) method, which, to our knowledge, is one of the most accurate meshless methods for computing spatial derivatives. The DC PSE method originated as a Lagrangian particle-based numerical method [34] and is based on Particle Strength Exchange (PSE) operators. To solve PDEs using the DC PSE meshless method, the authors in [35] reformulated the Lagrangian DC PSE method to work in the Eulerian framework. For completion, the PSE operators and the DC PSE method are described below.

#### 2.3.1. Particle Strength Exchange (PSE) Operators

The Particle Strength Exchange (PSE) method utilizes kernels to approximate differential operators that conserve particle strength in particle-particle interactions. The PSE method was proposed by Degond and Mas-Gallic [36] for diffusion and convection-diffusion problems. Eldredge et al. [37] then developed a framework for approximating arbitrary derivatives.

In general, a PSE operator *Q*<sup>β</sup> *f*(*x*) for approximating the derivative *D*<sup>β</sup> *f*(*x*) has the form

$$Q^{\mathfrak{g}}f(\mathbf{x}) = \frac{1}{\varepsilon^{|\mathfrak{g}|}} \int (f(y) \mp f(\mathbf{x})) \eta^{\mathfrak{g}}\_{\varepsilon}(\mathbf{x} - y) dy \tag{15}$$

with η β <sup>ε</sup> = ηβ(*x*/ε)/ε*<sup>d</sup>* a scaled kernel of radius ε, and *d* the number of dimensions. The sign in Equation (15) is negative when |β| is even and positive when |β| is odd, with β a multi-index [34]. Now, let β = (β1, β2, ... , β*n*), where β*i*, *i* = 1, 2, ... , *n* is a non-negative integer. Then the partial differential operator *D*<sup>β</sup> can be expressed as *D*<sup>β</sup> = <sup>∂</sup>|β<sup>|</sup> ∂*x* β1 <sup>1</sup> ∂*x* β2 <sup>2</sup> ...∂*x* β*n n* . The challenge is to find a kernel η β <sup>ε</sup> that leads to good approximations for *D*β. To find such kernels for arbitrary derivatives we adopt the idea in Eldredge et al. [37] and start from the Taylor expansion of a function *f*(*y*) about a point *x*:

$$f(y) = f(\mathbf{x}) + \sum\_{|\mathbf{a}|=1}^{\infty} \frac{1}{\mathbf{a}!} (y-\mathbf{x})^{\mathbf{a}} D^{\mathbf{a}} f(\mathbf{x}).\tag{16}$$

Next, we subtract or add *f*(*x*), depending on whether |β| is odd or even, to both sides and convolute the equation with the unknown kernel η β <sup>ε</sup> . Considering Equation (15), this leads to:

$$\mathbb{L}Q^{\mathcal{J}}f(\mathbf{x}) = \frac{1}{\,\_{\mathcal{E}}^{|\mathcal{J}|}} \sum\_{|\mathbf{a}|=1}^{\infty} \frac{1}{\mathbf{a}!} D^{\mathbf{a}}f(\mathbf{x}) \int (\mathbf{y}-\mathbf{x})^{\mathbf{a}} \eta\_{\mathcal{E}}^{\mathcal{J}}(\mathbf{x}-\mathbf{y}) d\mathbf{y}.\tag{17}$$

Introducing the continuous α-moments

$$M\_{a} = \int (\mathbf{x} - \mathbf{y})^{\mathbf{a}} \eta^{\mathbf{g}}(\mathbf{x} - \mathbf{y}) d\mathbf{y} \ = \int \mathbf{z}^{\mathbf{a}} \eta^{\mathbf{g}}(\mathbf{z}) d\mathbf{z} \tag{18}$$

and isolating the derivatives *D*<sup>β</sup> on the right-hand side of Equation (17) we obtain:

$$Q^{\mathfrak{J}}f(\mathbf{x}) = \frac{(-1)^{|\mathfrak{J}|}}{\mathfrak{J}!}M\_{\mathfrak{I}}D^{\mathfrak{J}}f(\mathbf{x}) + \sum\_{\substack{|\mathfrak{a}| = \ 1 \ \mathfrak{a} \neq \mathbf{0} \\ \mathfrak{a} \neq \mathfrak{J}}}^{\infty} \frac{(-1)^{|\mathfrak{a}|}}{\mathfrak{a}!} \varepsilon^{|\mathfrak{a}| - |\mathfrak{J}|} M\_{\mathfrak{a}}D^{\mathfrak{a}}f(\mathbf{x}). \tag{19}$$

Finally, to approximate *Q*β*f*(*x*) with order of accuracy *r*, the following set of conditions is imposed for the moments *Ma*:

$$M\_{\mathfrak{A}} = \begin{cases} (-1)^{|\mathfrak{A}|} \beta! & a = |\mathfrak{A}| \\ 0, & a \neq \beta\_{\prime} & 1 \le |a| \le \left| \mathfrak{F} \right| + r - 1. \end{cases} \tag{20}$$

In addition, if we impose

$$\int |z|^{|\mathcal{S}|+r} \Big| \eta^{\mathcal{S}}(z) \Big| dz < \infty \tag{21}$$

the mollification error ε(*x*) = *D*<sup>β</sup> *f*(*x*) is bounded [34]. The procedure to construct a kernel that satisfies the conditions in Equation (16) has been described in [34]. Once the kernel is defined, the operator in Equation (17) can be discretized using a midpoint quadrature over the nodes as

$$\mathcal{Q}\_h^{\mathcal{J}} f(\mathbf{x}) \;= \frac{1}{\varepsilon^{|\mathcal{J}|}} \sum\_{p \in \mathcal{N}(\mathbf{x})} \left( f(\mathbf{x}\_p) \mp f(\mathbf{x}) \right) \eta\_\varepsilon^{\mathcal{J}} (\mathbf{x} - \mathbf{x}\_p) V\_p \; \; \tag{22}$$

where *N*(*x*) is the number of all nodes in a neighborhood around *x*, which can be defined by a cut-off radius *rc*, usually chosen such that *N*(*x*) coincides to a certain level of accuracy with the kernel support, and *Vp* is the volume associated with each particle. Given such a discretization, the discretization error *h*(*x*) <sup>=</sup> *<sup>Q</sup>*β*f*(*x*) <sup>−</sup> *<sup>Q</sup>*<sup>β</sup> *<sup>h</sup> <sup>f</sup>*(*x*) is also bounded [34].

#### 2.3.2. The Discretization Corrected PSE Operators

The Discretization Corrected PSE operators were introduced by Schrader et al. [34] to reduce the discretization error *h*(*x*) in the PSE operator approximation. We can derive the DC PSE approximation from Equation (22), aiming of finding a kernel function which minimizes the difference between this discrete operator and the actual derivative. We can achieve this, by replacing each term *f*(*xp*) in Equation (22) with its Taylor expansion about *x*. This leads to the following expression for the derivative approximation:

$$\mathcal{Q}\_h^{\mathfrak{g}} f(\mathbf{x}) = \frac{(-1)^{|\mathfrak{g}|}}{\mathfrak{P}!} Z\_h^{\mathfrak{g}} D^{\mathfrak{g}} f(\mathbf{x}) + \sum\_{\substack{|\mathfrak{a}| = \ 1 \ \\ a \neq \pm \mathfrak{g}}}^{\infty} \frac{(-1)^{|\mathfrak{a}|}}{a!} \varepsilon^{|\mathfrak{a}| - |\mathfrak{d}|} Z\_h^{\mathfrak{a}} D^{\mathfrak{a}} f(\mathbf{x}) + r\_0 \tag{23}$$

with

$$r\_0 = \begin{cases} 0, & \left| \beta \right| \text{ even} \\ 2e^{-|\beta|} Z\_h^0 f(\mathbf{x}) & \left| \beta \right| \text{ odd} \end{cases} \tag{24}$$

and the discrete moments defined as:

$$Z\_h^a = \frac{1}{\varepsilon^d} \sum\_{p \in N(\mathbf{x})} \left(\frac{\mathbf{x} - \mathbf{x}\_p}{\varepsilon}\right)^a \eta^g \left(\frac{\mathbf{x} - \mathbf{x}\_p}{\varepsilon}\right). \tag{25}$$

Therefore, the set of moment conditions becomes:

$$Z\_h^a = \begin{cases} (-1)^{|\mathcal{S}|} \beta! & a = \beta \\ 0 & a \neq \mathcal{S} \\ < \infty & \text{otherwise} \end{cases} \qquad a\_{\min} \le |a| \le \left| \mathcal{S} \right| + r - 1 \\ \tag{26}$$

for all - - β - - - - 0, where *amin* is one when - - β - - is odd and zero when - - β - - is even. The kernel η<sup>β</sup> is chosen as:

$$\eta^{\mathfrak{g}}(\mathbf{x}, \mathbf{z}) \;= \left( \sum\_{|\mathbf{y}| = |a\_{\min}}^{|\mathfrak{g}| + r - 1} a\_{\mathcal{V}}(\mathbf{x}) z^{\mathcal{V}} \right) e^{-|\mathbf{z}|^{2}} \;= \,^{P}(\mathbf{x}, \mathbf{z}) \mathcal{W}(\mathbf{z}), \; \mathbf{z} \;= \,^{\frac{\mathbf{x} - \mathbf{x}\_{\mathcal{P}}}{\mathcal{E}}}. \tag{27}$$

The kernel function consists of a polynomial correction function *P*(*x*, *z*) and the weight function *W*(*z*). Different types of weight functions can be applied with the possible choices described in [38].

The unknown coefficients *a*γ(*x*) are obtained by requiring the kernel given by Equation (28) to satisfy the conditions in Equation (27), resulting in the following linear system of equations:

$$\sum\_{|\mathbf{y}|=a\_{\min}}^{|\mathcal{S}|+r-1} a\_{\mathcal{Y}}(\mathbf{x}) w\_{\mathbf{a}}^{\mathcal{Y}}(\mathbf{x}) \;= \begin{cases} (-1)^{|\mathcal{S}|} \mathcal{B}! & \mathbf{a} = \mathcal{B} \\ 0 & \mathbf{a} \neq \mathcal{B} \end{cases}, \; \forall \; a\_{\min} \le |\mathcal{S}| + r - 1 \tag{28}$$

with weights

$$w\_a^{\mathcal{Y}}(\mathbf{x}) = \frac{1}{\varepsilon^{|\mathbf{a} + \mathbf{y}| + d}} \sum\_{p \in \mathcal{N}(\mathbf{x})} \left(\mathbf{x} - \mathbf{x}\_p\right)^{\mathbf{a} + \mathcal{Y}} e^{-\left(\frac{|\mathbf{x} - \mathbf{a}\_p|}{\varepsilon}\right)^2} \tag{29}$$

To compute the approximated derivative *Q*<sup>β</sup> *<sup>h</sup> <sup>f</sup>*(*x*) at node *<sup>x</sup>p*, the coefficients are found by solving the linear system of Equation (28) for *x* = *xp*. Given our choice of kernel function, the DC PSE derivative approximation becomes:

$$\mathcal{Q}\_{h}^{\mathcal{B}}f(\mathbf{x}\_{\mathcal{P}}) = \frac{1}{\varepsilon(\mathbf{x}\_{\mathcal{P}})^{\mathcal{B}}} \sum\_{\mathbf{x}\_{\mathcal{P}} \in \mathcal{N}(\mathbf{x}\_{\mathcal{P}})} \left( f(\mathbf{x}\_{\mathcal{P}}) \mp f(\mathbf{x}\_{\mathcal{P}}) \right) p\left(\frac{\mathbf{x} - \mathbf{x}\_{\mathcal{P}}}{\varepsilon(\mathbf{x}\_{\mathcal{P}})} \right) a^{T}(\mathbf{x}\_{\mathcal{P}}) e^{-\left(\frac{|\mathbf{x}\_{\mathcal{P}} - \mathbf{x}\_{\mathcal{Q}}|}{\varepsilon(\mathbf{x}\_{\mathcal{P}})}\right)^{2}} \tag{30}$$

where *p*(*x*) = [*p*1(*x*), *p*2(*x*), *pl*(*x*)] and *a*(*x*) are the vectors of terms in the monomial basis and their coefficients, respectively. By using the DC PSE method, the spatial derivatives *Q*<sup>β</sup> up to second order are given as:

$$\mathbf{Q}^{1,0} = \frac{\partial}{\partial \mathbf{x}'} \quad \mathbf{Q}^{0,1} = \frac{\partial}{\partial y} \tag{31}$$

and

$$\mathbf{Q}^{1,1} \equiv \frac{\partial^2}{\partial \mathbf{x} \partial y} = \frac{\partial^2}{\partial y \partial \mathbf{x}'}, \quad \mathbf{Q}^{2,0} \equiv \frac{\partial^2}{\partial \mathbf{x}^2}, \quad \mathbf{Q}^{0,2} \equiv \frac{\partial^2}{\partial y^2} \tag{32}$$

#### *2.4. Vorticity Boundary Conditions*

In finite difference methods, a number of formulae can be used to impose the vorticity boundary conditions [33,39]. These formulae, generally referred to as local, compute vorticity values on a given node on the boundary from the vorticity values in the interior domain, without involving other nodes on the boundary. The most widely used method is Thom's formula [39]. Unfortunately, applying these formulae in real applications had limited success. As stated in [33], vorticity boundary conditions should be global, in the sense that computing the boundary vorticity values should involve vorticity values at nodes in the interior and the boundary. Herein, to achieve global vorticity boundary conditions, we impose vorticity boundary conditions through strong form meshless differential operators, used to compute spatial derivatives. This way, imposing vorticity boundary conditions is consistent with the rest of the algorithm (described in Section 2.2) and becomes quite straightforward.

DC PSE methods are recognized as one of the most accurate and efficient numerical methods to compute derivatives on an irregularly distributed set (cloud) of nodes [34,35,38]. For the stream function-vorticity N-S Equations, given the velocity field values computed previously by the updated stream function values ψ*n*+<sup>1</sup> (Step 2, Section 2.2), we can compute the updated vorticity values ω*n*+<sup>1</sup> for the entire spatial domain (including boundaries) using the strong form meshless operators for first order spatial derivative as

$$
\omega^{n+1} = \frac{\partial v^{n+1}}{\partial \mathbf{x}} - \frac{\partial u^{n+1}}{\partial y} \tag{33}
$$

We have already computed the updated values ω*n*+<sup>1</sup> at the interior nodes (Step 1 in the Euler explicit solver). The updated vorticity values on the boundary nodes are computed using the updated velocity values and the DC PSE operators (described in Section 2.3) for spatial derivatives (Equation (27)).

#### *2.5. Poisson Solver*

To compute the stream function field values (Equation (2)), we numerically solve a Poisson-type equation. Depending on the flow problem and spatial discretization, a direct or iterative solver is chosen accordingly. Iterative and direct solvers for elliptic type PDEs are well established. Direct solvers are robust and easy to use but can be computationally costly and memory intensive when the systems of equations to be solved are very large (e.g., a few million degrees of freedom). On the other hand, a decisive advantage of iterative solvers is their low memory usage, which is significantly less than a direct solver for the same sized problems. Unfortunately, iterative solvers are less robust than direct solvers and may fail to compute the numerical solution.

It is computationally more efficient to use iterative solvers to solve the Poisson equation when a large number of nodes is used. There are many iterative solvers for Poisson type equations. For example, Gauss-Seidel or successive over relaxation (SOR) algorithms have been widely used. These solvers are accurate but relatively computationally demanding because they need *O* (*N*2) iterations to converge (*N* being the total number of grid points). Multi-grid algorithms have been developed to accelerate convergence and the computational effort has been significantly reduced [40]. For equally-spaced grids, fast Fourier transform (FFT) solvers are the fastest available algorithms for solving Poisson equations [41].

In the present study, we use a direct solver. The discrete Laplace operator defined in the Poisson type equation is accurately discretized using DC PSE method, which performs accurately on uniform and locally refined Cartesian grids. The grid is fixed in time so a LU factorization can be precomputed, which reduces the computational cost significantly. For up to 4 million nodes, the factorization is fast and requires low memory. For larger number of nodes, we utilize iterative solvers, such as conjugate gradient (CG) and biconjugate gradient stabilized (BiCGSTAB), since the linear systems are always positive definite, diagonally dominant and symmetric.

#### *2.6. Critical Time Step*

To compute the critical time step needed to obtain stable results, we rewrite Equation (1) as

$$
\omega^{\eta+1} = \omega^{\eta} + dt \left( \frac{\partial \psi^{\eta}}{\partial \mathbf{x}} \frac{\partial \omega^{\eta}}{\partial \mathbf{y}} - \frac{\partial \psi^{\eta}}{\partial \mathbf{y}} \frac{\partial \omega^{\eta}}{\partial \mathbf{x}} + \frac{1}{Re} \nabla^{2} \omega^{\eta} \right) \tag{34}
$$

which can be written in matrix form as

$$
\omega^{n+1} = [\mathbf{I} + \delta t \mathbf{A}] \omega^n \tag{35}
$$

where <sup>ω</sup>*n*+<sup>1</sup> is a column vector of dimension *<sup>N</sup>* <sup>×</sup> 1 (*<sup>N</sup>* is the number of nodes). The *<sup>N</sup>* <sup>×</sup> *<sup>N</sup>* matrix **<sup>A</sup>** is defined as the DC PSE approximation of the convection and diffusion terms, viz.

$$\mathbf{A} = \mathbf{K} + \mathbf{L} = \frac{\partial \psi^n}{\partial \mathbf{x}} \frac{\partial \omega^n}{\partial y} - \frac{\partial \psi^n}{\partial y} \frac{\partial \omega^n}{\partial \mathbf{x}} + \frac{1}{R\varepsilon} \nabla^2 \omega^n. \tag{36}$$

where

$$\mathbf{K} = \text{diag}\{\mathbf{Q}^{1,0}\boldsymbol{\Psi}^{\boldsymbol{n}}\} \mathbf{Q}^{0,1} - \text{diag}\{\mathbf{Q}^{0,1}\boldsymbol{\Psi}^{\boldsymbol{n}}\} \mathbf{Q}^{1,0} \tag{37}$$

and

$$L = \frac{1}{R\varepsilon} (\mathbf{Q}^{2,0} + \mathbf{Q}^{0,2}) \tag{38}$$

with diag *Q*1,0ψ*<sup>n</sup>* and diag *Q*1,0ψ*<sup>n</sup>* being diagonal matrices of dimension *N* × *N*. The explicit method defined by Equation (34) is stable if

$$|1 + \delta t \lambda\_A| \le 1\tag{39}$$

where λ*<sup>A</sup>* are the eigenvalues of the matrix **A**. The above stability condition requires the eigenvalues λ*<sup>A</sup>* to be either real or negative, leading to

$$
\delta t \le \frac{2}{|\lambda\_A|} \tag{40}
$$

or complex with negative real parts, leading to

$$
\delta t \le \frac{2\text{Re}(\lambda\_A)}{\left|\lambda\_A\right|^2} \tag{41}
$$

When the problem discretization includes a large number of nodes, matrix **A** becomes very large, and calculating the eigenvalues is not practical. The terms of matrix **A** are determined by the Laplacian operator *L*, which consists of second order derivatives, and by the advection operator *K***,** which includes only first order derivatives. The relative weight of the two operators on the structure of matrix **A** is dictated by discretisation and velocity field values (velocity is related to stream function and vector potential field values); a more refined discretization leads to a higher weight of operator *L* (diffusion) and lower influence of operator *K* (convection), while higher Reynolds number and higher velocity field values leads to a higher influence of operator *K***.** When the nodal discretization is not adequately refined, the higher weight of the operator *K* can lead to eigenvalues with a positive real part; in such case explicit time integration is not possible and discretization has to be refined.

For refined discretizations, the resulting matrix is diagonally dominant, having its eigenvalues distributed close to the real axis. We can use the Gershgorin circle theorem [31] to estimate the maximum time step using Equations (36) and (37) with an upper bound of the maximum eigenvalue of matrix **A**. Given the composition of matrix **A**, the upper bound of the maximum eigenvalue can be computed without assembling the matrix:

$$|\lambda\_A| \le \max\_i \left( \sum\_j \left( \left| L\_{ij} \right| + \left| K\_{ij} \right| \right) \right) \tag{42}$$

Equation (42) clearly shows that the eigenvalues of matrix **A** depend on the spatial derivatives of the field variables, computed using the DC PSE method. Therefore, how derivatives are computed reflects on the critical time step. We notice that for the same Reynolds number and the same nodal discretization, the critical time step increases when spatial derivatives are computed using a large number of nodes (more than 40) in the support domain of each node, compared with the critical time step computed using a small number of nodes (around 15). In some cases, this increase in the critical time step can be up to two orders of magnitude, which decreases the computational cost dramatically.

The reason for the increased critical time step is the upwind inherent nature of meshless methods. Meshless methods are locally based methods and use neighboring nodes to compute spatial derivatives. Furthermore, since matrix **A** involves derivatives of the stream function, by computing the critical time step using the Gershgorin circle theorem we ensure that the computation resembles the estimation of the critical time step based on the nodal spacing <sup>∝</sup> <sup>Δ</sup>*x*−<sup>1</sup> and the Courant−Friedrichs−Lewy (CFL) condition <sup>∝</sup> *<sup>u</sup>*ReΔ*x*−<sup>1</sup> . Therefore, by adjusting the number of support domain nodes, we can adjust the critical time step to be comparable to that used by implicit time integration schemes.

#### **3. Algorithm Verification**

To demonstrate the efficiency and accuracy of the proposed numerical scheme, we first apply our method to well-established benchmark problems in computational fluid dynamics, including lid-driven cavity flow, the backward-facing step (BFS), and the unbounded flow past a cylinder.

For all the examples presented here, we examine the accuracy and the efficiency of the proposed scheme for each benchmark problem. Furthermore, we examine the dependence of the critical time step on the grid resolution, Reynolds number and nodes in the support domain. Computations were conducted using an Intel i7 quad core processor with 16 GB RAM.

#### *3.1. Lid-Driven Cavity*

Our first example involves incompressible, non-stationary flow in a square cavity (Figure 1). Despite its geometrical simplicity, the lid-driven cavity flow problem exhibits a complex flow regime, mainly due to the vortices formed in the center and at the corners (bottom left and right, and upper left) of the square domain. We impose no-slip boundary conditions at the bottom and vertical walls (left and right wall), while the top wall slides with unit velocity (*U* = 1), generating the interior viscous flow.

**Figure 1.** Geometry and boundary conditions for the lid-driven cavity flow problem.

In most numerical studies on the lid-driven cavity flow problem, the flow regime reaches a steady state solution for Reynolds (*Re*) numbers lower than *Re* = 10,000. In our study, we report on the results obtained by the proposed scheme for Reynolds numbers *Re* = 7500 and 10,000. We consider both uniform Cartesian and irregular nodal distributions. For uniform nodal distributions, our previous studies [27–29,35] show that for Reynolds numbers up to *Re* = 10,000, a uniform Cartesian grid with resolution of 361 × 361 captures the forming vortices and provides a grid-independent and accurate numerical solution. In the present study, to highlight the efficiency of the proposed scheme, we use successively finer uniform Cartesian grids, from 601 × 601 to 2048 × 2408 nodes. We notice that for *Re* = 10,000 the 1024 × 1024 grid resolution can capture all the flow scales and can be used for Direct Numerical Simulation (DNS) studies.

We use the Gershgorin circle theorem to define the critical time step for different nodal distributions (Tables 1 and 2) and for various Reynolds numbers. Table 1 lists the critical time step computed using the Gershgorin circle theorem for several grid resolutions and different Reynolds numbers (up to *Re* = 10,000). For the results reported in Table 1, 20 nodes were used in the support domain. We further examine the dependence of the critical time step on the number of neighboring nodes in the support domain. Table 2 lists the critical time step computed using the Gershgorin circle theorem with 40 nodes in the support domain. We can observe that the critical time step increases with the number of nodes in the support domain.

**Table 1.** Critical time step for various grid resolution and Reynolds (*Re*) numbers, using 20 nodes in the support domain.


**Table 2.** Critical time step for various grid resolution and *Re* numbers, using 40 nodes in the support domain.


Table 3 lists the computational time (in s) for computing the spatial derivatives for various grid resolutions, and for the numerical solution of N-S equations (for each time iteration) in the case of *Re* = 10,000. Recall that we precompute the LU factorization for the discrete Laplacian operator defined for the Poisson type equation for the stream function. Hence, steps 1 and 2 must be performed only once at the beginning of the simulation.

**Table 3.** Computational time (in seconds) for computing (**1**) spatial derivatives, (**2**) factorization and (**3**) numerical solution for various grid resolutions.


Figure 2 shows the *u*- velocity profiles along the vertical line passing through the geometric center of the cavity, and the *v*- velocity profiles along the horizontal line passing through the geometric center

of the cavity, for *Re* = 7500 and 10,000, obtained using the proposed scheme and compared to those reported by Ghia et al. [42], which are considered as benchmark solution.

**Figure 2.** Velocity profiles along the vertical line passing through the geometric center of the cavity for *u*- velocity component (left) and horizontal line passing through the geometric center of the cavity for the *v*- velocity component (right) for (**a**) *Re* = 7500 and (**b**) *Re* = 10,000. The results we obtained in this study using our methods described in section Methods Governing Equations and Solution Procedure (indicated as "Meshless" in the velocity profiles) are compared with the results reported in Ghia et al. [42].

Table 3 demonstrates the efficiency of the proposes scheme and its ability to solve relatively large algebraic systems (Poisson equation for stream function) with low computational cost.

We set the Reynolds number to *Re* = 10,000 and the total time for the flow simulation to *Tfinal* = 250. At that time, the flow regime has all the characteristic features of steady state [42]. We use a 1024 × 1024 grid resolution and a time step for the simulations of *dt* = <sup>5</sup> <sup>×</sup> <sup>10</sup><sup>−</sup>4.

Figures 3–5 show stream function and vorticity contours and demonstrate the formation of secondary vortices (Figures 3 and 4) which appear as the Reynolds number increases. There are often used as qualitative results, to highlight the accuracy of the numerical method. In detail, Figure 3 shows the stream function contour plots for *Re* = 7500, while Figure 4 shows the stream function contour plots for *Re* = 10,000. Magnified views of the secondary vortices are also included. The stream function values for these contours are listed in Table 4.

**Figure 3.** Streamline contours of primary, secondary and additional corner vortices for lid-driven cavity flow with *Re* = 7500.

**Figure 4.** Streamline contours of primary, secondary and additional corner vortices for lid-driven cavity flow with *Re* = 10,000.

**Figure 5.** Vorticity isocontour patterns for (**a**) *Re* = 7500 and (**b**) *Re* = 10,000.


**Table 4.** Values for streamline and vorticity contours in Figures 3 and 4.

Figure 5 shows the vorticity contours for *Re* = 7500 and 10,000. The vorticity values for the contours are listed in Table 4.

To verify the applicability of the proposed scheme in irregular nodal configurations, we compute the flow regime for *Re* = 10,000 using randomly distributed nodes. We use 496,987 nodes to discretize the spatial domain, which provides a grid-independent numerical solution (for the uniform Cartesian grid, a grid independent solution was obtained using a 401 × 401 grid resolution with 160,801 nodes). We obtain the irregular point cloud through a 2D triangular mesh generator (MESH2D-Delaunay-based unstructured mesh-generation). To compare the results with those obtained using the uniform nodal distribution (regular Cartesian grid), we interpolate the velocity, stream function and vorticity values computed using the irregular point cloud onto the uniform nodal distribution using the moving least squares (MLS) approximation method [43] and compute the maximum normalized root mean square error (NRMSE), defined as [44],

$$NRMSE = \frac{\sqrt{\frac{1}{N} \sum\_{i=1}^{N} \left(u\_i^{Cartsain} - u\_i^{Irregular}\right)^2}}{u\_{max}^{Irregular} - u\_{min}^{Irregular}}\tag{43}$$

For all field variables considered, the NRMSE was less than 10<sup>−</sup>4, which highlights the accuracy of the proposed method for irregular nodal distributions.

We further demonstrate the accuracy of the proposed scheme, providing quantitative results. We use a uniform Cartesian grid of 1024 <sup>×</sup> 1024 resolution and we set the time step to *dt* = 5 <sup>×</sup> 10−4. In Table 5, we present the strengths and positions of the primary (located in the center of the cavity), secondary (bottom right corner) and ternary vortexes (bottom and top left corners). The numerical results are in excellent agreement with the results reported in Ghia et al. [42]. Table 6 lists the *u*velocity values along vertical line through the center of cavity, and Table 7 the *v*- velocity values along horizontal line through the center of cavity.



**Table 6.** *u*- velocity along vertical line through the center of cavity.

**Table 7.** *v*- velocity along horizontal line through the center of cavity.


#### *3.2. Backward-Facing Step*

The second benchmark problem considered is the backward-facing step [45]. This problem has been studied by several researchers using different numerical methods and is considered to be a demanding flow problem to solve, mainly due to the flow separation that occurs when the fluid passes over a sharp corner and re-attaches downstream [1,45,46].

Figure 6 shows the spatial domain for the backward facing step. The coordinate system is centered at the step corner, with the *x*-coordinate being positive in the downstream direction, and the *y*coordinate across the flow channel. The height *H* of the channel is set to *H* = 1 (ranging from (0, −0.5) to (0, 0.5)), while the step height and upstream inlet region are set to *H*/2. To ensure fully developed flow, the downstream channel length *L* is set to *L* = 30 *H*. The inlet velocity has a parabolic profile, with horizontal component *u*(*y*) = 12*y* <sup>−</sup> 24*y*<sup>2</sup> for 0 <sup>≤</sup> *y* <sup>≤</sup> 0.5, which gives a maximum inflow velocity of

*umax* = 1.5 and average velocity of *uavg* = 1. At the outlet, we assume fully developed flow (*du*/*dx* = 0, *v* = 0). The Reynolds number is defined as *Re* = *uavgH*/*vf*, with *vf* being the kinematic viscosity.

**Figure 6.** Geometry for the backward-facing step flow problem.

We first discretize the flow domain with a uniform Cartesian grid. In our previous studies [29,35], we have shown that grid with resolution 31 × 901 is sufficient to compute a grid-independent numerical solution for Reynolds numbers up to *Re* = 800. As in the previous section, to highlight the efficiency of our scheme, we use a denser grid with resolution 121 × 3630, resulting in 439,230 nodes. We set the total time *Ttotal* = 250 (to ensure that the solution will reach steady state), and the time step *dt* = 10<sup>−</sup>4. The simulation terminates when the NRMSE of the time derivative of the stream function and vorticity field values in two successive time steps is less than 10<sup>−</sup>6. The time needed to create the grid was 0.023 s, and it takes 0.3 s to update the solution for each time step. We obtained a steady state solution after 100,000 time steps.

Figure 7 shows the stream function and vorticity contours for *Re* = 800. The flow separates at the step corner and vortices are formed downstream. We can observe the two vortices formed at the lower and upper wall. After reattachment of the upper wall eddy, the flow in the duct slowly recovers towards a fully developed flow. We compute the separation and reattachment points at *Llower* ≈ 6.1 for the lower wall separation zone, *Lupper* ≈ 5.11 for the upper separation zone, where the separation begins at *x* ≈ 5.19. Our numerical findings show good agreement with other numerical methods for 2D computations [1,46]. In [1], the authors used a finite difference method and predicted separation lengths of *Llower* ≈ 6.0 and *Lupper* ≈ 5.75, while [46] using the A Fluid Dynamics Analysis Program (FIDAP) code (Fluid dynamics International Inc., Evanston, IL, USA) predicted *Llower* ≈ 5.8 and the upper *Lupper* ≈ 4.7.

**Figure 7.** Contour plots of the (**a**) stream function and (**b**) vorticity for *Re* = 800 for the backward-facing step flow problem using the stream function and vorticity values reported in Gartling [45].

Figure 8 shows the comparisons of the *u*- and *v*- velocity components and vorticity values, between the present scheme and those obtained using the finite element method (FEM) [45], along the line *x* = 7 and *x* = 15 for *Re* = 800. Additionally, we computed the shear stress, defined as <sup>∂</sup>*<sup>u</sup>* ∂*y* on the lower and upper wall, for increasingly refined nodal resolution. Figure 9a shows plots of the shear stress for the upper and lower wall, while Figure 9b displays the convergence of the upper wall shear stress when successively denser point clouds are used. We can see that shear stress converges and is accurate when compared to results of [45]. The accurate computation of spatial derivatives is a particular advantage of the strong form meshless method. In contrast, convergence of many finite element methods can be obtained only in terms of integral norms, and the point-wise convergence for the velocity gradient cannot be guaranteed even in the case of a smooth numerical solution [47].

**Figure 8.** (**a**) Horizontal and (**b**) vertical velocity profiles, and (**c**) vorticity profiles at *x* = 7 and *x* = 15 for the backward-facing step flow problem with *Re* = 800.

**Figure 9.** (**a**) Shear stress for the upper and lower wall for *Re* = 800 for the backward-facing step flow problem (**b**) convergence of the upper wall shear stress with successively denser point clouds (40, 80 and 120 correspond to the number of nodes in the *y*-direction).

Tables 8 and 9 list the velocity and vorticity field values at the at *x* = 7 and *x* = 15, respectively, for *Re* = 800.


**Table 8.** Cross-sectional profile at *x* = 7.

**Table 9.** Cross-sectional profile at *x* = 15.


#### *3.3. Unbounded Flow Past a Cylinder*

We examine the case of external flow past a circular cylinder in an unbounded domain [48]. The flow domain is large compared to the dimensions of the cylinder, as shown in Figure 10. The cylinder cross-section has a radius *Rc* = 0.5 and is located at the origin *O* (0,0) of a square domain with dimensions −10 ≤ *x* ≤ 30 and −20 ≤ *y* ≤ 20.

The uniform flow is not perturbed near the inlet, which is far from the body. Therefore, we assume that the inflow velocity *uinlet* behaves like the potential flow *upotential* given as:

$$u\_{\text{potential}} = \left(lL\_{\text{co}}\left(1 - \frac{R\_{\text{c}}^2}{\mathbf{x}^2 + \mathbf{y}^2} + \frac{2R\_{\text{c}}^2 \mathbf{y}^2}{\left(\mathbf{x}^2 + \mathbf{y}^2\right)^2}\right) lL\_{\text{co}}\frac{-2R\_{\text{c}}^2 \mathbf{x}\mathbf{y}}{\left(\mathbf{x}^2 + \mathbf{y}^2\right)^2}\right) \tag{44}$$

or in terms of the stream function:

$$
\psi\_{potential} = \mathcal{U}\_{\rm co} \mathcal{Y} \left( 1 - \frac{R\_c^2}{\mathbf{x}^2 + \mathbf{y}^2} \right) \tag{45}
$$

**Figure 10.** Geometry and boundary conditions for flow past a cylinder.

The boundary conditions for the stream function ψ and vorticity ω on the remaining boundaries (the top and bottom wall, outlet and cylinder surface) are:

• At the inlet, top and bottom walls (*B* = inlet, top and bottom)

$$\begin{array}{rcl} \mathfrak{u}\_{\mathcal{B}} &=& \mathfrak{u}\_{\text{potential}}|\_{\mathcal{B}} \\\\ \psi\_{\mathcal{B}} &=& \psi\_{\text{potential}} \end{array} \Big|\_{\mathcal{B}} $$

• At the outlet

$$\frac{\partial \psi}{\partial \mathbf{x}} = 0$$

$$\frac{\partial \omega}{\partial \mathbf{x}} = 0$$

ψ*cylinder* = 0 *ucylinder* = 0

• On the cylindrical surface

We represent the flow domain with a uniform Cartesian embedded grid, locally refined in the vicinity of the cylinder (Figure 11a). We use 402,068 nodes (0.5 s to create the nodal distribution) and we set Reynolds number to *Re* = 40. We use a time step of *dt* = 10−<sup>4</sup> for the entire flow simulation. It takes 0.3 s to update the solution for each time step, and we obtained a steady state solution after 20,000 time steps. The spatial derivatives (up to 2nd order) are computed in 4.24 s using a C++ code. Additionally, we solve the flow problem using an irregular nodal distribution with 221,211 nodes, locally refined in the vicinity of the cylinder.

**Figure 11.** (**a**) Locally refined Cartesian and (**b**) irregular nodal configurations for the flow past cylinder flow problem.

We compute the following flow parameters: the pressure coefficient (*Cp*) on the body surface, the length (*L*) of the wake behind the body, the separation angle (θ*s*), and the drag coefficient (*CD*) of the body. The drag and lift coefficient of the body are given by:

$$\mathbf{C}\_{\rm D} = \frac{2\mathbf{F}\_{b}\cdot\mathbf{\hat{e}}\_{x}}{l l\_{\rm co}^{2}D} \tag{46}$$

$$\mathbb{C}\_{L} = \frac{2\mathbb{F}\_{b} \cdot \mathbb{a}\_{y}}{l l\_{\infty}^{2} D} \tag{47}$$

where *D* is the characteristic length of the body, *e*ˆ*<sup>x</sup>* and *e*ˆ*<sup>y</sup>* are the unit normal vector in *x*- and *y*direction, respectively, and *F<sup>b</sup>* is the total drag force acting on the body. The total drag force is given as:

$$F\_b = F\_p + F\_f \tag{48}$$

where *Fp*, the pressure drag, can be determined from the flux of vorticity on the surface of the cylinder as:

$$F\_p = -R\_\mathbb{C} \int\_0^{2\pi} v\_f \frac{\partial \omega}{\partial r} \mathbf{\hat{e}}\_0 d\theta \tag{49}$$

The friction drag *F<sup>f</sup>* may be computed from the vorticity on the surface of the body as:

$$F\_f = R\_\mathbb{C} \int\_0^{2\pi} v\_f \omega\_0 d\theta \tag{50}$$

with *<sup>e</sup>*ˆ<sup>θ</sup> <sup>=</sup> <sup>−</sup>sin <sup>θ</sup>*<sup>i</sup>* <sup>+</sup> cos <sup>θ</sup>*j*.Table <sup>5</sup> lists the recirculation length (*Lrec*) of the wake behind the body, the separation angle (θ*s*), and the drag coefficient (*CD*) for *Re* = 40.

The numerical findings obtained by the proposed scheme are listed in Table 10, and are in good agreement with those in the literature [48–51]. The stream function and vorticity contours around the cylinder are illustrated in Figure 12 (as in [48]).

**Table 10.** Comparison of the wake length (*Lsep*), the separation angle (θ*sep*), and the drag coefficient (*CD*) for Reynolds number *Re* = 40.


**Figure 12.** (**a**) Stream function and (**b**) vorticity contours around the body for flow behind a cylinder using a Cartesian embedded nodal distribution.

#### **4. Numerical Results**

In this section, we examine the applicability and reproducibility of the proposed scheme for several flow cases involving complex geometries.

#### *4.1. Vortex Shedding Behind Cylinders*

In the previous example (unbounded flow past a cylinder), we considered the flow past a cylinder in an unbounded domain. In this example, we study the vortex shedding behind a cylinder in a rectangular duct [52] (internal flow), as shown in Figure 13. The duct has length *L* = 2.2 m and height *H* = 0.41 m, while the cylinder is located at point *O* (0.2, 0.2) and has diameter *D* = 0.1 m. The kinematic viscosity of the fluid is set to *vf* = 0.001 m2/s. The Reynolds number is defined as *Re* = *UmD*/*vf*, with the mean velocity *Um* given as *Um*(*x*,*y*,*t*) = 2*U*(0,*H*/2,*t*)/3. The inflow velocity (*x*- direction component) is set to *U*(0, *y*) = 4*Umy H*−*y H*<sup>2</sup> , and for *Um* = 1.5 m/s yields a Reynolds number *Re* = 100. At the outlet, we consider fully developed flow (*du*/*dx* = 0), while at the remaining boundaries we apply no-slip boundary conditions. The total time for the simulation was set to *Ttot* = 8 s. The critical time step is computed and monitored for each time step (the calculation involves stream function and vorticity values) to ensure that CFL conditions are fulfilled.

**Figure 13.** Spatial domain for flow behind a cylinder.

First, we consider a uniform Cartesian embedded grid with grid spacing *h*, defined by the average distance of the boundary nodes on the cylinder circumference. We conducted a grid independence analysis, applying successively refined Cartesian grids, starting from 34,999 (90 nodes on the circular boundary) up to 406,678 (720 nodes on the circular boundary) nodes.

The time needed to create the grid of 406,678 nodes is 0.8 s. We delete the nodes located close to the boundary (with distance less than 0.25 *h*), since they would increase the condition number of the Vandermonde matrix. A Vandermonde matrix with too large condition number would affect the accuracy of the spatial derivative computation and consequently the overall precision of the numerical simulation. A grid independent solution is obtained with 203,890 nodes (360 nodes on the circular boundary). The critical time step computed for this grid resolution is <sup>δ</sup>*tcritical* = 2 <sup>×</sup> 10−3. In the simulations, we set the time step to *dt* = 10−<sup>3</sup> s. For each time step, it takes 0.38 s to update the solution.

Figure 14a shows the stream function contours at time instances *t* = 2, 4 and 8 s, while Figure 14b plots the vorticity isocontours. Figure <sup>15</sup> plots the drag *CD* <sup>=</sup> <sup>2</sup>*Fb*·*e*ˆ*<sup>x</sup> U*2 *mD* and lift *CL* <sup>=</sup> <sup>2</sup>*Fb*·*e*ˆ*<sup>y</sup> U*2 *mD* coefficients, computed with our meshless explicit scheme (as in the previous example). To highlight the applicability of our code for locally refined nodal distributions, we use a uniform Cartesian embedded grid, locally refined in the vicinity of the cylinder. As before, the average distance of the cylinder nodes dictates the nodal spacing *hd* of the Cartesian grid in the refined part of the domain. Nodes of the coarse grid have spacing *hc* = 2*h*, and nodes located close to the cylinder (with distance less than 0.25 *hd*) are deleted since they affect the condition number of the Vandermonde matrix and decrease the accuracy of the numerical results. The flow domain is represented by 321,897 nodes, which ensure a grid independent solution. For the simulations, we set the time step to *dt* = 10−<sup>3</sup> s.

**Figure 14.** (**a**) Stream function and (**b**) vorticity contours for flow around a cylinder at time *t* = 2, 4 and 8 s.

**Figure 15.** (**a**) Drag and (**b**) lift coefficient in time computed by the proposed scheme.

Additionally, to highlight the versatility of the proposed scheme, we examined the flow in the rectangular duct with multiple (seven in total) cylindrical obstacles. The length of the duct was increased to *L* = 4.4 m in order to ensure fully developed flow at the outlet. The boundary conditions applied are the same as before. We use Cartesian embedded and irregular nodal distributions (Figure 16) to represent the flow domain.

**Figure 16.** (**a**) Cartesian embedded and (**b**) irregular nodal distributions used in duct flow problems considering multiple cylindrical obstacles.

We conducted a comprehensive analysis in order to obtain a grid independent numerical solution. For the Cartesian grid we use 279,178 nodes (0.42 s to create the nodal distribution), while for the irregular point cloud we use 353,617 nodes (1.2 s to create the irregular nodal distribution). Both nodal distributions provided a grid independent numerical solution. The total time was set to *Ttot* = 8 s, the time step was set to *dt* = 10−<sup>4</sup> s. For each time step, the time needed to update the solution was 0.32 s. Figure 17 displays the stream function and vorticity contours for *Re* = 100 at time *t* = 1 and *t* = 2 s.

We further demonstrate the accuracy of the proposed scheme, by comparing our results with those computed using the finite element method (FEM). Flow equations were solved using the Incremental Pressure Correction Scheme (IPCS), which is an operator splitting method [53]. For the meshless solver, we use 353,617 nodes, generated using a triangular mesh generator, and we set the time step to *dt* = 10−<sup>3</sup> s. For the finite element model, we solve the flow equations using the generated mesh and FEniCS (https://fenicsproject.org). We compute the Normalized Root Mean Square Error (NRMSE), as defined in Equation (43), for both velocity components (*ux*, *uy*), considering as gold standard the velocity field values calculated using FEniCS. Figure 18 shows the NRMSE for the velocity components for the time interval between *t* = 0 and *t* = 1 s.

**Figure 17.** (**a)** Stream function and (**b**) vorticity contours for flow behind multiple cylinders at time *t* = 1 and *t* = 2 s.

**Figure 18.** Normalized Root Mean Square Error (NRMSE) for the velocity components between the proposed scheme and the finite element solution.

#### *4.2. Flow in a Duct with Irregular Geometry*

Finally, we consider two flow cases in the flow domains shown in Figure 19. The first one has an irregularly shaped obstacle downstream, while in the second the flow domain is split into two branches, forming a bifurcation.

**Figure 19.** Geometry for the flow in the (**a**) bypass and (**b**) bifurcation domain.

The length *L* is set to *L* = 0.12 and 0.041 *m* for the first (Figure 19a) and second (Figure 19b) flow domain, respectively. In both test cases, the height of the duct is set to *H* = 0.0041 m.

The dynamic viscosity is defined as μ = 0.00032 kg/m s, and the fluid density is ρ = 1050 kg/m3. The inflow condition *Uinlet* = 4*Umy H*−*y H*<sup>2</sup> , with *Um* = 0.035 m/s. The Reynolds number *Re* = ρ*UmH*/μ, with *D* being a characteristic length defined separately for each case. At the outlet, the flow is fully developed (*du*/*dx* = 0). For the remaining boundaries we applied no-slip conditions. In both cases, we represent the flow domain with uniform Cartesian embedded grid, and irregular nodal distribution (Figure 20).

**Figure 20.** (**a**) Cartesian embedded and (**b**) irregular nodal distribution for the complex flow geometries.

First, we represent the flow domain with a uniform Cartesian embedded grid. For the irregularly shaped obstacle (bypass), we consider successively denser nodal distributions to obtain a grid independent numerical solution. We start with a grid spacing of *h* = 1/8200, *h* = 1/12,300 and

*h* = 1/16,400 for the coarse, moderate and dense Cartesian embedded grids, resulting in 51,468, 114,387 and 201,468 nodes, respectively (for the denser grid (201,468 nodes) it takes 0.25 s to create the nodal distribution). The total time for the simulation was set to *Ttot* = 30 s and the time step used was *dt* = 10–4 s (for each time step the solution is computed in 0.35 s). Figure 21 shows the iso-contours for the stream function and vorticity field functions, at different time instances: *t* = 0.5 and *t* = 1 s, and for steady state (steady state is reached when, for two successive time instances *t <sup>n</sup>*+<sup>1</sup> and *t <sup>n</sup>* for both vorticity and stream function field values, *NRMSE*/*dt* < 10<sup>−</sup>8).

**Figure 21.** Stream function and vorticity contours for the bypass test case at (**a**) *t* = 0.5 s and (**b**) *t* = 1 s and (**c**) steady state.

We observe that two vortices are formed; the first is located at the inlet of the bypass, the second at the upper domain of the bypass which is moving towards the outlet of the flow domain. Figure 22a plots the *u*- velocity profile, computed at *x* = 0.07 m and *x* = 0.08 m at time *t* = 3 s for the coarse, moderate and dense nodal distributions, while Figure 22b plots the *u*- velocity profile, computed at *x* = 0.07 m and *x* = 0.08 m at time *t* = 3 s for the denser grid used.

Additionally, to highlight the applicability of the method to irregular nodal distributions, we solved the flow equations by representing the flow domain with nodes generated by a 2D triangular mesh generator (we use only the coordinates and not their connectivity). We used 392,122 nodes, which is higher than the number of nodes used in the Cartesian embedded grid. The numerical results obtained were compared against those obtained by the Cartesian grid and they were in excellent agreement.

**Figure 22.** (**a**) *u*-velocity profiles at *x* = 0.07 at time *t* = 3 s for the coarse, moderate and dense nodal distributions for *Re* = 474 (*Um* = 0.035 m/s) and (**b**) *u*-velocity profiles at *x* = 0.07 and 0.08 for flow in the bypass geometry.

For the bifurcated geometry, we use both uniform Cartesian and irregular nodal distributions. In the case of Cartesian embedded grid, we obtain a grid independent numerical solution by using successively denser nodal distributions, with the denser one consisting of 678,967 nodes (for the fine grid it takes 0.68 s to create the nodal distribution). The total time for the simulation was set to *Ttot* = 30 s and the time step used is *dt* = 10−<sup>4</sup> s, which is smaller than the critical time that ensures stability for the explicit solver (the solution in each time step is computed in 0.52 s). The inflow velocity *Uinlet* = 4*Umy H*−*y H*<sup>2</sup> , with *Um* = 0.015 m/s results in Reynolds number *Re* = 212. Figure 23 shows the iso-contours for the stream function and vorticity field functions, at time *t* = 0.5 and *t* = 1 s and for steady state (steady state is reached when, for two successive time instances *t <sup>n</sup>*+<sup>1</sup> and *t <sup>n</sup>* for both vorticity and stream function field values, *NRMSE*/*dt* < 10<sup>−</sup>8).

**Figure 23.** (**a**) Stream function and (**b**) vorticity contours for the bifurcation test case.

To further demonstrate the accuracy of the proposed scheme, we compare our numerical findings for both flow cases (bypass and bifurcation geometry) with those obtained using the finite element method. The flow domain is discretized using 392,122 and 678,967 nodes, respectively, and Taylor-Hood elements (as in the vortex shedding behind cylinders example). We use the IPCS flow solver to numerically solve the flow equations. We compare both numerical solutions, computing the NRMSE for the two velocity components (*ux* and *uy*) on the vertices of the triangular elements (the same nodes are used in the meshless point collocation flow solver). Figure 24 shows the NRMSE for the velocity components (bypass and bifurcation flow examples) for the time interval between *t* = 0 and *t* = 1 s.

**Figure 24.** Normalized Root Mean Square Error (NRMSE) for the velocity components between the proposed scheme and the finite element solution for (**a**) bypass and (**b**) bifurcation flow cases.

#### **5. Conclusions**

In this contribution, we described a meshless point collocation algorithm for solving the stream function-vorticity formulation of N-S equations. We demonstrated our algorithm to work well in complex geometries like the ones shown in Section 4.1 (Vortex Shedding Behind Cylinders) and 4.2 (Flow in a Duct with Irregular Geometry). We demonstrate the accuracy of the proposed scheme by comparing our numerical results with those computed using the finite element method. An important advantage of our method is the ease and speed with which one can construct computational grids for flow domains with irregular shapes.

The meshless scheme based on DC PSE methods to compute spatial derivatives can be used in the case of Cartesian and Cartesian-embedded grids. The method works both efficiently and accurately for uniform Cartesian (embedded) grids and for irregular point clouds. We verified the accuracy of the proposed scheme through the following three benchmark problems: lid-driven cavity flow, backward-facing step and unbounded flow past a cylinder. The results were in good agreement with other published numerical studies.

Our proposed scheme offers several advantages over other commonly used methods:


**Author Contributions:** Conceptualization, G.C.B.; Investigation, G.C.B.; Software, G.C.B.; Writing, G.C.B, B.F.Z. and K.M.; Writing—review and editing, G.C.B., B.F.Z., G.R.J., V.C.L., A.C.R.T, A.W. and K.M.

**Funding:** This research was supported in part by the Australian Government through the Australian Research Council's Discovery Projects funding scheme (project DP160100714).

**Acknowledgments:** This research was supported in part by the Australian Government through the Australian Research Council's Discovery Projects funding scheme (project DP160100714). The views expressed herein are those of the authors and are not necessarily those of the Australian Government or Australian Research Council.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **A New Wall Model for Large Eddy Simulation of Separated Flows**

#### **Ahmad Fakhari**

Institute for Polymers and Composites, Department of Polymer Engineering, Campus of Azurém, University of Minho, 4800-058 Guimarães, Portugal; ahmadfakhari@gmail.com; Tel.: +351-253-510397

Received: 14 September 2019; Accepted: 19 November 2019: date; Published: 28 November 2019

**Abstract:** The aim of this work is to propose a new wall model for separated flows which is combined with large eddy simulation (LES) of the flow field in the whole domain. The model is designed to give reasonably good results for engineering applications where the grid resolution is generally coarse. Since in practical applications a geometry can share body fitted and immersed boundaries, two different methodologies are introduced, one for body fitted grids, and one designed for immersed boundaries. The starting point of the models is the well known equilibrium stress model. The model for body fitted grid uses the dynamic evaluation of the von Kármán constant *κ* of Cabot and Moin (Flow, Turbulence and Combustion, 2000, 63, pp. 269–291) in a new fashion to modify the computation of shear velocity which is needed for evaluation of the wall shear stress and the near-wall velocity gradients based on the law of the wall to obtain strain rate tensors. The wall layer model for immersed boundaries is an extension of the work of Roman et al. (Physics of Fluids, 2009, 21, p. 101701) and uses a criteria based on the sign of the pressure gradient, instead of one based on the friction velocity at the projection point, to construct the velocity under an adverse pressure gradient and where the near-wall computational node is in the log region, in order to capture flow separation. The performance of the models is tested over two well-studied geometries, the isolated two-dimensional hill and the periodic two-dimensional hill, respectively. Sensitivity analysis of the models is also performed. Overall, the models are able to predict the first and second order statistics in a reasonable way, including the position and extension of the downward separation region.

**Keywords:** wall layer model; LES; separated flow; body fitted; immersed boundary

#### **1. Introduction**

Wall bounded turbulent flows, especially at high Reynolds numbers, require a high resolution near the wall, because of the need to solve the thin viscous sub-layer [1–3]. In LES, this resolution is comparable with that required by direct numerical simulation (DNS). As far as Reynolds number increases, the cost of a wall-resolving LES augments with *Re*2.5 (for a discussion, see Piomelli and Balaras [1]). Moreover, for applications where wall roughness is the rule rather than the exception, it is not feasible to describe the wall boundary in a deterministic sense.

The most practical way of simulating a wall-bounded, high Reynolds number flow, is to consider coarse grid, and model the wall shear stress with a wall function. This function is designed to mimic the stress induced by the wall, produced by the adherence condition and ruled by the turbulent field. Several wall layer models have been developed in the past (for a discussion, see Piomelli and Balaras [1] and Piomelli [2]). They are designed to work under certain conditions, and in particular they may suffer in the presence of massive separation. This subject is still challenging for improvement.

In equilibrium stress models, the grid is generated in such a way that the first near-wall computational node is placed in the log region of the boundary layer. So, based on the law of the wall, the instantaneous or mean horizontal velocity can be fitted to determine the wall stress. These models are valid only under the equilibrium assumption and work for both smooth and rough walls.

Deardorff [4] and Schumann [5] first used the concept of wall layer modeling in conjunction with LES. Applying LES on plane channel flow and annuli at large Reynolds numbers, they assumed the existence of an equilibrium-stress layer near the wall and used the outer flow velocity to calculate the wall stress based on the logarithmic law.

Deardorff [4] considered a second order velocity derivative in vertical direction and forced plane averaged stream-wise and spanwise velocity gradients to follow logarithmic law. Schumann [5] obtained the plane averaged wall shear stress from plane averaged velocity at the first grid point, using the iterative method so that the shear velocity is computed from the law of the wall from the averaged velocity. These methods are the least expensive wall layer models. They also provide roughness corrections easily from the logarithmic law modification, which is an important feature in environmental and oceanographic flows. The models suffer in cases of marginal separation, or strong pressure gradient [2].

In zonal approaches (two-layer models, TLMs), stress derives from the solution of a separate set of equations on a finer mesh close to the wall. This model was proposed by Balaras and Benocci [6] who applied turbulent boundary layer equations inside the boundary layer (inner layer) and LES on the outer region. In this model, a finer grid is considered between the wall and the first computational node of the coarser grid. A uniform pressure field is applied from LES for the inner layer, and velocity field from LES is considered at the edge of the Reynolds Averaged Navier-Stokes (RANS) region as its boundary condition. The coupling between the inner and outer regions of the boundary layer is weak, and these models have problems when a perturbation extends from the wall to the outer layer. Balaras and Benocci [6] used an algebraic eddy viscosity to parametrize the near-wall region. In addition to plane channel flow, this method was tested on square ducts and also rotating channel, the cases in which equilibrium stress model was not valid or even failed; this method showed a good accuracy in simulating these flows. These methods, able to simulate equilibrium as well as non-equilibrium flows were not designed to deal with separated flows.

Cabot [7] applied this approach to simulate flow over a backward-facing step with only 10% grid points fewer than those used in former wall-resolved LES. He found that stream-wise pressure gradient has an important effect in boundary layers of TLMs. In this case, mean velocity and skin friction coefficient were predicted well; however, the backward-facing step is just a simple case where separation is induced by the sharp discontinuity in the streamlines.

Cabot and Moin [8] used the wall layer model in conjunction with LES to simulate the flow over a backward-facing step. They considered a zonal approach using thin boundary layer equations (TBLEs) in the inner layer and LES (dynamic SGS model) in the outer region. This way, they were able to use eight uniform cells from the bottom of the wall up to half the height of the step, greatly reducing the total number of cells with respect to a stretched-grid case previously used for the wall-resolved LES.

Finally, in hybrid methods, a single grid is used and the turbulence models are different in inner and outer layers. Detached eddy simulation (DES) is a hybrid approach proposed by Spalart et al. [9] as a method to compute massive separation. According to DES, a stretched grid is used close to the wall to resolve the boundary layer. RANS is applied in the inner and LES in the outer region.

Since there is no zonal interface between the inner and the outer layer, velocity profile is smooth everywhere. Some weaknesses of this method are logarithmic layer mismatch between the two regions and high grid resolution dependency near the wall. In addition, time and length scales of the eddies in the inner layer (modelled by unsteady RANS equations with the Spalart-Allmaras 1-equation turbulent model) are much larger than those computed in the outer region by LES; this leads to much lower resolved stress by LES than the modelled stress (by RANS) at the RANS/LES interface and even farther Piomelli [2].

A review of DES and the two modifications of it, which are delayed detached eddy simulation (DDES) and improved delayed detached eddy simulation (IDDES), can be seen in Mockett et al. [10].

The total shear stress in plane channel flow only depends on the distance from the wall, therefore a reduction of eddy viscosity farther than the RANS/LES interface affects the velocity profile and generates a high gradient at the transition into LES region. This phenomenon is called the DES buffer layer. To eliminate this, Keating and Piomelli [11] added stochastic force in the interface region to accelerate resolved eddy generation, thus obtaining better results. Temmerman et al. [12] also computed the constant required to calculate eddy viscosity in the RANS region in such a way to equate the eddy viscosity of RANS and LES at the interface of their hybrid RANS/LES application.

While the hybrid RANS/LES methods work satisfactorily in flows with instabilities such as those with adverse pressure gradient and concave curvature, they are weak in attached flows, and, in general, in flows with a low level of instability; in these cases a false merging region may appear at the RANS/LES interface. Since the grid near the wall must be resolved, they are the most expensive ones in the wide family of wall layer models. In addition to the three main wall layer frameworks herein discussed, there are other models which are not mentioned here.

The near-wall modelisation must be implemented in the algorithms that solve the governing equations using different strategies. A body fitted (also called boundary fitted) case is the case in which a grid is generated to follow the geometry. Since the grid boundaries coincide with the solid surface of a solid body, boundary conditions are imposed on certain grid nodes or cell faces. This facilitates the computation of all the vectorial quantities as well as of their gradients.

In complex geometry, where a structured grid can hardly follow the boundary complexities, alternative approaches must be employed. Among others, the immersed boundary method (IBM) has proved to be simple, effective and accurate (Mittal and Iaccarino [13]). The domain grid can be either Cartesian or curvilinear. In this approach the cells are cut by the solid surface and the cell boundaries in the general cases do not coincide with the solid boundaries. Hence the fluxes at the solid surfaces are difficult to define. Finally, the velocities in near-wall computational nodes are reconstructed and thus a local frame of reference must be introduced at each near-wall node to identify directions normal and tangential to the immersed boundary.

A number of papers have been published discussing different implementations of immersed boundary methods (see Mittal and Iaccarino [13]). Among others, Roman et al. [14] extended IBM to the general case of curvilinear coordinates, and simulated both steady and unsteady flows in complex geometric configurations showing the flexibility of this methodology when compared to the Cartesian counterpart. However, few papers have dealt with the development of wall layer models in conjunction with immersed boundaries. This is a matter of great importance when flows at a high Reynolds number over complex geometry must be simulated.

Tessicini et al. [15] applied wall layer model LES in the presence of immersed boundaries to simulate the flow past a 25◦, asymmetric trailing edge of a model hydrofoil. The Reynolds number, based on free stream velocity and the hydrofoil chord (C), was chosen equal to *ReC* <sup>=</sup> 2.15 <sup>×</sup> <sup>10</sup>6. They used turbulent boundary layer equations for modelling the wall layer. In general, their results were good, although they had a deviation at the second off-the-wall node, which was considered as the outer boundary for the wall model. This discrepancy was sensitive to the distance of the second node from the wall, and was more evident where the distance was larger.

Posa and Balaras [16] proposed a new near-wall reconstruction model to account for the lack of resolution and provided correct wall shear stress and hydrodynamic forces. They used a zonal approach (TLM), boundary layer equations with a finer grid in the near-wall region (called in this case the full boundary layer FBL) and LES in the outer region. They validated their model to simulate flow around a cylinder and a sphere.

Then, they considered a coarser grid, and set one node inside the boundary layer and neglected the advective term in boundary layer equations. This was justified by the fact that, if the first point off the wall is located inside the boundary layer, neglecting this term does not provide significant errors. They assumed a constant pressure gradient between the first and second nodes, and obtained tangential velocity with two approaches: The reduced diffusion model (RDM) and the hybrid reconstruction method (HRM). The Reynolds number in the cylinder case was *Re* = 300 based on free stream velocity and the cylinder diameter, and in the sphere was *Re* = 1000. The pressure coefficient for all cases was predicted well, while the skin friction was underestimated in all cases, but an improvement was observed using RDM and HRM in comparison with linear reconstruction. The prediction of separation was also improved a lot using these two approaches.

Chen et al. [17] also proposed a wall layer model based on turbulent boundary layer equations at high Reynolds numbers for implicit LES in the presence of immersed boundaries. First, they tested it on a turbulent channel flow in the range of Reynolds numbers (based on shear velocity) from *Re<sup>τ</sup>* = 395 to *Re<sup>τ</sup>* = 100, 000. They used a minimum of 20 cells inside the boundary layer for lower Reynolds numbers and 40 cells for higher values. Inside the boundary layer (inner region), the pressure gradient and friction are dominant, thus advection is neglected in cases where there are slow changes in wall parallel direction and sharp changes in wall normal direction. The fact that the changes in wall normal direction are sharper at higher Reynolds numbers requires more resolution for capturing the flow in this direction.

They also simulated flow over a backward-facing step at a Reynolds number based on the step height *Reh* = 5000 in a Cartesian grid. Finally they simulated the flow over periodic hills at *ReH* = 10,595 based on hill height, using different resolutions, and they obtained comparatively good results. To summarize, different models have an acceptable accuracy under some special conditions; therefore, more general models are needed to work well in different situations characterized by attached as well as separated flow conditions.

In the present study, attempts are made to overcome some of the difficulties mentioned above, proposing wall layer model to be used in conjunction with LES, suited for both attached and detached flows. Since in engineering applications a combination of body fitted solid walls and immersed boundaries is often employed, a comprehensive model must include the two aspects. Here the models are designed to work in body fitted cases and in the presence of immersed boundaries, respectively. The solver for the simulations is the LES model (LES–COAST) developed over the years at the Laboratory of Environmental and Industrial Fluid Mechanics of the University of Trieste, Italy, and which has successfully been used in a number of environmental real-scale applications (see among others Roman et al. [18] and Galea et al. [19]). The paper is organized as follows: In Section 2 the governing equations are described and the numerical method employed; in Section 3, the wall layer model for body fitted grids, its improvement and application to a case characterized by the presence of a boundary layer together with massive separation, namely the case of the flow over an isolated hill, are shown; in Section 4 the model in the presence of the immersed boundary is presented, together with its application to the case of single hill and to the case of periodic hill. Concluding remarks are given in Section 5.

#### **2. Governing Equations**

The incompressible form of the filtered Navier-Stokes equations read as:

$$\begin{cases} \frac{\partial \overline{u}\_{\overline{j}}}{\partial x\_{\overline{j}}} = 0, \\\\ \end{cases} \tag{1}$$

$$\frac{\partial \overline{u}\_{i}}{\partial t} + \frac{\partial (\overline{u}\_{i} \overline{u}\_{j})}{\partial \mathbf{x}\_{j}} = -\frac{1}{\rho\_{0}} \frac{\partial \overline{p}}{\partial \mathbf{x}\_{i}} - \frac{\partial \mathbf{r}\_{lj}}{\partial \mathbf{x}\_{j}} + \nu \frac{\partial^{2} \overline{u}\_{l}}{\partial \mathbf{x}\_{j} \partial \mathbf{x}\_{j}}.\tag{2}$$

The filter operation (denoted by overbar) allows reproduction of the space–time evolution of the large scales of motion, which are anisotropic and energetic, while the effect of the small sub-grid scales (SGS) is contained in the SGS stress terms (*τij*) in the momentum equation. It is here modeled through an SGS eddy viscosity model:

$$
\pi\_{ij} - \frac{\delta\_{ij}}{3} \pi\_{kk} = -2\nu\_T \overline{\mathfrak{S}}\_{ij\prime} \tag{3}
$$

where *Sij* is the resolved-scale tensor, defined as:

$$\overline{S}\_{i\dot{j}} = \frac{1}{2} (\frac{\partial \overline{u}\_i}{\partial x\_{\dot{j}}} + \frac{\partial \overline{u}\_{\dot{j}}}{\partial x\_{\dot{i}}})\_{\prime} \tag{4}$$

*ν<sup>T</sup>* is the SGS turbulent viscosity, also known as eddy viscosity. The standard Smagorinsky model [20] is based on the equilibrium assumption. Considering the length scale *l* ∼ Δ, the eddy viscosity can be written as:

$$\nu\_T = (\mathbb{C}\_s \overline{\Delta})^2 |\overline{\mathbb{S}}| \,. \tag{5}$$

where *Cs* is the constant of the model (the Smagorinsky constant), and |*S*| = 2*SijSij* is the contraction of the strain rate tensor of the large scales, *Sij*. Finally, the SGS stresses are calculated as:

$$
\pi\_{i\dot{j}} = -2\nu\_T \overline{\bf S}\_{i\dot{j}}.\tag{6}
$$

The Smagorinsky constant is commonly considered to range between 0.065 and 0.2; Lilly [21] found a theoretical value of 0.18, but the Smagorinsky constant depends on the type of the flow, e.g., in shear flows it must be declined to 0.1 [22]. A comprehensive study of the Smagorinsky constant in plane channel flows can be found in Stocca [23]. The filter width is proportional to the grid size in all directions, and is equal to Δ = (Δ*x*Δ*y*Δ*z*)1/3. Near the walls, where the eddy viscosity is expected to vanish, the van Driest damping function is adopted.

In the dynamic Smagorinsky model proposed by Germano et al. [24], the coefficient is calculated dynamically by defining an additional test filter (denoted by a caret) with a width Δ larger than the grid filter width Δ; here Δ = 2Δ. The dynamic Smagorinsky coefficient is computed in a least-square sense over tensor components:

$$\mathbb{C} = -\frac{1}{2} \frac{\langle L\_{ij} \mathcal{M}\_{ij} \rangle}{\langle \mathcal{M}\_{ij} \mathcal{M}\_{ij} \rangle},\tag{7}$$

where <> denotes spatial average and *Lij* resolved turbulent stresses:

$$\mathcal{L}\_{i\bar{j}} \equiv \overleftarrow{\widetilde{u}\_{i}} \overleftarrow{\widetilde{u}\_{\bar{j}}} - \overleftarrow{\widetilde{u}\_{i}} \overleftarrow{\widetilde{u}\_{\bar{j}}} \quad \mathcal{M}\_{i\bar{j}} \equiv \mathfrak{L} \left( \Lambda^{2} | \overline{\widetilde{\mathcal{S}}} | \overline{\widetilde{\mathcal{S}}}\_{i\bar{j}} - \widehat{\Lambda}^{2} | \overline{\widetilde{\mathcal{S}}} | \widehat{\widetilde{\mathcal{S}}}\_{i\bar{j}} \right) \tag{8}$$

and then eddy viscosity can be calculated as below.

$$
\nu\_T = \mathbb{C}\dot{\Delta}^2|\overline{\mathbb{S}}|.\tag{9}
$$

The advantage of the dynamic model over the standard one is that the eddy viscosity vanishes near the walls and in laminar flows [24]; so, the model allows to treat transitional and wall bounded flow without the need to use special treatments of the constant. In order to follow the solid boundaries, the numerical model uses the curvilinear-coordinate form of the Navier-Stokes equations frame. Spatial discretization in the computational space is carried out using second order central finite differences. Temporal integration is carried out by using the second order accurate Adams-Bashforth scheme for the convective term and

the implicit Crank-Nicolson scheme for the diagonal viscous terms. A collocated grid is considered where pressure and Cartesian velocity components are defined at the cell centers, and the volume fluxes are defined at the midpoints of the cell faces. For the numerical model see Zang et al. [25], whereas for the implementation of the SGS models see Armenio and Piomelli [26]. A new parallel version of the model has been developed over the years and a version suited for environmental applications (LES-COAST) is available [27].

#### **3. Wall Layer Model for Body Fitted Geometry**

A basic equilibrium wall stress model is present in the solver. The wall stress is obtained from instantaneous horizontal velocity at the first off-wall centroid based on the law of the wall:

$$\mu\_p^+ = \begin{cases} \frac{1}{\kappa} \ln(y\_c^+(1)) + B \text{ if } y\_c^+(1) > 11\\ \quad y\_c^+(1) & \text{if } y\_c^+(1) \le 11 \end{cases} \tag{10}$$

where

$$
\mu\_p^+ = \frac{u\_p}{u\_\tau} = \frac{\sqrt{u^2 + w^2}}{u\_\tau}
$$

is the modulus of instantaneous non-dimensional velocity at the first off-wall computational node *P*, which has a distance *yc*(1) from the wall; *κ* = 0.41 is the von Kármán constant, and B varies between 5 and 5.5 [2]. Here *B* = 5.1 was used for all the simulations. The friction velocity *u<sup>τ</sup>* is calculated from the velocity *up* at each near-wall computational node, and depends on the distance from the wall *y*<sup>+</sup> *<sup>c</sup>* (1), either from the linear or logarithmic law. Then, wall shear stress *τ<sup>w</sup>* is calculated from friction velocity; *τ<sup>w</sup>* = *ρu*<sup>2</sup> *<sup>τ</sup>*. More details of the procedure can be found in Fakhari [28].

In addition to this, the knowledge of the contraction of the resolved strain rate tensor is required in order to use an SGS eddy viscosity at the wall in the Smagorinsky model. Since the tangential velocity at the wall is not determined, the evaluation of the leading terms of *Sij* becomes increasingly wrong with decreasing grid resolution.

To overcome this problem, the leading terms of the strain rate tensor *Sij* are set analytically based on the location of *yc*(1). If the first point *P* is in the logarithmic region, the leading terms of the strain rate tensor are:

$$
\overline{S}\_{12} = \frac{\mathsf{u}\_{\mathsf{T}}}{\mathsf{x}y\_{\mathsf{c}}(1)} \frac{\overline{\mathsf{u}}}{\mathsf{u}\_{\mathsf{p}}} + \frac{\partial \overline{\mathsf{v}}}{\partial \mathsf{x}}, \quad \overline{S}\_{32} = \frac{\mathsf{u}\_{\mathsf{r}}}{\mathsf{x}y\_{\mathsf{c}}(1)} \frac{\overline{\mathsf{w}}}{\mathsf{u}\_{\mathsf{p}}} + \frac{\partial \overline{\mathsf{w}}}{\partial \mathsf{z}}.\tag{11}
$$

Consequently the value of the eddy viscosity near the wall adjusts consistently with the imposed stress. The wall layer model mentioned above is for smooth surfaces. If we deal with a rough surface with the roughness height *y*0, the velocity profile is:

$$u\_p^+ = \frac{1}{\kappa} \ln(\frac{y\_\mathfrak{c}(1)}{y\_0}) \quad y\_\mathfrak{c}(1) > y\_0. \tag{12}$$

The same procedure based on Equation (11) can be used to modify the leading terms of *Sij*.

This wall layer model is very economic and reasonably accurate in attached flows. Its main drawback is the simulation of separated flows [1,2] as will be discussed with details in Section 3.2. The logarithmic law does not capture separation; therefore, in detached flows a stretched grid near-wall is required to have more resolution there to set the near-wall computational nodes inside the viscous layer.

Cabot and Moin [8] used wall layer models with LES to simulate flow over a backward-facing step. They considered a zonal approach using the thin boundary layer equation (TBLE) in the inner layer and LES with dynamic SGS model in the outer region. They used 8 uniform cells from the bottom of the

wall up to half the height of the domain, much less than in a former stretched-grid which was used for a wall-resolving LES. The authors introduced a dynamic treatment of the von Kármán constant to equate the stress predicted by TBLE in the inner layer (*κyu <sup>τ</sup> Sij*) in a least-square sense to the total one (resolved + SGS) computed by LES in the outer region (−*u iuj* <sup>−</sup> *<sup>τ</sup> ij*). The dynamic *<sup>κ</sup>* was used to calculate the eddy viscosity in the inner region, *<sup>ν</sup><sup>t</sup>* <sup>=</sup> *<sup>κ</sup>yuτD*, *<sup>D</sup>* = [<sup>1</sup> <sup>−</sup> *exp*(−*y*+/*A*+)]<sup>2</sup> in which *<sup>A</sup>*<sup>+</sup> <sup>=</sup> 17, but in that case since *<sup>y</sup>*<sup>+</sup> *<sup>A</sup>*<sup>+</sup> they considered *<sup>D</sup>* as unity. In Cabot and Moin [8] the use of dynamic *<sup>κ</sup>* was justified by the fact that their wall model (TBLE) would carry shear stress both in the RANS eddy viscosity and the RANS convective terms, thus they needed a reduced RANS eddy viscosity, and by using the dynamic *κ* which had the values less than the von Kármán constant, and reduced the shear stress.

#### *3.1. Model Optimization for Body Fitted Geometry*

In this work the context is different. It is known that the equilibrium stress wall function works well in equilibrium flows, and has a drawback in the presence of separation [1,2]. In addition, in separated regions the velocity profile does not follow the standard log law; therefore, a modification of the standard log law is required. Here, the aim of using dynamic *κ* is not to reduce stress, but to have it deviate from the standard log law. The dynamic *κ* is calculated here to match analytical stress from the boundary layer and the stress computed with dynamic Smagorinsky using a coefficient:

$$\kappa = -coeff \cdot \frac{}{} \,\tag{13}$$

in which <> denotes averaging over the directions of homogeneity. First, a simulation of a periodic open channel flow was carried out in order to obtain a cross-sectional plane of data instantaneously to be used as the inflow over the hill, as will be discussed in Section 3.2. For the simulation of periodic open channel flow, a plane averaged approach was performed to compute *coef f* in such a way that the dynamic *κ* becomes equal to the von Kármán constant. In LES–COAST, *coef f* = 0.4 was obtained, and this will be explained in Section 3.2.

#### *3.2. Application of the Wall Layer Model for Body Fitted Geometry*

The wall layer model is here tested on a detached flow case. The accuracy of the wall layer model is checked and then the model is improved to increase the accuracy of the wall function. Most of the wall layer models have weakness in capturing the points of flow separation and re-attachment, separately, especially when the slope of a solid wall changes gradually. Thus, this work focuses on flow simulation over a single two-dimensional hill, which, from one side, is a very challenging problem for wall layer models and, from the other side, literature data are available.

The geometry and boundary conditions are collected from ERCOFTAC classic database, environmental flows area, case 69 (http://ERCOFTAC/case69) [29]. The hill height is *H* = 0.117 m and the amplitude of the hill is *a* = 3*H* (Figure 1a). The wind tunnel experiments were carried out by Khurshudyan et al. [30]. A wall-resolved LES of turbulent boundary layer over a 2D hill was also performed by Chaudhari et al. [31] and Chaudhari [32] for two different aspect ratios *a* = 3*H* and 5*H*. Since they faced difficulty in validating the reattachment point for the aspect ratio of *a* = 3*H* with the experiment, this case was specifically chosen for this study as a challenging problem.

(**a**) Hill geometry in which *H* is the hill height, *a* is the aspect ratio, with courtesy of ERCOFTAC, Environmental Flows [29].


(**b**) Computational domain.

**Figure 1.** The hill geometry and the domain grid with resolution of 96 × 20 × 32 cells in streamwise, wall normal and spanwise directions, respectively, for simulation of flow over hill using dynamic *κ* (body fitted approach).

A domain dimension with streamwise length *Lx* = 7.12 *δ*, height *δ* and width (in spanwise direction) *Lz* = *δ* is considered where *δ* = 1 m.

The boundary conditions are given as follows: At the inlet, inflow is obtained from simulation of a periodic open channel flow with the same cross-sectional grid resolution; a cross-sectional plane was considered to collect the instantaneous data at each time step after the channel flow reached a steady state and the collected data was used as inflow for the flow simulation over the hill. The roughness characteristic height of the wall for this case is *y*<sup>0</sup> = 1.57 mm, shear velocity *u<sup>τ</sup>* = 0.178 m/s and the Reynolds number based on shear velocity is *Re<sup>τ</sup>* = *uτδ*/*ν* = 1187; at the outlet, radiative boundary conditions were given; at the upper boundary the free slip condition is imposed; periodic conditions are applied along the spanwise direction. The dynamic SGS model wis employed for simulations with body fitted grid, with the constant obtained from spanwise averaging; simulations were run with a fixed Courant number equal to 0.2.

The results of this simulation, which was initially performed using 96 × 32 × 32 cells in streamwise, wall normal and spanwise directions, respectively, show that the wall layer model described in Section 3, with a fixed value of the von Kármán constant, is not able to capture flow separation when a uniform grid is employed (where *y*<sup>+</sup> *<sup>c</sup>* (1) = 18.5). Conversely, stretching the grid to set the first computational node off the wall inside the viscous layer allows obtaining flow separation. Figure 2 shows a comparison between the experiment by Khurshudyan et al. [30] and three simulations: The case with uniform grid, and two cases with stretched grids in which the first centroids are at *y*<sup>+</sup> *<sup>c</sup>* (1) = 4 and *y*<sup>+</sup> *<sup>c</sup>* (1) = 7, respectively. All the grids share the same number of grid cells and the resolution in streamwise and spanwise directions, while the two grids are stretched vertically using the Vinokur approach [33] to have the largest cells at the top of the domain with <sup>Δ</sup>*y*<sup>+</sup> <sup>≈</sup> 64. The free surface velocity is used as *Uref* to normalize the velocity plots.

downstream from the hill.

**Figure 2.** Mean streamwise velocity and root mean square (RMS) of its fluctuations obtained from simulation using 32 cells in the wall normal direction for the cases with *y*<sup>+</sup> *<sup>c</sup>* (1) = 4 and *<sup>y</sup>*<sup>+</sup> *<sup>c</sup>* (1) = 7 compared to the experiment by Khurshudyan et al. [30]; the hill starts from *x* = −*a* and ends at *x* = *a*.

For the cases with fixed value of *κ*, the starting point of flow separation is at *x* = 0.5*a* using the stretched grids; the reattachment point is predicted well in these cases. This test reveals that the resolutions in streamwise and spanwise directions are sufficient to capture boundary layer separation. Overall, separation is predicted with a grid that for the vertical resolution near the wall resembles that of a wall-resolving LES. This is opposite the expectations from a wall layer model, which is supposed to work well with a poor resolution in the wall region.

The improved model which makes use of the dynamic evaluation of the von Kármán constant *κ* is tested on the uniform mesh. In this case separation is captured; at *x* = 0.5*a* the first centroid is too far from the wall to capture the starting point of separation, while at the other locations, separation is predicted correctly as well as boundary layer reattachment.

Applying the dynamic evaluation of *κ* in the case of the stretched grids reveals that in the case with *y*<sup>+</sup> *<sup>c</sup>* (1) = 4 the mean velocity profile at *x* = 0 demonstrates a better acceleration at the first centroid compared to the case with a fixed value of *κ*; moreover, at *x* = 0.5*a*, which is the starting point of separation, the negative velocity displays a larger magnitude at the few cells above the wall, which is closer to the experimental values, while a bit farther from the wall, the case with constant *κ* is in a better agreement with the experiment. In the separation region and even after reattachment, the mean velocity at the near-wall centroid is smaller than using the fixed value of *κ*, and the reattachment point is predicted correctly for *y*+ *<sup>c</sup>* (1) = 4. The case with *y*<sup>+</sup> *<sup>c</sup>* (1) = 7 demonstrates higher velocity at the near-wall computational node using dynamic *κ* in the beginning of separation (*x* = 0.5*a*), while at *x* = *a* and beyond, the velocity gets smaller than the fixed *κ*, and the reattachment point is delayed.

The root mean square (RMS) of streamwise velocity fluctuations exhibits an overestimation close to the wall for stretched grids using both fixed and dynamic *κ*, while the uniform grid with dynamic *κ* slightly underestimates this quantity in most of the regions. This overestimation is due to the fact that the equilibrium stress model for the rough surface computes the shear velocity and wall shear stress from the logarithmic law (Equation (12)), while the near-wall computational nodes in the stretched grids are located inside or near the viscous sub-layer.

Figure 3 contains mean velocity profiles related to the present simulations, the wall-resolved LES by Chaudhari et al. [31] and the experiment by [30] at three critical locations: The hill crest, the starting point of separation and just beyond the reattachment point. For the wall-resolved LES case, the data were obtained by extraction of significant points from their plots. The mean velocity profile shows that at the top of the hill, the flow acceleration for wall-resolved LES agrees well with the experiment. Moreover, the point of separation is captured well, but the flow experiences an early reattachment at *x* = 2*a*.

The wall-resolved LES underestimates root mean square of streamwise velocity fluctuations except at the start of separation *x* = 0.5*a* close to the wall, as depicted in Figure 3b.

Figure 3a shows that, overall, the present results exhibit a reasonable accuracy, also in view of the grid herein employed. It is to be noted that the reference wall-resolved LES by Chaudhari et al. [31] with 495 × 70 × 136 numerical grids (48 times more expensive than the current wall layer model simulation) had an early prediction of reattachment point, while in this simulation there is a better agreement of the reattachment point with the experiment.

Successively, a different grid resolution was applied in the vertical direction (96 × 20 × 32 cells in streamwise, wall normal and spanwise directions, respectively) on the same geometry with the same boundary conditions. The domain grid is shown in Figure 1b, with the first computational node situated in the logarithmic region at *y*<sup>+</sup> *<sup>c</sup>* (1) ≈ 30.

The simulations were carried out as follows. First, an open channel flow at *Re<sup>τ</sup>* = 1187 based on the friction velocity *u<sup>τ</sup>* and the channel height was simulated, with a grid resolution of 40 × 20 × 32 cells in streamwise, wall normal and spanwise directions, respectively. Before the simulation, a test was carried out to compute the dynamic (Smagorinsky) model constant and also *coef f* of Equation (13) applying a plane average approach, and using *coef f* = 0.4 as a target value of the von Kármán constant *κ*.

Thereafter, in another attempt, spanwise averaging of the numerator and denominator of Equation (13) with *coef f* = 0.4 and also the dynamic model constant was performed. Note that *κ* must be positive since it is used in the denominator to calculate friction velocity and strain rate tensor (see Section 3). Setting the minimum value of 0.06 avoided unrealistic values of velocity due to the near-zero or negative value of *κ*. Moreover, larger threshold values were tested to check the sensitivity to this lower bound; the analysis showed that it does not affect the results, hence the method is robust from this aspect.

(**b**) Root mean square of streamwise velocity.

**Figure 3.** Mean streamwise velocity and RMS of its fluctuations at three critical positions from simulation using 32 cells in the wall normal direction for the cases with *y*<sup>+</sup> *<sup>c</sup>* (1) = 4 and *<sup>y</sup>*<sup>+</sup> *<sup>c</sup>* (1) = 7 using constant and dynamic *k*, and uniform grid with dynamic *κ* compared to resolved LES by Chaudhari et al. [31] and the experiment by Khurshudyan et al. [30] at the hill crest, the beginning of separation and just before the reattachment points.

Figure 4a shows time- and spanwise-averaged velocity profiles at different positions along the streamwise direction. As expected (Figure 4a), the current resolution is too coarse, since the first computational node is out of the separation region; this explains the absence of separation at *x* = 0.5*a*. However, separation occurs at *x* = 2*a* which has a delay with respect to both the wall-resolved LES and the experiment, and reattachment is obtained at *x* = 3*a* which is closer to the experiment, while the wall-resolved LES exhibits an earlier reattachment.

Figure 4b shows the root mean square of streamwise velocity fluctuations. Generally there is an underestimation of the velocity fluctuations especially right before and at the starting point of the separation region. Chaudhari et al. [31] also underestimated the RMS of the streamwise velocity fluctuations uphill, and overestimated it in the separation region.

As was discussed in the previous section, the use of dynamic *κ* in this method is different than in Cabot and Moin [8]. To understand this better, instantaneous values of the dynamic *κ* along *x* for the flow over the single hill of this simulation in Figure 5. Since *κ* is in the numerator to calculate the shear velocity and wall shear stress (*u<sup>τ</sup>* = *u* ∗ *κ*/*ln*(*y*/*y*0)), on top of the hill, there is an increase of *κ*, which gives a better acceleration rather than the standard log law (*κ* = 0.41). Then, at the starting point of separation, *κ* reduces and therefore the wall shear stress goes close to zero. Thereafter, inside the separation region, at most of the spots *κ* is large, leading to a negative value of the wall shear stress with a larger magnitude, and finally it goes close to 0.41 after reattachment. In this way, the equilibrium stress model is applicable on separated flows.

(**b**) Root mean square of streamwise velocity fluctuations.

**Figure 4.** Mean streamwise velocity and RMS of its fluctuations obtained from simulation using 20 cells in the wall normal direction, applying dynamic *κ* compared to the experiment by Khurshudyan et al. [30]; the hill starts from *x* = −*a* and ends at *x* = *a*.

**Figure 5.** Instantaneous values of dynamic *κ* on a stream line in the current simulation of flow over the single hill.

#### **4. Wall Layer Model for Immersed Boundary Methodology (IBM)**

This part of the work takes advantage of the model developed in Roman et al. [34]. The velocity reconstruction method is briefly recalled here. With reference to Figure 6, the velocity at the projection point (PP) is interpolated from velocities at its surrounding points (empty square points in Figure 6). Then, shear velocity at the PP point is computed. Finally, based on the distance of the first centroid off-the-wall, i.e., the IB node, from wall in wall units, the velocity at the IB node is calculated using the law of the wall:

$$\overline{u}\_{IB} = \begin{cases} \overline{u}\_{PP} - \frac{1}{\kappa} \sqrt{\frac{\tau\_w}{\rho}} \ln(\frac{d\_{PP}}{d\_{IB}}) & \text{if } \ d\_{IB}^+ > 11\\ \frac{d\_{IB} \underline{u} \tau^2}{\nu} & \text{if } \ d\_{IB}^+ \le 11 \end{cases} \tag{14}$$

Considering that the velocity is known at the PP point, a parabolic interpolation is carried out to compute the wall normal velocity at the IB node:

$$
\overline{u}\_{n,IB} = \overline{u}\_{n,PP} \frac{d\_{IB}^2}{d\_{PP}^2}.\tag{15}
$$

A reconstruction of shear stress at the cell face is also done using a RANS-like eddy viscosity. The eddy viscosity at the IB nodes is calculated analytically from mixing the length theory from the equation

$$
\omega\_{\ell} = \mathbb{C}\_{w} \kappa \mu\_{\tau} d\_{IB},
\tag{16}
$$

where *κ* is the von Kármán constant, and *Cw* is an intensification coefficient to be determined such that the Reynolds shear stress is a fraction of the wall shear stress near the wall (where *ν<sup>T</sup>* is set). Consequently, this coefficient can be written as:

$$\mathbb{C}\_{w} = \frac{\tau\_{F} d\_{F}}{\tau\_{w} d\_{IB}},$$

where the index *F* denotes a quantity calculated at the cell face, *dF* is the distance between the cell face and the immersed surface. A more detailed description can be found in Roman et al. [34] and Roman [35]. This wall layer model is very economic since the velocity is just interpolated from the projection point PP, and works well in attached flows using a very coarse grid [34]. Since the interpolation is based on

the logarithmic law, and the velocity direction at the IB node always follows the velocity at the PP, this method has a drawback in separated flows even setting the IB node inside the viscous layer.

**Figure 6.** Discretization of a fluid–solid interface with the immersed boundary method by Roman et al. [34]. Solid squares, empty squares and empty circles represent solid, fluid and IB nodes respectively. Reproduced with permission from AIP Publishing [4714120594491].

Simulation of the flow over the hill, applying Cartesian and curvilinear grids, using the Roman et al. [34] model in its original formulation, does not allow to obtain acceptable results, since the flow does not separate even with increasing resolution close to the immersed boundary and with setting the of the IB node inside the viscous layer [28]. Here, first the Roman et al. [34] model on a standard open channel flow is re-analysed and successively the flow over the hill is investigated.

#### *4.1. Calibration of the Wall Layer Model with IBM*

In order to check the accuracy of the current IBM, an open channel is considered with a resolution of 48 × 20 × 32 cells in the streamwise, wall normal and spanwise directions, respectively, and a dimension of 7.12*δ* × 1*δ* × 1*δ* in which *δ* is the height of the open channels. The lower wall is reproduced with immersed boundaries and the simulation is carried out imposing a non-dimensional driving pressure gradient (*dp*/*dx* = 1). The friction Reynolds number is *Re<sup>τ</sup>* = 2000, giving the IB node as located at *y*<sup>+</sup> = 50.

The velocity profile scaled with the friction velocity collapses over the theoretical line as has already been shown in Roman et al. [34]. However, considering the non-scaled averaged velocity, the method underestimates the velocity compared to the logarithmic law at the same rate as the shear velocity, thus demonstrating a loss of momentum. In particular, the wall shear stress is underestimated by approximately 16% in the simulation. This has to be attributed to the current implementation of IBs, which does not conserve momentum perfectly. This is a common issue in a number of IB methodologies, although more recent implementations appear more conservative (Kim et al. [36], Meyer et al. [37] and Rapaka and Sarkar [38]).

After this observation, several cases of open channel geometries are created sharing the same dimension and resolution; the only difference stands in the fact that the immersed boundary surface was moved slightly to check whether the position of the IB node affects the results. Five cases are considered:

Cases 1 to 4 in which the IB node was located in the logarithmic region, and case 5 in which it was in the viscous layer. The positions of the immersed boundary for the five cases are displayed in Figure 7 and details are given in Table 1. Simulations are hence carried out for all cases at *Re<sup>τ</sup>* = 2000, with the same boundary conditions as the original case. The results show an underestimation of friction velocity for cases 1 to 4 in which the IB node is set in the logarithmic region, and an overestimation of shear velocity for case 1 which has the IB node in the viscous layer (see Table 1), indicating that the IBM of Roman et al. [34] is sensitive to the position of the immersed surface with respect to the grid line, and it is geometry dependent. Figure 8a displays a non-scaled mean velocity profile for all cases compared to the theoretical velocity based on the law of the wall.

Apart from the shear velocity deviation, the non-dimensional velocity profile compared well with the wall law for the first four cases (Figure 8b). A calibration of the IBM is carried out to avoid momentum loss. For this purpose, the coefficient which is used to calculate eddy viscosity at the IB nodes (*Cw*) in Equation (16) is varied to reach the target value of the wall stress.

**Figure 7.** Five different channels using the immersed boundary as the lower wall in different positions, cases 1 to 5 in order from left to right. Filled squares display IB node.

**Table 1.** Different geometries description of IB cases and shear velocity obtained from simulation.


**Figure 8.** Mean velocity for IBM cases before and after calibration versus law of the wall.

**Figure 9.** Cubic line fitted to the calibrated values computed for the 5 cases.

Table 2 shows the optimal coefficients and friction velocities obtained after running the simulations for all five cases. Further, it has been verified that changes in the coefficients by 10% produced very small variations in the shear velocity, thus showing the robustness of the method.


**Table 2.** Optimal coefficient for different positions of the IB.

The simulations are then repeated at *Re<sup>τ</sup>* = 4000 to check the accuracy of the calibration; it appeared insensitive to variations in the Reynolds number [28]. The coefficient can be written as a function of the

non-dimensional distance from the wall (*dIB*/*dy*), which is within the interval of 0–1. As it can be seen in Figure 9, a cubic law can be fitted to the numerical points is written below.

$$\mathcal{C}\_{\varepsilon} = -6.9576 \left( \frac{d\_{IB}}{dy} \right)^{\frac{\gamma}{3}} + 15.051 \left( \frac{d\_{IB}}{dy} \right)^2$$

$$-10.902 \left( \frac{d\_{IB}}{dy} \right) + 3.8493 \tag{18}$$

The dimensional and non-dimensional mean velocity profiles after calibration for all the cases are shown in Figure 8c,d, respectively. All cases are non-dimensionalized with shear velocity obtained from the mean velocity at the IB nodes based on the law of the wall. An improvement can be observed after calibration, especially for case 5 in which the IB node is located close to the immersed boundary. Finally the RANS-like eddy viscosity at the IB nodes can be calculated from the calibrated coefficient *Cc*, which is obtained from the cubic fitting:

$$
\nu\_T = \mathbb{C}\_c k \mu\_\tau d\_{IB}.\tag{19}
$$

Calibration of the coefficient allows to improve the prediction of the wall shear stress, hence, the velocity profile. The results of the simulations for all cases show that the mean velocity and root mean square of velocity fluctuations are in good agreement with the law of the wall and also the DNS of plane channel flow of Hoyas and Jimenez [39] for a channel at *Re<sup>τ</sup>* = 2000 [28]. Only for cases 3 and 4 in which IB nodes are at the beginning of the log-region (*d*<sup>+</sup> *IB* ≤ 50) is there a deviation of the mean velocity profile. However, this is a case that can be hardly encountered in real-scale flows, characterized by very high values of *Re*. It is important to point out that calibration may not be necessary in the presence of IB methodologies strictly conservative for momentum.

While the wall layer model predicts the velocity profile well, it underestimates the wall normal velocity fluctuations. Thus, random fluctuations near the wall are added to improve the model. In hybrid LES/RANS and DES, stochastic forcing is needed to reduce the effect of excessive damping coming from the RANS eddy viscosity [11,12]. In the present model, stochastic forcing is more likely required because of the grid coarseness, which is, however, a standard in simulations for engineering purposes.

Taylor and Sarkar [40] used random stochastic forcing in the wall normal direction after finding that the near-wall model (NWM) LES with the dynamic eddy viscosity model (DEVM) underrated vertical velocity fluctuations in their body fitted plane-channel-flow simulation.

This forcing term was applied to the right-hand side of the momentum equation in the wall normal direction to increase velocity fluctuations in this direction and control the Reynolds stress in order to allow it to fit the logarithmic velocity profile. This term is written as:

$$f\_{\mathcal{Y}}(x, y, z) = \pm \mathbb{R} \* A(y),\tag{20}$$

where *R* is a random number between 0 and 1, and *A*(*y*) is an amplitude function. At each time step, the latter can be obtained from summation of the amplitude at the previous iteration, and another term including the error function:

$$A(y)^{n+1} = A(y)^n + \frac{\mu\_{\mathsf{T}}\epsilon(y)}{\mathsf{T}},\tag{21}$$

where the error function is the difference between the resolved shear stress and the theoretical value based on the logarithmic law:

$$\epsilon(y) = \frac{ky}{u\_{\tau}} \left( \frac{d \langle u \rangle^2}{dy} + \frac{d \langle w \rangle^2}{dy} \right)^{\frac{1}{2}} - 1,\tag{22}$$

where *τ* is a relaxation time. This approach is correct if the first computational node is in the logarithmic region since in Equation (22) the error is established upon *ky*/*uτ*, which is the theoretical resolved shear stress based on the logarithmic law. However, this is not a serious issue, since the model is designed to work for high *Re* number flows. Finally, the sign of the function is computed to reduce the correlation between streamwise and vertical velocity fluctuations (*u* and *v* ) if the error is positive, and to enhance this correlation if the error is negative:

$$\begin{aligned} \text{sgn}\left(f\_{\mathcal{Y}}(\mathbf{x}, \mathbf{y}, \mathbf{z})\right) &= -\text{sgn}\left(\frac{d\langle u\_{\mathcal{s}}\rangle}{dy}\right) \* \text{sgn}\left(\mathfrak{e}(\mathbf{y})\right) \\ &\ast \text{sgn}\left(u'\_{\mathcal{s}}(\mathbf{x}, \mathbf{y}, \mathbf{z})\right), \end{aligned} \tag{23}$$

where *us* is the velocity in the direction of the mean wall shear stress.

The stochastic random force in case 4 is imposed instantaneously at every iteration of the simulation. Since the IB node is located in the logarithmic region, the horizontal velocity is interpolated based on the logarithmic law from the projection point PP; adding this force to the first two nodes does not improve the velocity profile. Considering one more point to impose the forcing term on, it makes the deviation from the log law disappear (Figure 10).

This happens because the stochastic forcing allows to obtain a more accurate prediction of the resolved Reynolds shear stress (associated to the resolved velocity fluctuations) as can be seen in Figure 11.

**Figure 10.** Mean velocity profile before and after adding stochastic forces for case 4 compared to the log law.

**Figure 11.** Reynolds shear stresses in global coordinates compared to the calibrated IBM for case 4.

The relaxation time plays an important role in the simulation. It has to be large enough to let the flow adapt itself to the force. On the other hand, if the relaxation time is too large the effect of force is diluted. In the present simulations, 0.0006 *uτ*/*δ* ≤ 1/*τ* ≤ 0.002 *uτ*/*δ* gave the best results. Values larger than 0.002 *uτ*/*δ* increased the velocity continuously since the flow did not have time to adjust itself to the force; values smaller than 0.0006 *uτ*/*δ* displayed a very small improvement in velocity profile.

This method works very well when the IB node is located in the logarithmic region and in attached flows since a logarithmic law for the velocity profile is expected. However, in the present form, it is not able to predict massive separation.

Steady two-dimensional separation is known to be related to the sign of the streamwise pressure gradient. After neglecting advective terms, the boundary layer equation in the streamwise direction simplifies as:

$$\left(\nu\left(\frac{\partial^2 u}{\partial y^2}\right)\_{\text{wall}} = \frac{1}{\rho}\frac{\partial p}{\partial x}.\tag{24}$$

Considering the flow over a curved surface, a negative streamwise pressure gradient develops upward, associated to an acceleration of the flow. Thereafter, the change of curvature makes the pressure gradient become unfavourable, and from the continuity equation it contributes to an enhancement of the boundary layer thickness. Hence the second derivative of velocity is positive and therefore there is a change of sign of the velocity curvature inside the boundary layer. At a point called inflection, the second derivative of streamwise velocity becomes zero; *∂*2*u*/*∂y*<sup>2</sup> = 0. If the unfavourable pressure gradient is strong enough, the flow next to the wall changes direction giving rise to a recirculation region. As already noted, in this method the velocity at the IB node is calculated from the friction velocity at the projection point. This means that *uIB* is always in the same direction of *UPP*, and the flow is not allowed to separate, since the above-described physical mechanism is not accounted for. For this reason, the method of calculation of the velocity at the IB point is modified as follows. The boundary layer equation close to the wall, after neglecting advective terms is:

$$
\nu\_T \left( \frac{\partial^2 \mathfrak{u}}{\partial y^2} \right)\_{IB} \sim \left( \frac{1}{\rho} \frac{\partial p}{\partial \mathbf{x}} \right)\_{IB} . \tag{25}
$$

Using the central difference scheme (CDS) in order to discretize the second derivative of velocity for a non-uniform grid (Ferziger and Peric [41]) considering PP, IB and IP nodes, it can be written as:

$$
\left(\frac{\partial^2 u}{\partial y^2}\right)\_{IB} \approx \frac{u\_{PP}(d\_{IB}) + u\_{IP}(d\_{PP-IB}) - u\_{IB}(d\_{PP})}{\frac{1}{2}(d\_{PP})(d\_{PP-IB})(d\_{IB})},\tag{26}
$$

in which *dPP*−*IB* is the distance between IB and PP points. Finally the streamwise tangential velocity at the IB node can be written as a function of the tangential pressure gradient instead of the friction velocity;

$$
\mu\_{IB} = \mu\_{PP} \left( \frac{d\_{IB}}{d\_{PP}} \right) - \mathbb{C} \left( \frac{\partial p}{\partial \mathbf{x}} \right)\_{IB} \frac{(d\_{PP-IB})(d\_{IB})}{\nu\_T}, \tag{27}
$$

where the eddy viscosity is also the calibrated value obtained from Equation (19).

Using a coefficient (C) of the order of 10−<sup>3</sup> inside the second term containing the pressure gradient at the IB avoids problems in regions characterized by the presence of large values of the pressure gradient with small values of turbulent eddy viscosity. Many simulations have been carried out to find out the best criterion for using this new scheme. Finally, applying Equation (27) under the following conditions:

$$
\left(\frac{\partial p}{\partial x}\right)\_{IB} > 0,\tag{28}
$$

$$d\_{IB}^{+} < 40,\tag{29}$$

gave the best results for simulating separated flows. The two criteria physically mean that the new scheme to calculate tangential velocity at the IB is used if the tangential pressure gradient at the IB is unfavourable, and the distance of the IB node from the immersed boundary is such that the velocity point does not belong to the log region of the velocity profile.

#### *4.2. Flow Simulation over a Single Hill Using IBM*

This new model is applied for simulation of the flow over a two-dimensional hill. First an open channel is constructed in which the immersed boundary is set as the lower wall. After the periodic open channel flow reached a steady state, instantaneous data are collected from a cross sectional plane at different time steps. Then the bottom of the domain including the hill shape is created as the immersed boundary, and the data that are obtained from the plane channel flow are used as inflow for the flow simulation over the hill.

The simulation is performed using two different grids (Cartesian and curvilinear) at *Re<sup>τ</sup>* = 1187. Applying the new scheme, it is feasible to capture separation in the simulation with a grid resolution of 128 × 40 × 32 cells in the streamwise, wall normal and spanwise directions, respectively. With this resolution, the IB node upward and downward from the hill is located at *d*<sup>+</sup> *IB* ≈ 15. The mean velocity profiles for both grids are compared with the body fitted approach and the experiment by Khurshudyan et al. [30] as shown in Figure 12a. In the simulation for both grids, the starting point of separation is predicted well at *x* = 0.5*a*. The reattachment points for both cases are anticipated before *x* = 2*a*; therefore, at *x* = 2*a* the mean velocities are positive, although very small close to the immersed boundary.

In addition, for the Cartesian grid a simulation without calibration is performed to check the effect of eddy viscosity calibration on flow separation; as it can be observed in Figure 12a, the case without calibration is not able to capture the starting point of separation properly, and in general it underestimates separation.

The root mean square of streamwise velocity fluctuations of the present simulations (Figure 12b) are in a good agreement with the experimental data. While there is an overestimation of fluctuations beyond the starting point of separation, the behavior of this quantity is similar to that of the experiment. Comparing the results of the present model with the one developed for the body fitted grid, it appears

that the present IBM anticipates the starting point of separation in better agreement with the wall-resolved LES and the experiment, while the prediction of the reattachment point is similar to the wall-resolved LES, earlier than in the body fitted approach and the experiment. The present IBM gives more energetic velocity fluctuations than the body fitted approach. Moreover, comparing this result with the wall-resolved LES by Chaudhari et al. [31], it can be stated that the mean velocity profile has similar behavior: Predicting the starting point of separation well, and undergoing an early reattachment. The RMS of streamwise velocity fluctuations is also similar to the wall-resolved LES; depicting marginal underestimation before and after the separation region especially far from the wall, and an overestimation inside the separation region near the wall.

**Figure 12.** The results for the simulation of flow over a single hill using new the IBM scheme in Cartesian and curvilinear grids compared to the body fitted approach and experiment by Khurshudyan et al. [30]; the hill starts at *x* = −*a* and ends at *x* = *a*.

#### *4.3. Flow Simulation over 2D Periodic Hills*

Finally, the new scheme is tested on flow over two-dimensional periodic hills with polynomial shape (Almeida et al. [42]). It is Case 81 of the ERCOFTAC classic database (http://ERCOFTAC/cases/case81)[43]. The shape of the domain is shown in Figure 13.

The edges of the domain are at the top of the hill. The hill height is *h* = 28 mm, and the crests of the two successive hills are separated by *Lx* = 9*h*. The height of the channel is equal to *Ly* = 3.035*h*, and the channel width is *Lz* = 4.5*h*. The reference for comparing the results are the mean velocity and RMS of the velocity fluctuations obtained from resolved LES by Temmerman and Leschziner [44] available in ERCOFTAC, and for separation and reattachment points, obtained from resolved LES by Fröhlich et al. [45] from the NASA Langley Research Center database (http://nasa/LES/2dhillperiodic) [46].

The computational domain is created in two different ways, using Cartesian and curvilinear grids, respectively (see Figure 14). The latter is shaped in such a way as to follow the hill geometry. In both cases, the bottom surface is created as an immersed boundary. The Reynolds number based on the hill height and bulk velocity on top of the first hill is equal to *Re* = *Ubh*/*ν* = 10595, where *ν* is the molecular viscosity. The flow is periodic in the *x* and *z* directions, using immersed boundary at the bottom and free-slip surface at top.

**Figure 13.** Domain characteristics for the simulation of flow over 2D periodic hills, with courtesy of ERCOFTAC [43].

**Figure 14.** Two different grids for simulation of periodic hills using immersed boundaries, 96 × 64 × 32 cells in *x*, *y* and *z* directions, respectively.

A domain with grid resolution of 96 × 64 × 32 cells out of immersed boundary in streamwise, wall normal and spanwise directions, respectively, is constructed. This is equivalent to the coarsest grid of Chen et al. [17], who simulated flow over periodic hills using immersed boundary and applied a zonal approach (TLM); in particular, the authors used the turbulent boundary layer equations close to the immersed boundary and LES in the interior region.

The robustness of the method to reproduce separation is tested under a large variation of the threshold values employed in Equation (29). The results of this test for the threshold in the range of 30–50 are shown in Figure 15. The mean velocity profile in Figure 15a shows that in all cases separation is captured at *x* = 1*a*, but using *d*<sup>+</sup> *IB* < 50 makes the reattachment point shift back slightly with respect to the other cases and the wall-resolved LES. The mean velocity profile for *d*<sup>+</sup> *IB* < 40 is marginally closer to the resolved LES compared to *d*<sup>+</sup> *IB* < 30 (which overestimates it narrowly). In addition to the mean velocity profile, Reynolds shear stresses for *d*<sup>+</sup> *IB* < 40 demonstrates better agreement to the wall-resolved LES in the separation region and after reattachment (Figure 15b). Overall, it can be stated that the method is robust for large variations of the threshold value chosen to switch from one to the other law for the velocity at the first grid point. Large variations of this value produce marginal variations in the results.

**Figure 15.** Mean streamwise velocity and Reynolds shear stresses for a Cartesian grid compared to resolved LES by Temmerman and Leschziner [44], applying three different criteria to use the new scheme; (*∂p*/*∂x*)*IB* <sup>&</sup>gt; 0 and: *<sup>d</sup>*<sup>+</sup> *IB* <sup>&</sup>lt; 40 (square), *<sup>d</sup>*<sup>+</sup> *IB* <sup>&</sup>lt; 30 (circle), *<sup>d</sup>*<sup>+</sup> *IB* < 50 (delta).

Figure 16 depicts statistics of the simulations of flow over periodic hills compared to the wall-resolved LES by Temmerman and Leschziner [44] at ten different streamwise locations. Velocities are made non-dimensional with *Ub* which is the bulk velocity on the first hill crest; moreover, Reynolds stress and turbulent kinetic energy are non-dimensionalized with *U*<sup>2</sup> *b* .

(**d**) Kinetic energy.

**Figure 16.** Results for simulation of flow over periodic hills at different positions, using Cartesian and curvilinear grids compared to the resolved LES by Temmerman and Leschziner [44].

Figure 16a shows mean streamwise velocity profiles (averaged in time and the spanwise direction) at different streamwise locations. They are overall in good agreement with the wall-resolved LES by Temmerman and Leschziner [44]. While the mean velocity for the Cartesian grid at *x* = 0.5*h* matches the profile obtained from the wall-resolved LES, the velocity is not negative at the IB node, while the curvilinear grid displays a negative velocity at the IB node in this location. Separation is captured in both Cartesian and curvilinear grid cases, and the reattachment point for the Cartesian grid is a bit earlier than the resolved LES, while the flow in the curvilinear grid reattaches earliest before *x* = 4*h*.

The mean vertical velocity profiles in Figure 16b show a slight underprediction for both grids downhill, but in the other locations they matched the wall-resolved LES data.

Figure 16c displays the Reynolds shear stress for both cases (based on the resolved fluctuations). It behaves similarly to that of the resolved LES, while a marginal underestimation is observed before the end of downhill at *x* = 1*h* near the immersed boundary, especially in the case with curvilinear grid.

As illustrated in Figure 16d, turbulent kinetic energy for the Cartesian grid is slightly overpredicted with respect to the wall-resolved LES, while the curvilinear grid demonstrates a closer value to the reference data. Overall, the behavior of this quantity is similar to that of the wall-resolved LES for both cases.

Table 3 displays the separation and reattachment points in the current simulation, the wall-resolved LES of Fröhlich et al. [45] and WMLES (using two-layer model) of Chen et al. [17]. In the present simulation, the boundary layer separation occurs at *x* ≈ 0.54*h* with the Cartesian grid and at *x* ≈ 0.47*h* with the curvilinear grid; in the wall-resolved LES it happens at *x* ≈ 0.22*h*. Therefore a delay is observed to predict the separation point. Moreover, the intensity of the recirculation bubble predicted by the optimized IBM simulations is smaller than that of the wall-resolved LES (comparing the present simulation values with the large negative values of the reference simulation). The wall-resolved LES shows the reattachment point at *x* ≈ 4.69*h*, while in the current simulation the boundary layer reattaches a bit earlier in both cases; the Cartesian grid predicts the reattachment point at *x* ≈ 4.20*h*, while the curvilinear grid anticipates it at *x* ≈ 3.81*h*.

Overall, the agreement of the simulation result is acceptable, considering the simplicity of the model, which does not require the solution of an additional equation in the wall layer and the number of grid points employed in the tests. Moreover, in comparison to the simulations of [17], the new wall layer model on the Cartesian grid predicts the separation point better than their case with the same resolution, and on the curvilinear grid it is even closer to the reference data than the WMLES of [17], but the reattachment point in their case is predicted closer to the reference. The other statistics obtained from the current

simulation, especially the mean vertical velocity profile, are in better agreement with the wall-resolved LES than the simulations of [17].

**Table 3.** Resolution, separation point (*xs*) and reattachment point (*xr*) in resolved LES Fröhlich et al. [45], the present optimized IBM simulations and the wall model LES by Chen et al. [17].


#### **5. Summary and Conclusions**

In the present paper, a wall layer modeling for massive separation has been developed within the LES context. Two main frameworks have been considered; namely, the case where the solid wall is reproduced using a body fitted curvilinear grid and the case where the solid surface is simulated with immersed boundaries. The research aimed at developing a simple wall layer model to be used in conjunction with LES for engineering applications where the near-wall resolution is generally coarse. The starting point was the equilibrium wall layer model, which gives good results in attached flows [47]. In the case of body fitted curvilinear grids, the equilibrium wall layer model was improved using the dynamic *κ* procedure of Cabot and Moin [8] in a new fashion. The dynamic *κ* was applied on the log law without using TBLE in order to deviate the wall shear stress and the two components of strain rate tensor from the standard log law.

When the solid surface was simulated with immersed boundaries, a different strategy was accomplished. The starting point was the equilibrium wall layer model of Roman et al. [34], in which the velocity at the near-wall nodes was reconstructed from a farther point (the projection point) based on the law of the wall, and a RANS-like eddy viscosity was given at the IB node. To reconstruct the fluctuations in the wall normal direction well, especially near wall, random stochastic forcing was applied to the first three near-wall cells. Moreover, the model was complemented with a procedure designed at reproducing massive separation without the need to use zonal approaches. Specifically, after removing the advective terms from the boundary layer equation, the tangential streamwise velocity at the IB node (*uIB*) was obtained from the pressure gradient instead of calculating it from shear velocity at the projection point based on the law of the wall: When the pressure gradient was unfavourable and the IB node was not in the log region, the tangential velocity was obtained through a balance between the wall normal diffusion term and the streamwise pressure gradient. In some sense, this represents a simplified procedure with respect to the zonal approach where the boundary layer equation is solved within the near-wall layer. In order to check what grid topology is best suited for reproducing this kind of flow, both Cartesian and curvilinear grids were considered in conjunction with immersed boundaries.

The models were tested and validated over two two-dimensional geometries, both being well studied in the literature through physical and high-resolution LESs: First the isolated hill (case 69 of the ERCOFTAC classic database) was considered; then, the periodic hill which is case 81 of ERCOFTAC classic datasbe was considered.

The model for body fitted geometry was generally in good agreement with the reference data; the starting point of separation was delayed with respect to the wall-resolved LES and the experiment, but the reattachment point was closer to the experiment, while the wall-resolved LES displayed an early reattachment. Applying 32 and 20 cells in the vertical direction correspondent to <sup>Δ</sup>*y*<sup>+</sup> <sup>≈</sup> 37 and 60, respectively, the results were comparatively acceptable. However, decreasing the vertical resolution further would delay the starting point of separation more. The model developed for immersed boundaries

gave a mean velocity profile and other statistics in acceptable agreement with the reference data on both isolated and periodic hills, also in consideration of the coarseness of the grids employed; for the isolated hill, 40 cells in the vertical direction were required to cover the hill with four cells applying the Cartesian grid, which was correspondent to <sup>Δ</sup>*y*<sup>+</sup> <sup>≈</sup> 30; higher resolution was not used to avoid the near-wall computational node being located in the viscous sub-layer, and lower resolution did not capture the flow separation.

For the periodic hill, 64 cells in the vertical direction gave <sup>Δ</sup>*y*<sup>+</sup> <sup>≈</sup> 105 applying the Cartesian grid. Lower resolution did not display flow separation because of an excessive under-resolution of the hill geometry.

**Funding:** The research has been supported by project COSMO "CFD open source per opera morta", the grant number CUP n. J94C14000090006, law number 3865, PAR FSC 2007-2013, Friuli Venezia Giulia.

**Acknowledgments:** The author wholeheartedly thanks Vincenzo Armenio from Università degli Studi di Trieste, Italia, and Federico Roman from IEFLUIDS S.r.l., Università degli Studi di Trieste, Italia, for their effective collaboration to accomplish this work.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**




c 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Breaking the Kolmogorov Barrier in Model Reduction of Fluid Flows**

#### **Shady E. Ahmed and Omer San \***

School of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK 74078, USA; shady.ahmed@okstate.edu

**\*** Correspondence: osan@okstate.edu; Tel.: +1-405-744-2457; Fax: +1-405-744-7873

Received: 15 January 2020; Accepted: 14 February 2020; Published: 18 February 2020

**Abstract:** Turbulence modeling has been always a challenge, given the degree of underlying spatial and temporal complexity. In this paper, we propose the use of a partitioned reduced order modeling (ROM) approach for efficient and effective approximation of turbulent flows. A piecewise linear subspace is tailored to capture the fine flow details in addition to the larger scales. We test the partitioned ROM for a decaying two-dimensional (2D) turbulent flow, known as 2D Kraichnan turbulence. The flow is initiated using an array of random vortices, corresponding to an arbitrary energy spectrum. We show that partitioning produces more accurate and stable results than standard ROM based on a global application of modal decomposition techniques. We also demonstrate the predictive capability of partitioned ROM through an energy spectrum analysis, where the recovered energy spectrum significantly converges to the full order model's statistics with increased partitioning. Although the proposed approach incurs increased memory requirements to store the local basis functions for each partition, we emphasize that it permits the construction of more compact ROMs (i.e., of smaller dimension) with comparable accuracy, which in turn significantly reduces the online computational burden. Therefore, we consider that partitioning acts as a converter which reduces the cost of online deployment at the expense of offline and memory costs. Finally, we investigate the application of closure modeling to account for the effects of modal truncation on ROM dynamics. We illustrate that closure techniques can help to stabilize the results in the inertial range, but over-stabilization might take place in the dissipative range.

**Keywords:** reduced order modeling; Kolmogorov *n*-width; Galerkin projection; proper orthogonal decomposition; turbulent flows

#### **1. Introduction**

Reduced order modeling (ROM) has been a hot topic in response to the increasing complexity of engineering systems and the requirement of near real-time and multi-query responses along with the limited advancement in computational resources [1–5]. This is vital for informed decision-making in the context of digital twins [6–14], model-based control [15–22], data assimilation [23–33], parameter estimation [34–37], and uncertainty quantification [38–44]. In fluid systems, this has been feasible thanks to the existence of a few underlying structures (e.g., vortices) that dominate the flow dynamics as well as the tremendous availability of data, which enables us to identify those structures (also called modes or basis functions) in a data-driven fashion [45–57]. However, most of the ROM developments have been applied mainly on prototypical flow problems. In particular, there is a limited number of investigations regarding ROMs in turbulent flows (e.g., see [58–64]). This is basically due to the excessive complexity of turbulent flows and existence of so many scales that interact with each other and control the mass, momentum, and energy transfer.

Challenges in the development of ROM for turbulent flows are also related to the Kolmogorov *n*-width for these systems. The Kolmogorov *n*-width is a concept from approximation theory that determines the linear reducibility of a system [65,66]. Mathematically, it is defined as [66–68]

$$d\_n(\mathcal{M}) := \inf\_{\mathcal{S}\_n} \sup\_{q \in \mathcal{M}} ||q - \Pi\_{\mathcal{S}\_n} q||\_\prime \tag{1}$$

where S*<sup>n</sup>* is a linear *n*-dimensional subspace, M is the solution manifold, and ΠS*<sup>n</sup>* is the orthogonal projector onto S*n*. In other words, *dn*(M) quantifies the maximum possible error that might arise from the projection of solution manifold onto the best-possible *n*-dimensional linear subspace. When the decay of *dn*(M) with increasing *n* is fast, this means that the energy cascade and modal hierarchy allow the reduction of the system and the solution trajectory can be adequately approximated to evolve on a reduced linear subspace. However, for systems with strong nonlinearity, like turbulent flows, the intense interactions between different modes reduce the decay rate of the Kolmogorov *n*-width. Consequently, the linear reducibility is hindered. In ROM context, this is denoted as the "Kolmogorov barrier" because it necessitates involving more modes in ROM to guarantee sufficient approximation of the underlying dynamics. In short, turbulent flows are generally characterized by a slow decay of the Kolmogorov *n*-width, which raises the Kolmogorov barrier and the present work aims at addressing this issue.

In particular, we propose the use of a time domain partitioning approach, called principal interval decomposition (PID) [69–73] to break the Kolmogorov barrier of turbulent flows. PID has been investigated in previous studies and has been proven to be more accurate than standard ROMs based on the global basis construction (see [71,73] for example). Here, we aim at addressing a more complicated and involved flow problem characterizing two-dimensional turbulence. This allows us to investigate the effect of PID to capture the scales responsible for energy transfer and dissipation through an energy spectrum analysis. In addition, we relate the notion of the Kolmogorov barrier to the well-defined concept of relative information content in ROM context. Although turbulence is, by nature, three-dimensional, we investigate it in a simplified two-dimensional (2D) setting. It was shown that many aspects of idealized 2D turbulence are relevant for physical systems like geostrophic turbulence [74] and large-scale motions in the atmosphere and oceans [75,76]. The study of 2D turbulence idealizes geophysical phenomena and provides a starting point for their modeling. Specifically, we study ROM application on 2D Kraichnan turbulence. Moreover, we present a spectral study of the energy spectrum to emphasize the efficiency of PID approach for ROM in turbulence. Finally, we investigate the possibility of applying closure models for ROM stabilization. As a standard ROM technique, we employ proper orthogonal decomposition (POD) [46] for basis construction and Galerkin projection for dynamics approximation [58,77].

Fluid flow systems often include some sort of nonlinear interactions (e.g., corresponding to the convective effects). This implies that different modes (including the truncated ones) experience strong dynamical interactions between each others. Therefore, many modes need to be included for certain flows (especially for complex geometries and/or high Reynolds numbers). For example, Zhang and Stevens [78] stated that 2000 modes are necessary to capture 80% of the total energy in a neutral atmospheric boundary layer (ABL) process, while Shah and Bou-Zeid [79] found that about 500 POD modes are required to capture the same amount of energy by using a two-dimensional snapshot POD. Nonetheless, the computational cost of a Galerkin ROM usually scales with *R*<sup>3</sup> in case of quadratic nonlinearity, where *R* is the number of retained/resolved modes. This necessitates the use of only a few modes for feasible ROM computations, especially for real-time control applications or multiple-query estimations. Other than the accuracy loss that would be incurred due to modal truncation, instabilities can easily occur. For instance, Östh et al. [80] showed a case for a flow around Ahmed body where the solution of a POD Galerkin model with 100 (or less) modes converges to infinity. Instabilities in standard ROMs have been attributed mainly to the truncation of higher modes, but Grimberg et al. [81] recently demonstrated that these instabilities are largely due to the standard Galerkin projection

where the *left* and *right* set of basis functions match. To mitigate the instabilities encountered in Galerkin projection-based ROMs, Carlberg et al. [82] showed that least-squares Petrov–Galerkin can indeed build more accurate ROMs while Galerkin ROM becomes unstable for long time intervals. Alternatively, closure and stabilization models represent a feasible choice in the accuracy/stability-cost trade-off for Galerkin ROMs. Closure models have been popular in turbulence community, and found their way into ROM for fluid flows as well. Therefore, we also investigate the effect of a linear mode dependent eddy viscosity type closure model to stabilize the solution of ROM in case of 2D Kraichnan turbulence.

#### **2. Two-Dimensional Kraichnan Turbulence**

The two-dimensional (2D) turbulence cannot be realized in practice or experiments, but only in numerical simulations. Furthermore, 2D turbulence behaves differently from realistic 3D turbulence. In 3D turbulence, energy is transferred forward, from large scales to smaller scales, via vortex stretching. Conversely, in 2D turbulence, most of the forcing and dissipation energy will be transferred from smaller scales to larger scales and follows the Kraichnan–Batchelor–Leith (KBL) theory [83–85]. Despite the apparent simplicity in dealing with two spatial dimensions, 2D turbulence is very rich in its dynamics due to the inverse energy and forward enstrophy cascading mechanisms, which makes it a feasible testbed for a study of ROM in turbulence. Nonetheless, readers should be cautious while trying to extend the investigated framework into more realistic 3D turbulence cases.

The decaying homogeneous isotropic 2D Kraichnan turbulence follows the 2D Navier–Stokes equations, which can be written in vorticity-streamfunction formulation as below,

$$\frac{\partial \omega}{\partial t} + f(\omega, \psi) = \frac{1}{\text{Re}} \nabla^2 \omega,\tag{2}$$

where *ω* is the vorticity and *ψ* is the streamfunction. Re is Reynolds number relating the inertial and viscous effects. *<sup>J</sup>*(*ω*, *<sup>ψ</sup>*) and <sup>∇</sup>2*<sup>ω</sup>* are the Jacobian and Laplacian operators, respectively, which can be defined as

$$J(\omega, \psi) = \frac{\partial \omega}{\partial x} \frac{\partial \psi}{\partial y} - \frac{\partial \omega}{\partial y} \frac{\partial \psi}{\partial x} \tag{3}$$

$$
\nabla^2 \omega = \frac{\partial^2 \omega}{\partial x^2} + \frac{\partial^2 \omega}{\partial y^2}. \tag{4}
$$

The vorticity-streamfunction formulation enforces the incompressibility condition, where the vorticity and streamfunction fields are related by the following Poisson equation,

$$
\nabla^2 \psi = -\omega.\tag{5}
$$

According to the KBL theory for 2D turbulence, the inertial range in the energy spectrum is proportional to *k*−<sup>3</sup> in the inviscid limit. In our numerical experiments, the initial energy spectrum in Fourier space is given by

$$E(k) = \frac{4k^4}{3\sqrt{\pi}k\_p^5} \exp\left[-\left(\frac{k}{k\_p}\right)^2\right],\tag{6}$$

where *k* = *k*2 *<sup>x</sup>* + *k*<sup>2</sup> *<sup>y</sup>* and *kp* is the wavenumber at which the maximum value of initial energy spectrum occurs. During the time evolution process, due to the nonlinear interactions, this spectrum quickly approaches toward *k*−<sup>3</sup> spectrum as we will show later when we present our results. The magnitude of the vorticity Fourier coefficients is related to the energy spectrum as

$$|\tilde{\omega}(k)| = \sqrt{\frac{k}{\pi}E(k)}.\tag{7}$$

Then, the initial vorticity distribution (in Fourier space) is obtained by introducing a random phase. More details regarding derivation of the initial vorticity distribution from an assumed energy spectrum can be found in [86]. In the present study, we use a spatial computational domain of (*x*, *y*) ∈ [0, 2*π*] × [0, 2*π*] and a time domain of *t* ∈ [0, 4]. Periodic boundary conditions are applied in both *x*- and *y*-directions. A spatial grid of 10242 and a timestep of Δ*t* = 0.001 are used. A fourth-order accurate Arakawa scheme [87] is used for spatial discretization and a third-order total variation diminishing Runge–Kutta scheme (TVD-RK3) [88] is adopted for temporal discretization.

The energy spectrum for different values of Reynolds number is given in Figure 1 at *t* = 2 and *t* = 4. We can observe that the energy spectrum in the inertial range flattens toward the classical *k*−<sup>3</sup> scaling limit as Re increases, in agreement with the KBL theory for two-dimensional turbulence. It should be noted that the same initial vorticity distribution is enforced for all numerical experiments, by fixing the seed of the random numbers generator. In order to verify the validity of the used spatial resolution, Figure 2 shows the energy spectrum for Re = 16,000 (left) and Re = 32,000 (right) using two spatial resolutions (i.e., 1024<sup>2</sup> and 20482). A good match between the energy spectra corresponding to the two spatial resolutions can be seen, even for Re = 32,000. Nonetheless, we will consider Re = 16,000 with a spatial grid of 10242 for the rest of the present study. In addition, results obtained from solving Equation (2) using the aforementioned numerical schemes and resolution will be denoted as the full order model (FOM) results.

**Figure 1.** Evolution of energy spectra in 2D decaying Kraichnan turbulence for Re ∈ {2000, 4000, 8000, 16,000, 32,000 } at *t* = 2 (**a**) and *t* = 4 (**b**). Notice the convergence of energy spectrum in the inertial range to the *k*−<sup>3</sup> scaling with increasing Reynolds number.

**Figure 2.** A mesh independence test for Re = 16,000 (**a**) and Re = 32,000 (**b**) using two spatial resolutions. An acceptable match is obtained from both resolutions.

#### **3. Reduced Order Modeling**

In order to build a Galerkin projection-based reduced order model (ROM) of a physical system, the system's state, **u**(**x**, *t*), is assumed to live in a reduced trial subspace. Then, the high-dimensional operators are projected onto the same subspace, constraining the residual to be orthogonal to that subspace. Therefore, generating effective reduced subspaces that sufficiently contain the solution manifold is vital for the successful construction of Galerkin ROMs. One of the very early-developed and well-known approaches to build reduced subspaces is Fourier analysis. However, it assumes universal basis functions (or modes) which have no specific relation to the system in hand, except for boundary conditions. More efficiently, snapshot-based model reduction techniques can tailor a reduced subspace that best fits the problem by extracting the underlying coherent structures that control the major dynamical evolution we are interested in. Among those snapshot-based techniques, proper orthogonal decomposition (POD) is the most popular and well-established approach extracting the modes which contribute the most to the total system's energy [4,89]. The basic implementation of POD is presented in the following subsection.

#### *3.1. Proper Orthogonal Decomposition*

As mentioned before, proper orthogonal decomposition (POD) is a snapshot-based approach. Therefore, a collection of system's realizations (called snapshots) are required in advance. Assuming that a number of snapshots (*Ns*) of the 2D vorticity field, denoted as *ω*(*x*, *y*, *tn*), are stored at consecutive times *tn* for *n* = 1, 2, . . . , *Ns*, the time-averaged field, called "base flow", can be computed as

$$
\omega(\mathbf{x}, \mathbf{y}) = \frac{1}{N} \sum\_{n=1}^{N} \omega(\mathbf{x}, \mathbf{y}, t\_n). \tag{8}
$$

The mean-subtracted snapshots, also called anomaly or fluctuation fields, are then computed as the difference between the instantaneous fields and the mean field,

$$
\omega'(\mathbf{x}, y, t\_n) = \omega(\mathbf{x}, y, t\_n) - \bar{\omega}(\mathbf{x}, y). \tag{9}
$$

This mean-subtraction is common in ROM community, and it guarantees that ROM solution would satisfy the same boundary conditions as a full order model [90]. This anomaly field procedure is also interpreted as a mapping of snapshot data to its origin (i.e., centering), and will be used for constructing the POD basis functions. The POD approach basically relies on an eigenvalue decomposition of a snapshot correlation (or covariance) matrix. Since the number of snapshots (*O*(102)) is usually much less than the number of grid points (*O*(106)), we follow the method of snapshots [54] for efficient implementation of the POD algorithm.

An *Ns* × *Ns* temporal correlation data matrix **A** = [*aij*] is computed from the inner product of temporal anomaly fields as

$$a\_{ij} = \langle \omega^{\prime}(\mathbf{x}, y, t\_i); \omega^{\prime}(\mathbf{x}, y, t\_j) \rangle,\tag{10}$$

where the angle-parenthesis denotes the inner product in Hilbert space defined as

$$
\langle \omega'(\mathbf{x}, y, t\_{\bar{i}}); \omega'(\mathbf{x}, y, t\_{\bar{j}}) \rangle = \int\_{\Omega} \omega'(\mathbf{x}, y, t\_{\bar{i}}) \omega'(\mathbf{x}, y, t\_{\bar{j}}) d\mathbf{x} dy,\tag{11}
$$

where Ω represent the physical (spatial) domain. Then, an eigenvalue decomposition of **A** is performed as

$$\mathbf{AV} = \mathbf{V}\mathbf{A},\tag{12}$$

where **Λ** is a diagonal matrix whose entries are the eigenvalues *λ<sup>k</sup>* of **A**, and **V** is a matrix whose columns **v***<sup>k</sup>* are the corresponding eigenvectors. **A** is a Gramian matrix, a property which can be utilized for efficient solution of the eigenvalue problem (e.g., using Jacobi transformations [91]). The eigenvalues *λ<sup>k</sup>* represent the amount of information contained in each mode; therefore, they should be arranged in a descending order (i.e., *λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥···≥ *λN*), for proper selection of the POD modes. The POD modes *φ<sup>k</sup>* are finally computed as

$$\phi\_k(\mathbf{x}, \mathbf{y}) = \frac{1}{\sqrt{\lambda\_k}} \sum\_{n=1}^N v\_k^n \omega'(\mathbf{x}, \mathbf{y}, t\_\mathbf{n}), \tag{13}$$

where *v<sup>n</sup> <sup>k</sup>* is the *n*-th component of the eigenvector **v***k*. It can be easily verified that the POD modes are orthonormal (i.e., *φi*; *φj* = *δij*), which simplifies the derivation of ROM.

#### *3.2. Galerkin Projection*

With the POD algorithm implemented, a set of POD basis functions are obtained {*φk*(*x*, *<sup>y</sup>*)}*<sup>R</sup> <sup>k</sup>*=1, where *R* is the number of POD modes retained in ROM (i.e., the size of the reduced subspace). The solution is therefore approximated as a superposition of the contributions of these selected modes as follows:

$$
\omega(\mathbf{x}, \mathbf{y}, t) = \bar{\omega}(\mathbf{x}, \mathbf{y}) + \sum\_{k=1}^{R} a\_k(t) \phi\_k(\mathbf{x}, \mathbf{y}), \tag{14}
$$

where *αk*(*t*) are the so-called temporal modal coefficients. Since the vorticity and streamfunction are related by the kinematic relationship given by Equation (5), the basis functions (*θk*(*x*, *y*)) and mean field (*ψ*¯(*x*, *y*)) corresponding to the streamfunction can be obtained from those of the vorticity as follows:

$$
\nabla^2 \bar{\psi}(\mathbf{x}, \mathbf{y}) = -\bar{\omega}(\mathbf{x}, \mathbf{y})\_\prime \tag{15}
$$

$$
\nabla^2 \theta\_k(\mathbf{x}, \mathbf{y}) = -\phi\_k(\mathbf{x}, \mathbf{y}), \quad k = 1, 2, \dots, R,\tag{16}
$$

which might result in a set of basis functions for the streamfunction that are not necessarily orthonormal. Moreover, the reduced order approximation of streamfunction can share the same temporal coefficients *αk*(*t*),

$$
\psi(x, y, t) = \psi(x, y) + \sum\_{k=1}^{R} a\_k(t)\theta\_k(x, y). \tag{17}
$$

Those approximations are then substituted in Equation (2), as follows:

$$\begin{split} \frac{\partial}{\partial t} \left[ \boldsymbol{\omega}(\mathbf{x}, \boldsymbol{y}) + \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x}, \boldsymbol{y}) \right] &= -\boldsymbol{J} \left( \left[ \boldsymbol{\omega}(\mathbf{x}, \boldsymbol{y}) + \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x}, \boldsymbol{y}) \right], \left[ \bar{\boldsymbol{\psi}}(\mathbf{x}, \boldsymbol{y}) + \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\theta}\_{k}(\mathbf{x}, \boldsymbol{y}) \right] \right) \\ &+ \frac{1}{\mathrm{Re}} \nabla^{2} \left[ \boldsymbol{\omega}(\mathbf{x}, \boldsymbol{y}) + \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x}, \boldsymbol{y}) \right]. \end{split}$$

Then, we use the definition of Jacobian operator and expand the above expression as

$$\begin{split} \frac{\partial}{\partial t}\widehat{\boldsymbol{\omega}}(\mathbf{x},\boldsymbol{y}) + \frac{\partial}{\partial t} \left[ \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x},\boldsymbol{y}) \right] &= -f \left( \widehat{\boldsymbol{\omega}}(\mathbf{x},\boldsymbol{y}), \overline{\boldsymbol{\upmu}}(\mathbf{x},\boldsymbol{y}) \right) - f \left( \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x},\boldsymbol{y}), \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\theta}\_{k}(\mathbf{x},\boldsymbol{y}) \right) \\ &- f \left( \widehat{\boldsymbol{\omega}}(\mathbf{x},\boldsymbol{y}), \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\theta}\_{k}(\mathbf{x},\boldsymbol{y}) \right) - f \left( \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x},\boldsymbol{y}), \widehat{\boldsymbol{\upmu}}(\mathbf{x},\boldsymbol{y}) \right) \\ &+ \frac{1}{\mathrm{Re}} \nabla^{2} \widehat{\boldsymbol{\omega}}(\mathbf{x},\boldsymbol{y}) + \frac{1}{\mathrm{Re}} \nabla^{2} \left[ \sum\_{k=1}^{R} a\_{k}(t) \boldsymbol{\phi}\_{k}(\mathbf{x},\boldsymbol{y}) \right]. \end{split}$$

Note that the first term *<sup>∂</sup> ∂t ω*¯(*x*, *y*) is zero. In addition, the Jacobian and Laplacian operators are spatial functions; therefore, the following can be written (after change of indices)

$$\begin{split} \frac{\partial}{\partial t} \left[ \sum\_{i=1}^{R} a\_i(t) \phi\_i(\mathbf{x}, \mathbf{y}) \right] &= -\int \left( \bar{\boldsymbol{\omega}}(\mathbf{x}, \mathbf{y}), \bar{\boldsymbol{\uptheta}}(\mathbf{x}, \mathbf{y}) \right) - \sum\_{i=1}^{R} \sum\_{j=1}^{R} a\_i(t) a\_j(t) \left( \phi\_i(\mathbf{x}, \mathbf{y}), \theta\_j(\mathbf{x}, \mathbf{y}) \right) \\ &- \sum\_{i=1}^{R} a\_i(t) \left( \bar{\boldsymbol{\omega}}(\mathbf{x}, \mathbf{y}), \theta\_i(\mathbf{x}, \mathbf{y}) \right) - \sum\_{i=1}^{R} a\_i(t) \left( \phi\_i(\mathbf{x}, \mathbf{y}), \bar{\boldsymbol{\uptheta}}(\mathbf{x}, \mathbf{y}) \right) \\ &+ \frac{1}{\text{Re}} \nabla^2 \bar{\boldsymbol{\omega}}(\mathbf{x}, \mathbf{y}) + \sum\_{i=1}^{R} a\_i(t) \frac{1}{\text{Re}} \nabla^2 \phi\_i(\mathbf{x}, \mathbf{y}). \end{split}$$

Then, we can collect the constant, linear, and nonlinear terms (in time) as follows:

$$\begin{split} \frac{\partial}{\partial t} \left[ \sum\_{i=1}^{R} a\_{i}(t) \phi\_{i}(\mathbf{x}, \boldsymbol{y}) \right] &= \left[ -f \left( \widehat{\boldsymbol{\omega}} (\mathbf{x}, \boldsymbol{y}), \overline{\phi}(\mathbf{x}, \boldsymbol{y}) \right) + \frac{1}{\text{Re}} \nabla^{2} \boldsymbol{\omega} (\mathbf{x}, \boldsymbol{y}) \right] \\ &+ \sum\_{i=1}^{R} a\_{i}(t) \left[ -f \left( \widehat{\boldsymbol{\omega}} (\mathbf{x}, \boldsymbol{y}), \theta\_{i}(\mathbf{x}, \boldsymbol{y}) \right) - f \left( \phi\_{i}(\mathbf{x}, \boldsymbol{y}), \overline{\phi}(\mathbf{x}, \boldsymbol{y}) \right) + \frac{1}{\text{Re}} \nabla^{2} \phi\_{i}(\mathbf{x}, \boldsymbol{y}) \right] \\ &- \sum\_{i=1}^{R} \sum\_{j=1}^{R} a\_{i}(t) a\_{j}(t) f \left( \phi\_{i}(\mathbf{x}, \boldsymbol{y}), \theta\_{j}(\mathbf{x}, \boldsymbol{y}) \right) . \end{split}$$

Finally, an inner product with an arbitrary basis function *φk*(*x*, *y*) is performed, where we make use of the orthonormality of the vorticity basis functions. Therefore, a reduced order model is obtained, which can be simply represented by the following set of ODEs for *αk*(*t*),

$$\frac{da\_k}{dt} = \mathfrak{B}\_k + \sum\_{i=1}^{R} \mathfrak{L}\_{i,k} a\_i + \sum\_{i=1}^{R} \sum\_{j=1}^{R} \mathfrak{N}\_{i,j,k} a\_i a\_j, \quad k = 1, 2, \dots, R,\tag{18}$$

where the predetermined model coefficients can be computed by the following numerical integration (offline computing)

$$\begin{split} \mathfrak{B}\_{k} &= \langle \left( -J(\vec{\omega}, \vec{\upmu}) + \frac{1}{\text{Re}} \nabla^{2} \vec{\omega} ; \boldsymbol{\upmu}\_{k} \rangle \rangle, \\ \mathfrak{C}\_{i,k} &= \langle \left( -J(\vec{\omega}, \theta\_{i}) - J(\phi\_{i}, \vec{\upmu}) + \frac{1}{\text{Re}} \nabla^{2} \phi\_{i} ; \phi\_{k} \rangle \rangle \right. \\ \mathfrak{M}\_{i,j,k} &= \langle \left( -J(\phi\_{i}, \theta\_{j}) ; \phi\_{k} \right) . \end{split} \tag{19}$$

To complete the dynamical system given by Equation (18), the initial conditions for *αk*(*t*) may be obtained by the following projection of the initial vorticity field onto the respective basis functions,

$$
\omega\_k(t\_0) = \left< \omega(\mathbf{x}, y, t\_0) - \bar{\omega}(\mathbf{x}, y); \phi\_k \right>,\tag{20}
$$

where *ω*(*x*, *y*, *t*0) is the vorticity field specified at initial time *t*0.

We note that the approach described here is called tensorial ROM, since the model predetermined terms (especially those corresponding to nonlinear terms) are computed in advance as tensors during an offline stage, while the online solution of Equation (18) scales with *R*3. Alternatively, the model terms can be calculated online while minimizing the cost of computing the nonlinear terms through approximating approaches like discrete empirical interpolation method (DEIM) [92,93], gappy POD [60,94], and missing point estimation (MPE) [95,96]. An overview of these techniques can be found in [97,98].

#### **4. Partitioned ROM**

A representative solution subspace is a key factor in the construction of ROMs of physical systems and POD has been successfully offering a systematic approach for building such a subspace. However, POD in principle depends on a global approximation of the snapshot data. This has several consequences, a few of which are mentioned here. First, local temporary excursions in state space which contain small amounts of energy can be overlooked by POD because their contribution to the total energy may be negligible. These excursions can, however, be of interest and have significant impact on the dynamical evolution (see [99], for example). Second, the global nature of POD approach can result in overall deformation of the obtained modes for systems with fast variations in state. As a result, the constructed POD modes are smoothed out and cannot capture any dominant structure at all. This in turn causes an energy distribution across a large number of modes; therefore, the truncated modes contain a significant amount of system's energy, which causes inaccuracies and instabilities in the solution unless the size of the ROM is increased.

An alternative approach which preserves the optimality of POD, while giving care to the system's local characteristics, is called principal interval decomposition (PID) [69–73]. In PID, the time domain is divided into a few partitions, each characterizing a specific stage in the system's dynamics and evolution. Then, the same POD algorithm is applied within each partition to generate a set of basis functions that best fit the respective partition *locally*. In a sense, the PID approach can be viewed as decomposing the global POD subspace into a few of locally optimal subspaces. As a result, accurate partitioned ROMs with smaller sizes can be dedicated to each individual sub-interval. More details about the properties of PID can be found in [73]. This partitioning idea can also be performed in the physical domain, state space, or parameter space [64,100–102].

In brief, the time domain T = [*t*0, *tf* ] is divided into a number *P* of non-overlapping partitions (every two consecutive partitions share only the interface). Adaptive partitioning and clustering techniques can be used to tailor partitions with arbitrary widths if no a priori information is available about the dynamics of the systems. However, for simplicity and clarity of presentation, we assume equidistant decomposition of the time domain. In other words, the total number of snapshots *Ns* is divided into *Ns*/*P* sets. After that, local mean fields *ω*¯ (*l*)(*x*, *y*) and *ψ*¯(*l*)(*x*, *y*) are computed and the POD algorithm described in Section 3.1 is applied to each group of snapshots to generate local basis functions {*φ*(*l*) *<sup>k</sup>* (*x*, *<sup>y</sup>*)}*<sup>R</sup> <sup>k</sup>*=<sup>1</sup> and {*θ* (*l*) *<sup>k</sup>* (*x*, *<sup>y</sup>*)}*<sup>R</sup> <sup>k</sup>*=<sup>1</sup> for *l* = 1, 2, ... , *P*. Then, the Galerkin ROM equations are obtained for each interval as

$$\frac{da\_k^{(l)}}{dt} = \mathfrak{B}\_k^{(l)} + \sum\_{i=1}^{R} \mathfrak{L}\_{i,k}^{(l)} a\_i^{(l)} + \sum\_{i=1}^{R} \sum\_{j=1}^{R} \mathfrak{N}\_{i,j,k}^{(l)} a\_i^{(l)} a\_j^{(l)}, \quad k = 1, 2, \dots, R, \quad l = 1, 2, \dots, P,\tag{21}$$

where the predetermined model coefficients can be computed partition-wise by the following numerical integrations:

$$\begin{split} \mathfrak{B}\_{k}^{(l)} &= \langle \!/ -J(\bar{\omega}^{(l)}, \bar{\Psi}^{(l)}) + \frac{1}{\text{Re}} \nabla^{2} \bar{\omega}^{(l)}; \! \! \! \! \! \! / \rangle, \\ \mathfrak{L}\_{i,k}^{(l)} &= \langle \! -J(\bar{\omega}^{(l)}, \theta\_{i}^{(l)}) - J(\phi\_{i}^{(l)}, \bar{\Psi}^{(l)}) + \frac{1}{\text{Re}} \nabla^{2} \phi\_{i}^{(l)}; \! \! \! \! / \rangle, \\ \mathfrak{N}\_{i,jk}^{(l)} &= \langle \! -J(\phi\_{i}^{(l)}, \theta\_{j}^{(l)}); \! \! \! \! / \rangle, \end{split} \tag{22}$$

and the initial conditions for *α*(1) *<sup>k</sup>* (*t*) are obtained by the following projection:

$$a\_k^{(1)}(t\_0) = \langle \omega(\mathbf{x}, y, t\_0) - \bar{\omega}^{(1)}(\mathbf{x}, y); \boldsymbol{\phi}\_k^{(1)} \rangle. \tag{23}$$

It remains to enforce the interface constraints to enable solution transition from one partition to the next, if no access to the true field at the interface instant is available. We do so by imposing the following condition at the interface [103],

$$
\langle \omega^{(l)}(\mathbf{x}, \mathbf{y}, t\_l) - \omega^{(l+1)}(\mathbf{x}, \mathbf{y}, t\_l); \boldsymbol{\phi}\_k^{(l+1)} \rangle = 0,\tag{24}
$$

where *tl* represents the time instant at the interface between the interval *l* and *l* + 1. Here, *ω*(*l*)(*x*, *y*, *tl*) is the reconstructed vorticity field at the interface using the left interval (*l*), while *ω*(*l*+1)(*x*, *y*, *tl*) is reconstructed at the same instant, but using the right interval (*l* + 1). Those can be rewritten as

$$
\omega^{(l)}(\mathbf{x}, \mathbf{y}, t\_l) = \bar{\omega}^{(l)}(\mathbf{x}, \mathbf{y}) + \sum\_{k=1}^{R} a\_k^{(l)}(t\_l) \phi\_k^{(l)}(\mathbf{x}, \mathbf{y}), \tag{25}
$$

$$
\omega^{(l+1)}(\mathbf{x}, \mathbf{y}, t\_l) = \widehat{\omega}^{(l+1)}(\mathbf{x}, \mathbf{y}) + \sum\_{k=1}^{R} a\_k^{(l+1)}(t\_l) \boldsymbol{\phi}\_k^{(l+1)}(\mathbf{x}, \mathbf{y}).\tag{26}
$$

This reduces to performing the following computation at the interface to re-initiate our solver at the first timestep of the new time interval,

$$\alpha\_k^{(l+1)}(t\_l) = \left\langle \sum\_{i=1}^{R} a\_i^{(l)}(t\_l) \phi\_i^{(l)}(\mathbf{x}, \mathbf{y}) + \bar{\omega}^{(l)}(\mathbf{x}, \mathbf{y}) - \bar{\omega}^{(l+1)}(\mathbf{x}, \mathbf{y}); \phi\_k^{(l+1)}(\mathbf{x}, \mathbf{y}) \right\rangle. \tag{27}$$

#### **5. Results**

As indicated in Section 2, the full order model is solved over a square domain with a side length of 2*<sup>π</sup>* using a spatial grid of 10242 for *<sup>t</sup>* <sup>∈</sup> [0, 4] with a timestep of 0.001. These were shown to be sufficient resolutions to solve the problem at a Reynolds number of 16, 000. To build the reduced order subspace using the POD and PID algorithms, described in Sections 3.1 and 4, 800 snapshots of vorticity field were collected (i.e., every five timesteps). It was also shown that the sampling rate has minor effect on POD-based modal decomposition, as long as the Nyquist–Shannon sampling criterion is met [104]. Another factor we consider in our choice of sampling rate is to have adequate number of snapshots in each sub-interval (after partitioning the time domain).

The partitioned ROM approach is used to solve Equation (21) using different numbers of partitions. In particular, we present the results for *P* ∈ {1, 4, 8}. Note that, for *P* = 1, the approach reduces to the standard Galerkin-POD ROM with a single interval covering the whole time domain. In Figure 3, the energy spectrum at *t* = 4 is used to illustrate the effect of increasing the number of partitions on ROM accuracy and convergence to the FOM solution. This is shown for *P* = 1 (top), *P* = 4 (middle), and *P* = 8 (bottom), along with a compensated (for zooming-in) view of the energy spectrum at the inertial range on the right. We can easily observe that increasing the number of partitions improves the convergence of ROM energy spectrum to the true one for all values of *R*.

Although increasing the number of retained modes generally improves the ROM accuracy, this does not help a lot in the case with *P* = 1. This is shown in Figure 3, where we still see a large deviation of ROM results in the inertial range even with 32 modes. This might be attributed to the smoothing-out of constructed basis functions in such a way that information is distributed across a large number of modes. Consequently, the effect of modal truncation on system's dynamics is no more negligible. On the other hand, a partitioned ROM with *P* = 8 enables us to build accurate ROMs with relatively compact sizes (e.g., *R* = 16).

**Figure 3.** Energy spectra obtained from partitioned ROM with *P* = 1 (**a,b**), *P* = 4 (**c,d**), and *P* = 8 (**e,f**). Note that *P* = 1 corresponds to the standard Galerkin-based reduced order model (ROM) with global proper orthogonal decomposition (POD) modes.

To provide more insights about the scales resolved with partitioned ROM, we plot the evolution of the temporal coefficients of the first and eighth modes in Figure 4. We can notice that a ROM with global POD modes (i.e., *P* = 1) can adequately capture the evolution of the first mode's contribution (with increasing deviations toward the end of the time domain). On the other hand, a phase shift and amplification of the temporal evolution of the 8th modal coefficient are spotted after a small initial period. From a physical point of view, the first modes correspond to the large dominating scales, while the last ones correspond to the small (or fine) scales. This indicates that a global POD application provides ROMs that moderately capture the large scales, but can fail to capture the finer details. Moreover, we can deduce from Figure 4a that the dynamics at different time instants is different (where all the snapshots are projected onto the same set of basis functions). This justifies the use of paritioned ROM to tailor unique bases for each sub-interval. On the other hand, with partitioned ROMs (i.e., *P* = 4 and *P* = 8), the evolution of both *α*<sup>1</sup> and *α*<sup>8</sup> is tracked accurately, implying that the presented partitioning approach improves the ROM's capabilities to represent the fine scales. Moreover, the accuracy of *α*<sup>1</sup> prediction is improved compared to *P* = 1. Although the evolution of temporal coefficients in different sub-intervals looks similar, this is because the fields in each region are projected onto their *respective* bases. Therefore, once reconstructed, the dynamics in the full order space will be different. In addition, we can note the "jump" at the interfaces (e.g., at *t* ∈ {1, 2, 3, 4} for *P* = 4) corresponding to updating the solution manifold. The true modal coefficients (denoted as FOM) is obtained from projecting the vorticity field at different times on the corresponding basis function as

$$
\alpha\_k^{(l)}(t) = \langle \omega(\mathbf{x}, y, t) - \bar{\omega}^{(l)}(\mathbf{x}, y); \phi\_k^{(l)}(\mathbf{x}, y) \rangle. \tag{28}
$$

More visualizations of ROM's predictability are provided in Figures 5 and 6, showing the vorticity (in color) and streamfunction (in height) at the final time (i.e., *t* = 4), compared to the FOM results. In Figure 5, we illustrate that ROM with *P* = 1 fails to correctly predict the final fields at different values of *R*. At *R* = 16, even the large scales are not captured accurately (recall the larger deviation of *α*<sup>1</sup> at final time from Figure 4). Conversely, both the large fine scales are predicted reliably using the partitioned ROM approach as shown in Figure 6 with *R* = 16.

In order to obtain more insights about the scales resolved by partitioned ROM, compared to standard POD-based ROM, we compare PID ROM results for *P* = 8 and *R* = 16, with standard ROM (i.e., *P* = 1) and increased number of modes. The root-mean-squares errors between Galerkin ROMs and FOM are shown in Table 1. We find that at least around *R* = 128 modes are needed for standard Galerkin ROM to achieve similar accuracy as partitioned ROM, which suggests that the relevant scales represented in those global 128 modes are almost encapsulated in 16 local modes if eight partitions are used. The computational gain of the PID approach can be easily seen from this analysis since online computational cost scales with *O*(*R*3).


**Table 1.** Root-mean-squares error at time *t* = 4 between Galerkin ROM (without any additional stabilization *κ* = 0) and FOM at Re = 16,000.

**Figure 4.** Temporal evolution of the first and eighth modal coefficients using different numbers of partitions, compared to the true values obtained from the projection of FOM vorticity field on the corresponding basis function. (**a**) shows the first modal coefficient with *P* = 1, while the eighth is given in (**b**). Results for *P* = 4 are shown in (**c,d**), while those for *P* = 8 are given in (**e,f**).

(**a**) (**b**)

**Figure 5.** Predicted results at time *t* = 4 showing vorticity contours in color with the level of streamfunction as height. (**a**) Full order model (FOM), (**b**) ROM (*R* = 16) with *P* = 1, (**c**) ROM (*R* = 24) with *P* = 1, and (**d**) ROM (*R* = 32) with *P* = 1. Note that *P* = 1 refers to the standard Galerkin ROM approach using a single partitioning zone.

**Figure 6.** Predicted results at time *t* = 4 showing vorticity contours in color with the level of streamfunction as height. (**a**) Full order model (FOM), (**b**) ROM (*R* = 16) with *P* = 1, (**c**) ROM (*R* = 16) with *P* = 4, and (**d**) ROM (*R* = 16) with *P* = 8.

#### *5.1. Kolmogorov n-Width and Relative Information Content*

The Kologorov *n*-width provides a measure of the system's reducibility, and in POD/PID context, it can be considered as a measure of how well a linear superposition of POD modes might represent the underlying dynamics. Similarly, a relative information content (RIC) of the snapshot matrix can be used as a simple quantitative metric to understand the Kolmogorov *n*-width of the system. The following RIC formula [105] is used to compute the percentage modal energy,

$$\text{RIC}^{(l)}(k) = \left(\frac{\sum\_{j=1}^{k} \lambda\_j^{(l)}}{\sum\_{j=1}^{N\_s/P} \lambda\_j^{(l)}}\right) \times 100. \tag{29}$$

In other words, RIC(*k*) represents the fraction of information (variance) of the snapshot matrix that can be recovered using a specific number (*k*) of basis functions. Figure 7 shows the growth of RIC values with increasing *k* for different values of *P*. We can observe that, in case of *P* = 1, the increase of the fraction of preserved information is relatively slow and more modes are required to exceed a 90% RIC. On the other hand, with the use of a partitioned ROM approach, the slope of RIC curve increases significantly. For example, less than 10 modes are enough to hit the 90% RIC value. In Kolmogorov *n*-width context, a faster increase in RIC corresponds to a faster decrease of the Kolmogorov *n*-width. Therefore, we consider that partitioned ROM can, in turn, help to break the Kolmogorov barrier and increase the system's reducibility.

**Figure 7.** Relative information content (RIC) of the snapshot matrix. (**a**) varying the number of partitions *P*, and (**b**) its sensitivity with respect to different zones for *P* = 4 case.

We note that, in Figure 7a, we only show the RIC in the first sub-interval (i.e., *l* = 1) for the sake of clarity. In Figure 7b, we plot the RIC curves for *P* = 4 in the four sub-intervals, where they almost match each other. An interesting observation from Figure 7b is that the slopes of RIC in the second and third sub-intervals are the lowest, compared to those in the first and last sub-intervals. This might be justified by the fact that flow is just initiated in the first sub-interval with relatively smooth dynamics. After that, most of the system's vortical evolution and energy exchange take place within the following sub-intervals, decaying to quasi steady-state in the last sub-interval. For fast evolution and strong dynamics, more modes are required to capture sufficient amount of information. Alternatively, constructing local modes (through further partitioning) can help to accurately capture those dynamics. Since we use fixed number of modes in each sub-interval, this might correspond to different RIC values. That is the ROM approximation in some regions might be more or less accurate than the approximation in other regions. This might be crucial for flow problems with stronger and more complicated dynamics, where that difference in RIC values can cause inconsistencies of

approximations. Alternatively, varying values of *R* can be used to satisfy pre-determined level of RIC in each partition.

#### *5.2. Closure Modeling*

It turns out that the partitioning approach helps with building more accurate and stable ROM due to several effects. First, it helps to construct more compact and localized subspaces which capture fine scales efficiently. Second, the energy (information) is concentrated in fewer modes. As a result of these two effects, modal truncation works effectively as the discarded modes have negligible effects on the system's dynamics. However, as shown in previous discussions, a large number of partitions might be needed to provide acceptable accuracy. Other than the increased storage requirement to store basis function for each sub-interval, a projection error is accumulated due to interface treatment. If the true field data are available at the interface, it can be projected onto the respective basis functions to obtain the corresponding modal coefficients at the beginning of the sub-interval. Since this is not generally available, the update of ROM solution manifold is performed using Equation (27), where the obtained reduced order approximation of the field from the previous partition is projected onto the basis functions of the following partition. Since that approximation depends on the accuracy of the ROM in the current partition, and lives in a reduced subspace different than that of the following subspace, discontinuities can easily occur at the interface. In Figure 8, we show the modal coefficients at the interfaces for *P* = 4 obtained from ROMs with different *R* dimensions. Moreover, the mismatch between initial conditions at each interface increases in subsequent partitions, due to the successive interface treatment, resulting in amplification of the error. However, we highlight that the projection error generated at the interface is bounded and overall accuracy gain due to the localization in the PID approach surpasses that error.

**Figure 8.** Close-up views of temporal coefficients at the interfaces for *P* = 4 case.

Conversely, for ROM with global POD application without any partitioning, the truncated modes might include substantial amount of energy and/or have strong interactions with the retained modes. Hence, inaccuracies and instabilities can easily take place in the produced ROM. Therefore, a compromise between accuracy gain from localization and projection error from interfaces should be considered while selecting the number of intervals. To enhance the stability of ROMs, closure models have been effectively incorporated (inspired from large eddy simulation (LES) studies). Here, we investigate the effect of closure modeling to stabilize and correct ROMs based on global application of POD (i.e., *P* = 1). We utilize a linear mode length eddy viscosity closure based on Rempfer's model [72,106,107], where the 2D Navier–Stokes equation (given in Equation (2)) is re-written as

$$\frac{\partial \omega}{\partial t} + f(\omega, \psi) = \frac{1}{\text{Re}} \left( 1 + \kappa \frac{k}{R} \right) \nabla^2 \omega,\tag{30}$$

where *κ* is a free-parameter controlling the amount of stabilization added to the model, *R* is the number of modes, and *k* is the mode index (i.e., *k* = 1, 2, ... , *R*). More recently, several ideas from a plethora of predictive turbulence models have been used in ROMs in the spirit of closure modeling to account for the effects of the truncated scales [108–115]. In the present study, we test the effect of the linear mode dependent eddy viscosity closure model for *κ* ∈ {0, 4, 8}, where *κ* = 0 corresponds to the standard Galerkin ROM without stabilization.

In Figure 9, we present the temporal evolution of *α*<sup>1</sup> and *α*<sup>8</sup> for ROM with 16 modes and a single global interval (i.e., *P* = 1). We find that the closure significantly improves the predictive capabilities of the ROM, and with increasing *κ* (i.e., adding more stabilization), results converge to the true values of the modal coefficients.

**Figure 9.** Temporal evolution of the first (**a**) and eighth (**b**) modal coefficients in the presence of closure modeling.

In addition, the final vorticity and streamfunction fields at *t* = 4 are shown in Figure 10. We can notice tremendous improvement of predicted results with closure modeling. With close investigation of surface topology, we can observe that Figure 10d captures the correct large scales as Figure 10a. However, we can see that smaller and finer details are overlooked and smoothed-out. This is to be expected given the fact that closure models try to compensate the effect of the truncated modes on the dynamics of the retained modes. In other words, it helps to improve our predictions of the dynamics of these retained modes, but does not add the contribution of the truncated modes into our reduced order approximation. Therefore, our ROM prediction will always be restricted to the scales represented by the retained modes. In case of global POD modes, only the large scales are captured and the fine scales are smoothed-out, which justifies Figure 10d.

**Figure 10.** Predicted results in the presence of closure modeling at time *t* = 4 showing vorticity contours in color with the level of streamfunction as height. (**a**) full order model (FOM), (**b**) ROM with *κ* = 0, (**c**) ROM with *κ* = 4, and (**d**) ROM with *κ* = 8. For (**b**–**d**): *R* = 16 and *P* = 1.

Finally, the energy spectrum at final time is given in Figure 11, where we can see the influence of closure modeling in improving ROM results. A very nice observation from this figure is that stabilization helps a lot in the inertial regime of energy spectrum, but it causes over-stabilization in the dissipative regime.

**Figure 11.** Energy spectra at final time predicted by ROMs in the presence of stabilization, compared to the FOM spectrum and *k*−<sup>3</sup> scaling (**a**), with a compensated view in (**b**).

The root-mean-squares errors (RMSE) between Galerkin ROM approximation (with *P* = 1 and *R* = 16), in the presence of stabilization, and FOM solution are shown in Table 2, compared to the base case without closure. It is clear that the addition of stabilization to Galerkin ROM helps to improve the predictions and reduces the approximation error. However, compared to Table 1, we can deduce that the accuracy gain due to partitioning is much more superior than that of closure. In addition, the RMSE is sensitive to the value of the stabilization parameter. Therefore, a dynamic selection of this parameter might be needed to guarantee sufficient correction.

Moreover, it should be noted that there might exist situations where the gain from only closure is minimum, even in case of "optimal" closure [73]. Hence, a combination of both partitioning and stabilization seems to be a feasible choice for building accurate, stable, and compact ROMs for turbulent flows. Indeed, Ahmed and San [72] demonstrated that an eddy viscosity based closure in conjunction with PID yields significant improvements in accuracy for capturing strong moving discontinuities with a negligibly small computational overhead. In addition, Table 3 shows the effect of stabilization through closure modeling in the partitioned ROM (with *P* = and *R* = 16). It is clear that the closure slightly improves the RMSE of vorticity field approximation. In addition, we can see that further increase in *κ* can actually increase the RMSE due to over stabilization. It can be also observed that RMSE in streamfunction predictions in case of closure modeling with PID is generally larger than that in case of PID without closure. This can be attributed to the nature of streamfunction being inverse of the Laplacian operator of vorticity, making it a smoother field by construction. Therefore, closure with PID might cause over-stabilization and excessive smoothing of the streamfunction fields predictions. This again suggests the development of a dynamical strategy to estimate optimal values of eddy viscosity coefficient.

**Table 2.** Root-mean-squares error at time *t* = 4 between the standard Galerkin ROM (*P* = 1, *R* = 16) and FOM in order to demonstrate the sensitivity with respect to the additional stabilization through *κ* at Re = 16,000.


**Table 3.** Root-mean-squares error at time *t* = 4 between the partitioned ROM (*P* = 8, *R* = 16) and FOM in order to demonstrate the sensitivity with respect to the additional stabilization through *κ* at Re = 16,000.


#### *5.3. Computational Cost*

In addition to the accuracy gains through the partitioned ROM approach, presented in previous subsections, a comparison in terms of computational cost is required for fair assessment. Here, we consider two phases of ROM computations: offline and online. By offline, we refer to the evaluation of the vectors, matrices, and tensors corresponding to the constant, linear, and nonlinear predetermined model coefficients (i.e., B, L, and R). Meanwhile, we denote the actual prediction and solution of Equation (21) as the online stage. Table 4 provides the CPU times in the offline and online stages for different number of partitions and modes. From this table, we can see that increasing the number of modes increases the CPU times for both the offline (due to computation of larger vectors, matrices, and tensors) and online (due to solving an increased number of equations) stages. Interestingly, we can observe that the computational cost of the online stage scales cubically with increasing *R*. This conforms to the quadratic nonlinearity in the investigated system, making the online cost an order of *O*(*R*3).

On the other hand, increasing the number of partitions has a negligible effect on the online CPU time. Therefore, the partitioned ROM approach provides more accurate results with similar online costs as standard ROM with the same number of modes. However, increasing *P* (with fixed *R*) results in an increase in the offline CPU time. This corresponds to the computation of B, L, and R for each individual partition. That is why the offline CPU time scales linearly with increasing *P*. Nonetheless, it was shown in [73] that the CPU times for basis construction are largely reduced in the PID framework due to the solution of smaller eigenvalue problems rather than solving a large one. Finally, it is noteworthy that the partitioned ROM approach requires more memory to store the local basis functions and model predetermined coefficients in each partition. Therefore, we can consider that partitioned ROM converts the offline and storage costs into a speed-up of the online deployment.


**Table 4.** CPU time (in seconds) for ROM prediction and predetermined model coefficients computations, denoted as online and offline stages, respectively. We note that the CPU time assessments documented in this table are based on Fortran executions.

#### **6. Conclusions and Future Works**

We utilize a partitioning approach in the current study in order to build a compact reduced order models of 2D turbulent flows. We demonstrate the framework using the two-dimensional decaying Kraichnan turbulence. It is shown that this partitioned ROM helps to break the Kolmogorov *n*-width barrier, in the context of a relative information content criterion. An energy spectrum analysis is performed to emphasize the capabilities of the presented framework to sufficiently capture both the large and fine scales and recover the energy spectra within the inertial and dissipative ranges. The partitioned ROM produces significantly more accurate predictions than standard ROM without

partitioning. A computational CPU time analysis reveals that the partitioned ROM minimizes the online cost at the expense of offline cost and increased memory to build reduced order approximation with acceptable accuracy. In addition, we investigate the use of a linear mode dependent eddy viscosity to stabilize ROM prediction. We find that closure improves the results in the inertial kinetic energy regime, corresponding to the large scales. However, it presents over-stabilization in the dissipative regime, which smoothens out the field finer details. This might advocate a strategy to develop optimal (or near optimal) estimates for the eddy viscosity coefficients to be posed. Therefore, we suggest the use of partitioning along with closure as an enabler for more efficient ROMs turbulent flow systems. Incorporating PID with closure would provide the flexibility of constructing ROMs with less modes (and same number of partitions), or with less partitions (and same number of modes). The former would provide higher speed-ups during online deployment, while the latter would reduce the offline cost.

To this end, the partitioned ROM has been shown to improve the predictions for the investigated 2D Kraichnan turbulence case. However, scaling this framework into a 3D setting is still required to fully assess the validity of the framework to address realistic turbulent flows with strong anisotropy. In addition, the PID has been shown to capture finer details than standard global POD. Application of this approach onto flow situations where global basis fails drastically and results in completely unstable and inaccurate ROMs can be an interesting extension to the current study. Also, the adopted PID approach assumes evenly-spaced partitions with an equal number of modes. Despite the resulting simplicity of implementation, this fixed partitioning might be less meaningful for more challenging flow problems where dynamics change rapidly. Therefore, an adaptive partitioning with varying number of modes can be another extension to generalize the present work. The assumption of non-overlapping partitions might also be relaxed to allow overlapping sub-intervals with an indicator for smooth transition between partitions based on the actual flow conditions. Finally, the development of a strategy to dynamically update the basis functions using data assimilation techniques would enable the extrapolation of ROM in time while reflecting local phenomena.

**Author Contributions:** Data curation, S.E.A. and O.S.; Supervision, O.S.; Writing—original draft, S.E.A.; and Writing—review and editing, S.E.A. and O.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Award Number: DE-SC0019290.

**Acknowledgments:** This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0019290. Omer San gratefully acknowledges their support. Disclaimer: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Closure Learning for Nonlinear Model Reduction Using Deep Residual Neural Network**

#### **Xuping Xie 1, Clayton Webster <sup>2</sup> and Traian Iliescu 3,\***


Received: 18 March 2020; Accepted: 19 March 2020; Published: 23 March 2020

**Abstract:** Developing accurate, efficient, and robust closure models is essential in the construction of reduced order models (ROMs) for realistic nonlinear systems, which generally require drastic ROM mode truncations. We propose a deep residual neural network (ResNet) closure learning framework for ROMs of nonlinear systems. The novel ResNet-ROM framework consists of two steps: (i) In the first step, we use ROM projection to filter the given nonlinear system and construct a spatially filtered ROM. This filtered ROM is low-dimensional, but is not closed. (ii) In the second step, we use ResNet to close the filtered ROM, i.e., to model the interaction between the resolved and unresolved ROM modes. We emphasize that in the new ResNet-ROM framework, data is used only to complement classical physical modeling (i.e., only in the closure modeling component), not to completely replace it. We also note that the new ResNet-ROM is built on general ideas of spatial filtering and deep learning and is independent of (restrictive) phenomenological arguments, e.g., of eddy viscosity type. The numerical experiments for the 1D Burgers equation show that the ResNet-ROM is significantly more accurate than the standard projection ROM. The new ResNet-ROM is also more accurate and significantly more efficient than other modern ROM closure models.

**Keywords:** reduced order model; closure model; variational multiscale method; deep residual neural network

#### **1. Introduction**

Many scientific and engineering applications, such as weather forecasting, ocean modeling, and cardiovascuar flow simulation, can often be represented by multiscale systems of ordinary differential equations (ODE) in high-dimensional modal spaces. The analysis and high-fidelity simulation of such systems can be very expensive even on high-performance computing systems. Consequently, using the full order model (FOM) for such simulations can be impractical for time critical applications, such as flow control and parameter estimation. To alleviate the computation burden of the FOM simulation, reduced order models (ROMs) have been successfully used.

ROMs seek a low-dimensional approximation of the FOM with orders of magnitude reduction in computational cost. The classical projection based ROM approach is first to construct a low dimensional space using data-driven reduction methods, such as proper orthogonal decomposition (POD) or dynamical modal decomposition (DMD). The ROM dynamics can be obtained via Galerkin projection of the FOM onto the reduced space [1–5].

The Galerkin projection reduced order model (GP-ROM) can be very efficient and relatively accurate for many systems [3,4]. There are, however, systems (e.g., convection-dominated fluid flows) for which the GP-ROM can generate inaccurate approximations. There are several approaches to address this inaccuracy (see, e.g., [6–9]). In this paper, we focus on one of the main reasons for this inaccuracy: the ROM closure problem, i.e., modeling the interaction between the GP-ROM modes

and the discarded modes. Indeed, due to the inherently drastic mode truncation required in realistic settings, the dimension of the GP-ROM space is too low to resolve the complex nonlinear interactions of the fluid system [1,2,10,11]. GP-ROMs that do not include a ROM closure model can yield inaccurate results, often in the form of spurious numerical oscillations [1,2,12,13]. Endowing GP-ROM with closure models could extend the applicability of GP-ROM in many fluid mechanics applications, such as flow control, climate modeling, and weather forecasting [1,2].

ROM closure models for nonlinear systems have been proposed in, e.g., [14–24]. The vast majority of the current ROM closure models aim at mitigating the numerical instability observed in GP-ROMs that do not include a closure model. Some of these ROM closure models use stabilization techniques that have been developed in standard discretization methods (e.g., in the finite element community) [25–27]. Other ROM closure models have imported ideas developed in standard CFD methodologies, e.g., large eddy simulation (LES) [1,24,28,29]. The overwhelming majority of the current ROM closure models can be categorized as stabilization techniques (for a notable exception, see the approximate deconvolution ROM closure model [28] that uses a mathematical framework inspired from image processing).

This is in stark contrast with classical LES, where a plethora of closure models have been proposed over the years. The main difference between ROM closure and LES closure is that the latter has been entirely built around physical insight from the statistical theory of turbulence (e.g., energy cascade and Kolmogorov's theory), which is generally posed in the Fourier setting [30,31]. This physical insight is generally not available in the ROM setting (see, however, [32] for initial steps). Thus, current ROM closure models have generally been deprived of this powerful methodology that represents the core of most LES closure models.

We believe that machine learning represents a natural solution for ROM closure modeling. Indeed, since physical insight cannot be easily extended to the ROM setting, available data and machine learning can be utilized instead to develop ROM closure models.

We propose a novel ROM closure modeling framework that is constructed by using available data and deep residual neural network (ResNet) (for details, see, e.g., [33–37]). The resulting ROM, which we call the *residual neural network ROM (ResNet-ROM)*, is schematically illustrated in (1) and Figure 1 (see Section 3 for details). We emphasize that, in the new ResNet-ROM framework, data is used only to *complement* classical physical modeling (i.e., only in the closure modeling component) [29,38], not to completely replace it [39]. Thus, the resulting ResNet-ROM combines the strengths of both physical and data-driven modeling.

$$\text{FOM} \xrightarrow{\text{filtering}} \text{GP-ROM} \xrightarrow{\text{learning closure}} \text{ROM} \tag{1}$$

The main contributions of this paper can be summarized as follows:


Machine learning has recently been utilized to construct ROM closure models (see, e.g., [22,38,40–42]). The ROM closure model proposed in [22] is similar to the new ResNet-ROM. There are, however, two major differences between the two ROM closure models: The first difference is that the ROM closure model in [22] uses FOM data to model both the linear and the nonlinear terms in the underlying equations (see Section 5.1 in [22]). In contrast, in the LES spirit [31], the ResNet-ROM models only the nonlinear terms (see Algorithm 1). One motivation for modeling only the nonlinear terms is given in the theoretical and numerical investigation in [43], where it was shown that the contribution of the linear terms to the ROM

closure term (i.e., the commutation error) is not significant in convection-dominated problems, such as those we consider in this study. The second major difference between the ROM closure model in [22] and the ResNet-ROM is the machine learning approach used to construct the ROM closure model: Bayesian regularization and extreme learning machine approaches are used in [22], whereas a deep residual neural network (ResNet) is used to construct the ResNet-ROM.

**Figure 1.** Flow chart of the new ResNet-ROM.

#### **2. Reduced Order Model**

To construct the novel ResNet-ROM framework, we use the Navier-Stokes equations (NSE) for incompressible fluid flows:

$$\begin{cases} \frac{\partial \boldsymbol{u}}{\partial t} - \boldsymbol{R} \boldsymbol{x}^{-1} \Delta \boldsymbol{u} + (\boldsymbol{u} \cdot \boldsymbol{\nabla}) \boldsymbol{u} + \boldsymbol{\nabla} \boldsymbol{p} = \boldsymbol{0}, \\\\ \boldsymbol{\nabla} \cdot \boldsymbol{u} = \boldsymbol{0}, \end{cases} \tag{2}$$

which are defined on a spatial domain Ω and on the time interval [0, *T*]. In the NSE (2), *u* is the velocity, *p* the pressure, and *Re* the Reynolds number. The ROM basis {*ϕ*1, ... ,*ϕr*}, where *r* is small, represents the large, energy containing structures in the flow, and is obtained by using available numerical or experimental data and, e.g., the POD or DMD methods [1,3,4]. The ROM velocity approximation is defined as

$$u\_I(\mathbf{x}, t) \equiv \sum\_{j=1}^r a\_j(t) \mathfrak{p}\_j(\mathbf{x})\,,\tag{3}$$

where {*aj*(*t*)}*<sup>r</sup> <sup>j</sup>*=<sup>1</sup> are the sought time-varying coefficients. To determine these coefficients, we use a Galerkin procedure. We replace *u* in (2) with the ROM approximation (3), we take the *L*<sup>2</sup> inner product between the resulting system and each ROM basis function {*ϕ*1, ... ,*ϕr*}, and then we integrate by parts: ∀ *i* = 1, . . . ,*r*,

$$\left( \left( \frac{\partial \boldsymbol{u}\_{r}}{\partial t}, \boldsymbol{\varrho} \boldsymbol{\varrho}\_{i} \right) + \frac{1}{Re} \left( \nabla \boldsymbol{u}\_{r}, \nabla \boldsymbol{\varrho}\_{i} \right) + \left( \left( \boldsymbol{u}\_{r} \cdot \nabla \right) \boldsymbol{u}\_{r}, \boldsymbol{\varrho}\_{i} \right) = 0.$$

To derive the above equation, we assumed that the ROM velocity modes are perpendicular to the discrete pressure space, which is the case if standard mixed FEs (e.g., Taylor-Hood) are used for the snapshot creation [1,29]. Using (3), we obtain the standard *Galerkin projection ROM (GP-ROM)*:

$$
\dot{\mathfrak{a}} = A \, \mathfrak{a} + \mathfrak{a}^{\top} \, B \, \mathfrak{a} \, , \tag{4}
$$

which can be written componentwise as follows: ∀ *i* = 1...*r*,

$$d\_i = \sum\_{m=1}^{r} A\_{im} a\_m(t) + \sum\_{m=1}^{r} \sum\_{n=1}^{r} B\_{imn} a\_n(t) \; a\_m(t) \, . \tag{5}$$

where *Aim* <sup>=</sup> <sup>−</sup>*Re*−<sup>1</sup> (∇*ϕm*, <sup>∇</sup>*ϕ<sup>i</sup>* ) and *Bimn* <sup>=</sup> <sup>−</sup>- *ϕ<sup>m</sup>* · ∇*ϕn*,*ϕ<sup>i</sup>* .

#### **3. Closure Learning**

#### *3.1. Residual Neural Network (ResNet)*

Deep residual neural network (ResNet) has been first introduced for image recognition in [33]. ResNet has been widely studied and applied in many supervised learning tasks. Recent mathematical understanding of deep ResNet has been achieved in the ODE representation of ResNet; for a comprehensive introduction, see, e.g., [35–37,44].

To construct the novel ResNet-ROM framework, we consider the ResNet model, which is illustrated in Figure 2.

**Figure 2.** ResNet block used in our training. Each ResNet block consists of a flatten fully connected layer with a tanh activation function, followed by a drop out layer to prevent overfitting. 128 neurons and 5% drop out rate are used.

The input values of forward propagation in the ResNet are given by

$$X\_{t+1} = X\_t + \tanh(\mathcal{W}\_t X\_t + b\_t), \ t = 1, \ldots, N - 1,\tag{6}$$

where *<sup>N</sup>* is the number of layers in the network architecture, and *Xt* <sup>∈</sup> *<sup>s</sup>* is the output from each ResNet block at time *t*. *Wt* and *bt* are the weight matrix and bias at each layer, respectively. The ResNet propagation starts from step *t* = 1 with the nonlinear activation function tanh. The initial input layer for the network is *X*<sup>0</sup> = [*Re*, *t*, *a*1, ... , *ar*] *<sup>T</sup>*. For a standard ResNet in image classification, a convolution layer (CNN) is often included in a residual block. In our work, we use a simplified version of the ResNet that does not include CNN.

#### *3.2. ROM Closure Modeling*

In realistic nonlinear systems (e.g., convection-dominated flows), current GP-ROMs of the form (4) generally yield inaccurate results, often in the form on numerical instabilities. This inaccurate behavior is due to the fact that, in order to maintain a low computational cost, GP-ROMs are constructed with a drastically truncated ROM basis {*ϕ*1, ... ,*ϕr*}, where *r* is generally small (e.g., *r* = O(10)). In realistic nonlinear systems, this extreme truncation cannot capture the complex nonlinear interactions among the various degrees of freedom of the system. Thus, current GP-ROMs are computationally efficient, but numerically inaccurate. To alleviate this inaccurate behavior, GP-ROMs are often supplemented with a ROM closure model [1,24], i.e., a model for the important interactions between the ROM basis

{*ϕ*1, ... ,*ϕr*} and the discarded ROM modes {*ϕr*+1, ... ,*ϕR*}, where *R* is the dimension of the input data. For example, for the NSE, the standard GP-ROM (4) is generally modified as follows:

$$
\dot{a} = A \, a + \boldsymbol{a}^{\top} \, \boldsymbol{B} \, \boldsymbol{a} + \boldsymbol{\pi},\tag{7}
$$

where *τ* is a *ROM closure model* that represents the interactions between the ROM modes and the discarded modes.

We emphasize that the same closure problem needs to be addressed when classical numerical discretization schemes (e.g., finite element or spectral methods) are used in the numerical simulation of turbulent flows. In those settings, classical discretizations schemes are used in inherently under-resolved regimes (i.e., they use coarse meshes or too few spectral modes). To ensure the relative accuracy of these under-resolved simulations, various types of closure models for the unresolved (e.g., subgrid-scale) information are generally utilized. These closure models are central to, e.g., large eddy simulation (LES) [31], one of the main approaches to the numerical simulation of turbulent flows. The vast majority of LES closure models have been constructed by using physical insight from Kolmogorov's statistical theory of turbulence. The concept of energy cascade is central in the development of LES closure models. The energy cascade states that energy enters the system at the large scales, is transferred to smaller and smaller scales through nonlinear interactions, and is dissipated at the smallest scale (i.e., the Kolmogorov's scale). Thus, most LES closure models (e.g., of eddy viscosity type) aim at recovering the energy cascade displayed by the original system (i.e., the NSE).

This physical insight cannot be easily extended to the ROM setting (see, however, [32] for a preliminary numerical investigation). Thus, current ROM closure models have generally been deprived of this powerful methodology that represents the main tool in the development of most LES closure models.

#### *3.3. ROM Closure Learning*

Our vision is that machine learning represents a natural approach for constructing ROM closure modeling. Indeed, since physical insight cannot generally be used in a ROM setting, data and machine learning can be utilized instead to develop ROM closure models. Furthermore, data is used only to construct the ROM closure model; the other ROM operators are built by using the classical Galerkin projection. Thus, data and machine learning complement (instead of replace) physical based modeling, yielding a hybrid framework that synthesizes the strengths of both approaches [17,18,22,27,29,42,45–48].

In this section, we propose a novel ROM closure modeling framework that is constructed by using available data and the ResNet approach described in Section 3.1. The resulting ROM, which we call the *residual neural network ROM (ResNet-ROM)*, is schematically illustrated in (1) and Figure 1, and is summarized in Algorithm 1.

To obtain the explicit formula (11) in Algorithm 1, we develop a *large eddy simulation ROM (LES-ROM)* framework [24,28]: First, we filter the high-resolution (*R*-dimensional) ROM approximation of the NSE with a low-pass ROM spatial filter, denoted by overbar in (11). In this paper, as a ROM spatial filter, we use the ROM projection onto the space spanned by the first *r* ROM modes [29,48] (see [28] for an alternative ROM spatial filter): For a given *u* ∈ span{*ϕ*1, ... ,*ϕR*}, the ROM projection seeks *u* ∈ span{*ϕ*1,...,*ϕr*} such that

$$(\overline{u}, \varphi\_{\overline{i}}) = (u, \varphi\_{\overline{i}}) \qquad \forall \, i = 1, \dots, r,\tag{8}$$

where (·, ·) denotes the *<sup>L</sup>*<sup>2</sup> inner product. The resulting filtered equations can be cast within a *variational multiscale (VMS)* framework [49], yielding a two-scale VMS-ROM [50]. Equation (11) represents the explicit VMS-ROM formula for the ROM closure term *τ* in (10). In the two-scale VMS-ROM [50], the ROM closure model is developed in two steps. First, an ansatz is used to approximate the ROM closure term: *<sup>τ</sup>* <sup>≈</sup> *<sup>A</sup>*˜ *<sup>a</sup>* <sup>+</sup> *<sup>a</sup> <sup>B</sup>*˜ *<sup>a</sup>*. Then, FOM data for *<sup>τ</sup>* in the training time interval is used to solve

a least squares problem to determine the optimal entries in *A*˜ and *B*˜. In the new ResNet-ROM, we construct the ROM closure model in a fundamentally different way. Instead of making an ansatz on the structural form of the ROM closure term *τ* (as in the two-scale VMS-ROM [50]), in Algorithm 1 we consider a general structural formulation and use the ResNet approach to construct the ROM closure model. To this end, in (6), we make the following choices: The initial layer contains the Reynolds number (*Re*), the current time (*t*), and the current ROM coefficients in (3) (*a*1, ... , *ar*), i.e., *<sup>X</sup>*<sup>0</sup> = [*Re*, *<sup>t</sup>*, *<sup>a</sup>*1, ... , *ar*] <sup>∈</sup> (*s*+2). The final output layer provides an approximation for the closure term, i.e., *XN* <sup>≈</sup> *<sup>τ</sup>FOM*. The optimization problem associated with this ResNet is given by

$$\min \|\mathbf{r}^{\text{ansatz}} - \mathbf{r}^{\text{FOM}}\|\_{F}^{2} + \lambda R(\mathcal{W}, b), \tag{9}$$

where the regularizer *R* penalizes undesirable parameters and can prevent overfitting, *<sup>τ</sup>ansatz* <sup>=</sup> *<sup>f</sup>*(*a*, *Re*, *tj*) is the output from the neural network, *<sup>W</sup>* and *<sup>b</sup>* are weights in the network, ·*<sup>F</sup>* is the Frobenius norm, and *λ* is a hyperparameter for the *L*<sup>2</sup> regularization to prevent overfitting [36,37]. The minimization problem (9) is over the parameters defining the structural form of the function *f* used to define *τansatz*.

**Algorithm 1** ResNet-ROM

1: Consider the ROM closure model

$$
\dot{\mathfrak{a}} = A \, \mathfrak{a} + \mathfrak{a}^{\top} \, \stackrel{\dash}{B} \mathfrak{a} + \mathfrak{r} \,. \tag{10}
$$

2: Use snapshot data to compute the true vector *τ* in (10), *τFOM*:

$$\pi\_i^{\text{FOM}}(t\_j) \quad = \quad - \left( \overline{\left( \mathfrak{u}\_{\mathbb{R}}^{\text{FOM}}(t\_j) \cdot \nabla \right) \mathfrak{u}\_{\mathbb{R}}^{\text{FOM}}(t\_j)}' \right)$$

$$- \left( \mathfrak{u}\_r^{\text{FOM}}(t\_j) \cdot \nabla \right) \mathfrak{u}\_r^{\text{FOM}}(t\_j) \,, \mathfrak{q}\_i \right) \, \tag{11}$$

where an overbar indicates spatial filtering with ROM projection.

3: Use snapshot data and (11) to define the approximation function *τ* in (10), *τansatz*:

$$
\pi^{ansatz}(t\_{\bar{j}}) = f(\mathfrak{a}, \mathfrak{R}e, t\_{\bar{j}}),
\tag{12}
$$

where *f* is a generic function that needs to be determined.

4: Use ResNet to train the closure term, i.e., to find the form of *f* in (12) that is optimal with respect

to the minimization problem (9).

5: The novel ResNet-ROM has the following form:

$$
\dot{a} = A \, a + \boldsymbol{a}^{\top} \, \boldsymbol{B} \, \boldsymbol{a} + f\left(\boldsymbol{a}, \operatorname{Re}, t\right) \,. \tag{13}
$$

#### **4. Numerical Experiments**

#### *4.1. Implementation*

As a test problem for our new approach, we use the 1D Burgers equation, which has been used to test new ROM ideas in simplified settings [29,51,52]:

$$\begin{cases} u\_t - \mathcal{R}e^{-1}u\_{xx} + uu\_x = 0 & x \in \Omega, \\ u(x,0) = u\_0(x) & x \in \Omega, \\ u(x,t) = 0 & x \in \partial\Omega, \end{cases} \tag{14}$$

where, for consistency with the notation used for the NSE, the diffusion parameter is denoted as *Re*−1. In our numerical tests, Ω = [0, 1] is the computational domain and the time domain is [0, 1]. We use the same initial conditions as those utilized in [29,51,52]: *u*0(*x*) = 1, *x* ∈ (0, 1/2], *u*0(*x*) = 0, *x* ∈ (1/2, 1]. These initial conditions yield a steep internal layer that is difficult to capture in the convection-dominated regime that we consider [29,51,52]. We first use a piecewise linear finite element discretization to generate the FOM solution. To this end, we utilize a uniform mesh with *N* = 1024 grid points (which yields a meshsize *h* = 1/1024) and the forward Euler method with Δ*t* = 10−<sup>4</sup> for the time discretization. To construct the ROM basis, we collect 101 snapshots sampled from [0, 1]. We build the neural network in PyTorch and we train the ROM closure model with a 6 block ResNet with the common Adam optimizer [53]. We perform all the computational experiments on a Linux system laptop with Nvidia Geforce GTX GPU hardware.

#### *4.2. Reconstruction*

In this section, we consider the reconstructive regime, i.e., we test the ROMs at the same *Re* as the *Re* at which the ROMs are constructed. We choose *Re* = 1000 in (14) and we use *r* = 6 basis functions in all ROMs. In Figure 3, we plot the solutions of the FOM (top left), GP-ROM (top right), and ResNet-ROM (bottom left). When compared with the FOM data, the ResNet-ROM solution is significantly more accurate than the standard GP-ROM solution. In Figure 3, we also plot the FOM, GP-ROM, and ResNet-ROM solutions at the final time step (i.e., at *t* = 1). This plot shows that the closure term in the ResNet-ROM plays an important role in stabilizing the ROM approximation. Indeed, the GP-ROM solution displays large, spurious numerical oscillations. These oscillations are dramatically decreased in the ResNet-ROM solution.

In Figure 4, we plot the the time evolution of the ROM coefficients *a* for the FOM, GP-ROM, and ResNet-ROM. The plots show that the ResNet-ROM is significantly more accurate than the standard GP-ROM.

**Figure 3.** Reconstructive regime, *Re* = 1000: FOM (**top left**), GP-ROM (**top right**), ResNet-ROM (**bottom left**), and final time solution for all three simulations (**bottom right**). ResNet-ROM yields the most accurate solution.

**Figure 4.** Reconstructive regime, *Re* = 1000: Time evolution of ROM coefficients *a*<sup>1</sup> and *a*<sup>3</sup> for FOM, GP-ROM, and ResNet-ROM. ResNet-ROM yields the most accurate solution.

#### *4.3. Prediction*

To study the robustness of the new ResNet-ROM, we test its predictability, i.e., we train the ResNet-ROM closure term on data from multiple *Re* and we then test the ResNet-ROM to predict the ROM dynamics at different *Re*. The training data space is sampled at *Re* = [20, 50, 100, 200, 500, 800, 1000], and the test data contains *Re* = [30, 80, 300, 1200]. Note that the solution of the Burgers equation is affected by *Re*. Small *Re* values yield a slow movement of the sharp internal layer, while large *Re* values can speed up this movement.

In Figure 5, for *Re* = 30, 80, and 1200 (which are different from the training *Re* values), we plot the solutions for the DNS (first column), GP-ROM (second column), ResNet-ROM (third column), and final time solution for all three simulations (fourth column). These plots show that the ResNet-ROM is consistently the most accurate ROM, especially for the largest *Re* value. In Figure 5, we also plot the FOM, GP-ROM, and ResNet-ROM solutions at the final time step (i.e., at *t* = 1). These plots show that the closure term in the ResNet-ROM plays an important role in stabilizing the ROM approximation.

In Figure 6, we plot the the time evolution of the ROM coefficients *a* for the FOM, GP-ROM, and ResNet-ROM. These plots show that the ResNet-ROM is significantly more accurate than the standard GP-ROM.

Overall, we draw the same conclusion as in the reconstructive regime (Section 4.2): In all cases, the ResNet-ROM is significantly more accurate than the standard GP-ROM in the predictive regime.

**Figure 5.** Predictive regime: ROMs are trained on data from *Re* = [20, 50, 100, 200, 500, 800, 1000] and are tested at *Re* = 30 (**first row**), *Re* = 80 (**second row**), *Re* = 1200 (**third row**). Results presented for FOM (**first column**), GP-ROM (**second column**), ResNet-ROM (**third column**), and final time solution for all three simulations (**fourth column**).

**Figure 6.** Predictive regime: Time evolution of ROM coefficients *a*<sup>1</sup> and *a*<sup>3</sup> for FOM, GP-ROM, and ResNet-ROM. Results for *Re* = 30 (**top**) and *Re* = 1200 (**bottom**). ResNet-ROM yields the most accurate solution.

#### *4.4. Comparison*

In this section, in the numerical simulation of the Burgers equation, we also make a numerical comparison between the new ResNet-ROM and several modern ROM closure models: the POD artificial viscosity model (POD-AV) [51], the POD Smagorinsky model (POD-L) [54], the evolve-then-filter ROM (EF-ROM) [55], the approximate deconvolution ROM (AD-ROM) [28], and the data-driven filtered ROM (DDF-ROM) [29].

In Table 1, we list the *L*<sup>2</sup> errors of the ResNet-ROM and the other closure models. These results show that the ResNet-ROM error is at least *an order of magnitude lower* than the errors of the other ROM closure models.


**Table 1.** *L*<sup>2</sup> errors of the new ResNet-ROM and other closure models.

For the online ResNet-ROM integration, we use the *scipy* package with the built-in integration function *odeint*. In the online stage, the new ResNet-ROM is more efficient than the other ROM closure models: The online CPU time of the ResNet-ROM is 3.89 s, whereas the online CPU times of the EF-ROM, AD-ROM, and DDF-ROM are 6.91, 7.26, and 4.42 s, respectively [28,29,55].

The CPU time of the offline training of the new ResNnet-ROM is 122.74 s for the current dataset with 10,000 epochs (iterations). Thus, even though the online cost for ResNet-ROM is lower than the online cost of the other closure models that are compared in this paper, the neural network training cost of ResNet-ROM is much higher than the offline training cost of the other ROM closure models [28,29,55].

#### *4.5. Sensitivity*

In this section, we perform a sensitivity study of the new ResNet-ROM with respect to two parameters: (i) the hyperparameter *λ* used in the regularization of the neural network training; and (ii) the meshsize *h* used in the snapshot generation. We also investigate the potential improvement in GP-ROM accuracy when the number of snapshots and the dimension *r* of the ResNet-ROM are increased.

The parameter *λ* is an *L*<sup>2</sup> regularization parameter used in the neural network training to prevent overfitting. In our numerical simulations, we pick the parameter *λ* based on the validation error performance in the training phase. In our training dataset for *Re* = [20, 50, 100, 200, 500, 800, 1000], we test the following parameter values: *λ* = 0, 1, 10<sup>−</sup>1, 10−2, 10−3. In Figure 7, we plot the ResNet-ROM error for different parameter values. The plot in Figure 7 shows that the ResNet-ROM is not very sensitive to the hyperparameter *λ*. Thus, in our numerical tests, we fix *λ* = 0.01 with 10,000 epochs (iterations) in the Resnet-ROM training.

We also perform a sensitivity study for the new ResNet-ROM with respect to the meshsize *h*. In Figure 8, for the reconstructive regime and *Re* = 1000, we plot the FOM, GP-ROM, and ResNet-ROM results for three coarse meshsize values. The plots in Figure 8 show that the ResNet-ROM is still significantly more accurate than the GP-ROM. In the predictive regime, however, both the ResNet-ROM and the GP-ROM yield inaccurate results for these three coarse meshes. The relationship between the FOM mesh utilized to generate the snapshots and the ROM accuracy is subject of current research (see, e.g., [56,57]) and should be further investigated for the new ResNet-ROM.

**Figure 7.** Validation error performance for different values of the regularization parameter *λ*.

**Figure 8.** Reconstructive regime, *Re* = 1000: FOM (**first column**), GP-ROM (**second column**), and ResNet-ROM (**third column**), for meshsizes *h* = 1/64 (**first row**), *h* = 1/128 (**second row**), and *h* = 1/256 (**third row**). ResNet-ROM yields the most accurate solution.

We also investigate the ResNet-ROM rate of convergence with respect to the meshsize *h*. To this end, we fixed the ResNet-ROM dimension at *r* = 30 and the time-step size at Δ*t* = 10−3. The plot in Figure 9 shows that the rate of convergence (obtained with a least squares fit) is about *h*1.70, which is an acceptable approximation to the theoretical rate of convergence of *h*2.

Finally, we investigate the potential improvement in GP-ROM accuracy when the number of snapshots and the dimension *r* of the ResNet-ROM are increased.

First, we collect the maximum number of snapshots that are available from the FOM simulations in the training interval. That is, we collect 1001 snapshots, which yield a snapshot matrix whose rank is 251. Comparing the left plot in Figure 10 (for the highest number of snapshots) with the top right plot in Figure 3 (for the lower number of snapshots), we conclude that increasing the number of snapshots does not seem to improve the ResNet-ROM accuracy.

**Figure 9.** The ResNet-ROM rate of convergence with respect to the meshsize *h*.

Next, we increase the GP-ROM dimension. In Figure 10, we plot the GP-ROM results for three *r* values: *r* = 10, *r* = 20, and *r* = 30. As expected, the GP-ROM accuracy improves as we increase *r*. We emphasize, however, that the ROM closure models (such as that used in the new ResNet-ROM) are designed to improve the GP-ROM accuracy in under-resolved numerical simulations, i.e., when only a few ROM basis functions can be used, which is often the case in realistic settings [1,2,14,16–18,20–23,25,27–29,32,38,40–42,46–48,50,51,54,58]. As shown earlier, for *r* = 10, the ResNet-ROM is significantly more accurate than the GP-ROM both in the reconstructive and the predictive regimes.

**Figure 10.** Reconstructive regime, *Re* = 1000, GP-ROM results: *r* = 10 (**left**), *r* = 20 (**middle**), and *r* = 30 (**right**).

#### **5. Conclusions**

In this paper, we used available data and deep residual neural networks (ResNet) to construct a novel reduced order model (ROM) closure for complex nonlinear settings. We emphasize that the ResNet-ROM closure terms are much more general than the phenomenological ansatzes generally used in ROM closure modeling [29]. We tested the novel ResNet-ROM in the numerical simulation of the Burgers equation. For comparison purposes, we investigated the standard Galerkin projection ROM (GP-ROM) and the full order model (FOM). We considered two settings: (i) a reconstructive regime, in which the Reynolds number *Re* is the same in the training and testing stages; and (ii) a predictive regime, in which *Re* used in the testing regime is different from the *Re* used in the training regime. In both regimes, the new ResNet-ROM was consistently more accurate than the standard GP-ROM. Furthermore, the ResNet-ROM was also dramatically more accurate than several other ROM closure models from the literature.

There are several research directions that we plan to pursue: We will test the novel ResNet-ROM on realistic test problems (e.g., 3D turbulent flows) and we will compare it with state of the art closure models. We will also investigate alternative approaches to develop the ROM closure term *τ*. Indeed, in this paper we used the ROM projection as a spatial filter in the construction of the ROM closure term *τ*. We plan to investigate different ROM spatial filters, such as the ROM differential filter [55]. This alternative ROM filter will yield a different ROM closure term *τ* and, therefore, a different ResNet-ROM. Finally, a numerical investigation of alternative machine learning approaches (see, e.g., [22]) could yield improved ROM closure models. Indeed, for the one-dimensional Burgers equation test case considered in this paper, the computational cost of training the ROM closure model with ResNet was acceptable. For more complex settings, however, this computational cost could be much higher. In those cases, neural networks with a lower computational cost could be more effective.

**Author Contributions:** Conceptualization, X.X., C.W. and T.I.; methodology, X.X., C.W. and T.I.; writing—original draft preparation, X.X.; writing—review and editing, C.W., T.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work of the second author was supported by U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research. The work of the third author was supported by National Science Foundation grant DMS-1821145.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Liquid-Cooling System of an Aircraft Compression Ignition Engine: A CFD Analysis**

#### **Alessandro Coclite 1,\*, Maria Faruoli 1, Annarita Viggiano 1, Paolo Caso <sup>2</sup> and Vinicio Magi 1,3**


Received: 7 April 2020; Accepted: 9 May 2020; Published: 13 May 2020

**Abstract:** The present work deals with an analysis of the cooling system for a two-stroke aircraft engine with compression ignition. This analysis is carried out by means of a 3D finite-volume RANS equations solver with *k*- closure. Three different cooling system geometries are critically compared with a discussion on the capabilities and limitations of each technical solution. A first configuration of such a system is considered and analyzed by evaluating the pressure loss across the system as a function of the inlet mass-flow rate. Moreover, the velocity and vorticity patterns are analyzed to highlight the features of the flow structure. Thermal effects on the engine structure are also taken into account and the cooling system performance is assessed as a function of both the inlet mass-flow rate and the cylinder jackets temperatures. Then, by considering the main thermo-fluid dynamics features obtained in the case of the first configuration, two geometrical modifications are proposed to improve the efficiency of the system. As regards the first modification, the fluid intake is split in two manifolds by keeping the same total mass-flow rate. As regards the second configuration, a new single-inlet geometry is designed by inserting restrictions and enlargements within the cooling system to constrain the coolant flow through the cylinder jackets and by moving downstream the outflow section. It is shown that the second geometry modification achieves the best performances by improving the overall transferred heat of about 20% with respect to the first one, while keeping the three cylinders only slightly unevenly cooled. However, an increase of the flow characteristic loads occurs due to the geometrical restrictions and enlargements of the cooling system.

**Keywords:** internal combustion engines; liquid-cooling system; heat transfer; computational fluid dynamics

#### **1. Introduction**

It is well known that internal combustion engines for aeronautical applications operate within a specific temperature range to avoid structural damages, detonations and loss of efficiency of the combustion process. The heat released by combustion is only partially converted to work: a certain amount is within exhaust gases, while the remaining part warms-up the engine structure. This heat needs to be some how dissipated by keeping the engine within its optimal temperature range as to ensure reliability and long service life [1]. This temperature range (roughly from 350 to 380 K) ensures that the engine is efficiently working from the thermodynamic point of view. Indeed, a high cooling would reduce the engine thermal efficiency whereas if the engine temperature exceeds this range, the engine may over-heat and this over-heating is likely to damage the engine. The engine over-heating can be prevented by using efficient cooling systems to assist

the engine running at its optimal performance. Hence, there is a need to look at the total heat balance and control system for the aircraft in order to search for performance optimization [2–4].

Generally, for piston aircraft engines, either air or liquid cooling systems are employed; only a few modern engines implement a combination of both, all with their own advantages and disadvantages [5]. Inline four, six or eight cylinders engines are almost exclusively air-cooled, except for the Rotax, Viking and Subaru engines and some aero diesels [6]. Such a cooling technique represents a good compromise between the structure design and the low weight achieved when compared to liquid cooling circuits. Nevertheless, air-cooled cylinders present a large number of fins cast around the heads increasing the total exposed surface. Indeed, the engine may be shock-cooled at high altitudes [7]. On the other hand, liquid-cooling systems have a weight penalty with respect to air-based systems while they are able to keep the temperature of all of the cylinders quite even and the engine cannot be shock-cooled during high speed or low power descends. Moreover, the coolant liquid is usually thermostatically controlled so that the engine is quickly warmed up at the start-up and works all the time around the optimal operating temperature [8].

In this context, a detailed analysis of such cooling systems is required by the designer to optimize the heat distribution of the engine structure. Computational Fluid Dynamics (CFD) simulations provide a means to optimize cooling circuit configurations by employing sensitivity analysis to the components of such circuits. Indeed, simulation models are very useful for engineers not only to support but also to reduce the amount of testing required during the design of the engine. Numerical models and simulations can greatly enhance development efforts by predicting performance trends and trade-offs and will, therefore, result in more efficient and better-optimized heating and cooling systems for high-performing engines [9–14]. Specifically, due to the wide spatial and temporal scales involved in internal combustion engine simulations, steady and unsteady Reynolds-averaged Navier–Stokes (RANS) solvers are largely employed supporting the rational design of such engines. Moreover, recently, hybrid approaches involving both unsteady RANS and Large Eddy Simulations (LES) techniques have also been proposed thus achieving the required resolutions needed for a comprehensive analysis of such engines [15–17].

In this work, a liquid cooling system for a specific aircraft engine is analyzed by means of a 3D finite-volume RANS equations solver with *k*- closure. Specifically, the RANS equations for an incompressible thermal fluid is used to analyze the cooling system performance as a function of the inlet mass-flow rate G*In*. Such analysis is carried out by varying the cylinder jackets temperature and by comparing the outflow temperature and the transferred thermal power as a function of coolant mass-flow rate. The CFD results enable us to notice the drawbacks of the cooling system geometry and suggest some improvements that can be used to enhance the heat transfer. Two geometrical modifications to the cooling system are proposed. The first modification consists of two coolant inlets while keeping the same total mass-flow rate; the second configuration consists of a new single-inlet geometry with new restrictions and enlargements within the cooling system to constrain the coolant flow through the cylinder jackets and also by positioning downstream the outflow manifold. Capabilities and limitations of these three cooling system geometries are critically discussed and compared providing useful insights for the internal combustion engine community.

#### **2. The Model**

#### *2.1. Mathematical Model*

The steady three-dimensional Reynolds-Averaged Navier-Stokes (RANS) system of equations with the *k*–closure reads:

$$\frac{\partial(E - E\_v)}{\partial x} + \frac{\partial(F - F\_v)}{\partial y} + \frac{\partial(G - G\_v)}{\partial z} = S \, , \tag{1}$$

where *E*, *F* and *G* are the inviscid flux vectors:

$$\begin{aligned} ^0E = \begin{pmatrix} \overline{u} \\ \rho \overline{u}^2 + p\_t \\ \rho \overline{u} \overline{v} \\ \rho \overline{u} \overline{u} \\ \rho c\_p \overline{v} \overline{u} \\ \rho \overline{u}k \\ \rho \overline{u} \varepsilon \end{pmatrix}, F = \begin{pmatrix} \overline{v} \\ \rho \overline{u} \overline{v} \\ \rho \overline{v}^2 + p\_t \\ \rho c\_p \overline{v} \overline{v} \\ \rho c\_p \overline{v} \overline{v} \\ \rho \overline{v} k \\ \rho \overline{v} \varepsilon \end{pmatrix}, G = \begin{pmatrix} \frac{\overline{w}}{\overline{w}} \\ \rho \overline{v} \overline{w} \\ \rho \overline{v} \overline{w} \\ \rho c\_p \overline{v} \overline{v} \\ \rho c\_p \overline{v} \overline{v} \\ \rho \overline{v} \overline{v} \end{pmatrix}; \tag{2} \end{aligned} \tag{2}$$

where overbar indicates Reynolds-averaging of the physical variable; *ρ* is the density; (*u*, *v*, *w*) and *T* are the Reynolds-averaged velocity components and temperature, respectively; *pt* = *<sup>p</sup>* + <sup>2</sup> <sup>3</sup> *k* being *p* the Reynolds-averaged pressure; *k* and  are the turbulent kinetic energy and its dissipation rate. *Ev*, *Fv* and *Gv* represent the viscous flux vectors, reading:

$$\begin{aligned} \;^L\_V &= \left( \begin{pmatrix} 0\\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \end{pmatrix} , \;^F\_U = \begin{pmatrix} 0\\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \end{pmatrix} , \;^L\_U = \begin{pmatrix} 0\\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \\ \mu \frac{d\overline{w}}{dt} \end{pmatrix} , \end{aligned}$$

being *μ*, *μ<sup>t</sup>* and *λ* the molecular viscosity, turbulent viscosity and thermal diffusivity, respectively. Finally, S corresponds to the source term vector,

$$S = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ S\_k \\ S\_c \end{pmatrix},\tag{4}$$

where *Sk* and *S* are the source terms related to *k* and *-*, respectively.

The steady-state RANS equations with *k*– turbulence closure are solved by means of a cell-centered finite volume numerical scheme. The computational domain is discretized by a multi-block unstructured mesh. A pressure-based solver is employed where the equations are solved segregated one to each other,

i.e., each equation is decoupled from all the others. Indeed, this segregated algorithm is memory-efficient, since the discretized equations need only to be located in the memory one at a time [18]. Specifically, the SIMPLE solver by Patankar and Spalding [19] is employed. Viscous terms are discretized by a second-order-accurate least-squares cell-centered scheme [20], while convective terms are treated with a second-order upwind scheme [21]. The pressure and velocity fields are solved by using a SIMPLE-type pressure-velocity coupling in which the momentum equation is solved using an estimated initial pressure field. Then, the SIMPLE correction for the pressure field is employed to satisfy the continuity equation. The pressure-correction equation is solved by using the algebraic multigrid (AMG) method by Hutchinson and Raithby [22]. In the present work, only steady flows are considered; therefore, the algorithm is iterated in a pseudo-time until a residual drop of at least four orders of magnitude for all of the conservation-law equations is achieved. No-slip conditions are imposed along the engine walls while the near-wall region of the flow field is treated with the standard wall-functions approach proposed by Launder and Spalding [23].

#### *2.2. Computational Domain and Boundary Conditions*

In this work, a six-cylinder aircraft compression ignition engine with two opposite banks is considered. Specifically, due to the specular geometries of the banks, only one of the two cooling system banks is considered (Figure 1a). The bank is characterized by an inlet section and an outlet section and presents three slots corresponding to the three cylinders jackets. Moreover, two modifications of this geometry are proposed and discussed to provide improvements in terms of mass-averaged outflow temperature and cylinder specific and total transferred thermal power (see Figure 1b,c). These geometries are discretized by means of about 8,000,000 tetrahedral elements with an average cell size of about 1.5 mm and a global minimum size of about 0.05 mm. The spatial discretization used in the computations for the single-inlet geometry (depicted in Figure 1a) is shown in Figure 2. Three sections are considered along with close-ups to show the quality of the discretization in the near-wall regions. Specifically, two transversal sections are located at y = −0.250 m and y = −0.325 m and the longitudinal section corresponds to the valve plane at z = <sup>−</sup>0.01 m. The minimum and maximum *<sup>y</sup>*<sup>+</sup> for this numerical discretization are 20 and 97, respectively. These values are well confined in the range of 10 <sup>≤</sup> *<sup>y</sup>*<sup>+</sup> <sup>≤</sup> 300, which represents a constraint for the employed *k*–closure joined with the use of standard wall functions.

The comparison between the three geometries is provided in terms of both the thermodynamic and kinematic flow fields, i.e., velocity, vorticity and temperature distributions, and of the evaluation of the pressure loss through the bank, the outflow temperature, as well as the cylinder specific and total transferred thermal power. To this end, two sets of simulations with five inlet mass-flow rates, namely *Gin* = 0.5, 1.0, 1.5, 2.0, and 2.5 kg/s are considered. At first instance, thermal effects are neglected, i.e., fluid temperature is constant through the bank. This analysis is mainly carried out to provide a picture of the overall flow structure and to identify flow features under such an assumption. The working fluid is assumed to be water at 353 K with density and viscosity equal to 972 kg/s and 3.54 <sup>×</sup> <sup>10</sup>−<sup>4</sup> Pa·s, respectively. The second set of computations includes heat transfer between fluid and walls with the inlet coolant water temperature equal to 343 K and the crankcase external temperature equal to 380 K, whereas the cylinder jacket temperature ranges from 400 K to 460 K. For such thermal analysis, the water density and viscosity are computed as functions of the water temperature by employing a second-order polynomial fitting [24]. For both sets of computations, the mass-flow rate is set at the inlet section with an outlet pressure p*out* = 1 bar. As regards turbulence, the inlet turbulence intensity is *u in* = 0.25 *uin* and the turbulence characteristic length is equal to 0.3 *din*, being *din* and *uin* the inlet diameter and the inlet mass-averaged fluid velocity, respectively.

**Figure 1.** Geometry of the cooling system for the three configurations. (**a**) Sketch of the right bank of the cooling system. (**b**) Sketch of the right bank of the cooling system with a second intake duct (In2) on the opposite side with respect to the old one (In1). (**c**) Sketch of a new single inlet geometry (left bank) obtained by inserting enlargements and restrictions to the old one and by moving downstream the outlet duct. All of the graphics are represented from an internal (left column) and external (right column) point of view.

**Figure 2.** Spatial discretization of the cooling system for the single-inlet configuration. (**a**) Computational mesh of the right bank of the cooling system. (**b**,**c**) Transversal sections of the computational mesh at (**b**) y = −0.250 m and (**c**)y= −0.325 m. (**d**) Longitudinal section of the computational mesh at the valve plane z = −0.01 m.

#### **3. Results and Discussion**

#### *3.1. Analysis of the Single-Inlet Geometry*

At first instance, the flow field of the single-inlet geometry (see Figure 1a) is analyzed by neglecting thermal effects. The pressure drop across the cooling system is computed as a function of the inlet mass-flow rate and shown in Figure 3. The head losses are computed as the difference between the inlet (< *pin* >) and the outlet (< *pout* >) surface-averaged pressure, Δ*p* =< *pin* >*Ain* − < *pout* >*Aout* , where < *pin*/*out* >*Ain*/*out* correspond to the surface-averaged pressure related to the inlet (outlet) sections. The diagram shows that Δ*p* vs. G*in* follows a quadratic trend due to the fully developed turbulent regime of the flow field. The figure shows that Δ*p* is of the order of 10−<sup>2</sup> bar for all the investigated inlet mass-flow rates.

The velocity and vorticity magnitude contour plots, given in Figure 4, show that a large fraction of the flow pattern sweeps over the cylinder that is closer to the inlet section, i.e., cylinder 5. Then, the flow is almost totally short-circuited from the inlet directly to the outlet for all values of G*in*. Indeed, the cylinder in between, i.e., cylinder 3, is directly affected by the fluid flow, whereas cylinder 1 results poorly cooled down even for large values of G*in*, as shown in Figure 4c. On the other hand, the small values of Δ*p* (O(10−2) bar) reflect that the flow is somehow short-circuited so that the path actually performed by the fluid is relatively short and without much cross-sectional area variations. An interesting view of the flow structure is given in Figure 5 for G*In* = 0.5, 1.5, and 2.5 kg/s, where the velocity magnitude colored streamlines are drawn to provide a qualitative picture of the coolant internal flow structure.

**Figure 3.** Single-inlet cooling system characteristic loads. Pressure loss as a function of the inlet mass-flow rate.

**Figure 4.** Velocity and vorticity contour plots for the single-inlet geometry. Distribution of velocity (left) and vorticity (right) magnitude for G*In* = 0.5 (**a**), 1.5 (**b**), and 2.5 (**c**) kg/s.

**Figure 5.** Velocity magnitude colored streamlines for the single-inlet geometry. Velocity magnitude colored streamlines, starting from the inlet section, for G*In* = 0.5 (**a**), 1.5 (**b**), and 2.5 (**c**) kg/s. Internal (left) and external (right) point of view.

In order to consider thermal effects, the temperatures of the inlet water and of the crankcase are kept constant and equal to 343 K and 380 K, respectively. Several simulations have been performed by changing the cylinder jackets temperatures (T*Cyls*) from 400 to 460 K with a 20 K step. For the sake of conciseness, only the temperature contour plots for the case T*Cyls* = 440 K are reported in Figure 6. Like the velocity and the vorticity patterns, the fluid temperature field clearly shows that the highest temperature is located around cylinder 1, regardless of G*in*. The water, flowing through cylinder 5 jacket, results colder than that through cylinder 3 jacket which is colder than that through cylinder 1 jacket. As regards the mass-averaged outflow temperature, Table 1 shows that T*Out* decreases, by increasing G*In*, with a smaller rate with respect to the mass flow rates (see Figure 7a). Besides, Table 1 shows the cylinder-specific and the total heat transferred. The cylinder-specific heat is computed by considering only the numerical cells that surround each cylinder, whereas the total heat is the sum over all the cells of the computational domain. The values of Table 1 clearly show that the cooling of the cylinder reduces as the distance between the cylinders and the inlet section increases. Specifically, the cylinder closest to the inlet transfers to the fluid a thermal power that is 2 to 3 times that of the furthest cylinder. *Q*˙ *Tot* increases linearly with G*In* as depicted in Figure 7b. The distributions of T*Out* for T*Cyls* = 400, 420, 440, and 460 K

are summarized in Figure 7c. Indeed, the functional form of these distributions is roughly constant with respect to the parameter T*Cyls*, while the distributions themselves are displaced with an offset which varies between 1.3 and 1.7 K. Two convenient thresholds for both the mass-averaged outflow temperature and the total transferred heat flux have been selected, which are superimposed to Figure 7c,d. T*Out* has been supposed to be lower than 363 K since too high values would represent an inefficient cooling for the engine itself. Besides, since the output mechanical power of the six-cylinder engine ranges from 200 to 250 kW, the total transferred heat flux for each bank should range from 100 to 120 kW. The second condition is more stringent than the first and returns only some of the couples (T*Cyl*, G*In*) which are acceptable in the parameter space spanned with the simulations. The physically acceptable parameters are enlighten in Table 2.

**Figure 6.** Temperature patterns for T*Cyls* = 440 K for the single-inlet geometry. Temperature contour plots for T*Cyls* = 440 K and G*In* = 0.5 (**a**), 1.5 (**b**), and 2.5 (**c**) kg/s.

**Figure 7.** Outflow temperature and transferred heat flux for the single-inlet geometry. (**a**) Mass-averaged outflow temperature as a function of G*In* for *TCyls* = 440 K. (**b**) Cylinder specific and total transferred heat flux as a function of G*In* for *TCyls* = 440 K. (**c**) Mass-averaged outflow temperature as a function of G*In* for *TCyls* = 400, 420, 440, and 460 K. (**d**) Total transferred heat flux as a function of G*In* for *TCyls* = 400, 420, 440, and 460 K.

**Table 1.** Mass-averaged temperature, cylinder-specific and total transferred heat flux for T*Cyls* = 440 K and G*In* = 0.5, 1.0, 1.5, 2.0, and 2.5 kg/s obtained considering the single-inlet geometry.


**Table 2.** Mass-averaged outflow temperature and total transferred heat flux tabulated as a function of G*In* and T*Cyls* for the single-inlet geometry.


#### *3.2. A First Cooling Improvement: The Double-Inlet Geometry*

To overcome the uneven distribution of the cylinder-specific heat transfer with the single inlet geometry, a first simple geometrical modification is to double the intake manifold but with the same total inlet mass-flow rate G*In*, as reported in Figure 1b. Specifically, the second inlet section is placed close to cylinder 1 to promote heat transfer for this cylinder. The non-thermal analysis is firstly considered and shown in Figure 8. The internal flow streamlines depict an even distribution of the fluid between the three cylinders. Nonetheless, it must be stressed that the up-left region close to cylinder 1 results in the less-cooled region regardless of G*In*. This is qualitatively given in Figure 8a for G*In* = 0.5, 1.5, and 2.5 kg/s. The head losses computed separately for both inlets as a function of G*In* are comparable to the single inlet geometry case as shown in Figure 8c, where G*In* is the total mass flow rate through the two inlets.

**Figure 8.** Velocity streamlines and characteristic loads for the double-inlet geometry. (**a**) Velocity magnitude colored streamlines obtained for the double-inlet geometry for G*In* = 0.5, 1.5 and 2.5 kg/s. (**b**) Distribution of the pressure loss as a function of the inlet mass-flow rate.

In order to account for the thermal effects, with the same fashion as per the single inlet geometry, the mass-averaged outflow temperature, as well as the cylinder-specific and the total transferred heat, are computed as a function of both G*In* and T*Cyls*. These data are given in Table 3 for T*Cyls* = 440 K. Regardless of G*In*, the double inlet geometry returns outflow temperatures somewhat smaller than those of the single inlet geometry. Moreover, although an almost perfectly even distribution of transferred heat among the three cylinders, the total transferred thermal power is about 12-20% less than the previous geometry (see Table 1). This is because, even with a second inlet section, this geometry provides less cooling than the previous geometry. Indeed, it must be pointed out that the total intake mass-flow rate is the same as before but supplied by the two inlets. This reduces the amount of coolant fluid flowing into the system from each inlet, so that cylinder 5 jacket exchanges less heat with the less supplied water. As regards cylinder 3 and cylinder 1, the amount of water is about the same of cylinder 5 but not enough to efficiently cool down the overall system. For all of the investigated values of T*Cyls*, the total transferred heat and the outflow temperatures are given in Table 4 which reflects the same picture drawn for the single inlet geometry.

**Table 3.** Mass-averaged temperature, cylinder-specific and total transferred heat flux for T*Cyls* = 440 K and G*In* = 0.5, 1.0, 1.5, 2.0, and 2.5 kg/s when considering the double-inlet geometry.


**Table 4.** Mass-averaged outflow temperature and total transferred heat flux tabulated as a function of G*In* and T*Cyls* for the double inlet geometry.


#### *3.3. A New Single Inlet Geometry*

An interesting geometrical improvement for the cooling system is proposed in Figure 1c. This new geometry is shown for the left bank of the cooling system. Indeed, right and left banks slightly differ due to the quasi-symmetric position of the cylinder seats. However, these differences are negligible for the purposes of the present work. Specifically, the geometrical improvement is twofold. On one hand, the geometry considers some enlargements and restrictions within the bank (see the intra-cylinder jackets regions), and, on the other hand, the outlet manifold is moved from the inlet section. In this context, enlargements and restrictions are designed to guide the flow from the inlet through all of the three-cylinder jackets and then to the outflow section, while the outlet is placed between cylinder 4 and cylinder 2

to avoid any possible short-circuiting of the flow itself. The velocity magnitude colored streamlines of the simulations without thermal effects for the new geometry are depicted in Figure 9a for G*In* = 0.5, 1.5 and 2.5 kg/s. The figure shows that the fluid flow supplies, almost uniformly, the three-cylinder jackets and reaches the outlet section without being short-circuited. Indeed, the water flowing over the three cylinders results enhanced for large values of the intake mass-flow rate as shown by the increased number of streamlines sweeping over the jackets and the cylinder heads. Moreover, the increasing complexity of the flow path with respect to the previous single-inlet geometry, due to the provided enlargements and restrictions, reflects into an increase of Δ*p*, as shown in Figure 9b.

**Figure 9.** Velocity streamlines and characteristic loads for the new single-inlet geometry. (**a**) Velocity magnitude colored streamlines obtained for the new single-inlet geometry for G*In* = 0.5, 1.5 and 2.5 kg/s (**b**) Pressure loss as a function of the inlet mass-flow rate.

Thermal effects are also considered in order to analyze the cooling efficiency of this new geometry in terms of computed temperature field, mass-averaged outflow temperature and total transferred heat. As shown in Figure 10, the temperature field results quite homogeneous with respect to the old single-inlet

geometry. Specifically, for G*In* = 0.5 kg/s, around the farthest cylinder jacket from the inlet, i.e., cylinder 1, a temperature between 370 and 380 K is computed for the old geometry (see Figure 6a) whereas, for the new geometry the farthest cylinder jacket from the inlet, i.e., cylinder 2, provides a temperature between 360 and 370 K. This improvement is also quantitatively assessed by the outflow mass-average temperature in Table 5. Indeed, for the T*Cyls* = 440 K case, T*Out* is higher for the new geometry with respect to the old configuration. Moreover, an improvement of about 20% for the total transferred thermal power is observed for all the investigated G*In*. All of those data are summarized in Figure 11a,b for T*Out* and *Q*˙ , respectively. A direct comparison between the two single inlet geometries is obtained by considering T*Out* and *Q*˙ as a function of G*In* and T*Cyls* and depicted in Figure 11c,d. Regardless of the cylinder jacket wall temperatures, the outflow water temperature was higher with the new configurations due to the larger amount of thermal power (≈20%) transferred to the water. As per the old geometry, two thresholds are considered. The outflow temperature threshold is set to 363 K while the bandwidth for *Q*˙ corresponds to 110 kW ± 10 kW. The practical values in the spanned parameter space are enlightened in Table 6.

**Figure 10.** Temperature patterns for the new single-inlet geometry. Temperature contour plots obtained for T*Cyls* = 440 K and G*In* = 0.5 (**a**), 1.5 (**b**), and 2.5 (**c**) kg/s.

**Figure 11.** Outflow temperature and transferred heat flux for the new geometry. (**a**) Mass-averaged outflow temperature as a function of G*In* for *TCyls* = 440 K. (**b**) Cylinder specific and total transferred heat flux as a function of inlet mass-flow rate for *TCyls* = 440 K. (**c**) Comparison of the mass-averaged outflow temperature as a function of G*In* and *TCyls* between the old and the new single-inlet geometry. (**d**) Total transferred heat flux as a function of the inlet mass-flow rate obtained with the old and the new geometry.

**Table 5.** Mass-averaged temperature, cylinder-specific and total transferred heat flux for T*Cyls* = 440 K and G*In* = 0.5, 1.0, 1.5, 2.0, and 2.5 kg/s when considering the new geometry.


**Table 6.** Mass-averaged outflow temperature and total transferred heat flux tabulated as a function of G*In* and T*Cyls* for the new geometry.


#### **4. Conclusions**

In this work multi-dimensional CFD is employed to enhance the performance of the cooling system of an aircraft engine. Capabilities and limitations of different geometries are highlighted, in order to give guidelines as regards an efficient design of the engine. The analysis is carried out by means of a finite-volume RANS equations solver with *k*-closure.

An initial configuration of the cooling system without any thermal effects is firstly considered, with constant temperature, density and viscosity for the water. The pressure loss of the cooling system as a function of the inlet mass-flow rate is computed. Velocity and vorticity patterns are also given to point out features of the flow field. Then, thermal effects are taken into account and the cooling performance of such a configuration is assessed as a function of both the inlet mass-flow rate and the cylinder jackets temperatures. Specifically, the outflow temperature and both the total and the cylinder-specific transferred thermal power are computed. This geometry is demonstrated to be not very efficient because the cylinder-specific transferred heat decreases as the distance between the cylinders and the intake section increases.

Two geometrical improvements are then proposed: first, the inlet sections are doubled while keeping the same total mass-flow rate; second, a new single-inlet geometry is designed by inserting restrictions and enlargements within the cooling system to constrain the coolant flow and by moving downstream the outflow duct. The double-inlet geometry exchanges less thermal power than the initial single inlet geometry (≈12-20%) with an almost perfectly even cylinder-specific transferred heat distribution. The new single inlet geometry improves the overall abilities of the cooling system of about 20% (referring to the total transferred thermal power) while keeping the three cylinders only slightly unevenly cooled. Moreover, the new single-inlet geometry presents, as expected, an increase of the characteristic loads due to the restrictions and enlargements within the flow field. However, since this head loss represents only about 10% of the total head loss of the whole cooling circuit, the new single-inlet geometry may be a reasonable choice to optimize the efficiency of the cooling system.

**Author Contributions:** Data curation, A.V. and V.M.; funding acquisition, V.M.; investigation, A. C. and M.F.; methodology, A.C. and M.F.; project administration, A.V., P.C. and V.M.; writing, review and editing, A.V. and V.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** Project SIMPA "Sistemi Innovativi per Motori a Pistoni Aeronautici", Asse I, Priorità di Investimento 1.b, Azione 1.1.3 LDR granted by MISE "Ministero dello Sviluppo Economico" and MUR "Ministero dell'Università e della Ricerca".

**Acknowledgments:** This work is financially supported by MISE "Ministero dello Sviluppo Economico" and MUR "Ministero dell'Università e della Ricerca" under project SIMPA "Sistemi Innovativi per Motori a Pistoni Aeronautici", Asse I, Priorità di Investimento 1.b, Azione 1.1.3 LDR.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**



c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Fluids* Editorial Office E-mail: fluids@mdpi.com www.mdpi.com/journal/fluids

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18