**The 1st International Electronic Conference on Algorithms**

Editor

**Frank Werner**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Frank Werner Otto-von-Guericke University Magdeburg Germany

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Proceeding published online in the open access journal *Computer Sciences & Mathematics Forum* (ISSN 2813-0324) (available at: https://www.mdpi.com/ 2813-0324/2/1).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-3825-9 (Hbk) ISBN 978-3-0365-3826-6 (PDF)**

Cover image courtesy of Frank Werner

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**



### **About the Editor**

**Frank Werner**, apl. Prof. Dr. rer. nat. habil.

For more than four decades, I have mainly dealt with the exact and approximate solutions of different types of scheduling problems, which began around 1980 with the development of genetic algorithms for the flow shop scheduling problem. I also deal with complexity issues, scheduling problems under uncertainty, train scheduling problems, and various graph-theoretic problems. I was involved in several research projects supported by the German Research Society (DFG), European Community (INTAS), and Belarusian Republican Foundation of Fundamental Research. Since 2019, I have been Editor-in-Chief of the journal Algorithms. I am an associate editor of the International Journal of Production Research, the Journal of Scheduling and of Operations Research, and Decisions. I am also a member of the Editorial/Advisory Board of 14 further journals. I was a guest editor of Special Issues in seven international journals and a member of the program committee of more than 100 international conferences. I am also an author or two Mathematics textbooks, an author/editor of eight further books, as well as about 300 published journal papers.

### **Preface to "The 1st International Electronic Conference on Algorithms"**

This book is dedicated to the 1st Electronic Conference on Algorithms (IOCA 2021) which was held completely online from 27 September to 10 October 2021. We received 49 submissions, among which 32 works were finally accepted and posted for discussion at the conference. The conference subjects were split into eight sections:


After the conference, the authors of the 32 accepted presentations were invited to submit either an abstract with supplementary material or a proceedings paper of about 8 pages. This book contains 22 of these works, including 16 proceedings papers as well as 6 abstracts with supplementary material. These 22 works cover a broad range of topics in the field of developing algorithms.

The 16 more detailed proceeding papers considered, e.g., the maximum multi-commodity flow problem, the fastest transshipment in an evacuation problem, metaheuristic algorithms for the permutation flow shop problem, or structural monitoring problems, to name a few. Several papers deal with machine learning or particularly deep learning approaches. As an example, a hybrid deep learning approach for COVID-19 diagnosis, a deep learning model for Polysilicon MEMS (microelectromechanical systems) sensors, and deep learning methodologies for the diagnosis of respiratory disorders from chest X-ray images are presented.

The abstracts presented in this paper deal with image-based algorithms, approximation algorithms for the Traveling Salesman Problem and iterative schemes for solving nonlinear systems of equations. Other subjects are how to avoid temporal confounding in time series forecasting, the multi-commodity contraflow problem, and hopscotch methods for the heat conduction equation.

The authors of the 22 works contained in this book come from 17 countries: Vietnam, Mexico, Saudi Arabia, Spain, India, Germany, Nepal, Hungary, Italy, USA, Iran, Japan, France, Turkey, Costa Rica, Yemen, and Korea. So, the conference reached a wide audience from several continents.

As the Conference Chair, it is my pleasure to thank all authors for their interesting submissions and presentations in a broad spectrum of fields in the development of algorithms, all members of the Program Committee as well as all reviewers for their timely insightful reports. My special thanks go to the members of the conference secretariat for the pleasant cooperation before, during, and after the conference. I also hope to get many interesting submissions to the second edition of this conference in the future.

> **Frank Werner** *Editor*

### *Proceeding Paper* **A Bicriteria Model for Saving a Path Minimizing the Time Horizon of a Dynamic Contraflow †**

**Hari Nandan Nath 1,\*, Tanka Nath Dhamala <sup>2</sup> and Stephan Dempe <sup>3</sup>**


**Abstract:** The quickest contraflow in a single-source-single-sink network is a dynamic flow that minimizes the time horizon of a given flow value at the source to be sent to the sink allowing arc reversals. Because of the arc reversals, for a sufficiently large value of the flow, the residual capacity of all or most of the paths towards the source, from a given node, may be zero or reduced significantly. In some cases, e.g., for the movement of facilities to support an evacuation in an emergency, it is imperative to save a path from a given node towards the source. We formulate such a problem as a bicriteria optimization problem, in which one objective minimizes the length of the path to be saved from a specific node towards the source, and the other minimizes the quickest time of the flow from the source towards the sink, allowing arc reversals. We propose an algorithm based on the epsilon-constraint approach to find non-dominated solutions.

**Keywords:** quickest contraflow; saved path; network flow; bicriteria optimization; dynamic flow

### **1. Introduction**

Network flow modeling has been widely used for a variety of real-life applications. Because of their computational efficiency, they have been used to model problems involving significantly large networks. One of the important applications is evacuation planning, in which population in hazardous areas are shifted to safe areas using a complex urban road network. To use network flow modeling in such problems, a large urban road network is represented as a directed graph, and algorithms from graph theory and mathematical programming are used to identify the optimal traffic flow configuration. For a recent survey on evacuation planning problems, we refer to Dhamala et al. [1].

In evacuation planning problems, one of the important strategies is the contraflow approach, in which the appropriate direction of traffic is identified to optimize the flow, reversing the usual direction of traffic in the necessary road segments [2–4]. Recent research also focuses on location decisions along with flow decisions [5,6]. In contraflow planning, because of reversal of the direction of the traffic flow, the paths towards hazardous areas may be blocked. Sometimes, it is necessary to save a path towards such areas to transport necessary facilities. A path-saving strategy, with objectives to maximize the flow and minimize the length of the saved path is introduced in [7] as a bicriteria optimization model. In this paper, we extend the modeling to minimize the evacuation time and the length of the saved path. The paper is organized as follows. In Section 2, we give the basic ideas of network flow modeling. In Section 3, our main result, a bicriteria model to minimize the quickest time and the length of the saved path, is presented along with a solution algorithm. Section 4 concludes the paper.

**Citation:** Nath, H.N.; Dhamala, T.N.; Dempe, S. A Bicriteria Model for Saving a Path Minimizing the Time Horizon of a Dynamic Contraflow. *Comput. Sci. Math. Forum* **2022**, *2*, 2. https://doi.org/10.3390/IOCA 2021-10897

Academic Editor: Frank Werner

Published: 25 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **2. Basic Ideas**

A single-source-single-sink network *N* = (*V*, *A*,*s*, *t*, *u*, *τ*) is a directed graph with a set of nodes, *V*, and a set of arcs, *A* ⊆ *V* × *V*. *s* is a source node, *t* is a sink node (*s* = *t*), *<sup>u</sup>* : *<sup>A</sup>* <sup>→</sup> <sup>R</sup>≥<sup>0</sup> assigns capacity *uij*, and *<sup>τ</sup>* : *<sup>A</sup>* <sup>→</sup> <sup>R</sup>≥<sup>0</sup> assigns transit time *<sup>τ</sup>ij* to the arcs of the network. For each *i* ∈ *V*, we define:

$$V\_i^+ = \{ j \in V : (i, j) \in A \}\_{\prime \prime}$$

and

$$V\_i^- = \{ j \in \mathcal{V} : (j, i) \in \mathcal{A} \}.$$

*2.1. Static Flow and Dynamic Flow*

A static flow *<sup>x</sup>* : *<sup>A</sup>* <sup>→</sup> <sup>R</sup> satisfies

$$0 \le x\_{i\bar{j}} \le u\_{i\bar{j}\prime} \,\forall (i, j) \in A \tag{1}$$

and

$$\sum\_{j \in V\_i^+} \mathbf{x}\_{ij} - \sum\_{j \in V\_i^-} \mathbf{x}\_{ji} = \mathbf{0}, \forall i \in V \sim \{\mathbf{s}\_i, t\}. \tag{2}$$

The value of *x* is

$$w(\mathbf{x}) = \sum\_{j \in V\_s^+} \mathbf{x}\_{ij} - \sum\_{j \in V\_s^-} \mathbf{x}\_{ji} = \sum\_{j \in V\_t^-} \mathbf{x}\_{ji} - \sum\_{j \in V\_t^+} \mathbf{x}\_{ij}.\tag{3}$$

If constraints (2) are satisfied for all *i* ∈ *V*, then *x* is a circulation. If *P* is a directed *s*–*t* path, a chain flow *x<sup>P</sup>* is a static flow of value *δ* > 0 is defined by

$$\mathbf{x}\_{ij}^{P} = \begin{cases} \delta & (i, j) \in P \\ 0 & \text{otherwise} \end{cases} \tag{4}$$

If *C* is a directed cycle, a cycle flow *x<sup>C</sup>* of value *δ* > 0 is a circulation defined by

$$\mathbf{x}\_{ij}^{\mathbb{C}} = \begin{cases} \delta & (i, j) \in \mathbb{C} \\ 0 & \text{otherwise} \end{cases} \tag{5}$$

A static flow *x* can be decomposed into chain and cycle flows [8].

For a given time-horizon *T*, a dynamic flow *f* = *fij* (*i*,*j*)∈*<sup>A</sup>* consists of Lebesgue measureable functions *fij* : [0, *<sup>T</sup>*) <sup>→</sup> <sup>R</sup>≥<sup>0</sup> such that *fij*(*θ*) <sup>=</sup> 0 for *<sup>θ</sup>* <sup>≥</sup> *<sup>T</sup>* <sup>−</sup> *<sup>τ</sup>ij* and satisfies the following:

$$e\_f(\mathbf{i}, \theta) = \sum\_{j \in V\_i^-} f\_{jl}(\xi) d\xi - \sum\_{j \in V\_i^+} f\_{ij}(\xi) d\xi \ge 0 \,\forall \mathbf{i} \in V \small\sim \{s\}, \ \theta \in [0, T), \tag{6}$$

$$e\_f(\mathbf{i}, T) = 0 \; \forall \mathbf{i} \in V \; \lhd \; \{s, \; t\}, \tag{7}$$

$$0 \le f\_{i\bar{j}}(\theta) \le u\_{i\bar{j}} \,\forall (i, \, j) \in A, \,\,\theta \in [0, \,\, T). \tag{8}$$

The value of *f* is

$$
\sigma\_T(f) = \mathfrak{e}\_f(t, T). \tag{9}
$$

Given a static flow, *x*, and a time horizon, *T*, a dynamic flow (*f*) can be obtained by sending a flow of value equal to that of *<sup>x</sup><sup>P</sup>* along every path, *<sup>P</sup>*, repeating *<sup>T</sup>* − *<sup>τ</sup>*(*P*) times, in the chain-and-cycle decomposition of *x*. Such an *f* is called a temporally repeated dynamic flow with the value

$$T\,\boldsymbol{\upsilon}(\boldsymbol{x}) - \sum\_{(i,j)\in A} \boldsymbol{x}\_{i\bar{j}} \,\boldsymbol{\tau}\_{i\bar{j}} \tag{10}$$

For more details, see [9,10].

Given a time horizon, a dynamic flow with maximum value is called a *maximum dynamic flow*. Given a supply (*Q*) assigned to the source, a dynamic flow of value *Q* with a minimum time horizon is called a *quickest flow*.

According to [11], the static flow corresponding to the temporally repeated quickest flow can be found by solving a fractional programming problem with linear constraints.

**Theorem 1.** *(Lin and Jaillet [11]): The quickest flow problem can be formulated as the fractional programming problem:*

$$\min \quad \frac{Q + \sum\_{(i,j) \in A} \tau\_{ij} x\_{ij}}{v} \tag{11}$$

*subject to:*

$$\sum\_{j \in V\_i^+} x\_{ij} - \sum\_{j \in V\_i^-} x\_{ji} = \begin{cases} \quad \text{vfor } i = s \\ \quad -\text{vfor } i = t \\ \quad \text{0for } i \in V \sim \{s, \ t\} \end{cases} \tag{12}$$

$$0 \le x\_{ij} \le u\_{ij\prime} \; \forall (i, j) \in A \tag{13}$$

### *2.2. The Contraflow Problem*

The problem of identification of optimal static (dynamic) flow with ideal direction of the arcs reversing the necessary arcs, is a static (dynamic) contraflow problem. In finding an analytical solution of the contraflow problems, an important strategy is to construct, what is known as, the auxiliary network from the given network. Let *N* = (*V*, *A*, *s*, *t*, *u*, *τ*) be a network with *τij* = *τji* whenever (*i*, *j*), (*j*, *i*) ∈ *A*. The auxiliary network *N* = (*V*, *A* ,*s*, *t*, *u* , *τ* ) is the network where *A* consists of arcs (*i*, *j*) and (*j*, *i*) whenever (*i*, *j*) or (*j*, *i*) is in *A*. The capacity *u ij* = *uij* + *uji* with *uij* = 0 whenever (*i*, *j*) ∈/ *A*, and the travel time *τ ij* = *τij* if (*i*, *j*) ∈ *A*, and *τ ij* = *τji* if (*j*, *i*) ∈ *A*.

In a network *N* = (*V*, *A*,*s*, *t*, *u*, *τ*) with a given supply *Q* at the source, the quickest contraflow is the quickest flow allowing arc reversals at time zero. According to, the quickest contraflow problem can be solved by solving the quickest flow problem in the auxiliary network. Algorithm 1 solves the quickest flow problem.


**Input**: A network *N* = (*V*, *A*,*s*, *t*, *u*, *τ*) with a supply *Q* at *s* **Output**: Quickest flow allowing arc reversals at time zero


### **3. Minimizing the Quickest Time of the Dynamic Contraflow after Saving a Path**

With a contraflow configuration, the arcs are reversed so as to increase the capacity of arcs resulting in increase of the flow value towards the sink. This may result in the blockage of paths towards the source.

**Example 1.** *Consider a network shown in Figure 1a. The arc labels represent capacity, travel time. With the time horizon of T* = 10*, the static flow corresponding to the temporally repeated maximum dynamic contraflow (maximum dynamic flow with arc reversals) is shown in Figure 1b. Each arc label represents flow/capacity and time. The value of the static flow is 10 and that of the dynamic flow is* 10 × 10 − (6 × 1 + 4 × 4 + 1 × 1 + 5 × 1 + 5 × 3) = 57*. So the static flow in Figure 1b represents the static flow corresponding to the temporally repeated quickest flow with Q* = 57 *and the quickest time 10.*

**Figure 1.** (**a**) Given network with arc labels capacity as transit time. (**b**) Static flow corresponding to the temporally dynamic flow with *T* = 10. The arc labels represent flow/capacity as transit time.

In Figure 1b, we see that all the paths towards source *s* are blocked because of the arc reversals. If one saves a path from specific node *d* to *s*, the quickest time increases. In what follows, the length of directed path *P* is *τ*(*P*) = ∑ (*i*, *j*)∈*P τij*.

**Example 2.** *Consider the network given in Figure 1a. Given Q* = 57*, if we save path d* − *a* − *s, i.e., allow arc reversals in the arcs except those in the path d* − *a* − *s, then the quickest time increases to 12.57. The saved paths, their lengths and the corresponding quickest times are shown in Table 1.*

**Table 1.** Saved paths, their lengths and the corresponding quickest time of the dynamic contraflow with *Q* = 57.


In the above example, if we consider the quickest time only, the optimal path is *P*<sup>4</sup> : *d* − *t* − *b* − *s*. However, if we also consider the length of the path, the decisions may be different, e.g., if the path length cannot exceed 7, the optimal path would be *P*3. This motivates a bicriteria model with the objectives of minimizing the length of the saved path and minimizing the quickest time horizon of the dynamic contraflow. For the development of such a model, the following results are helpful.

**Theorem 2.** *If every cycle in N is of positive length, the static flow corresponding to the quickest flow in the solution of (11)–(13) does not have a positive flow in a cycle.*

**Proof.** Suppose that

$$T(x) = \frac{Q + \sum\_{(i,j)\in A} \tau\_{ij} x\_{ij}}{v(x)}$$

for static flow *x* in *N*. Let *x*∗ be a solution of (11)–(13) with *v*(*x*∗) = *v*∗, and assume that a flow decomposition of *x*∗ has a positive flow in cycles. Suppose that *C* is the set of arcs that form a cycle with a flow value *<sup>δ</sup>* <sup>&</sup>gt; 0. Define *<sup>x</sup>*1, *<sup>x</sup>*<sup>2</sup> : *<sup>A</sup>* <sup>→</sup> <sup>R</sup> by

$$\mathbf{x}\_{ij}^{1} = \left\{ \begin{array}{ll} \mathbf{x}\_{ij}^{\*} & (i,j) \in A \times \mathbb{C} \\\ x\_{ij}^{\*} - \delta \text{,} & (i,j) \in \mathbb{C} \end{array} \right.$$

and

$$x\_{ij}^2 = \begin{cases} \begin{array}{c} 0, \quad (i,j) \in A \times C \\ \delta, \quad (i,j) \in C \end{array} \end{cases}$$

Then *x*<sup>1</sup> and *x*<sup>2</sup> are feasible static flows in *N* and *x* = *x*<sup>1</sup> + *x*<sup>2</sup> such that *v*(*x*∗) = *v x*1 because a flow in a cycle does not contribute to the value of the static flow. So,

$$\begin{array}{lcl} T(\mathbf{x}^{\*}) &= \frac{Q + \sum\_{\begin{subarray}{c} (i,j) \in A \ \mathsf{T}\_{ij} \end{subarray}} \mathsf{T}\_{ij} \mathsf{x}\_{ij}^{\*}}{\mathrm{v}(\mathbf{x}^{\*})}\\ &= \frac{Q + \sum\_{\begin{subarray}{c} (i,j) \in A \ \mathsf{T}\_{ij} \end{subarray}} \mathsf{T}\_{ij} \mathsf{x}\_{ij}^{1} + \sum\_{\begin{subarray}{c} (i,j) \in A \ \mathsf{T}\_{ij} \end{subarray}} \mathsf{T}\_{ij} \mathsf{x}\_{ij}^{2}}{\mathrm{v}^{\*}}\\ &= \frac{Q + \sum\_{\begin{subarray}{c} (i,j) \in A \ \mathsf{T}\_{ij} \end{subarray}} \mathsf{T}\_{ij}^{\*} \mathsf{x}\_{ij}^{1}}{\mathrm{v}^{\*}} \\ &> \frac{Q + \sum\_{\begin{subarray}{c} (i,j) \in A \ \mathsf{T}\_{ij} \end{subarray}} \mathsf{T}\_{ij} \mathsf{x}\_{ij}^{1}}{\mathrm{v}(\mathbf{x}^{1})} = D \begin{pmatrix} \mathsf{x}^{1} \end{pmatrix}. \end{array}$$

This contradicts the optimality of *x*∗. -

If the transit time in each of the arcs of a network is positive, then every cycle in the network is of positive length and we have the following theorem.

**Theorem 3.** *If τij* > 0, ∀(*i*, *j*) ∈ *A, then a flow decomposition of an optimal solution x*<sup>∗</sup> *of the problem (11)–(13) does not contain a positive flow in a cycle.*

The theorem leads to the following:

**Theorem 4.** *Given a network N* = (*V*, *A*,*s*, *t*, *u*, *τ*) *with a supply Q at s, if (i)* (*j*, *i*) ∈ *A for each* (*i*, *j*) ∈ *A such that τij* = *τji* > 0*, then a solution of the linear programming problem*

$$\min \quad \frac{Q + \sum\_{(i,j) \in A} \tau\_{ij} x\_{ij}}{v} \tag{14}$$

*subject to:*

$$\sum\_{j \in V\_i^+} x\_{ij} - \sum\_{j \in V\_i^-} x\_{ji} = \begin{cases} \quad \text{ $ v$  for  $ i = s$ }\\ -\text{ $ v$  for  $ i = t$ }\\ \quad 0 \text{ for  $ i \in V\_i^-$  \{s, |t|\}} \end{cases} \tag{15}$$

$$0 \le \mathfrak{x}\_{\mathrm{ij}} \le \mathfrak{u}\_{\mathrm{ij}} + \mathfrak{u}\_{\mathrm{j}\mathrm{i}\prime} \,\forall (i, j) \in A,\tag{16}$$

*is also a solution of the quickest contraflow problem with* (*i*, *j*) *reversed if xji* > *uji*.

**Proof.** Let *x* be a solution of the problem (14)–(16). According to Algorithm 1, the quickest contraflow problem can be solved by solving the quickest flow problem in the auxiliary network so that *xij* is bounded by *uij* + *uji* for each (*i*, *j*) ∈ *A*. Further, Theorem 3, guarantees that there are no positive flows in the flow decomposition of *x* so that Step 3 of the algorithm can be skipped and the result follows. -

Based on the above results, we formulate a bicriteria model as follows. Given a network *N* = (*V*, *A*,*s*, *t*, *u*, *τ*) with a supply *Q* at *s* and a specific node *d* ∈ *V*. Let

$$\psi\_1 = \sum\_{(i,j)\in A} \tau\_{ij} y\_{ij} \tag{17}$$

$$\psi\_2 = \frac{Q + \sum\_{(i,j)\in A} \tau\_{ij} x\_{ij}}{v} \tag{18}$$

The problem is

$$\min(\psi\_1, \psi\_2) \tag{19}$$

subject to

$$\sum\_{j \in V\_i^+} y\_{ij} - \sum\_{j \in V\_i^-} y\_{ji} = \begin{cases} -1 & \text{if } i = s \\ 0 & \text{if } i \in V \\ 1 & \text{if } i = d \end{cases} \sim \{s, d\} \tag{20}$$

$$
\forall j\_{i\bar{j}} \le u\_{i\bar{j}\prime} \; \forall (i, \; j) \in A \tag{21}
$$

$$\sum\_{j \in V\_i^+} x\_{ij} - \sum\_{j \in V\_i^-} x\_{ji} = \begin{cases} \begin{array}{c} \upsilon & \textit{if } i = s \\ 0 & \textit{if } i \in V \\ -\upsilon & \textit{if } i = t \end{array} \sim \{s, t\} \tag{22}$$

$$0 \le x\_{i\bar{j}} \le (1 - y\_{i\bar{j}})u\_{i\bar{j}} + (1 - y\_{j\bar{i}})u\_{j\bar{i}\nu} \,\,\forall (i, j) \in A \tag{23}$$

$$y\_{ij} \in \{0, 1\} \tag{24}$$

Constraints (20), (24) construct a *d*-*s* path. Constraints (21) ensure that such a path does not contain a path zero-capacity path. Constraints (22) and (23) send a static flow of value *v* from *s* to *t* allowing arc reversals in the network except the arcs in the path constructed by (20), (21), (24). The objective (19) minimizes the length of the path and the quickest time of the dynamic flow formed by the temporal repetition of the static flow *x*.

We use the idea given by Ehrgott [13] to get the weakly Pareto optimal (weakly efficient) solutions. This is done by minimizing *<sup>ψ</sup>*<sup>2</sup> and adding a constraint *<sup>ψ</sup>*<sup>1</sup> <sup>≤</sup> , <sup>∈</sup> <sup>R</sup>. Since *ψ*<sup>2</sup> is not linear, we put 1/*v* = *ω*, *xij*/*v* = *ξij* to make it linear. As a result, constraints (23) become non-linear. We put <sup>1</sup> − *yij ω* = *ζij* and use the idea given by Torres [14] to obtain the following mixed-integer linear program.

$$\min \psi\_2 = Q\omega + \sum\_{(i,j)\in A} \tau\_{ij}\zeta\_{ij} \tag{25}$$

Subject to

$$\sum\_{j \in V\_i^+} y\_{ij} - \sum\_{j \in V\_i^-} y\_{ji} = \begin{cases} -1 \text{ if } i = s \\\ 0 \text{ if } i \in V \sim \{s, d\} \\\ 1 \text{ if } i = d \end{cases} \tag{26}$$

$$y\_{ij} \le u\_{ij}, \forall (i, j) \in A \tag{27}$$

$$\sum\_{j \in V\_i^+} \xi\_{ij} - \sum\_{j \in V\_i^-} \xi\_{ji} = \begin{cases} 1 \text{ if } i = s \\ 0 \text{ if } i \in V \\ -1 \text{ if } i = t \end{cases} \tag{28}$$

$$0 \le \zeta\_{ij} \le \zeta\_{ij} u\_{ij} + \zeta\_{ji} u\_{j i \prime} \,\,\forall (i, j) \in \mathcal{A} \tag{29}$$

$$0 \le \mathbb{Z}\_{i\bar{j}} \le \omega\_\prime \forall (i, j) \in A \tag{30}$$

$$
\omega - y\_{ij} \le \mathbb{Z}\_{ij} \le 1 - y\_{ij\prime} \; \forall (i, j) \in A \tag{31}
$$

$$\sum\_{(i,j)\in A} \tau\_{ij} y\_{ij} \le \epsilon \tag{32}$$

$$y\_{ij} \in \{0, 1\} \tag{33}$$

(*i*,*j*)∈*A*

As *<sup>ψ</sup>*<sup>1</sup> is the length of a path, when transit time *<sup>τ</sup>ij* <sup>∈</sup> <sup>Z</sup>>0, <sup>∀</sup>(*i*, *<sup>j</sup>*) <sup>∈</sup> *<sup>A</sup>*, we construct Algorithm 2, which finds the set of non-dominated paths corresponding to all the nondominated points of the feasible set in the objective space.

#### **Algorithm 2.** Non-dominated paths with the quickest contraflow.

**Input**: *N* = (*V*, *A*, *u*, *τ*, *s*, *t*) with a depot node *d*, and supply *Q* at the source *s* with *uij* ∈ <sup>Z</sup>≥0, *<sup>τ</sup>ij* <sup>∈</sup> <sup>Z</sup>>0∀(*i*, *<sup>j</sup>*) <sup>∈</sup> *<sup>A</sup>*

**Output**: A set of non-dominated saved paths with the quickest contraflow

1. <sup>0</sup> = length of the shortest *d*–*s* path, *L* = ∅, *ψ*<sup>0</sup> <sup>2</sup> = ∞, <sup>1</sup> = 1 + ∑ (*i*,*j*)∈*A τij*, *k* = 1.


For each <sup>∈</sup> <sup>R</sup>, a solution of (25)–(33) is a weakly-efficient solution of (19)–(24) according to [13]. If the transit time on arcs is allowed to take only integral values, can also be taken as a positive integer ranging from the length of the shortest path to ∑ *τij*.

Because of the Step 2d in Algorithm 2, we have the following result.

**Theorem 5.** *Algorithm 2 gives a set of non-dominated paths corresponding to all the non-dominated points of feasible set in the objective space of (19)–(24).*

### **4. Conclusions**

To optimize the flow during emergency evacuation, sometimes it is pertinent to save a path towards the hazardous area for the transportation of some facilities. We have developed a bicriteria model that minimizes the length of the saved path and the quickest time of evacuees allowing reversal of the direction of the evacuee flow in appropriate road segments. We present a solution algorithm based on -constraint approach of finding efficient solution of a multicriteria optimization problem.

**Author Contributions:** Conceptualization, H.N.N., S.D. and T.N.D.; formal analysis, H.N.N.; writing, H.N.N., supervision, S.D. and T.N.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors have no conflict of interest.

### **References**


### *Proceeding Paper* **Unscented Kalman Filter Empowered by Bayesian Model Evidence for System Identification in Structural Dynamics †**

**Luca Rosafalco 1,2,\*, Saeed Eftekhar Azam 2, Andrea Manzoni 3, Alberto Corigliano <sup>1</sup> and Stefano Mariani <sup>1</sup>**


**Abstract:** System identification is often limited to parameter identification, while model uncertainties are disregarded or accounted for by a fictitious process noise. However, modelling assumptions may have a large impact on system identification. For this reason, we propose to use an unscented Kalman filter (UKF) empowered by online Bayesian model evidence computation for the sake of system identification and model selection. This approach employs more than one model to track the state of the system and associates with each model a plausibility measure, updated whenever new measurements are available. The filter outcomes obtained for different models are then compared and a quantitative confidence value is associated with each of them. Only the system identification outcomes related to the model with the highest plausibility are considered. While the coupling of extended Kalman filters (EKFs) and Bayesian model evidence was already addressed, we modify the approach to exploit the most striking features of the UKF, namely, the ease of implementation and higher-order accuracy in the description of the evolution of the state mean and variance. A challenging identification problem related to structural dynamics is discussed to show the effectiveness of the proposed methodology.

**Keywords:** system identification; unscented Kalman filter; model evidence calculation; model class selection; structural dynamics

### **1. Introduction**

Kalman filters (KFs) are well-known tools for system identification. They work by applying a predictor phase, in which a suitable model is needed to predict the evolution of a dynamic system, and a correction phase, in which corrections to the prediction are applied by recursively processing system measurements [1].

In civil and mechanical engineering, different model classes, consisting of different parametrizations of the structure to be identified, can be formulated. They are built upon different levels of complexity in the description of the system mechanics and uncertainty in the formulation of the modelling assumptions. Emphasis is usually placed on improving the quality of the parameter estimate, especially whenever nonlinear dynamic systems are handled. With this goal, KF extensions such as the extended Kalman filter (EKF) or the unscented Kalman filter (UKF) have been introduced. On the contrary, model uncertainties are often disregarded or accounted for by a fictitious process noise. In this work, we propose a way to tackle this aspect by calculating a quantitative estimate, referred to as model evidence, measuring how much the model employed by the KF is plausible with respect to other possible parametrizations. While a similar estimate was discussed in [2]

**Citation:** Rosafalco, L.; Eftekhar Azam, S.; Manzoni, A.; Corigliano, A.; Mariani, S. Unscented Kalman Filter Empowered by Bayesian Model Evidence for System Identification in Structural Dynamics. *Comput. Sci. Math. Forum* **2022**, *2*, 3. https:// doi.org/10.3390/IOCA2021-10896

Academic Editor: Frank Werner

Published: 25 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

for the EKF, here, we develop a model evidence formula suited for the UKF to exploit its ease of implementation and higher-order accuracy in the description of the evolution of the state mean and variance.

The remainder of the contribution is organized as follows: In Section 2, first, the governing equations of a mechanical elasto-dynamic system are discussed; second, the related algorithm showing the application of the UKF for parameter estimation is reported and finally, the equations allowing for recursive model evidence calculation are presented. In Section 3, a case study featuring a shear building excited by real ground acceleration is discussed, showing how parameter identification outcomes are affected by different structural parametrizations and how model evidence can be used for the sake of model selection a posteriori. Conclusions are finally discussed in Section 4.

### **2. Methodology**

### *2.1. Elasto-Dynamic Problem*

We focus on situations where the system dynamics is described by the finite element (FE) discretised version of a general elasto-dynamic problem. At time *tk*+1, it reads:

$$\mathbf{M}\ddot{\mathbf{q}}\_{k+1} + \mathbf{C}\dot{\mathbf{q}}\_{k+1} + \mathbf{K}\mathbf{q}\_{k+1} = \mathbf{f}\_{k+1}, \quad k = 0, \ldots, n\_{t-1} \tag{1}$$

where **M**, **C** and **K** are the mass, damping and stiffness matrices, respectively; **q**, **q**˙ , **q**¨ ∈ <sup>R</sup>*n*×<sup>1</sup> are the nodal displacements, velocities and accelerations, respectively; and **<sup>f</sup>***k*+<sup>1</sup> <sup>∈</sup> R*n*×<sup>1</sup> is the external force vector, assumed to be known.

Equation (1) is integrated in time by using the *α*-method [3], ruled by the parameters *αm*, *α<sup>f</sup>* and *β*. At each time step, the displacement field **q***k*+<sup>1</sup> is obtained by solving

$$\mathbf{K}\_{k+1}^\* \mathbf{q}\_{k+1} = \mathbf{f}\_{k+1}^\* (\mathbf{q}\_{k'} \dot{\mathbf{q}}\_{k'} \ddot{\mathbf{q}}\_k) . \tag{2}$$

The modified matrix **K**∗ and the right hand side vector **f**∗ *<sup>k</sup>*+<sup>1</sup> are computed by

$$\mathbf{K}\_{k+1}^{\*} = \frac{1 - \alpha\_m}{\beta \Delta t^2} \mathbf{M} + \frac{\gamma \left(1 - a\_f\right)}{\beta \Delta t} \mathbf{C} + \left(1 - a\_f\right) \mathbf{K}\_r \tag{3}$$

$$\begin{split} \mathbf{f}\_{k+1}^{\*} (\mathbf{q}\_{k}, \dot{\mathbf{q}}\_{k}, \dot{\mathbf{q}}\_{k}) &= \mathbf{f}\_{k+1-\mathfrak{a}\_{f}} + \left( \frac{1-\mathfrak{a}\_{m}}{\beta \Delta t^{2}} (\mathbf{q}\_{k} + \Delta t \dot{\mathbf{q}}\_{k}) + \frac{1+\mathfrak{a}\_{m} - 2\beta}{2\beta} \ddot{\mathbf{q}}\_{k} \right) \mathbf{M} + \\ \left( \frac{\gamma \left( 1-\mathfrak{a}\_{f} \right)}{\beta \Delta t} \mathbf{q}\_{k} - \frac{\beta - \gamma \left( 1-\mathfrak{a}\_{f} \right)}{\beta} \dot{\mathbf{q}}\_{k} - \left( 1 - \frac{\gamma}{2\beta} \right) \left( 1 - \mathfrak{a}\_{f} \right) \Delta t \ddot{\mathbf{q}}\_{k} \right) \mathbf{C} - \mathfrak{a}\_{f} \mathbf{K}, \end{split} \tag{4}$$

where <sup>Δ</sup>*<sup>t</sup>* = *tk*<sup>+</sup><sup>1</sup> − *tk*, *tk*<sup>+</sup>1−*α<sup>f</sup>* = 1 − *α<sup>f</sup> tk*<sup>+</sup><sup>1</sup> + *<sup>α</sup><sup>f</sup> tk*, **<sup>f</sup>***k*+1−*α<sup>f</sup>* = **<sup>f</sup>** *tk*<sup>+</sup>1−*α<sup>f</sup>* .

Moreover, the mechanical system is assumed to be only partially observed. Accordingly, a Boolean matrix **<sup>H</sup>** <sup>∈</sup> <sup>R</sup>*no*×3*<sup>n</sup>* establishes the connection between the *no* observed quantities ˆ**y***k*+<sup>1</sup> <sup>∈</sup> <sup>R</sup>*no* and the kinematic fields, as follows:

$$\mathbf{\dot{y}}\_{k+1} = \mathbf{H} [\mathbf{q}\_{k+1} \mathbf{\dot{q}}\_{k+1} \mathbf{\dot{q}}\_{k+1}]^T. \tag{5}$$

### *2.2. Unscented Kalman Filter for Parameter Estimation*

In this study, the ultimate goal of filtering is to estimate the unknown parameters *<sup>θ</sup>* <sup>∈</sup> <sup>R</sup>*np*×<sup>1</sup> ruling the mechanical response of the structure to be identified, where typically **C** = **C**(*θ*) and **K** = **K**(*θ*). In [1,4], Kalman filtering techniques were successfully applied, even in the presence of nonlinearities due to damage evolution in the observed system, by solving a dual estimation problem and thereby adopting as state variables the model displacements and the unknown parameters governing the response of the mechanical domain. However, treating FE solutions characterized by a large number *n* of degrees of freedom (DOF) may result in an excessive computational burden when dealing with

dual estimation. A possible solution consists of obtaining a reduced order model (ROM) representation of the mechanical domain and adopting as state variables, instead of the nodal kinematics, the ROM DOF [5,6]. This strategy has been explored in [7]. Here, we consider only *θ* as a state variable to avoid the computational burden connected to the combined use of UKF and DOF tracking when large FE models are addressed, despite the enhanced performance usually guaranteed by state tracking [1,8]. The following state-space representation is used:

$$
\theta\_{k+1} = \theta\_k + \mathbf{w}\_k \tag{6a}
$$

$$\mathbf{y}\_{k+1} = \mathbf{\hat{y}}\_{k+1} + \mathbf{v}\_{k+1} \tag{6b}$$

where the *<sup>θ</sup>* is driven by a random walk ruled by **<sup>w</sup>***<sup>k</sup>* <sup>∈</sup> <sup>R</sup>*np*×1, modelled as a white process noise **w***<sup>k</sup>* ∼ N (**0**, **Q**), and the FE predicted output **y**ˆ *<sup>k</sup>*+<sup>1</sup> is related to the actual response of the structure by adding a measurement noise **<sup>v</sup>***k*+<sup>1</sup> <sup>∈</sup> <sup>R</sup>*ny*×1, modelled as white **v***k*+<sup>1</sup> ∼ N (**0**, **R**). The matrices **Q** and **R** are symmetric and positive defined. The time variation of *θ*, introduced by the random walk formulation, is fictitious.

KFs attempt to propagate the mean and the covariance of the state variable vector through the state-space and the measurement update equations. Instead of propagating the probability density functions associated with the state variables, it is indeed preferable to draw a set of samples, deterministically propagate them and finally compute the mean and covariance of the state vector; this is especially profitable when the state-space and/or the measurement update equations are nonlinear. The UKF is based on this idea. The propagated vector collects a set of so-called sigma points (SPs) *ϑ<sup>i</sup> <sup>k</sup>*, *i* = 1, ... ,(2*n<sup>θ</sup>* + 1), distributed such that the mean and covariance of these points match those of the state variables. A scaled version of the UKF is used by setting the parameters *αSP*, *κSP* and *βSP*, as detailed in [9], to avoid sampling nonlocal effects that would spoil the state variable mean and covariance reconstruction [10]. In the predictor phase, this vector is propagated from the *k*-th to the *k*+1-th time step through the state-space equations. In the corrector phase, the estimated output covariance **P**ˆ *yy <sup>k</sup>*+1|*<sup>k</sup>* and the estimated cross covariance **<sup>P</sup>**<sup>ˆ</sup> *<sup>θ</sup><sup>y</sup> k*+1|*k* are used to compute the Kalman gain **G***k*+<sup>1</sup> needed to correct the propagated mean *θ*ˆ *k*+1|*k* and covariance **P**ˆ *θθ <sup>k</sup>*+1|*<sup>k</sup>* on the basis of the collected measurements **<sup>y</sup>***k*+1. The full expression of these quantities and the application of the UKF are detailed in Algorithm 1, adapted from [11].

### *2.3. Model Evidence Computation for Unscented Kalman Filter*

System identification is usually limited to select a particular parametric model M of the underlying structural system, estimating the corresponding unknown parameters *θ*. However, the use of either excessively simplified or too complex models may have a detrimental effect on the possibility to track the system state: oversimplified models may underestimate the effect of a physical process taking place; on the other hand, complex models may lead to good data fitting but possibly yield to poor predictions. In the latter case, the model overfits the incoming data. In [2], an online model class selection strategy was proposed in the framework of an EKF's parameter estimates. Here, a similar approach was adopted for a simultaneous parametric estimate and model class selection exploiting the UKF. Adopting a number *nm* of possible model classes, the model evidence (or plausibility) consisting of the probability *<sup>p</sup>*(M*<sup>m</sup> <sup>k</sup>*+1) ∈ (0, 1) was computed for each model class M*m*, with *m* = 1, ... , *nm* at each time step *tk*+1. The sum of the *nm* model evidences is equal to the unity. To derive the expression of *<sup>p</sup>*(M*<sup>m</sup> <sup>k</sup>*+1), first, the Bayes theorem was used, giving

$$p\left(\mathcal{M}\_{k+1}^{m}\right) = \frac{p\left(\mathbf{y}\_{k+1}|\mathcal{M}\_{k}^{m}\right)p\left(\mathcal{M}\_{k}^{m}\right)}{\sum\limits\_{l=1}^{w}p\left(\mathbf{y}\_{k+1}|\mathcal{M}\_{k}^{l}\right)p\left(\mathcal{M}\_{k}^{l}\right)},\tag{7}$$

where *p* **<sup>y</sup>***k*+1|M*<sup>m</sup> k* , called conditional evidence, represents the contribution of the measurement at *tk*<sup>+</sup><sup>1</sup> to the plausibility of the *m*-th model class.

Second, we extended the procedure explained in [2] from the EKF to the UKF. As a result, at the end of the corrector phase (after Step 15 of Algorithm 1), the following expression for the conditional evidence applied:

$$p\left(\mathbf{y}\_{k+1}|\boldsymbol{\theta},\boldsymbol{\lambda}^{\boldsymbol{m}}\_{k}\right) \approx (2\pi)^{-\frac{\mathsf{q}\_{k}}{2}} \left[\det\left(\mathbf{P}^{\boldsymbol{\theta}\boldsymbol{\theta}}\_{k+1|k+1}\left(\mathbf{P}^{\boldsymbol{\theta}\boldsymbol{\theta}}\_{k+1|k}\right)^{-1}\right)\right]^{\frac{1}{2}} \left[\det\left(\mathbf{P}^{\boldsymbol{\theta}\boldsymbol{y}}\_{k+1|k}\right)^{-1}\right]^{\frac{1}{2}}$$

$$\times \exp\left[-\frac{1}{2}\left(\boldsymbol{\theta}\_{k+1|k+1}-\boldsymbol{\theta}\_{k+1|k}\right)^{T}\left(\mathbf{P}^{\boldsymbol{\theta}\boldsymbol{\theta}}\_{k+1|k}\right)^{-1}\left(\boldsymbol{\theta}\_{k+1|k+1}-\boldsymbol{\theta}\_{k+1|k}\right)\right.\tag{8}$$

$$-\frac{1}{2}\left(\mathbf{y}\_{1:k+1}-\dot{\mathbf{y}}\_{1:k+1|k}\right)^{T}\left(\mathbf{P}^{\boldsymbol{y}\boldsymbol{y}}\_{k+1|k}\right)^{-1}\left(\mathbf{y}\_{1:k+1}-\dot{\mathbf{y}}\_{1:k+1|k}\right)\right],$$

where det(·) calculates the determinant of the input matrix. The reported expression approximates *p* **<sup>y</sup>***k*+1|*θ*ˆ,M*<sup>m</sup> k* due to the use of Laplace's asymptotic expansion [2].


### Predictor phase


### Corrector phase


**3. Results and Discussion**

16: end for

As a numerical case study, we studied how to determine the interstorey stiffness and damping of the two DOF shear building models (*n* = 2) reported in Figure 1. The mechanical properties of the building were adimensionalised to ease the UKF tuning by setting the matrices in Equation (1) equal to

$$\mathbf{M} = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}, \qquad \mathbf{C} = \begin{bmatrix} 0.2 & -0.1 \\ -0.1 & 0.1 \end{bmatrix}, \qquad \mathbf{K} = \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix}.$$

The building was excited by the ground acceleration **a**<sup>0</sup> reported in Figure 2, lasting 60 s. The response of the building was monitored by recording the floor acceleration **y** = [*y*1, *y*2] *T* with a sampling frequency of 50 Hz for a total of *nt* = 3000 samples. A white noise, featuring a standard deviation of 5 × <sup>10</sup><sup>−</sup>3, was added to *<sup>y</sup>*<sup>1</sup> and to *<sup>y</sup>*<sup>2</sup> to mimic the signal perturbation affecting micro-electro-mechanical accelerometers [12].

**Figure 1.** Two DOF shear model. Acceleration monitoring.

**Figure 2.** Ground acceleration.

The acceleration recordings coming from this reference building were used as measurements in the corrector phase of the filtering procedure (Steps 9 and 10 of Algorithm 1). Three model classes, M1, M<sup>2</sup> and M3, featuring different structural parametrizations, were considered, as shown in the following:

$$\begin{array}{ccc} \mathbf{C}^1 = \begin{bmatrix} 0.12 & -0.06\\ -0.06 & 0.06 \end{bmatrix}, & \mathbf{K}^1 \begin{pmatrix} \theta\_1^1\\ \theta\_1^1 \end{pmatrix} = \theta\_1^1 \begin{bmatrix} 2 & -1\\ -1 & 1 \end{bmatrix} \\\\ \mathbf{C}^2 \left(\theta\_2^2\right) = \theta\_2^2 \begin{bmatrix} 0.2 & -0.1\\ -0.1 & 0.1 \end{bmatrix}, & \mathbf{K}^2 \begin{pmatrix} \theta\_1^2\\ \theta\_1^2 \end{pmatrix} = \theta\_1^2 \begin{bmatrix} 2 & -1\\ -1 & 1 \end{bmatrix} \\\\ \mathbf{C}^3 \left(\theta\_3^3\right) = \theta\_3^3 \begin{bmatrix} 0.2 & -0.1\\ -0.1 & 0.1 \end{bmatrix}, & \mathbf{K}^3 \begin{pmatrix} \theta\_1^3 \,\theta\_2^3\\ \theta\_1^3 \,\theta\_2^3\end{bmatrix} = \begin{bmatrix} \theta\_1^3 \theta\_2^3 & -\theta\_2^3\\ -\theta\_2^3 & \theta\_2^3 \end{bmatrix} \end{array}$$

.

Model class M<sup>1</sup> is governed by the parameter *<sup>θ</sup>*<sup>1</sup> <sup>1</sup> ruling the interstorey stiffness of both floors (for this reason, *θ*<sup>1</sup> <sup>1</sup> is factored out from **<sup>K</sup>**1); <sup>M</sup><sup>2</sup> is governed by *<sup>θ</sup>*<sup>2</sup> <sup>=</sup> *θ*2 <sup>1</sup>, *<sup>θ</sup>*<sup>2</sup> 2 *T* ruling, respectively, the interstorey stiffness and damping of both floors; M<sup>3</sup> is governed by *θ*<sup>3</sup> = *θ*3 <sup>1</sup>, *<sup>θ</sup>*<sup>3</sup> <sup>2</sup>, *<sup>θ</sup>*<sup>3</sup> 3 *T* , where *θ*<sup>3</sup> <sup>1</sup> and *<sup>θ</sup>*<sup>3</sup> <sup>2</sup> rule the first and second floor interstorey stiffness and *θ*<sup>3</sup> <sup>3</sup> rules the damping associated with both floors. Comparing these parametrizations with the reference model, it is clear that M<sup>1</sup> is underparametrizing the mechanical system, not associating any parameter with the damping properties of the structures and suffering

a model bias, being **<sup>C</sup>**<sup>1</sup> = 0.6 **<sup>C</sup>**; M<sup>3</sup> is overparametrizing the stiffness matrix and M<sup>2</sup> is performing a correct parametrization of the structural response, and it is therefore expected to allow for the best estimate of the system mechanical properties. For all model classes, the initial guesses of the relevant parameters underestimated by 40% of the parameter values ruling the reference structure.

KF tuning is usually problem-dependent and is performed through a trial-and-error procedure. In this case, we have set the SP scaling parameters to *αSP* = 10−3, *κSP* = 0 and *<sup>β</sup>SP* <sup>=</sup> 2; the measurement noise covariance to **<sup>R</sup>** <sup>=</sup> <sup>4</sup> <sup>×</sup> <sup>10</sup>−<sup>4</sup> **<sup>I</sup>**2, where **<sup>I</sup>**<sup>2</sup> <sup>∈</sup> <sup>R</sup>2×<sup>2</sup> is the identity matrix; the process noise covariance to **<sup>Q</sup>** <sup>=</sup> <sup>10</sup>−<sup>8</sup> **<sup>I</sup>***np* , with **<sup>I</sup>***np* <sup>∈</sup> <sup>R</sup>*np*×*np* ; and the initial parameter covariance to **P**ˆ *θθ* <sup>0</sup> = 0.25 **I***np* . The value of *np* depends on the number of parameters employed by each model (*np* = 1 for M1, *np* = 2 for M<sup>2</sup> and *np* = 3 for M3).

In Figure 3, the predicted output of M1, computed according to Step 8 of Algorithm 1, is reported against the floor acceleration measurements, showing the filter capacity of tracking the shear building accelerations despite the presence of noise. A small discrepancy between the reference model and the predicted output is observable only magnifying the curves. The predicted outputs of M<sup>2</sup> and M3, not reported for lack of space, exhibit an even smaller discrepancies.

**Figure 3.** <sup>M</sup><sup>1</sup> predicted outputs (dot dashed blue line) are reported against the noise-corrupted reference model recordings (orange line). The left figure refers to the first floor and the right figure to the second floor. Black lines depict the reference model acceleration when not corrupted by noise.

The filter capacity of tracking the system output was expected to greatly help parameter identification. In Figures 4–6, the time evolution of the parameters employed by M1, M<sup>2</sup> and M<sup>3</sup> are reported, respectively. Black colour is used for parameters involved in the expression of the structural stiffness; orange colour when related to the structural damping. The plots report both the parameter posterior estimates and the confidence intervals of these estimates. Looking at the confidence intervals, stiffness-related parameters seem to assume negative values during the first part of the analyses. This is due to to the initial choice of **P**ˆ *θθ* <sup>0</sup> = 0.25 **I***np* . However, positive values have been always associated with the interstorey stiffness due to the use of the scaled version of the UKF. Similar reasoning applies to damping-related parameters.

**Figure 4.** Model class <sup>M</sup>1, time evolution of *<sup>θ</sup>*<sup>1</sup> <sup>1</sup>. The thicker dotted line reports the posterior estimate; the thinner dotted lines the 99% confidence interval of the estimate, determined using the posterior covariance. The continuous line reports the parameter value assumed by the reference model.

**Figure 5.** Model class <sup>M</sup>2, time evolution of *<sup>θ</sup>*2. The thicker dotted line reports the posterior estimate; the thinner dotted line the 99% confidence interval of the estimate, determined using the posterior covariance. The continuous line reports the parameter values assumed by the reference model.

**Figure 6.** Model class <sup>M</sup>3, time evolution of *<sup>θ</sup>*3. The thicker continuous line reports the posterior estimates; the thinner dotted lines the 99% confidence interval of the estimate, determined using the posterior covariance. The continuous line reports the parameter values assumed by the reference model.

Looking at Figure 4, the UKF was unable to provide a correct estimate for *θ*<sup>1</sup> <sup>1</sup>, despite the uncertainty reduction linked to the narrowing of the confidence interval. Even the stiffness-related parameters *θ*<sup>3</sup> <sup>1</sup> and *<sup>θ</sup>*<sup>3</sup> <sup>2</sup> of M3, depicted in Figure 6, seem not able to converge to the desired value. On the contrary, coming to M<sup>2</sup> , *<sup>θ</sup>*<sup>2</sup> <sup>1</sup> was correctly identified with small uncertainty, as shown in Figure 5. These results were somehow expected due to the underparametrization of the mechanical system operated by M<sup>1</sup> and the overparametrization of the mechanical system exhibited by M3, while M<sup>2</sup> embodied the correct description of the reference model.

Model class M<sup>3</sup> was unable to provide any idea of the damping properties, ending up pushing *θ*<sup>3</sup> <sup>3</sup> to 0. Model class M<sup>2</sup> provided a better estimate, still quite poor, overestimating by 40% the damping related parameter *θ*<sup>2</sup> <sup>2</sup>. These difficulties were due to the relevance of damping in the identification of continuously excited structures, discussed in [4].

From the results reported above, M<sup>2</sup> seems to lead to the best system identification; however, we reached this conclusion by knowing the mechanical properties of the reference system. It would have been very hard, if not impossible, to judge model plausibility simply looking at the predicted outputs. Indeed, as shown in Figure 3, the UKF has been able to reproduce the monitoring system outcome even when M<sup>1</sup> is employed. For this reason, model evidence computation, whose outcome is reported in Figure 7, is extremely relevant to understand which model can be trusted the most.

**Figure 7.** Model evidence evolution of each model.

At the beginning of the identification procedure, equal plausibility was associated with the three models. Their values were recursively updated as soon as new measurements became available using Equations (7) and (8). During the first part of the analysis, M<sup>1</sup> appeared to be the most plausible model class. This is in agreement with intuition: M<sup>1</sup> is the easiest to tune, employing just one parameter, and the bias in the modelling of damping has a marginal relevance when *t* < 20 s due to the strong ground motion undergone by the structure. In a second stage, M<sup>3</sup> resulted to be the most plausible model class. This was due to the good estimate of both the stiffness-related parameters and the dampingrelated parameters in the central part of the analysis. Finally, the overcomplexity of M<sup>3</sup> led to a deterioration of the parameter identification, while the good convergence of the stiffness-related parameters and the reasonable damping estimate promoted M<sup>2</sup> as the most plausible model class.

This numerical example shows that model evidence evaluation can be successfully used for model selection. The reader should note that, due to the recursive nature of Equation (7), a certain time delay occurred between the improved identification capacity of the filter equipped with a certain model and the increase in plausibility of this model.

### **4. Conclusions**

In this work, we have discussed an algorithm for simultaneous parameter estimation and model evidence calculation in dynamic linear elastic problems. Starting from the work of [2], a recursive expression for model evidence evaluation was derived for when the unscented Kalman filter is used. Numerical results show that model evidence can guide system identification in the presence of model uncertainties by associating a plausibility measure with different employed models featuring possible parametrizations of mechanical domains. Indeed, model evidence can be successfully used to select the most plausible structure parametrization as parameter identification is carried out.

**Author Contributions:** Conceptualization, L.R., S.E.A., A.C. and S.M.; methodology, L.R., S.E.A., A.C. and S.M.; software, L.R. and S.E.A.; validation, L.R., S.E.A., A.M., A.C. and S.M.; formal analysis, L.R., S.E.A. and S.M.; investigation, L.R. and S.E.A.; resources, S.E.A.; data curation, L.R. and S.E.A.; writing—original draft preparation, L.R.; writing—review and editing, S.E.A., A.M., A.C. and S.M.; visualization, L.R.; supervision, S.E.A. and A.C.; project administration, S.E.A. and A.C.; funding acquisition, S.E.A. and A.C. All authors have read and agreed to the published version of the manuscript.

**Acknowledgments:** The authors are indebted to Rodrigo Astroza, Universidad de los Andes (Chile), for the valuable discussions on the topic of this contribution.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **A Novel Strategy for Tall Building Optimization via the Combination of AGA and Machine Learning Methods †**

**Mohammad Sadegh Es-haghi <sup>1</sup> and Mohammad Sarcheshmehpour 2,\***


**Abstract:** The optimum design of tall buildings, which have a proportionately huge quantity of structural elements and a variety of design code constraints, is a very computationally expensive process. In this paper, a novel strategy, with a combination of evolutionary algorithms and machine learning methods, is developed for achieving the optimal design of tall buildings. The most timeconsuming part is the analysis of tall buildings and the control of design code constraints requiring long and frequent analyses. The main idea is to use machine learning methods for this purpose. In this study, a practical methodology for obtaining the optimal design of tall building structures, regarding the constraints imposed by typical building codes, is introduced. The optimization process will be performed by a novel evolutionary algorithm, named asymmetric genetic algorithm (AGA), and in each iteration that requires checking the constraints for a large number of different structural states, machine learning methods, including MLP, GMDH and ANFIS-PSO are facilitators. More specifically, MLP (R2 = 0.988) has performed better than GMDH (R2 = 0.961) and ANFIS-PSO (R<sup>2</sup> = 0.953). By coupling ETABS and MATLAB software, various combinations of sections for structural elements are assigned and analyzed automatically, thus creating a database for training neural networks. The applicability of the suggested procedure is described through the determination of the optimal seismic design for a 40-story framed tube building. Results designate that the present method not only supports the precision of the methodology but also remarkably diminishes the computational time and memory needed in comparison with the existing classical methods. More importantly, the optimization process time is also significantly decreased.

**Keywords:** practical structural optimization; seismic design; steel high-rise buildings; machine learning; group method of data handling; multilayer perceptron; hybrid ANFIS–PSO; artificial neural network

### **1. Introduction**

In recent years, the demand for the optimal candidate of tall building structures has grown significantly due to financial issues. On the other hand, the dependable design of such structures brings many difficulties for an engineer because of the significant number of structural members and also the strict design constraints imposed by codes. This makes the conventional design provided by engineers not necessarily economical. This highlights the significance of optimization tools in the design process of these structures to save on construction costs [1–4]. Many of the studies in the field of tubular structures deal with the modeling of a tall building as a huge cantilever box beam [5–7]. The recent advances in high-performance computers made possible the precise analysis of the whole frame of the high-rise building during the optimization process. Chan et al. [8] introduced an iterative procedure based on drift, strength, and fabrication constraints. The effects of

**Citation:** Es-haghi, M.S.; Sarcheshmehpour, M. A Novel Strategy for Tall Building Optimization via the Combination of AGA and Machine Learning Methods. *Comput. Sci. Math. Forum* **2022**, *2*, 4. https://doi.org/10.3390/ IOCA2021-10882

Academic Editor: Stefano Mariani

Published: 20 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

various parameters on the tube action of a reinforced concrete 55-story hotel building were investigated by Shin et al. [9]. Some researchers proposed techniques for the minimization of the weight of high-rise buildings subject to wind loads [10,11]. Aldwaik and Adeli [12] conducted a review of the optimization of high-rise buildings with either tubular or other structural systems.

The above-mentioned studies mostly consider fixed patterns of loads; however, another line of thought deals exclusively with the seismic loads which lead to more cumbersome behavior in the structure. In this respect, the design codes prescribe additional strict limitations on the design of structures subject to seismic loads. During the past decades, many researchers focused on the seismic assessment of the structures [13,14]. More specifically, many studies incorporated seismic considerations into optimization problems [15–17]. Moghaddam and Hajirasouliha [18] introduced an optimization technique to reach the uniform deformation of members in two-dimensional (2D) tall shear buildings subject to seismic excitation. Furthermore, Ganjavi et al. [19] investigated the best distribution of seismic lateral loads to achieve uniform damage distribution in 2D shear buildings considering the soil–structure interaction (SSI). Recently, many researchers employed optimization methods to reach the desired seismic performance objectives at various seismic hazard levels in 2D low-rise and mid-rise steel frames [20,21] and also in 2D reinforced concrete frames [22]. Recently, with the aid of gradient-based optimization algorithms, Sarcheshmehpour et al. proposed practical methodologies for optimal seismic design of steel-framed tube tall buildings based on conventional building codes [23], as well as life cycle costs [24].

Notwithstanding ample research on the optimization problems of tall buildings, using soft computing methods in optimal seismic design of tall buildings is scarce in the literature. In the current research, a practical methodology with logical computational demand to achieve the most beneficial possible design within the constructional aspects, by the combination of machine learning methods and evolutionary algorithms, was proposed. First, the optimization problem considering all constraints is described. Then, by establishing the connection between MATLAB and ETABS software, a huge database, which was used for training ANNs, is created. The methods of MLP, GMDH, and ANFIS-PSO were investigated and the best one was selected for evaluating the constraints in the optimization process, which was based on the AGA algorithm. Finally, the result for a sample 40-story building was presented. The structural analysis procedure for creating the database is convoyed based on the Iranian National Building Code (INBC), which is almost identical to the ANSI/AISC 360-10 LRFD design guide [25].

### **2. Formulation of the Optimization Problem**

In this section, the general formulation for seismic design optimization of high-rise buildings is presented. The structural design is performed according to the conventional load and resistance factor design (LRFD) approach:

Design for serviceability: Based on the Iranian Code of Practice for Seismic Resistant Design of Buildings (Standard No. 2800), the inter-story drift ratio (Δ*i*) of different stories of the buildings more than five stories high, the following constraint shall be satisfied under design seismic forces:

$$C\_d \Delta\_i \le 0.02,\tag{1}$$

In which *Cd* indicates the amplification factor accounting for the expected inelastic response.

1. Design for Strength

According to the building code, the demand–capacity ratio defined in Equation (2) shall be equal to or less than one for all load combinations, i.e.,

$$\frac{R\_{\text{ll}}}{\text{\#}\,R\_{\text{ll}}} \le 1,\tag{2}$$

where *Ru* represents the required strength under all LRFD load combinations and *φ Rn* indicates the design strength of each structural element.

2. Strong-column/weak-beam (SC/WB): For the design of Special Moment Frames (SMFs), the moment ratio shall satisfy the following constraint at each beam-to-column connection:

$$\frac{\sum \mathcal{M}\_{pb}^\*}{\sum \mathcal{M}\_{pc}^\*} < 1,\tag{3}$$

where ∑ *M*∗ *pb* represents the total flexural strength of all beams attached to the connection and ∑ *M*∗ *pc* indicates the total flexural strength of the columns with a reduction for the axial force.

3. Practical limitations: from a practical perspective, the dimensions of columns in each story shall not be less than those in the upper stories. This constraint can be formulated as:

$$d\_{j,i}^{\mathbb{C}al} \ge d\_{j+1,i}^{\mathbb{C}al}, \qquad b\_{j,i}^{\mathbb{C}ol} \ge b\_{j+1,i}^{\mathbb{C}al}, \qquad j = 1,2,\cdots, \text{ NS}-1; \ i = 1,2,\cdots \text{ } \text{NC}, \tag{4}$$

In Equation (4), *dCol <sup>j</sup>*,*<sup>i</sup>* and *<sup>b</sup>Col <sup>j</sup>*,*<sup>i</sup>* represent the depth and the width of the section of the *i*th column in the *j*th story, respectively. Furthermore, *NC* denotes the number of columns in each story and *NS* is the total number of stories.

In the current design optimization problem, the total weight of all beams and columns in the 3D steel tall building indicates the objective function and all the above-mentioned inequalities behave as the optimization constraints. In addition, the section properties of the structural elements are considered as the design variables. The resulting nonlinear constrained optimization problem is attacked by two basic approaches, in the current study. The first one is the metaheuristic optimization method, named AGA. The second one is using machine learning techniques for determining nonlinear inequality constraints instead of time-consuming analytical approaches. For the sake of convenience, the proposed procedure of the optimal seismic design is illustrated in Figure 1. A full description of the parameters available in Figure 1 can be found in [23].

**Figure 1.** The iterative procedure for the optimal structural seismic design.

### **3. Structural Model**

In this study, a 40-story framed tube building is considered as the case study to demonstrate the applicability of the proposed strategy. The 3D view and the typical floor plan of this building are shown in Figure 2. As seen, in both directions, the plan consists of nine bays, each with a length of 3 m. The gravitational columns (GC) and the corner columns (CC) of the perimeter tube have box sections. The sections of the rest of the columns (P1C and P2C) are I-shaped sections. As shown in Figure 2, there are two types of non-corner perimeter columns: P1C columns which are only connected to the perimeter beams from both sides, and P2C columns which are also connected to the pin-ended gravitational beams (GB). The spandrel beams (PB) are fixed-ended and have a length of 3 m. Moreover, both types of gravitational beams (GB and IGB) are pin-ended with a length of 9 m. As seen in Figure 2, IGB beams connect the gravitational columns together and GB beams connect the gravitational columns to the perimeter tube.

**Figure 2.** The 3D and plan view of the 40-story framed tube building.

The case study is a residential building in which the first four stories are considered as parking lots. The gravitational beams (GB or IGB) are divided into two groups based on their position in either residential floors or parking lots. For practical design purposes, all spandrel beams and all columns are grouped every four stories. This leads to a considerable reduction in the number of design variables. It is worth mentioning that the index *i* in Figure 2 represents the group number of each structural section.

The building is considered to be located in Tehran with a very high level of seismicity. Furthermore, the soil beneath the building is consistent with Soil Type 2 of Iranian seismic code (in a depth of 30 m, the average shear wave velocity is between 375 m/s and 750 m/s). A fixed base is assumed at the ground level and all supports are fixed in the structural model.

By coupling ETABS [26] and MATLAB software, 7800 combinations of sections for structural elements are assigned and analyzed automatically, thus creating a database for training neural networks.

### *Section Decision Variables*

In this research, the sections properties indicating the decision variables are considered to be continuous. The number of design variables reduces appreciably by relating all dimensions of the section to its depth through rational equations. In this study, linear equations relating the dimensions of the sections to their depths are determined according to Euro-standard sections. For more details refer to [23,24].

The above-mentioned relations for I-shape sections pertinent to the non-corner columns of the perimeter tube and beams are presented in Equations (5) and (6), respectively.

$$\begin{array}{l}b\_f = d\_\prime\\t\_f = 0.055d + 0.35\\t\_w = 0.015d + 0.6\end{array} \tag{5}$$

$$\begin{array}{l} b\_f = 0.35d + 3.3, \\ t\_f = 0.026d + 0.33 \\ t\_{\mathcal{W}} = 0.016d + 0.25 \end{array} \tag{6}$$

In Equations (5) and (6), *d* is the section depth, *bf* denotes the flange width, *tf* is the flange thickness, and *tw* represents the web thickness.

As mentioned before, all the corner and gravitational columns have box shape sections. The equation relating the thickness (*t*) of the box section to its depth (*d*) is given as:

$$t = 0.06d \tag{7}$$

### **4. Machine Learning Techniques for Constraint Evaluation**

Recently, machine learning has become very popular in engineering applications [27,28]. In this study, three non-linear machine learning models, namely multiplayer perceptron (MLP) [29], group method of data handling (GMDH) [30], and combining adaptive-networkbased fuzzy inference system and particle swarm optimization (ANFIS–PSO) [31] were employed to estimate structural constraints. These methods are examined and the best one is selected in the optimization process as the function of constraint evaluation. For training ANNs, 7800 models were created. For instance, the effect of dataset size for estimation of one of the constraints, design ratio of CC1, by MLP method, is shown in Figure 3. As shown in this figure, the number of 6000 is sufficient for the dataset size. Therefore, creating the 7800 model is appropriate. By trying several different models for neural networks, their final structures are presented in Table 1.

**Figure 3.** The effect of dataset size on the performance of machine learning (MLP) method.


**Table 1.** The parameters of Machine Learning methods.

In Figure 4, the performances of the three selected ML tools are compared. It can be seen that MLP is more accurate than GMDH and ANFIS-PSO in calculating the constraints and catching the structural response. The output provided by MLP is shown to be much less scattered than the others, and the linear interpolation of all the pairs of results is well aligned with the (perfect fit) bisector of the quadrant. Therefore, for the optimization process, the MLP method is selected.

**Figure 4.** Performances of the ML algorithms: parity plots showing the ML output against the corresponding targets for: (**a**) MLP regarding all data; (**b**) GMDH regarding all data; (**c**) ANFIS-PSO regarding all data; (**d**) linear regression line for ML al algorithms.

### **5. Results**

In this part, the details of the optimal designs and the seismic behavior of the 40-story framed tube are presented. For the optimization process, the algorithms of AGA, which are depicted in [2], are used. The AGA algorithm has some differences from the GA, the most important of which are in the constraints evaluation strategy. In AGA, initial population members are sorted by goal function, and, despite GA, just the constraints of the best member are evaluated. If the best member satisfies the constraints, AGA does not evaluate other members' constraints. If not, the penalty function is applied to the best one, and then the population is sorted again, and the new best solution is evaluated by the constraints again. This procedure proceeds to achieve the lowest weight member that is satisfied by whole constraints. After reaching this goal, AGA goes to the next level, as is shown in the related part in Figure 1. As a result, evaluation of constraints is not carried out for whole members of the population in AGA. See [2] for more information. The hyperparameters of AGA were selected by trying different values of the number of generations, the population size, the crossover probability, and the mutation probability, and the amounts of them are 200, 50, 60, and 5, respectively.

The optimization of the 40-story building consists of 54 decision variables. The sections' depths of the optimal designs associated with the 40-story building with both systems are given in Table 2. The inter-story drift ratios of different stories, SC/WB ratios, and demand–capacity ratios relevant to the optimal design of the 40-story building, as well as the corresponding code limits, are depicted in Figure 5.

**Figure 5.** Demand–capacity ratio, SC/WB ratio, and drift ratio for optimal candidate in 40-story building.


**Table 2.** The depth of columns sections in the optimal design of the building in (mm).

### **6. Conclusions**

In the current work, the optimal seismic design of high-rise buildings, which is a large-scale optimization problem, a time-consuming process requiring huge computational demands, was investigated. These problems are cast into the context of optimization with the combination of evolutionary algorithms and machine learning methods. AGA, as a novel evolutionary algorithm, was employed for the optimization process. The algorithm converged to optimal design, whose specifications were presented, with an initial population of 50 after 200 iterations. For constraint evaluation, three machine learning methods including MLP, GMDH, and ANFIS-PSO were investigated and the best one, MLP, with a coefficient of determination of 0.988, was selected. Therefore, the strategy mentioned in this paper can be used to achieve the minimum weight of the tall buildings along with meeting all practical and design code constraints. This strategy donates a methodical procedure for the reasonable comparison of different tall building designs.

**Author Contributions:** Conceptualization, M.S.E.-h. and M.S.; methodology, M.S.E.-h. and M.S.; software, M.S.E.-h. and M.S.; validation, M.S.E.-h. and M.S.; formal analysis, M.S.E.-h. and M.S.; investigation, M.S.E.-h. and M.S.; resources, M.S.E.-h. and M.S.; data curation, M.S.E.-h. and M.S.; writing—original draft preparation, M.S.E.-h. and M.S.; writing—review and editing, M.S.E.-h. and M.S.; visualization, M.S.E.-h. and M.S.; supervision, M.S.E.-h. and M.S.; project administration, M.S.E.-h. and M.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data, models and codes that support the findings of this study are available from the corresponding author upon request.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **Maximum Multi-Commodity Flow with Proportional and Flow-Dependent Capacity Sharing †**

**Durga Prasad Khanal 1, Urmila Pyakurel 2,\*, Tanka Nath Dhamala <sup>2</sup> and Stephan Dempe <sup>3</sup>**

<sup>1</sup> Saraswati Multiple Campus, Tribhuvan University, Kathmandu 44618, Nepal; durgapsdkhanal@gmail.com <sup>2</sup> Central Department of Mathematics, Tribhuvan University, Kathmandu 44618, Nepal;

tanka.nath.dhamala@gmail.com


**Abstract:** Multi-commodity flow problems concerned with the transshipment of more than one commodity from respective sources to the corresponding sinks without violating the capacity constraints on the arcs. If the objective of the problem is to send the maximum amount of flow within a given time horizon, then it becomes the maximum flow problem. In multi-commodity flow problems, the flow of different commodities departing from their sources arriving at the common intermediate node have to share the capacity through the arc. The sharing of the capacity in the common arc (bundle arc) is one of the major issues in the multi-commodity flow problems. In this paper, we introduce the maximum static and maximum dynamic multi-commodity flow problems with proportional capacity sharing and present polynomial time algorithms to solve the problems. Similarly, we investigate the maximum dynamic multi-commodity flow problems with flow-dependent capacity sharing and present a pseudo-polynomial time solution strategy.

**Keywords:** multi-commodity; maximum flow; proportional capacity sharing; flow-dependent capacity sharing

### **1. Introduction**

A topological structure with links and crossings, known as arcs and nodes, respectively, is a network in which entities are transshipped from one point to another. The initial and the final points are termed as source and sink nodes, respectively. In a multi-terminal network, the transshipment of more than one commodity from the respective sources to the corresponding sinks satisfying the capacity constraints on the arcs is a multi-commodity flow (MCF) problem. Supply chain networks, message routine in telecommunication, and transportation networks are some examples of multi-commodity network topology.

Ford and Fulkerson [1] introduced the concept of the static multi-commodity flow problem, and thereafter many researchers have contributed to the different aspects of the multi-commodity flow problems [2–5]. If the demand and supply of each commodity is to be maximized in the given time horizon, then the problem becomes a maximum dynamic multi-commodity flow problem. The static multi-commodity flow problem is polynomial time solvable by using the ellipsoid or interior point method, whereas the dynamic multicommodity flow problem is NP-hard [6]. Kappmeier [7] provided the solution to the maximum dynamic multi-commodity flow problem using a time-expanded network in a pseudo-polynomial time complexity. Pyakurel et al. [8] presented a polynomial time algorithm for the maximum static flow problem and pseudo-polynomial algorithms for the earliest arrival transshipment and maximum dynamic flow problems with partial contraflow. A priority based multi-commodity flow problem can be found in Khanal

**Citation:** Khanal, D.P.; Pyakurel, U.; Dhamala, T.N.; Dempe, S. Maximum Multi-Commodity Flow with Proportional and Flow-Dependent Capacity Sharing. *Comput. Sci. Math. Forum* **2022**, *2*, 5. https://doi.org/ 10.3390/IOCA2021-10904

Academic Editor: Frank Werner

Published: 26 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

et al. [9]. Using the concept of intermediate storage introduced by Pyakurel and Dempe [10], Khanal et al. [11] presented a polynomial time algorithm for the maximum static—and a pseudo-polynomial time algorithm for the maximum dynamic—multi-commodity flow problems with intermediate storage.

The sharing of the bundle arc capacity is one of the major issues in the multi-commodity flow problems. For each commodity, if the sharing of the capacity of the bundle arc is set in proportion to the bottleneck capacity of path from their respective sources to the tail node of the bundle arc, then it is known as proportional capacity sharing. In this case, the shared capacity of the bundle arc for each commodity is fixed and the multi-commodity flow problem is reduced to an independent single commodity flow problem. To avoid the fractional flow, we can use ceiling and floor functions with an appropriate manner. Similarly, if the sharing of the capacity of the bundle arc is made according to the inflow rate of the flow of each commodity, then it is termed as flow-dependent capacity sharing. In this method, the shared capacity of the bundle arc may not always be the same as the flow on the arc may vary over the time. We investigate these two sharing techniques hereafter in Sections 2.1 and 2.2.

In this paper, we introduce the maximum multi-commodity flow problem using proportional as well as flow-dependent capacity sharing on the bundle arcs. We present the polynomial time algorithms for the static as well as the dynamic multi-commodity flow problems, using proportional capacity sharing in Section 3. Similarly, in Section 4 a pseudo-polynomial time algorithm for the dynamic multi-commodity flow problem with flow-dependent capacity sharing is presented. The paper is concluded in Section 5.

### **2. Basic Terminologies**

Consider a network topology *G* = (*N*, *A*, *K*, *u*, *τ*, *di*, *S*, *D*, *T*) with commodity *i* ∈ *K* = {1, 2, . . . , *k*}, set of nodes *N* and set of arcs *A*. Here, *di* represents the demand/supply of each commodity *i* ∈ *K* which is routed through a unique source–sink pair *si-ti*, where *si* ∈ *S* ⊆ *N* and *ti* ∈ *D* ⊆ *N*. Each arc *e = (v, w)* ∈ *A* with *head(e) = w* and *tail(e) = v* is equipped with a capacity function *u:A* → R<sup>+</sup> that restricts the flow of the commodity and a non-negative transit time function *<sup>τ</sup>* : *<sup>A</sup>* → R<sup>+</sup> that measures the time to transship the flow from node *<sup>v</sup>* to node *<sup>w</sup>*. Let <sup>→</sup> *<sup>δ</sup>*(*v*) and <sup>←</sup> *δ*(*v*) be the set of outgoing arcs from node *v* and the incoming arcs to node *v*, respectively. We denote *Pi* as the set of all paths of the commodity *i* such that *P* ∈ *Pi* is a *si-ti* path and *P*[*si,v*] ∈ *Pi* represents the path from *si* to the intermediate node *v*. The time horizon is denoted by *T* = {0, 1, ... , *T*} in discrete time settings and *T* = [0, *T* + 1) in continuous time settings. In case of static flow, the time parameters T and τ are absent.

### *2.1. Proportional Capacity Sharing*

The multi-commodity flow problem differs from the single commodity flow problem due to the bundle constraints and the unique source–sink flow for each commodity. Our assumption is that the nature of flows inside the same commodity group are homogeneous and between the commodity groups are heterogeneous yet uniform in the occupancy rate of the arc capacity. To share the capacity of the bundle arc, we propose a proportional capacity sharing technique depending on the minimum of the arc capacity of paths *P*[*si,v*], (that is, bottleneck capacity of path *P*[*si,v*]) for each commodity *i* from their respective sources *si* to the tail *v* of bundle arc *e* = *(v, w)* as follows: Let *ue* be the capacity of a bundle arc *e*, then proportional sharing of capacity *ue* for each commodity *i* ∈ *K* is,

$$\mu\_{\varepsilon}^{\dot{i}} = \frac{\mu\_{a}^{\dot{i}}}{\sum\_{a \in \mathcal{P}\_{[s\dot{i},v]}:\hat{i} \in \mathcal{K}} \mu\_{a}} \mu\_{\varepsilon} \tag{1}$$

where *P*[*si,v*] is the path from *si* to the tail *v* of bundle arc *e*, for all *i* ∈ *K* and *a* is an arc in *P*[*si,v*] with minimum capacity. Here, *u<sup>i</sup> <sup>e</sup>* represents the portion of the capacity of the arc *e* allocated for the commodity *i.* Clearly, the sum of the shared capacities over each commodity is equal to the original arc capacity, i.e., <sup>∑</sup>*<sup>i</sup>* <sup>∈</sup> *<sup>K</sup> <sup>u</sup><sup>i</sup> <sup>e</sup>* = *ue.*

The shared capacity may be in fraction, i.e., *u<sup>i</sup> <sup>e</sup>* = *int*(*u<sup>i</sup> <sup>e</sup>*) *+ fra*(*u<sup>i</sup> <sup>e</sup>*), the sum of the integral part and the fractional part, respectively. The fractional capacities can be converted into the integral capacities as follows:


It is to be noted that if *u<sup>i</sup> <sup>e</sup>* < 1 and has no alternative path for commodity *i*, then it may block the transshipment of the flow. In such a case, the fractional capacity is to be accepted.

### *2.2. Flow-Dependent Capacity Sharing*

In the proportional capacity sharing technique the shared capacity of each commodity remains fixed at each time step *θ*. In this subsection, we present the flow-dependent capacity sharing technique, where the share of the capacity for each commodity depends on the inflow rate of the flow *f* in the predecessor arcs. At any instance of time *θ*, if a bundle arc *e = (v, w)* with the capacity *ue* holds more than one commodity *i* ∈ *K*, then the flow-dependent capacity sharing of *ue* for each commodity *i* ∈ *K* is,

$$\mu\_{\mathfrak{e}}^{i}(\theta) = \frac{f\_{\mathfrak{a}}^{i}(\theta - \mathfrak{r}\_{a})}{\sum\_{\mathfrak{a} \in \mathfrak{a}(\mathfrak{e}) : i \in \mathbb{K}} f\_{\mathfrak{a}}^{i}(\theta - \mathfrak{r}\_{a})} \mu\_{\mathfrak{e}} \tag{2}$$

where α(*e*) is the set of the predecessor arcs of bundle arc *e* so that *a* α(*e*) ⇒ *head*(*a*) = *tail*(*e*) and *u<sup>i</sup> <sup>e</sup>*(*θ*) is the portion of the capacity of arc e for the commodity *i* at time *θ*. For each time *θ*, the sum of the portion of the shared capacities *u<sup>i</sup> <sup>e</sup>*(*θ*) over all the commodities *i* <sup>∈</sup> *<sup>K</sup>* is equal to the original arc capacity, i.e., <sup>∑</sup>*<sup>i</sup>* <sup>∈</sup> *<sup>K</sup> <sup>u</sup><sup>i</sup> <sup>e</sup>*(*θ*) *= ue*. If the shared capacities are in fraction, we can convert them into integer values as described in Section 2.1.

### **3. Maximum MCF with Proportional Capacity Sharing**

### *3.1. Maximum Static Multi-Commodity Flow*

In the static network G = (*N*, *A*, *K*, *u*, *di*, *S*, *D*) the multi-commodity flow *ϕ* with proportional capacity sharing is the sum of the non-negative flows *<sup>ϕ</sup><sup>i</sup>* : *<sup>A</sup>* → R<sup>+</sup> for each i with demand di satisfying the proportional capacity sharing Equation (1) together with the conditions (3) and (4).

$$\sum\_{\substack{\varepsilon \to \delta \\ \varepsilon \in \delta \left(v\right)}} \varrho\_{\varepsilon}^{i} - \sum\_{\substack{\varepsilon \leftarrow \delta \\ \varepsilon \in \delta \left(v\right)}} \varrho\_{\varepsilon}^{i} = \begin{cases} \displaystyle d\_{i} & \text{for } v = s\_{i} \\ -d\_{i} & \text{for } v = t\_{i} \\ 0 & \text{otherwise} \end{cases} \qquad \forall \ v \in N, \ i \in K \tag{3}$$

$$0 \le \mathfrak{p}\_{\varepsilon}^{l} \le \mathfrak{u}\_{\varepsilon}^{l} \; \forall \; e \in A, \; i \in K \tag{4}$$

The constraints in (3) represent the supply/demand at the source/sink nodes and the flow conservation constraints at the intermediate nodes, whereas the constraints in (4) represent the boundedness of the flow on the arcs by their capacities. By taking the sum over each commodity in the later equation, we get the bundle constraints 0 <sup>≤</sup> <sup>∑</sup>*i*∈*<sup>K</sup> <sup>ϕ</sup><sup>i</sup> e* ≤ <sup>∑</sup>*i*∈*<sup>K</sup> <sup>u</sup><sup>i</sup> <sup>e</sup>* = *ue* for all *e* ∈ *A*. For a maximum static multi-commodity flow problem with proportional capacity sharing the objective is to maximize the total flow value <sup>∑</sup>*i*∈*<sup>K</sup> di* = |*ϕ*| subject to the constraints (1), (3) and (4).

We now introduce the maximum static multi-commodity flow problem with proportional capacity sharing as follows:

**Problem 1.** *For the given static multi-commodity network G = (N, A, K, u, di, S, D) the maximum static multi-commodity flow problem with proportional capacity sharing is to transship the maximum flow from si to ti, where the shared capacity for each commodity i* ∈ *K on the bundle arc is depending on the minimum capacity of paths from the respective source to the tail node of the bundle arc.*

To solve the problem, we first reduce the multi-commodity flow problem into k independent single commodity flow problems by sharing the capacity of the bundle arc using Equation (1). For each commodity i maximum static flow *ϕ<sup>i</sup>* is obtained and the sum of the flows for the commodities is the maximum static flow value |*ϕ*|. We now present the algorithm to solve Problem 1.

**Theorem 1.** *Algorithm 1 solves the maximum static MCF problem correctly in polynomial time complexity.*



Output: Maximum static MCF on *G* with proportional capacity sharing.

### *3.2. Maximum Dynamic Multi-Commodity Flow*

For a given dynamic network G with constant transit times *τ* on arc e, the MCF over time function f with proportional capacity sharing is the sum of the flows *<sup>f</sup> <sup>i</sup>* : *<sup>A</sup>* × *<sup>T</sup>* → R+, satisfying the proportional capacity sharing Equation (1) together with the constraints (5) and (7).

$$\sum\_{\substack{\sigma \in \tilde{\mathcal{S}}(v) \\ \sigma \in \tilde{\mathcal{S}}(v)}} \sum\_{\theta=0}^{T} f\_{\varepsilon}^{i}(\theta) - \sum\_{\substack{\epsilon \in \mathcal{C}(v) \\ \epsilon \in \mathcal{S}(v)}} \sum\_{\theta=0}^{T} f\_{\varepsilon}^{i}(\theta) = \begin{cases} \quad d\_{i} & \text{for } v = s\_{i} \\ \quad -d\_{i} & \text{for } v = t\_{i} \\ 0 & \text{otherwise} \end{cases} \qquad \forall \ v \in N, \ i \in K \tag{5}$$

$$\sum\_{\substack{\sigma \in \tilde{\delta}(v) \\ \sigma \in \tilde{\delta}(v)}} \sum\_{\theta=0}^{\beta} f\_{\ell}^{i}(\theta) - \sum\_{\substack{\epsilon \in \tilde{\delta}(v) \\ \sigma \in \tilde{\delta}(v)}} \sum\_{\theta=0}^{\beta} f\_{\ell}^{i}(\theta) \, \leq 0 \qquad \forall \ v \notin \{s\_{i}, t\_{i}\}, \ i \in \mathbb{K}, \ \beta \in T \tag{6}$$

$$0 \le f\_{\varepsilon}^{i}(\theta) \le u\_{\varepsilon}^{i} \qquad \forall \ e \in A, \ i \in K \text{ and } \ \theta \in T \tag{7}$$

Here, the constraints in (5) represent the supply/demand at the sources/sinks and the flow conservation at the intermediate nodes on time horizon *T*. The non-conservation of the flow at the intermediate nodes in any time step *β* in *T* = {0, 1, ... , *T*} are represented by the constraints in (6). Similarly, (7) represents that the flows on the arcs are bounded above by their capacities. With these constraints, together with Equation (1), we introduce the maximum dynamic MCF problem with proportional capacity sharing, which maximizes the total flow value <sup>∑</sup>*i*∈*<sup>K</sup> di* = | *<sup>f</sup>* | within the given time horizon T as follows:

**Problem 2.** *For given dynamic multi-commodity network G = (N, A, K, u, τ , di, S, D, T), the maximum multi-commodity flow problem with proportional capacity sharing is to transship the maximum amount of flow from si to ti within the given time horizon T, where the shared capacity for each i* ∈ *K on the bundle arc is depending on the minimum capacity of paths from the respective source to the tail node of the bundle arc.*

We now present an algorithm to solve Problem (2).

**Theorem 2.** *Algorithm 2 provides the feasible solution to the maximum dynamic MCF problem with proportional capacity sharing in polynomial time.*

**Algorithm 2:** The maximum dynamic MCF algorithm with proportional capacity sharing

Input: Given static multi-commodity flow network *G* = (*N, A, K, u, di, S, D*).


Output: Maximum dynamic MCF on *G* with proportional capacity sharing.

### **4. Maximum MCF with Flow-Dependent Capacity Sharing**

For a given dynamic network *G* with constant transit times *τ* on arc *e*, the multicommodity flow over time function f with flow-dependent capacity sharing is the sum of flows *<sup>f</sup> <sup>i</sup>* : *<sup>A</sup>* × *<sup>T</sup>* → R+, satisfying the constraints (8)–(12).

$$\sum\_{\substack{\sigma \in \vec{\mathcal{S}}(v) \\ \sigma \in \vec{\mathcal{S}}(v)}} \sum\_{\theta=0}^{T} f\_{\mathbf{r}}^{i}(\theta) - \sum\_{\substack{\leftarrow} & \sum\_{\theta=0}^{T} f\_{\mathbf{r}}^{i}(\theta) = \begin{cases} & d\_{\mathbf{i}} & \text{for } v = \mathbf{s}\_{\mathbf{i}} \\ & -d\_{\mathbf{i}} & \text{for } v = \mathbf{t}\_{\mathbf{i}} \\ & 0 & \text{otherwise} \end{cases} \quad \forall \ v \in N, \ i \in \mathcal{K} \tag{8}$$

$$\sum\_{\substack{\sigma \in \over \delta(v)}} \sum\_{\theta=0}^{\beta} f\_{\epsilon}^{i}(\theta) - \sum\_{\substack{\epsilon \leftarrow \over \delta(v)}} \sum\_{\theta=0}^{\beta} f\_{\epsilon}^{i}(\theta) \; \leq 0 \qquad \forall \ v \notin \{s\_{i\epsilon} t\_{i}\}, \ i \in \mathbb{K}\_{\prime} \ \beta \in T \tag{9}$$

$$\sum\_{i \in \mathcal{K}} f\_\varepsilon^i(\theta) \le u\_\varepsilon \qquad \qquad \forall \varepsilon \in A \tag{10}$$

$$\mu\_{\varepsilon}^{i}(\theta) = \frac{f\_{a}^{i}(\theta - \tau\_{a})}{\sum\_{\mathfrak{a} \in \mathfrak{a}(\varepsilon): i \in K} f\_{a}^{i}(\theta - \tau\_{a})} \mu\_{\varepsilon} \qquad \forall \varepsilon \in A \tag{11}$$

$$\int f\_c^i(\theta) \ge 0 \qquad \forall \ e \in A, \quad i \in \mathbb{K} \text{ and } \theta \in T \tag{12}$$

Here, the constraints in (8) and (9) have the usual meanings as represented in Section 3.2. The bundle constraints bounded by the arc capacities are presented by (10). The constraints in (11) represent the flow-dependent capacity sharing and the non-negativity of flows are represented by the constraints in (12). We now present the maximum dynamic MCF problem with flow-dependent capacity sharing satisfying the above constraints as follows:

**Problem 3.** *For a given multi-commodity network G = (N, A, K, u,τ , di, S, D, T), the maximum multi-commodity flow problem with flow-dependent capacity sharing is to transship the maximum amount of flow from si to ti within the given time horizon T, where shared capacity for each i* ∈ *K on the bundle arc is depending on the inflow of incoming arcs of the bundle arc.*

To solve the problem, we use a time-expanded layer graph.

### *Multi-Commodity Time-Expanded Layer Graph*

The multi-commodity time-expanded layer graph is a three-dimensional graph that contains the copy of nodes from the underlying static network for every discrete time step and for each commodity. It is applicable to solve the variety of flow over time problems by applying the algorithms and techniques developed for the static network flows. For a given network G with integral transit time on the arcs and the time horizon *T*, the T-timeexpanded layer graph GT is obtained by creating *T* + 1 copies of node set *N*, which are

labeled as *<sup>N</sup>*(0), *<sup>N</sup>*(1), ... , *<sup>N</sup>*(*T*), together with an *<sup>θ</sup>th* copy of node *<sup>v</sup>* labeled as *<sup>v</sup>*(*θ*), *<sup>θ</sup>* ∈ *<sup>T</sup>* and the commodities *i* ∈ *K*. For every arc *e = (v, w)* ∈ *A* and *θ* ∈ {0, 1, . . . , *T* − *τe*}, there is an arc *ei*(*θ*) from *vi*(*θ*) to *wi*(*θ* + *τe*) with the same capacity of arc e for a single commodity arc and the sharing capacity for bundle arc e. If intermediate storage is allowed at node v, then the arc from *vi*(*θ*) to *vi*(*θ* + 1) represents the holdover arc with infinite capacity that is used to hold the flow for the unit time interval [*θ*, *θ* + 1) for all *θ* ∈ {0, 1, . . . , *T*}.

For the graphical representation, we present a three-dimensional layer graph GT with the set of node *N*, time *T*, and commodity *K* as the coordinate axes (see Figure 1). Each commodity *i* ∈ *K* preforms the layers of the graphs in a vertical line. In Figure 1, network (a) represents a two-commodity network in which commodity-1 is transshipped from s1 to t1 and commodity-2 from s2 to t2. Arc (x, y) is the bundle arc, which carries both commodities. Figure 1b represents the time-expanded layer graph of Figure 1a with the time horizon T = 6, where parallel arcs on (x, y) share the capacity for each commodity with the flow-dependent capacity sharing technique. At time step *θ* = 0 and *θ* = 1, no flow of commodity-1 reaches arc (x, y), so only commodity-2 is transshipped on it; however, the capacity is shared after among the commodities. Similarly, the bundle arc transships only commodity-1 at time *θ* = 4 due to the absence of commodity-2.

**Figure 1.** (**b**) represents the time-expended layer graph *GT* of given network (**a**).

Depending on the time-expanded layer graph, we now present the algorithm to solve Problem 3.

**Theorem 3.** *A feasible solution to the maximum dynamic MCF problem with flow-dependent capacity sharing can be obtained by using Algorithm 3 in pseudo-polynomial time.*


of flow of more than one different commodity from respective sources to the corresponding sinks within the given time horizon. Allocation of the capacity of the bundle arc to each commodity is one of the major issues in the multi-commodity flow problem. To deal with this problem we have proposed proportional capacity sharing and flow-dependent capacity sharing. We have presented polynomial time solutions for the static—as well as the dynamic—maximum MCF problems with proportional capacity sharing and a pseudopolynomial time algorithm with flow-dependent capacity sharing. To the best of our knowledge these solution strategies for the maximum MCF problems are introduced for the first time.

**Author Contributions:** D.P.K.—conceptualization, investigation and documentation, U.P., T.N.D. and S.D.—formal analysis, editing and supervision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research work received no specific grants from any funding in the public, commercial or non-profit organizations.

**Data Availability Statement:** The authors have not used any additional data in this article.

**Acknowledgments:** The first author (Durga Prasad Khanal) thanks to the German Academic Exchange Service—DAAD for research grants—Bi-nationally Supervised Doctoral Degree/Cotutelle, 2021/2022 and the second author (Urmila Pyakurel) thanks the Alexander von Humboldt Foundation for Digital Cooperation Fellowship (1 August 2021–31 January 2022).

**Conflicts of Interest:** Authors have no any conflict of interest regarding the publication of the paper.

### **References**


### *Proceeding Paper* **Two Optimized IoT Device Architectures Based on Fast Fourier Transform to Monitor Patient's Photoplethysmography and Body Temperature †**

**Janith Kodithuwakku \*, Dilki Dandeniya Arachchi, Saw Thiha and Jay Rajasekera**

Digital Business and Innovations, Tokyo International University, Saitama 350-1197, Japan; dadmahindika@gmail.com (D.D.A.); michael.sawthiha@gmail.com (S.T.); jrr@tiu.ac.jp (J.R.)

† Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Abstract:** The measurement of blood-oxygen saturation (SpO2), heart rate (HR), and body temperature are very critical in monitoring patients. Photoplethysmography (PPG) is an optical method that can be used to measure heart rate, blood-oxygen saturation, and many analytics about cardiovascular health of a patient by analyzing the waveform. With the COVID-19 pandemic, there is a high demand for a product that can remotely monitor such parameters of a COVID-19 patient. This paper proposes two major design architectures for the product with optimized system implementations by utilizing the ESP32 development environment and cloud computing. In one method, it discusses edge computing with the fast Fourier transform (FFT) and valley detection algorithms to extract features from the waveform before transferring data to the cloud, and the other method transfers raw sensor values to the cloud without any loss of information. This paper especially compares the performance of both system architectures with respect to bandwidth, sampling frequency, and loss of information.

**Citation:** Kodithuwakku, J.; Arachchi, D.D.; Thiha, S.; Rajasekera, J. Two Optimized IoT Device Architectures Based on Fast Fourier Transform to Monitor Patient's Photoplethysmography and Body Temperature. *Comput. Sci. Math. Forum* **2022**, *2*, 7. https://doi.org/ 10.3390/IOCA2021-10905

Academic Editor: Frank Werner

Published: 26 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** photoplethysmography (PPG); fast Fourier transform (FFT); internet of things (IoT); bloodoxygen saturation (SpO2); heart rate (HR); body temperature; bandwidth optimization; cloud computing

### **1. Introduction**

Monitoring patients with highly contagious illnesses is extremely dangerous for medical staff as they also can be exposed to an unhealthy and harmful environment. IoT remote monitoring devices are very helpful to solve this problem by providing a remote dashboard to visualize needed medical parameters in real-time without being in physical contact with the patients. These systems also reduce the time spent by the medical officers in checking each patient. Furthermore, these systems have a potential to enhance patient monitoring with smart functions such as generation of notifications and reports, etc. [1]

Telemedicine systems with remote monitoring facilities became popular over the past few decades. Some of the IoT pulse oximetry systems are designed to be used in a limited area with wireless communication. Example implementations for these methods are personal ad-hoc wireless networks using Wi-Fi with station mode (STA) and point access mode (AP) [2] and wireless sensor networks (WSN) using ZigBee [3]. Some other existing efforts have been focused on real-time personal pulse oximetry monitoring with reduced data transferring time by using ISO/IEEE 11073 message format and server-side data processing in order to reduce the average response time to 251 milliseconds [4].

This paper discusses two efficient design architectures to implement IoT systems with a comparison of their capabilities. Both architectures include a cloud data server, dashboard, a capturing device to create an interface with a clinical PPG sensor, and a body temperature sensor as shown in Figure 1. This paper explains the architecture of

**<sup>\*</sup>** Correspondence: mjpkodithuwakku@gmail.com

each subsection of the systems with algorithms and methodologies used to improve the proposed systems by eliminating limitation such as high bandwidth usage and accessibility limitations of existing systems. The first part of the paper discusses the main idea of the two proposed architectures in detail as shown in Figure 2. Then the second part discusses the comparison of the results obtained from the two methods. Finally, there is the conclusion and suggestions for future work according to the obtained results.

**Figure 1.** Structure of the proposed IoT system.

**Figure 2.** System architectures of two proposed methods (hardware layers are represented in the dark colors while cloud layers are represented by the light colors in the figure): (**a**) Architecture I—describes an IoT system with data processing layer on the hardware itself. (**b**) Architecture II—describes an IoT system with data processing layer on the cloud.

### **2. Methodology**

### *2.1. System Overview*

The proposed IoT system consists of loosely coupled layered architecture with five main layers. Service oriented loosely coupled architecture of the system ensures the scalability of the system as well as an increase in the ability to change, remove, or upgrade each layer without affecting any other layer of the system.

In the sensing layer, the wireless sensor module is used to produce the readings. This hardware module is the only part that comes into physical contact with the patient. ESP32 has the built-in Wi-Fi function and 240 MHz clock speed required to become a good candidate for an IoT system, with its 32-bit microcontroller and other useful features. Hence, it is capable of the signal processing and data encoding described in the later sections in this paper. Moreover, the proposed system is enhanced with wireless data transmission, optimized data transferring with minimum bandwidth usage, flow control as well as data encoding and decoding, with a sliding window method to ensure minimum error occurrence. Further, it is focused on secure sensor data communication using TCP, and data layer isolation using an API platform.

Optimized data processing and transmission algorithms of the proposed system are the most important and mostly heavily focused upon aspect in this paper. Data processing is performed in a separate layer with the ability to integrate it in the cloud or in the hardware module itself according to the preferred architecture. These two architectures are discussed in detail under the following sections in this paper. In the presentation layer, a scalable and user-friendly web interface (dashboard) is developed to visualize obtained vital signs in real-time, which is accessible through the internet everywhere.

### *2.2. System Architecture*

### 2.2.1. System Architecture I

In this architecture, as shown in the Figure 2a, both the Sensing layer and the Processing layer are within the hardware module. In this method, all the required measurements (SpO2 and heart rate) are calculated within the microcontroller (ESP32) unit itself using obtained sensor values. Data from PPG sensor (IR and red values) are being fed to the SpO2 calculation algorithm and heart rate calculation algorithm. SpO2 level calculation is performed using valley detection algorithms [5] and heart rate is calculated with FFT. To produce a better heart rate value, the valley detection algorithm also calculates heart rate again. Body temperature sensor readings, as a 10-bit analog-to-digital converted (ADC) value, are sent without processing. Calculated hart rate values, SpO2 values, and temperature values are sent to the cloud server where the data receiver and data decoder are running. Decoded data are then sent to the data layer to be stored in the database via the API server running on the network layer. Stored data can be retrieved whenever they are needed for visualization. Then, in the presentation layer, all the measurements are visualized in the dashboard graphically in real-time. In summary, this architecture sends and stores only the calculated measurements. Therefore, there can be a loss of other embedded information of the original waveform.

### 2.2.2. System Architecture II

Unlike in the Architecture I, in this method data do not process within the microcontroller. Instead, it sends raw data obtained from sensors as a byte stream to the data receiver in the cloud and stores all the data in the database without loss of any information in the original waveform. Then the data processing is performed within the data processing layer, which is also in the cloud, and this visualizes them in the same way as described in Architecture I. Thus, only the sensing layer is located within the hardware module, while all the other layers are in the cloud as shown in the Figure 2b. Stored readings can be used for further medical studies related to PPG signal analysis as well.

### *2.3. Bandwith Requirement Analysis*

Data transmission is performed by the TCP socket server connection, which secures a high quality of service (QoS). Error handling and flow controlling are also handled within the TCP protocol. The following analysis discusses the payload optimization that can be obtained with the proposed architecture compared to the common practices. Note that, there we only discuss the bandwidth requirement for the parameter data of the TCP transmission.

### 2.3.1. Bandwidth Requirement Analysis for Architecture I

In this architecture, all the parameters are calculated within the hardware. Calculated heart rate has an integer value that normally reaches 3-digit numbers of beats per minute. SpO2 level is also a percentage value with an integer with maximum 3-digits.

Generally, a data frame can be defined with human readable format as follows,

H (3 bytes)—Heart rate value: (3-digit number)

S (3 bytes)—SpO2 level: (3-digit number)

T (4 bytes)—Temperature reading (10-bit value number from 0 to 210): (4-digit number)

C (1 byte)—Separation character

Data frame size(common) = H + C + S + C + T + C = 3 + 1 + 3 + 1 + 4 + 1 = 13 bytes (1)

If the database update rate is given by FDB, required bandwidth needed only for the parameters is given by following equation.

Required bandwidth(common) = FDB × 13 bytes /s = FDB × 13 × 8 bits /s = 104 FDB bits/s (2)

If FDB = 10 Hz, then the expected bit rate will be: 1040 bits/s (1.015625 kbits/s).

In the proposed Architecture I, because both heart rate and SpO2 values are integers, they can be represented by using only 1 byte. Thus, optimized data frame with proposed data encoder is as follows,

H1 (8 bits)—Heart rate value: (1 byte) S1 (8 bits)—SpO2 level (1 byte) T1 (10 bits)—Temperature reading: (10 bits) FC1 (6 bits)—Flow control: (6 bits)

Data frame size(optimized) = H1 + S1 + T1 + FC1 = 8 + 8 + 10 + 6 = 32 bits = 4 bytes (3)

If the database update rate is given by FDB, required bandwidth needed only for the parameters is given by following equation.

$$\text{Required bandwidth}\_{\text{(optimized)}} = \text{F}\_{\text{DB}} \times 32 \text{ bits/s} = 32 \text{ F}\_{\text{DB}} \text{ bits/s} \tag{4}$$

If FDB = 10 Hz, then the expected bit rate only for data transmission is 320 bits/s (0.3125 kbits/s).

According to Equations (1) and (3), it is clear that optimized data frame can save up to 9 bytes for each parameter set compared to the common practice, which will result in considerable impact on bandwidth, by optimizing payload of TCP transmission packets.

### 2.3.2. Bandwidth Requirement Analysis for Architecture II

In this architecture, raw sensor data are sent to the data receiver. Raw data have two 24-bit readings for each red and IR LEDs of the PPG sensor. In addition to that, the 10-bit temperature reading must also be transmitted.

Generally, a data frame can be defined with human readable format as follows,

R (8 bytes)—Red LED value (24-bit value from 0–224): (8-digit number)

I (8 bytes)—IR LED value (24-bit value from 0–224): (8-digit number)

T (4 bytes)—Temperature reading (10-bit value number from 0 to 210): (4-digit number) C (1 byte)—Separation character

Data frame size(common) = R + C + I + C + T + C = 8 + 1 + 8 + 1 + 4 + 1 = 23 bytes (5)

If the sensor reading rate is given by Fs, required bandwidth needed only for the parameters is given by the following equation.

Required bandwidth(common) = Fs × 23 bytes /s = Fs × 23 × 8 bits/s = 184 Fs bits/s (6)

If Fs = 400Hz, then the expected bit rate is 73,600 bits/s (71.875 kbits/s).

In the proposed Architecture II, IR and red readings are 24-bit numbers, which can be represented by 3 byte each. Thus, optimized data frame with proposed data encoder is as follows,

R1 (24 bits)—Red LED value: (24 bits)

I1 (24 bits)—IR LED value: (24 bits)

T1 (10 bits)—Temperature reading: (10 bits)

FC1 (6 bits)— Flow control: (6 bits)

Data frame size(optimized) = R1 + I1 + T1 + FC1 = 24 + 24 + 10 + 6 = 64 bits = 8 bytes (7)

If the sensor reading rate is given by Fs, required bandwidth needed only for the parameters is given by following equation.

Required bandwidth(optimized) = Fs × 64 bits/s = 64 Fs bits/s (8)

If F = 400 Hz, then the expected bit rate is 25,600 bits/s (25 kbits/s).

According to Equations (5) and (7), optimized data frame can save up to 15 bytes for each parameter set compared to the common practice, which will result in considerable impact on bandwidth, by optimizing payload of TCP transmission packets. This can be a huge advantage for applications which use high sampling rates, because the bandwidth usage is a function of the sensor sampling rate with a proportional relationship.

### *2.4. Data Encoder*

Depending on the architecture used, representation of each byte of the data frame is different. The structures of the encoded data frames for each architecture are shown in Figure 3. Regarding flow controlling, the last 6 bits were used in such a manner that can differentiate each frame for the sliding window method in the data decoder.


**Figure 3.** Optimized data frames after encoding.

This proposed encoding method uses the last 6 bits of each frame to represent 000000 and 111111-bit patterns, to gain the maximum difference between flow control values of the consecutive data frames. This method minimizes the error rate due to the fact that two consecutive sensor reading values cannot produce a very large variation nor the flipping pattern of the flow controlling bits. For example, in the Architecture II, the data frame size is 64 bits. If Nth data frame carries 111111-bit pattern as the last six bits of the data frame, the (N+1)th data frame carries 000000-bit pattern as the ending bits, as shown in the Figure 4a.

**Figure 4.** Structure of data frames used for the optimized data transmission: (**a**) Encoded data frames of n number of bytes with flow controlling bit patterns of 000000 and 111111 at last 6 bits; (**b**) slidingwindow of size n with matched data frame.

### *2.5. Data Receiver*

The data receiver is a TCP server located in the cloud that listens for the data from the hardware device. For each architecture, different data receivers are used according to the size and the structure of the receiving data frames. Data receivers are responsible for four different tasks.


This uses the Central API calls for task I and IV.

### *2.6. Data Decoder*

The data decoder uses the sliding window method with optimization on receiving consecutive sensor readings. Every data frame received by the receiver consists of 6 bits for flow controlling at the end of the data frame, as shown in Figure 4a. The last 6 bits (flow controlling bits) have either a 000000-bit pattern or 111111-bit pattern. This flipping six-bit pattern is responsible for avoiding reading data field as flow controlling bits by mistake when the sliding window method is being used. The sliding window method uses a fixed window size, which equals to the data frame size in each architecture. Use of flow controlling bits for decoding the receiving data frames are explained below.

The length of the sliding window is 8 bytes for Architecture I and 4 bytes for Architecture II. Once the data stream is being receiving, the algorithm executes the following steps:

Step 1: The window shifts over the received data buffer one byte at a time until a matching combination occurs (flipped bits of the current window's flow controlling bits should appear in the following window's ending bits). This is shown in Figure 4b. When a match occurs, the algorithm calls Step 2 with the proceeding data window.

Step 2: Parameters are calculated using the data within the window. This then goes to Step 3.

Step 3: The algorithm will shift along the data buffer from the single window size and validate it every time with the ending bits of each window to follow the expected bit pattern. If a mismatch occurs with flow controlling bit patterns, the algorithm goes back to Step 1. Otherwise, the algorithm goes to Step 2 again.

This method guarantees the reliable delivery of the data as well as ensures that the data are delivered in order.

### *2.7. Central API*

Central API is a RESTful API that consists of globally accessible functions that interact with the database. Although this Central API is globally accessible, it also handles the authentication with JSON Web Token (JWT) to limit the accessibility to ensure the network layer security. This API service consists of full functionalities including create, read, update, and delete (CRUD) operations for data records, and one special function to retrieve the last n number of records for visualization purposes. All the communication between the database and the other layers is conducted only through the central APIs, which ensures the security of the data layer. Moreover, changes conducted for the hardware layer (IoT Device) or presentation layer (Dashboard) will not affect each other or any middle layers.

### **3. Results and Discussion**

We have created the described system and tested for both design Architecture I and design Architecture II. We will focus more on the results of design Architecture II here because unlike Architecture I, Architecture II has to use more bandwidth due to the raw data format, real-time transmission, and its high sampling frequency. The data receiver handles continuous data streaming over several hours of testing without any connection failure. This ensures the capabilities of the data encoder running on hardware device and the data decoder running on the cloud server. The decoded message is shown in Figure 5a. We have run the real-time data processing on the cloud to calculate heart rate and SpO2 level before visualizing it on the web dashboard (Figure 5b).

**Figure 5.** Decoded data and data visualization: (**a**) Decoded data frames in the decoder before it sends to the database for visualization; (**b**) Web Dashboard: SpO2 and heart rate values are displayed in the labels on the top left. Body temperature values displayed in a different graph in the bottom left while IR and red values are displayed in the same graph at the bottom left corner.

The MySQL database and Node.js Central API server worked ideally and enabled the responsive dashboard to visualize all the acquired vital signs graphically using real-time updating of graphs and labels in a user-friendly way. This allows the user (medical officer) to check patients whilst using a minimum amount of time and gaining accurate data.

### **4. Conclusions**

The proposed wireless SpO2, heart rate, and body temperature monitoring system has been implemented and tested. Bandwidth optimization algorithm plays the key role in this system, enabling the system to be performed as expected in the case of data transmission. Even when there is a bandwidth limitation, this optimization on data frame sizes will result in high data throughput on transmission according to the Little's law [6].

The proposed system allows real-time monitoring of the SpO2, heart rate, and body temperature of a patient from a remote location without requiring the physician to take the measurements. The proposed bandwidth optimization algorithm enables these two design architectures to be cost-effective for long-time usage compared to the existing general methods of sending data in other formats. Furthermore, the proposed layered architecture enables the system to be scalable and adaptive to future needs.

**Author Contributions:** J.K.—System architecture, algorithm development and embedded system development; D.D.A.—Software development of backend, frontend, and API server of the proposed system; S.T.—Gathering data and testing the system; J.R.—Guiding and advising the research. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **Quickest Transshipment in an Evacuation Network Topology †**

**Iswar Mani Adhikari 1,\* and Tanka Nath Dhamala <sup>2</sup>**


**Abstract:** The quickest transshipment of the evacuees in an integrated evacuation network topology depends upon the evacuee arrival pattern in the collection network and their better assignment in the assignment network with appropriate traffic route guidance, destination optimization, and an optimal route. In this work, the quickest transshipment aspect in an integrated evacuation network topology is revisited concerning a transit-based evacuation system. Appropriate collection approaches for the evacuees and their better assignment to transit vehicles for their quickest transshipment in an embedded evacuation network are presented with their solution strategies.

**Keywords:** integrated network; evacuee arrival pattern; transit-vehicle assignment; quickest transshipment

### **1. Introduction**

The problem with evacuation planning concerns the maximum number of evacuees being sent from sources to sinks in the minimum time and as efficiently as possible. The bus-based evacuation planning problem (BEPP) is an important tool for transit-based evacuation planning. The effectiveness of the solution of BEPP depends upon the evacuee arrival patterns at the pickup locations and their appropriate assignment to transit vehicles available in the evacuation network [1–3].

The NP-hard multi-depot, multi-trip BEPP was introduced and analyzed prominently in [4], which is closer to the split-delivery multi-depot vehicle routing problem with interdepot routes. However, if there is only one bus depot, assuming that the bus collects the same number of people equal to its capacity, the author in [5] has also proposed the BEPP for the evacuation of a region. Based on such BEPP, Pyakurel et al. [6] explored it to be transit-dependent. It was considered that the evacuees had gathered at different pickup locations and were silent about their arrival patterns.

In our work, we focus on the new and better-suited form of arrival pattern of evacuees in the earliest arrival flow pattern, which maximizes the arrival of evacuees at every possible instance at the pickup locations with zero transit times from a source. We present a polynomial-time earliest arrival evacuee algorithm that follows the principle of temporally repeated flows to solve the earliest arrival evacuee problem with zero transit times and partial arc reversal capability. The evacuees collected at different pickup locations of the primary sub-network are considered as the supplies during the subsequent vehicle assignment for the secondary sub-network. The partial arc reversal approach for the collection of evacuees also reduces the waiting instances at different pickup locations and improves the solution. The assignment of transit vehicles in such a general or a prioritized embedded network is also carried out in a dominating solution approach for their quickest transshipment. The rest of the paper is organized as follows.

**Citation:** Adhikari, I.M.; Dhamala, T.N. Quickest Transshipment in an Evacuation Network Topology. *Comput. Sci. Math. Forum* **2022**, *2*, 8. https://doi.org/10.3390/ IOCA2021-10879

Academic Editor: Frank Werner

Published: 19 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In Section 2, we explain the flow of evacuees, the network topology in Section 3, and the integrated evacuation system related to the general and the prioritized network in Sections 4 and 5, respectively. Section 6 concludes the paper.

### **2. Flow of Evacuees**

In an evacuation planning problem, the flow stands for either the evacuees or the evacuee carrying vehicles. An *s*–*y* flow of evacuees over time from source *s* to the sink *y* is a non-negative function *f* on *A* × *R*<sup>+</sup> for the given time *T* = {0, 1, . . . , *T*} satisfying the flow conservation and capacity constraints (1)–(3). The inequality flow conservation constraints allow it to wait for flow at intermediate nodes; however, the equality flow conservation constraints force that flow when entering an intermediate node must leave it.

$$\sum\_{\sigma=\tau\_a}^{T} \sum\_{a \in A\_{\tilde{y}}^{in}} f\left(a, \,\,\sigma-\tau\_a\right) - \sum\_{\sigma=0}^{T} \sum\_{a \in A\_{\tilde{s}}^{out}} f\left(a, \,\,\sigma\right) = 0 \,, \,\forall \, i \in V \backslash (S \cup Y) \tag{1}$$

$$\sum\_{\sigma=\tau\_x}^{\theta} \sum\_{a \in A\_y^{in}} f(a, \sigma - \tau\_a) - \sum\_{\sigma=0}^{\theta} \sum\_{a \in A\_y^{out}} f(a, \sigma) \ge 0, \forall i \in V \backslash (S \cup Y), \theta \in T \tag{2}$$

$$0 \le f\left(a, \theta\right) \le u\_{\theta} \,\forall\, a \in A\,,\,\theta \in T. \tag{3}$$

The sets of outgoing and incoming arcs corresponding to the node *i V* are denoted by, *Aout <sup>i</sup>* <sup>=</sup> {*<sup>a</sup>* <sup>=</sup> (*i*, *<sup>j</sup>*) *<sup>A</sup>*} and *<sup>A</sup>in <sup>i</sup>* = {*a* = (*j*, *i*) *A*}, respectively. Not stated otherwise, for all *y Y* and *s S*, we assume that *Aout <sup>i</sup>* = *<sup>A</sup>in <sup>i</sup>* = Φ in the case without arc reversals. However, for *s* and *y*, the flow value is *υ<sup>f</sup>* (*s*) > 0 and *υ<sup>f</sup>* (*y*) < 0, respectively, where <sup>∑</sup>*i*∈*<sup>V</sup> <sup>ν</sup>f*(*i*) = 0. If the supply and demand on sources and sinks *<sup>υ</sup><sup>f</sup>* (*i*) is a fixed value for all *i* {*s*, *y*}, then the earliest evacuee problem maximizes value *υ<sup>f</sup>* (*θ*) for all *θ T*, as in Equation (4) satisfying the constraints (1)–(3).

$$f(\nu\_{f'}, \theta) = \sum\_{\sigma=0}^{\theta} \sum\_{a \in A\_s^{out}} f(a, \sigma) = \sum\_{\sigma=\pi\_a}^{\theta} \sum\_{a \in A\_y^{in}} f(a, \sigma - \pi\_a) \tag{4}$$

The total amount out of the source *s* that reached the pickup locations *Y* for all time up to *θ* ∈ *Z*+, with zero transit times *τ<sup>a</sup>* = 0, is given by,

$$|\nu\_f|\theta'=\sum\_{\sigma=1}^{\theta'}|\text{value}\ (\mathcal{Y},\theta)|.\tag{5}$$

For the given time bound T, the value of Equation (5) becomes,

$$|\nu\_f| = \sum\_{\sigma=1}^{T} |value\ (\mathbf{Y}, \theta)|.\tag{6}$$

We consider a flow of evacuees over the time problem with zero transit time function *f*: *A* × *Z*<sup>+</sup> → *R*+.

### **3. Network Topology**

In an integrated evacuation scenario, we consider a network *N*, obtained by combining two of its components *N*<sup>1</sup> and *N*<sup>2</sup> representing a primary and a secondary sub-network, respectively. The first part *N*<sup>1</sup> contains directed two-way road segments and the partial arc reversals are applicable. The second part *N*<sup>2</sup> contains directed one-way road segments, connecting the bus depot to the pickup locations, and undirected edges connecting such pickup locations to the sinks for the bus routing. Evacuees collected at the pickup locations *Y* in *N*<sup>1</sup> = (*s*, *V*, *A*, *ua*, *τa*,*Y*) are assigned to transit buses in the appropriate route across *N*<sup>2</sup> and are finally sent to the sinks as shown as in Figure 1. Here, *V* = {*v*1, *v*2, *v*3,.... *vn*} and

*Y* = {*y*1, *y*2, *y*3,..., *yn*} which are the set of auxiliary nodes and the set of pickup locations, respectively. The set of arcs is denoted by *A* = {*a* = (*s*, *v*) ∪ (*v*, *y*) : *v* ∈ *V*, *y* ∈ *Y*} where *ua* and *τ<sup>a</sup>* denote the capacity and transit times for *a* ∈ *A*.

**Figure 1.** An integrated network topology consisting of a primary and secondary sub-network in an embedding.

Additionally, in *N*<sup>2</sup> = (*d*, *Y*, *E*, *τe*, *Z*), *d* is the bus depot at which a set of transit buses B which have the homogeneous bus capacity is initially located and are assigned as required during the evacuation process. The bus depot does not perform further significant roles on the solution procedure as the buses do not return to it even after the completion of the evacuation plan because of risks under threat. In an embedding, *Y* works as the supply nodes during the bus-assignment in *N*2. The set of sinks is denoted by *Z* = {*z*1, *z*2, *z*3,.... *zn*}. In this mixed sub-network, the set E consists of the one-way arcs *e* = (*d*, *y*) with *y* ∈ *Y* and the undirected edges *e* = [*y*, *z*] with *z* ∈ *Z*. Here, *τ<sup>e</sup>* is the transit times for *e* ∈ *E* in *N*2.

Based on the BEPP introduced by [4], authors in [5] have developed a simplified version for the evacuation of a region from a set of collection points to a set of capacitated shelters with the help of buses in a minimum time, assuming that the bus collects exactly the number of people that equals its capacity. Through their solution on a branch-andbound framework, they have presented four different upper bounds and three lower bounds for time, in addition to three branching rules to minimize the number of branches, and two tree-reduction strategies to avoid the equivalent branches. Among them, four upper bounds are constructed in a polynomial-time complexity by four different heuristic algorithms, whereas three are based on precomputed tour lists. The fourth uses an iterative way without any precomputed tour lists and dominates the rest concerning the evacuation duration and is considered as the dominating assignment approach [7].

Here, we introduce the earliest arrival evacuee (Problem 1) respecting the partial arc reversal capability in *N*1.

**Problem 1.** *Given an evacuation sub-network N*<sup>1</sup> = (*S*, *V*, *A*, *ua*, *τa*, *Y*) *with supplies at S, demands at Y, auxiliary nodes V, arc capacity ua, and arc transit time τ<sup>a</sup> for a A, the quickest partial arc reversal transshipment problem is to find the quickest arrival of evacuees at Y with partial arc reversals capability*.

If the reversals of an arc are considered *a* = (*i*, *j*) be *a* = (*j*, *i*), then the transformed network of *N*<sup>1</sup> consists of the modified arc capacities and constant transit times as,

$$
\mu\_{\mathbb{L}} = \mu\_{a} + \mu\_{a'} \text{ and } \tau\_{\mathbb{L}} = \tau\_{a} \text{ if } a \in A \text{ and is } \tau\_{a'} \text{ for otherwise.} \tag{7}
$$

Here, an edge *a* ∈ *A* in transformed network *N*<sup>1</sup> if *a* ∨ *a* ∈ *N*1. Concerning the auxiliary reconfiguration, it is allowed to redirect the arc in any direction with the modified increased capacity but with the same transit time in either direction. The remaining graph structure and data are unaltered.

Now, Algorithm 1 is presented to solve the earliest arrival evacuee problem with zero transit times with partial arc reversal capability as in [7].

**Algorithm 1.** Earliest arrival evacuee algorithm.

**Input:** A flow over time sub-network *N*<sup>1</sup> = (*s*, *V*, *A*, *ua*, *τa*,*Y*) with *τ<sup>a</sup>* = 0 for each *a* ∈ *A*.


This algorithm sends the evacuees at the earliest arrival time to *Y* at each instances and the problem can be solved in polynomial-time complexity. For this we have (Theorem 1).

**Theorem 1.** *The earliest arrival evacuee problem having zero transit times with a partial arc reversal capability follows the principle of temporally repeated flows and can be solved in polynomial-time complexity.*

**Proof.** Steps 1, 2, and 4 given by Algorithm 1 are solved in linear time. Its time complexity is dominated by the time complexity of computation of the earliest arrival evacuees at the pickup locations *Y* with zero transit times on each arc as in [8] in Step 2, which is solved in polynomial-time. Thus, it can be solved in polynomial-time complexity in *N*1. -

The flow over time problem having zero transit times that reached to each of the pickup locations determines the maximum number of evacuees at every possible time instance from the beginning in *N*1. That means the earliest arrival of evacuees at *Y* from s with zero transit times in the transformed network follows the principle of temporally repeated flows which is equivalent to the solution with arc reversals capability in the original network [10].

### **4. Integrated Evacuation Network**

For large scale disasters with a sufficiently large number of evacuees, all the evacuees may not arrive at *Y* at the same time. Those who are delivered to *Y* earlier will have comparatively more waiting time. Whereas, for the evacuees, waiting at *Y* is comparatively better than to be at *s*. However, buses available at bus depot d request a certain time to be assigned to *Y* and are given by *τdi*. Hence the effective waiting time in *N* can be denoted by Ω = *max*{*ωi*, *τdi*}, for *ω<sup>i</sup>* the waiting is at *yi* ∈ *Y*. To address this, the objective function given for the BEPP can be modified. Therefore, for *Tmax* the duration of evacuation vehicles overall under the constraint as in [5], the integrated evacuation planning (Problem 2) can be reformulated as,

$$\text{Minimize } T\_{\text{max}} \tag{8}$$

$$\text{such that } T\_{\text{max}} \ge \Omega + \sum\_{r \in R} \tau\_{to}^{br} + \sum\_{r \in R} \tau\_{back}^{br} \lor b \in B \tag{9}$$

**Problem 2.** *Given N* = (*s*, *d*, *V*, *A*, *E*, *ua*, *τa*, *τe*, *Z), having supplies and demands at s and Z, respectively. The integrated evacuation planning problem in a prioritized embedding is to assign the vehicles for evacuees' transshipment with a minimum clearance time*.

To address such a problem in a prioritized embedding, we have the transit-vehicle assignment algorithm (Algorithm 2) for the minimum clearance time as in [7].

**Algorithm 2.** Transit-vehicle assignment algorithm for the minimum clearance time.

**Input:** An embedded evacuation network *N* = (*s*, *d*, *V*, *A*, *E*, *ua*, *τa*, *τe*, *Z*).


**Output:** Transit-vehicle assignment with the minimum clearance time from *s* → *Z*.

### **5. An Integrated Prioritized Evacuation System**

In a prioritized evacuation system as in [11,12], evacuees are collected from the disaster zone to the prioritized pickup locations of the primary sub-network in the minimum time as the quickest transshipment by using the lex-max flow approach [13]. Considering such pickup locations as the sources, the available set of transit buses are also assigned in the network to evacuate the evacuees safely to the sinks on a first-come-first-serve basis and is better suited for the simultaneous flow of evacuees. Such an assignment is also carried out in a dominating solution approach by adjusting the potential demands of the pickup locations to the minimum wait in the embedding. To have the quickest arrival of evacuees with partial arc reversals capability, we introduce Problem 3 and design Algorithm 3 as follows:

**Problem 3.** *Given an evacuation sub-network N*<sup>1</sup> = (*S*, *V*, *A*, *ua*, *τa*, *Y), with supplies at S, demands at Y, auxiliary nodes V, arc capacity ua, and arc transit time τ<sup>a</sup> for a* ∈ *A*. *The quickest partial arc reversal transshipment problem is to find the quickest arrival of evacuees at* Y *with partial arc reversals capability*.

**Algorithm 3.** Quickest partial arc reversal transshipment algorithm.

**Input:** A dynamic sub-network *N*<sup>1</sup> = (*S*, *V*, *A*, *ua*, *τa*, *Y*), with the supply and demand.


**Output:** The quickest arrival of evacuees at *Y* in *N*<sup>1</sup> with partial arc reversal capability.

For its time complexity, we have Theorem 2.

**Theorem 2.** ([11])**.** *For the quickest partial arc reversal transshipment in N*1*, the quickest evacuee arrival problem can be computed in polynomial-time complexity via k minimum cost flow (MCF) computations in O*(*k*(*MCF*)(*m*, *n*)) *time, where MCF*(*m*, *n*) = *O*(*m log n* (*m* + *n log n*)) *in a network having n nodes and m arcs.*

**Proof.** Steps 1, 3, and 4 related to the arc reversal capability as in Algorithm 3 are solved in a linear time, so that their time complexity is dominated by the time complexity of the computation of the quickest evacuee arrival in *N*<sup>1</sup> and is solved in polynomial-time as,

in *O*(*k*(*MCF*)(*m*, *n*) where *MCF*(*m*, *n*) = *O*(*m log n* (*m* + *n log n*)) in a network having n nodes and m arcs as in [14]. -

Transit buses having uniform capacity *Q* are assigned from *d* which are sufficiently nearer to *Y* in *N*<sup>2</sup> on a first-come-first-serve basis. Such assignment begins only after *α*<sup>1</sup> ≥ *Q* for *α*<sup>1</sup> is the number of evacuees who have arrived at the highest pickup demand. For the subsequent assignments, the effective waiting instance *ψ* is almost negligible.

Buses are assumed to collect their full capacities. For this, the potential demands of the pickup locations are adjusted to be the integral multiple of busloads. Let the potential demand of the pickup location *yk* ∈ *Y* be *α*(*yk*). For . to be the floor function, the demands can be adjusted to be *α* (*yk*) by using the following demand adjustment.

$$\alpha'(y\_k) = \lfloor \frac{a(y\_k) + \sum\_{q=1}^{k-1} [a(y\_k) - a'(y\_k)]}{Q} \rfloor \cdot Q \tag{10}$$

However, if the *kth* pickup location is the last one with the least priority, then it is taken

$$\alpha'(y\_k) = \alpha(y\_k) + \sum\_{q=1}^{k-1} \left[ \alpha(y\_q) - \alpha'(y\_q) \right] \tag{11}$$

Then the integrated evacuation planning problem, under similar constraints as above, can be reformulated as;

$$\text{Minimize} \quad T\_{\text{max}} \tag{12}$$

$$\text{such that } \tau\_{\text{max}} \ge \Psi + \sum\_{r \in R} \tau\_{to}^{br} + \sum\_{r \in R} \tau\_{back}^{br} \,\forall b \in B \tag{13}$$

Constraint (13) needs *Tmax* to be greater than or equal to the maximum travel cost incurred by all buses and is to be maximized in (12).

In an integrated approach, the quickest transshipment of the evacuees at *Y* in *N*<sup>1</sup> in the form of lex-max dynamic flows with respect to the adjusted demands are assigned to the transit buses in *N*2. For this, we introduce Problem 4 and design Algorithm 4.

**Problem 4.** *Given an evacuation network N* = (*S*, *V*, *A*, *ua*, *τa*, *Y*, *d*, *ue*, *τe*, *Z*)*. Having supplies and demands at s and Z respectively, the integrated evacuation planning problem in a prioritized embedding is to assign the vehicles for evacuees' transshipment with minimum clearance time.*

**Algorithm 4.** An integrated evacuation planning algorithm in a prioritized embedding.

**Input:** An embedding *N* = (*S*, *V*, *A*, *ua*, *τa*, *Y*, *d*, *ue*, *τe*, *Z*), with given supply and demand.


7. Otherwise, return to Step 4.

**Output:** Transshipment of evacuees finally to *Z* in minimum clearance time.

### **6. Conclusions**

Different network structures, models, algorithms, and their solution strategies are integrated and extended to achieve the quickest transshipment of the evacuees in an integrated network. Assignment of transit vehicles in such embeddings is carried out in a domination solution approach for the minimum evacuation time.

Corresponding to an integrated network topology, specific arrival patterns are considered in the collection network. In such a network, we use the concept of partial arc reversals which is beneficial to increase the flow values of evacuees by decreasing their collection time and is also favorable to have the minimum clearance time of the evacuees. The unused and saved arcs can be used for logistics and emergency facilities. A prioritized primary network is considered to collect the evacuees in the lex-max flow approach as the quickest transshipment and is assigned in the secondary sub-network in such prioritized embedding. It is a better suited and novel approach for the simultaneous assignment with minimum delay in the embedding.

**Author Contributions:** Conceptualization, I.M.A. and T.N.D.; writing—original draft preparation, I.M.A.; writing—review and editing, T.N.D.; supervision, T.N.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **A Generative Adversarial Network Based Autoencoder for Structural Health Monitoring †**

**Giorgia Colombera 1,\*, Luca Rosafalco 1, Matteo Torzoni 1, Filippo Gatti 2, Stefano Mariani 1, Andrea Manzoni <sup>3</sup> and Alberto Corigliano <sup>1</sup>**


**Abstract:** Civil structures, infrastructures and lifelines are constantly threatened by natural hazards and climate change. Structural Health Monitoring (SHM) has therefore become an active field of research in view of online structural damage detection and long term maintenance planning. In this work, we propose a new SHM approach leveraging a deep Generative Adversarial Network (GAN), trained on synthetic time histories representing the structural responses of both damaged and undamaged multistory building to earthquake ground motion. In the prediction phase, the GAN generates plausible signals for different damage states, based only on undamaged recorded or simulated structural responses, thus without the need to rely upon real recordings linked to damaged conditions.

**Keywords:** structural health monitoring; machine learning; generative adversarial network

### **1. Introduction**

Bridges, power generation systems, aircrafts, buildings and rotating machinery are only few instances of structural and mechanical systems which play an essential role in the modern society, even if the majority of them are approaching the end of their original design life [1]. Taking into account that their replacement would be unsustainable from an economic standpoint, alternative strategies for early damage detection have been actively developed so to extend the basis service life of those infrastructures. Furthermore, the advent of novel materials whose long-term behaviour is still not fully understood drives the effort for effective Structural Health Monitoring (SHM), resulting in a saving of human lives and resources [1].

SHM consists of three fundamental steps: (i) measurement, at regular intervals, of the dynamic response of the system; (ii) selection of damage-sensitive features from the acquired data; (iii) statistical analysis of those attributes to assess the current health state of the structure. To characterize the damage state of a system, the method relying on hierarchical phases, originally proposed by [2] represents the currently adopted standard. The latter prescribes several consecutive identification phases (to be tackled in order), namely: check the existence of the damage, the location of the damage, its type, extent and the system's prognosis. Damaged states are identified by comparison with a reference condition, assumed to be undamaged. The detection of the damage location relies upon a wider awareness of the structural behaviour and the way in which it is influenced by

**Citation:** Colombera, G.; Rosafalco, L.; Torzoni, M.; Gatti, F.; Mariani, S.; Manzoni, A.; Corigliano, A. A Generative Adversarial Network Based Autoencoder for Structural Health Monitoring. *Comput. Sci. Math. Forum* **2022**, *2*, 9. https:// doi.org/10.3390/IOCA2021-10887

Academic Editor: Frank Werner

Published: 22 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

damage. This information, along with the knowledge of how the observed features are altered by different kinds of damage, allows to determine the type of damage. The last two phases require an accurate estimation of the damage mechanisms in order to classify its severity and to estimate the Remaining Useful Life (RUL).

All the steps mentioned above rely on continuous data acquisition and processing to obtain information about the current health condition of a system. In the last few years, the concept of Digital Twin has emerged, combining data assimilation, machine learning and physics-based numerical Simulations [1], the latter being essential to completely understand the physics of the structure and damage mechanisms. A suitable tool able to extract main dominant features from a set of data is represented by neural networks [3], especially generative models such as Generative Adversarial Networks (GANs) [4] and Variational Autoencoders (VAEs) [5].

In this paper, an application of the generative neural network RepGAN, proposed by [6], is presented in the context of SHM. Section 2 provides an overview on existing works. In Section 3, the application of RepGAN to Structural Health Monitoring is presented. In Section 4, extensive numerical results are illustrated, while Section 5 gathers some concluding remarks.

### **2. Related Work**

Generative Adversarial Networks [4] are well known due to their generative capability. Given a multidimensional random variable *<sup>X</sup>* <sup>∈</sup> (R*dX* , <sup>E</sup>*X*, *PX*), where (R*dX* , <sup>E</sup>*X*, *PX*) denotes the probabilistic space with *σ*-algebra E*<sup>X</sup>* and probability measure *PX*, whose samples are collected in the data set S = *x*(*i*) *<sup>N</sup> i*=1 , with probability density function *pX*(*X*), the GAN generator attempts to reproduce synthetic samples *x***ˆ**, sampled according to the probability density function *pG*(*X*) as similar as possible to the original data, i.e., a GAN trains over data samples in order to match *pG* with *pX*. *G* maps a lower dimension manifold (R*dZ* , <sup>E</sup>*Z*, *PZ*) (with *dZ* <sup>&</sup>lt; *dX* in general) into the physics space (R*dX* , <sup>E</sup>*X*, *PX*). In doing so, *<sup>G</sup>* learns to pass the critic test, undergoing the judgement of a discriminator *<sup>D</sup>* : <sup>R</sup>*dX* <sup>→</sup> [0, 1], simultaneously trained to recognize *x***ˆ** (*i*) counterfeits. The adversarial training scheme relies on the following two-players Minimax game:

$$\{\mathbf{G}; D\} = \arg\min\_{\mathbf{G}} \max\_{D} V(D, \mathbf{G})$$

$$V(D, \mathbf{G}) = \mathbb{E}\_{\mathbf{X} \sim \mathbb{P}\_{\mathbf{X}}} [\ln D(\mathbf{X})] + \mathbb{E}\_{\mathbf{Z} \sim \mathbb{P}\_{\mathbf{Z}}} [\ln(1 - D(\mathbf{G}(\mathbf{Z}))) ]$$

In practice, *G* is represented by a neural network *G<sup>θ</sup>* and *D* by a neural network *Dω*, with trainable weights and biases *θ* and *ω*, respectively. Moreover, *V*(*D*, *G*) is approximated by the Empirical Risk function *L*<sup>S</sup> (*ω*, *θ*), depending on the data set S, defined as:

$$\begin{aligned} \{\theta; \omega\} &= \arg\min\_{\theta} \max\_{\omega} L\_S(\omega, \theta) = \\ &= \arg\min\_{\theta} \max\_{\omega} \frac{1}{n} \sum\_{i=1}^{n} \left( \ln D\_{\omega}(\mathbf{x}^{(i)}) + \ln(1 - D\_{\omega}(\mathbf{G}\_{\theta}(\mathbf{z}^{(i)}))) \right) \end{aligned} \tag{2}$$

with *z*(*i*) sampled from a known latent space probability distribution *pZ* (for instance the normal distribution <sup>N</sup> (**0**,I)). The generator *<sup>G</sup><sup>θ</sup>* induces a sampling probability *pG*(*X*; *<sup>θ</sup>*) so that, when optimized, passes the critic test, with *D* being unable to distinguish between *<sup>x</sup>*(*i*) and *<sup>G</sup><sup>θ</sup> z*(*i*) (i.e., *D*(*x*(*i*)) = <sup>1</sup> <sup>2</sup> = *D*(*G<sup>θ</sup> z*(*i*) ). In other words, *<sup>x</sup>*(*i*) and *<sup>G</sup><sup>θ</sup> z*(*i*) can be associated with the value of a categorical variable *C*, with two possible values: class "*d*" (data) and class "*g*" (generated). *<sup>x</sup>*(*i*) and *<sup>G</sup><sup>θ</sup> z*(*i*) can be therefore sampled with the mixture probability density *pM* = *αχ*(*C* = "*d*") + (1 − *α*)*χ*(*C* = "*g*") with *χ* being the indicator function and *α* = *P*(*C* = "*d*") [7]. The optimum solution of the Minimax game in Equation (2) induces a mixture probability distribution <sup>1</sup> 2 *pC*<sup>=</sup>"*d*" + *pC*="*g*" [4]. The saddle point of *V*(*D*, *G*) corresponds to the minimum (with the respect to to *D*) of the conditional Shannon's entropy <sup>S</sup>(*C*|*X*) (see Appendix A). Moreover, minimizing the conditional Shannon's entropy <sup>S</sup>(*C*|*X*) corresponds to the maximization of the Mutual Information *<sup>I</sup>*(*X*, *<sup>C</sup>*) = <sup>S</sup>(*C*) <sup>−</sup> <sup>S</sup>(*C*|*X*) (see Appendix B), i.e., it corresponds to extract *<sup>X</sup>* samples *<sup>x</sup>*(*i*) or *x***ˆ** (*i*) that are indistinguishable (belonging to same class), with an uninformative mapping *X* → *C*.

GANs proved useful in various applications such as generation of artificial data for data-set augmentation, filling gaps in corrupted images and image processing. Especially, deep convolutional generative adversarial networks (DCGANs) [8] proved useful in the field of unsupervised learning. SHM could benefit from GANs as they improve the generalisation performance of models, extracting general features from data, as well as their semantics (damage state, frequency content, etc). However, the adversarial training scheme in Appendix <sup>C</sup> does not grant a bijective mapping *<sup>G</sup>θ<sup>Z</sup>* : *<sup>Z</sup>* → *<sup>X</sup>* (decoder) and *<sup>F</sup>θ<sup>X</sup>* : *<sup>X</sup>* → *<sup>Z</sup>* (encoder), which is crucial in order to obtain a unique representation of the data into the latent manifold. Autoencoders have been developed for image reconstruction so to learn the identity operator *x***ˆ** (*i*) <sup>=</sup> <sup>I</sup>(*x*(*i*)) = *<sup>G</sup>θ<sup>Z</sup>* ◦ *<sup>F</sup>θ<sup>X</sup>* (*x*(*i*)). One can leverage the encoder *<sup>F</sup>θ<sup>X</sup>* representation power to sample points *<sup>z</sup>***<sup>ˆ</sup>** (*i*) <sup>=</sup> *<sup>F</sup>θ<sup>X</sup>* (*x*(*i*)) belonging to the latent manifold <sup>Ω</sup>*<sup>Z</sup>* and the decoder *<sup>G</sup>θ<sup>Z</sup>* to sample points *<sup>x</sup>***<sup>ˆ</sup>** (*i*) <sup>=</sup> *<sup>G</sup>θ<sup>Z</sup>* (*z*(*i*)) belonging to the latent manifold Ω*<sup>X</sup>* (see Equation (1)). In order to make the learning process of GANs stable across a range of data-sets and to realize higher resolution and deeper generative models, Convolutional Neural Networks (CNNs) are employed to define *<sup>F</sup>θ<sup>X</sup>* , *<sup>G</sup>θ<sup>Z</sup>* and the discriminators. *<sup>F</sup>θ<sup>X</sup>* and *<sup>G</sup>θ<sup>Z</sup>* induce sampling probability density functions *qZ*|*<sup>X</sup>* <sup>=</sup> *qXZ pX* and *pX*|*<sup>Z</sup>* <sup>=</sup> *pXZ pZ* , respectively. *pX* is usually unknown (depending on the data-set at stake), but *pZ* can be chosen ad hoc (such as, for instance, <sup>N</sup> (**0**,I)) in order to get a powerful generative tool for realistic data samples *x***ˆ** (*i*) . A particular type of Autoencoders, called Variational Autoencoders (VAEs) was introduced by [5], consisting in a probabilistic and generative version of the standard Autoencoder, where the encoder *<sup>F</sup>θ<sup>X</sup>* infers the mean *<sup>μ</sup><sup>Z</sup>* and variance *σ*<sup>2</sup> *<sup>Z</sup>* of the latent manifold. However, the main contribution provided by VAEs is the straightforward approach that allows to reorganize the gradient computation and reduce variance in the gradients labelled reparametrization trick.

Adversarial Autoencoders (AAEs) [9] employ the adversarial learning framework in Equation (1), replacing *<sup>G</sup>θ<sup>Z</sup>* (*z*(*i*)) by *<sup>G</sup>θ<sup>Z</sup>* ◦ *<sup>F</sup>θ<sup>X</sup>* (*x*(*i*)) and adding to the adversarial GAN loss the Mean Square Loss *x*(*i*) <sup>−</sup> *<sup>G</sup>θ<sup>Z</sup>* ◦ *<sup>F</sup>θ<sup>X</sup>* (*x*(*i*))<sup>2</sup> as an optimization penalty, in order to assure a good reconstruction of the original signal. However, AAEs do not assure a bijective mapping between (R*dX* , <sup>E</sup>*X*, *PX*) and (R*dZ* , <sup>E</sup>*Z*, *PZ*). In order to achieve the bijection (in a probabilistic sense) between (*x*, *z***ˆ**) and (*x***ˆ**, *z*) samples, the distance between the joint probability distributions *qXZ*<sup>ˆ</sup> <sup>=</sup> *qZ*<sup>ˆ</sup>|*<sup>X</sup> pX* and *pXZ*<sup>ˆ</sup> <sup>=</sup> *pX*<sup>ˆ</sup> <sup>|</sup>*<sup>Z</sup> pZ* [10], with the posteriors *qZ*<sup>ˆ</sup>|*<sup>X</sup>* and *pZ*|*X*<sup>ˆ</sup> must be minimized. A suitable distance operator for probability distributions is the so called Jensen–Shannon distance D*JS qXZ*<sup>ˆ</sup> ||*pXZ*<sup>ˆ</sup> , defined as [10]:

$$\mathbb{D}\_{\text{JS}}\left(q\_{X\mathbb{Z}}||p\_{\hat{X}\mathbb{Z}}\right) = \frac{\mathbb{D}\_{\text{KL}}\left(q\_{X\mathbb{Z}}||p\_M\right) + \mathbb{D}\_{\text{KL}}\left(p\_{\hat{X}\mathbb{Z}}||p\_M\right)}{2} = \mathbb{S}(p\_M) - \mathbb{S}(\mathbf{X}, \mathbf{Z}|M) \tag{3}$$

with <sup>D</sup>*KL*(*pq*) <sup>=</sup> <sup>S</sup>(*pq*) <sup>−</sup> <sup>S</sup>(*p*) being the Kullback–Leibler divergence (see Appendix B) and *pM* <sup>=</sup> *qXZ*<sup>ˆ</sup> <sup>+</sup>*pXZ*<sup>ˆ</sup> <sup>2</sup> being the mixture probability distribution [7], i.e., the probability of extracting *X*, *Z***ˆ** or *X***ˆ** , *Z* from a mixed data set, with *α* = *P*(*C* = "*d*") = <sup>1</sup> <sup>2</sup> and the entropy of the mixture probability S(*M*) = ln 2. D*JS qXZ*<sup>ˆ</sup> ||*pXZ*<sup>ˆ</sup> can be rewritten as:

$$\begin{split} \mathbb{D}\_{l\mathbb{S}}(q\_{X\mathbb{Z}}||p\_{\mathbb{X}\mathbb{Z}}) &= \mathbb{S}(p\_M) - \frac{1}{2} (\mathbb{S}(q\_{X\mathbb{Z}}) + \mathbb{S}(p\_{\mathbb{X}\mathbb{Z}})) = \\ &= \mathbb{S}(\mathbf{X}, \mathbf{Z}) - \mathbb{S}(\mathbf{X}, \mathbf{Z}|M) = \mathbb{S}(M) - \mathbb{S}(M|\mathbf{X}, \mathbf{Z}) \end{split} \tag{4}$$

The adversarial optimization problem expressed in Equation (1) can be seen as a minimization of the Jensen–Shannon distance for *C* ∈ {"*d*", "*g*"}:

$$\begin{split} \mathbb{D}\_{\mathbb{X}} \left( q\_{X\mathbb{Z}} || p\_{\hat{\mathbf{X}}\mathbb{Z}} \right) + \ln \mathfrak{2} &= -\mathbb{S}(M | \mathbf{X}, \mathbf{Z}) = \\ &= \frac{1}{2} \mathbb{E}\_{\left( \mathbf{X}, \mathbf{Z} \right) \sim q\_{\mathbf{X}\mathbb{Z}}} \left[ D\left( \mathbf{X}, \mathbf{Z} \right) \right] + \frac{1}{2} \mathbb{E}\_{\left( \mathbf{X}, \mathbf{Z} \right) \sim p\_{\mathbf{X}\mathbb{Z}}} \left[ 1 - D\left( \mathbf{X}, \mathbf{Z} \right) \right] \end{split} \tag{5}$$

that can be combined with the Autoencoder model in order to obtain the following expression [10,11]:

$$\begin{split} \mathbb{D}\_{\text{JS}}\left(q\_{X\underline{\mathcal{Z}}}||p\_{\underline{\mathcal{X}}\underline{\mathcal{Z}}}\right) + \ln 2 &= \mathbb{D}\_{\text{JS}}\left(q\_{\underline{\mathcal{Z}}|X}p\_{X}||p\_{\underline{\mathcal{X}}|Z}p\_{\underline{Z}}\right) + \ln 2 = \\ &= \frac{1}{2}\Big[\mathbb{E}\_{\mathbf{X}\sim\mathcal{P}\_{\text{X}}}\Big[D\Big(\mathbf{X},\mathsf{F}\_{\boldsymbol{\theta}\_{\mathcal{X}}}(\mathbf{X})\Big)\Big] + \mathbb{E}\_{\mathbf{Z}\sim\mathcal{P}\_{\text{Z}}}\Big[1 - D\Big(\mathbf{G}\_{\boldsymbol{\theta}\_{\mathcal{Z}}}(\mathbf{Z}),\mathbf{Z}\Big)\Big]\Big] \end{split} \tag{6}$$

In this context, *<sup>F</sup>θ<sup>X</sup>* learns to map data into a disentangled latent space, generally following the normal distribution, a good reconstruction is not ensured unless the crossentropy between *X* and *Z* is minimized too [12].

Another crucial aspect of generative models is the semantics of the latent manifold. Most of the standard GAN models trained according to Equation (1) employs a simple factored continuous input latent vector *Z* and does not enforce any restrictions on the way the generator treats it. The individual dimensions of *Z* do not correspond to semantic features of the data (uninformative latent manifolds) and *Z* cannot be effectively used in order to perform meaningful topological operations in the latent manifold (e.g., describing neighborhoods) and to associate meaningful labels to it. An information-theoretic extension to GANs, called InfoGAN [13] is able to learn a meaningful and disentangled representations in a completely unsupervised manner: a Gaussian noise *Z* is associated with a latent code *C* to capture the characteristic features of the data distribution (for classification purposes). As a consequence, the generator becomes *<sup>G</sup>θ<sup>Z</sup>* (*Z*, *C*) and the corresponding probability distribution *pG*, whose Mutual Information with the respect to to the latent codes *<sup>C</sup>*, namely *<sup>I</sup>*(*C*, *<sup>G</sup>θ<sup>Z</sup>* (*Z*, *C*)). The latter is forced to be high, penalizing the GAN loss in Equation (1) with the variational lower bound *LI*(*G,Q*), defined by:

$$L\_I(G, \mathbb{Q}) = \mathbb{E}\_{\mathbb{C} \sim \mathbb{P}\_{\mathbb{C}} \times \mathbb{V}\_{\mathcal{T}}}[\ln Q(\mathcal{C}|\mathcal{X})] + \mathbb{S}(\mathcal{C}) = \mathbb{E}\_{\mathbb{X} \sim \mathbb{P}\_{\mathbb{C}}} \mathbb{E}\_{\mathbb{C} \sim \mathbb{P}\_{\mathbb{C}} \mid \mathcal{X}} \left[ \ln q\_{\mathcal{C}|\mathcal{X}} \right] + \mathbb{S}(\mathcal{C}) \tag{7}$$

with *qC*|*<sup>X</sup>* being the probability distribution approximating the real unknown posterior probability distribution *pC*|*<sup>X</sup>* (and represented by the neural network *<sup>Q</sup>Z*). *LI*(*G*, *<sup>Q</sup>*) can be easily approximated via Monte Carlo simulation, and maximized with the respect to to *qC*|*<sup>X</sup>* and *pG* via reparametrization trick [13].

$$V\_{\text{InfoGAN}}(D, G, Q) = V(D, G) - \lambda L\_l(G, Q) \tag{8}$$

### **3. Methods**

With the purpose of learning a semantically meaningful and disentangled representation of the SHM time-histories, we adopted in this study the architecture called RepGAN, originally proposed in [6]. RepGAN is based on an encoder-decoder structure (both represented by deep CNNs made of stacked 1D convolutional blocks), with a latent space *Z* = [*C*, *S*, *N*]. *C* ∈ [0, 1] *dC* a categorical variable representing the damage class(es), with *C* ∼ *pC* which is generally chosen as a categorical distribution over *dC* classes, i.e., *pC* <sup>=</sup> Cat(*dC*). *<sup>S</sup>* <sup>∈</sup> <sup>R</sup>*dS* is a continuous variable of dimension *dS*, with *<sup>S</sup>* <sup>∼</sup> *pS*, generally *pS* <sup>=</sup> <sup>N</sup> (**0**,I) or the uniform distribution *pS* <sup>=</sup> <sup>U</sup>(−1, 1). Finally, *<sup>N</sup>* <sup>∈</sup> <sup>R</sup>*dN* is a random noise of *dN* independent components, with *<sup>N</sup>* <sup>∼</sup> *pN*, generally *pN* ∼ N (**0**,I). RepGAN adopts the conceptual frameworks of VAEs and InfoGAN, combining the learning of two representations *x* → *z*ˆ → *x*ˆ and *z* → *x*ˆ → *z*ˆ, respectively. The *x* → *z*ˆ → *x*ˆ scheme must learn to map multiple data instances *<sup>x</sup>*(*i*) into their images (via encoder *<sup>F</sup>θ<sup>X</sup>* ) in a latent manifold *z***ˆ** (*i*) <sup>=</sup> *<sup>F</sup>θ<sup>X</sup>* (*x*(*i*)) and back into a distinct instance in data space *<sup>x</sup>***<sup>ˆ</sup>** (*i*) <sup>=</sup> *<sup>G</sup>θ<sup>Z</sup>* ◦ *<sup>F</sup>θ<sup>X</sup>* (*x*(*i*)) (via decoder *<sup>G</sup>θ<sup>Z</sup>* ), providing satisfactory results in reconstruction. *z* → *x*ˆ → *z*ˆ maps multiple data latent instances into the same data representation, in order to guarantee impressive generation and clustering performance. Combining the two surjective mappings, in RepGAN the two learning tasks *x* → *z*ˆ → *x*ˆ and *z* → *x*ˆ → *z*ˆ are performed together with shared parameters in order to obtain a bijective mapping *x* ↔ *z*.In practice, the training of *z* → *x*ˆ → *z*ˆ is iterated five times more than the *x* → *z*ˆ → *x*ˆ. This ability to learn a bidirectional mapping between the input space and the latent space is achieved through a symmetric adversarial process. The Empirical Loss function can be written as:

$$\begin{split} \mathbb{E}\_{S} &= \mathbb{D}\_{f\mathcal{S}} \Big( p\_{\mathcal{X}|(\mathbb{C},S,N)} || p\_{X} \Big) + \mathbb{D}\_{f\mathcal{S}} \Big( q\_{\mathcal{C}|X} || p\_{\mathcal{C}} \Big) + \mathbb{D}\_{f\mathcal{S}} \Big( q\_{\mathcal{S}|X} || p\_{\mathcal{S}} \Big) + \mathbb{D}\_{f\mathcal{S}} \Big( q\_{\mathcal{N}|X} || p\_{N} \Big) \\ &- \mathbb{E}\_{p\_{\mathcal{C}}} \Big[ \mathbb{E}\_{p\_{\mathcal{X}|\mathcal{C}}} \Big[ \ln q\_{\mathcal{C}|X} \Big] \Big] - \mathbb{E}\_{p\_{\mathcal{S}}} \Big[ \mathbb{E}\_{p\_{\mathcal{X}|\mathcal{S}}} \Big[ \ln q\_{\mathcal{S}|X} \Big] \Big] - \mathbb{E}\_{p\_{\mathcal{X}}} \Big[ \mathbb{E}\_{\mathbb{q}|\mathcal{C},S;N|X} \Big[ \ln p\_{X|(\mathcal{C},S,N)} \Big] \Big] \end{split} \tag{9}$$

with the terms:


are introduced in order to constrain a deterministic and injective encoding mapping (see Appendix B). On the other hand, the term

• <sup>−</sup>E*pX* <sup>E</sup>*<sup>q</sup>*(*C*,*S*,*N*)|*<sup>X</sup>* ln *pX*|(*C*,*S*,*N*) .

penalizes the learning scheme, in order to minimize the conditional entropy <sup>S</sup>(*X*|(*C*, *<sup>S</sup>*, *<sup>N</sup>*)), i.e., in order to grant a good reconstruction.

Following the original RepGAN formulation:

	- *<sup>X</sup>*<sup>ˆ</sup> <sup>|</sup>*<sup>S</sup>* ln *qS*<sup>ˆ</sup>|*<sup>X</sup>* the reparametrization trick (structuring the *S* branch of the encoder-decoder structure as a VAE, see [5]).

Finally, <sup>E</sup>*pC* E*p <sup>X</sup>*<sup>ˆ</sup> <sup>|</sup>*<sup>C</sup>* ln *qC*<sup>ˆ</sup>|*<sup>X</sup>* is maximized in a supervised way, considering the actual class of labeled signals *x*(*i*): *x* (*i*) *<sup>d</sup>* corresponding to a damaged structure and *x* (*i*) *<sup>u</sup>* to an undamaged one, respectively. RepGAN provides an informative and disentangled latent space associated with the damage class *C*. The most significant aspect of the approach is the efficiency in generating reasonable signals for different damage states only on the basis of undamaged recorded or simulated structural responses. Both generators *<sup>F</sup>θ<sup>X</sup>* , *<sup>G</sup>θ<sup>Z</sup>* and discriminators *Dω<sup>X</sup>* , *Dω<sup>C</sup>* , *Dω<sup>S</sup>* and *Dω<sup>N</sup>* are parametrized via 1D CNN (and strided 1D CNN), following [8]. Our RepGAN model has been designed using the Keras API, and trained employing a Nvidia Tesla K40 GPU (on the supercomputer *Ruche*, the cluster of the Mésocentre Moulon of Paris Saclay University).

### **4. Results and Discussion**

In the following, a case study is considered in order to prove the ability of the new architecture to achieve the three fundamental tasks of semantic generation, clustering and reconstruction. The reference example is a shear building subject to an earthquake ground motion whose signals are taken from the STEAD seismic database [14]. STEAD [14] is a high-quality, large-scale, and global data set of local earthquake and non-earthquake signals recorded by seismic instruments. In this work, local earthquake wave forms (recorded at local distances within 350 km of earthquakes) have been considered. Seismic data are constituted by three wave forms of 60 s duration, recorded in east–west, north–south, and up-dip directions, respectively. The structure is composed of 39 storeys. The mass and the stiffness of each floor, in undamaged conditions, are, respectively, *<sup>m</sup>* = 625 × 103 kg and *<sup>k</sup>* = 1 <sup>×</sup> 109 *kN <sup>m</sup>* . Damage is simulated through the degradation of stiffness. In the present case, the stiffness reduction has been set equal to 50% of the above mentioned value. The structural response of the system is evaluated considering one degree-of-freedom (dof) per floor. To take into account damping effects, a Rayleigh damping model has been considered.

The following results have been obtained considering 100 signals in both undamaged and damaged conditions for a total of 200 samples, with separated training and validation data sets. Each signal is composed of 2048 time steps with dt = 0.04 s. The training process has been performed over 2000 epochs. The reconstruction capability of the proposed network has been evaluated through the Goodness-of-Fit (GoF) criteria [15] where both the fit in Envelope (EG) and the fit in Phase (FG) are measured. An example is shown in Figure 1. The values 9.17 and 9.69, respectively, related to EG and PG testify the excellent reconstruction quality.

**Figure 1.** Time–Frequency Goodness-of-Fit criterion: the black line represents the original time-histories *x*(*i*) while the red time history depicts the result of the RepGAN reconstructions *G<sup>Z</sup>* ◦ *F<sup>X</sup> x*(*i*) . GoF is evaluated between 0 and 10: the higher the score, the better is the reconstruction. Frequency Envelope Goodness (FEG), Time–Frequency Envelope Goodness (EG), Time Envelope Goodness (TEG), Frequency Phase Goodness (FPG), Time–Frequency Phase Goodness (PG) and Time Phase Goodness (TPG).

The capability of reproducing signals for different damage scenarios can be appreciated from Figure 2 which presents the original structural response (black) and the corresponding generated one (orange) in both undamaged (left panel in Figure 2) and damaged (right panel in Figure 2) conditions. Regarding the classification capability, the classification report and the confusion matrix in Figure 3 highlight the fact that the model is able to correctly assign the damage class to the considered time histories.

**Figure 2.** Examples of reconstructed signals for undamaged (**left**) and damaged (**right**) time-histories. The black lines represent the original time-histories *x* (*i*) *<sup>u</sup>* and *x* (*i*) *<sup>d</sup>* , respectively. The orange time histories represent the result of the RepGAN reconstructions *G<sup>Z</sup>* ◦ *F<sup>X</sup> x* (*i*) *u* and *G<sup>Z</sup>* ◦ *F<sup>X</sup> x* (*i*) *d* , respectively. The proposed examples represent the normalized displacement of the 1*st* floor of the building in object.

**Figure 3.** Evaluation of the classification ability of the model. On the **left panel**, precision, recall, f1-score and accuracy values are reported. A precision score of 1.0 for a class C means that every item labelled as belonging to class C does indeed belong to class C, whereas a recall of 1.0 means that every item from class C was labelled as belonging to class C. F1-score is the harmonic mean of the precision and recall. Accuracy represents the proportion of correct predictions among the total number of cases examined. On the **right panel**, the confusion matrix allows to visualize the performance of the model: each row of the matrix represents the instances in the actual class, while each column depicts the instances in the predicted class.

### **5. Conclusions**

In this paper, we introduce a SHM method based on a deep Generative Adversarial Network. Trained on synthetic time histories that represent the structural response of a multistory building in both damaged and undamaged conditions, the new model achieves high classification accuracy (Figure 3) and satisfactory reconstruction quality (Figures 1 and 2), resulting in a good bidirectional mapping between the input space and the latent space. However, the major innovation of the proposed method is the ability to generate reasonable signals for different damage states, based only on undamaged recorded or simulated structural responses. As a consequence, real recordings linked to damaged conditions are not requested. In our future work, we would like to extend our approach to real-time data. We will further consider a dataset constituted by a far larger number of time histories.

**Author Contributions:** Conceptualization, G.C., L.R., F.G., S.M. and A.C.; methodology, G.C. and F.G.; software, G.C., L.R. and F.G.; validation, G.C. and F.G.; formal analysis, G.C. and F.G.; investigation, G.C. and F.G.; resources, F.G.; data curation, G.C., L.R. and F.G.; writing—original draft preparation, G.C., L.R., M.T., F.G., S.M., A.M. and A.C.; writing—review and editing, G.C. and F.G.; visualization, G.C. and F.G.; supervision, F.G., S.M. and A.C.; project administration, F.G. and A.C.; Funding acquisition, G.C., F.G. and A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data generated during the study are available from the corresponding author upon reasonable request.

**Acknowledgments:** The training and testing of the neural network has been performed exploiting the supercomputer resources of the Mésocentre Moulon (http://mesocentre.centralesupelec.fr, last accessed 14 February 2022), the cluster of CentraleSupélec and ENS Paris-Saclay, hosted within the Paris-Saclay University and funded by the Contrat Plan État Région (CPER). This work has been developed thanks to the scholarship "Tesi all'estero—a.y. 2020/2021—second call" funded by Politecnico di Milano.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A. Shannon's Entropy**

• Shannon's entropy for a probability density function *pX*:

$$\mathbb{S}(\mathbf{X}) = \mathbb{S}(p\_{\boldsymbol{\chi}}) = \mathbb{E}\_{\mathbf{X} \sim p\_{\boldsymbol{\chi}}} \left[ \ln \frac{1}{p\_{\boldsymbol{\chi}}} \right] = -\mathbb{E}\_{\mathbf{X} \sim p\_{\boldsymbol{\chi}}} [\ln p\_{\boldsymbol{\chi}}] \ge 0$$

• Conditional Shannon's entropy for *X* and *Z*:

$$\mathbb{E}(X|\mathcal{Z}) = \mathbb{E}\_{\mathcal{Z}\sim p\_{\mathcal{Z}}}[\mathbb{S}(p\_{\mathcal{X}|\mathcal{Z}})] = \mathbb{E}\_{(\mathcal{X},\mathcal{Z})\sim p\_{\mathcal{X}\mathcal{Z}}} \left[ \ln \left( \frac{1}{p\_{\mathcal{X}|\mathcal{Z}}} \right) \right]$$

$$\mathbb{B}(X, Z) = \mathbb{B}(Z|X) + \mathbb{B}(X) = \mathbb{B}(X|Z) + \mathbb{B}(Z)$$

• Cross-entropy:

$$\mathbb{E}(p\_{\boldsymbol{X}\boldsymbol{Z}}||q\_{\boldsymbol{X}\boldsymbol{Z}}) = \mathbb{E}\_{(\boldsymbol{X},\boldsymbol{Z})\sim p\_{\boldsymbol{X}\boldsymbol{Z}}}\left(\ln\left(\frac{1}{q\_{\boldsymbol{X}\boldsymbol{Z}}}\right)\right) = \mathbb{E}\_{\boldsymbol{X}\sim p\_{\boldsymbol{X}}}\left[\mathbb{E}\_{\boldsymbol{Z}\sim p\_{\boldsymbol{Z}|\boldsymbol{X}}}\left[\ln\left(\frac{1}{q\_{\boldsymbol{X}\boldsymbol{Z}}}\right)\right]\right]$$

• Given a data set of identically independent distributed (i.i.d.) samples S = *x*(*i*) *<sup>N</sup> i*=1 , the true yet unknown probability *pX* of extracting an instance *x*(*i*) can be approximated by the likelihood *pθ<sup>X</sup> x*(*i*) *<sup>N</sup> i*=1 , whose entropy is

$$\mathbb{S}(p\_{\theta\chi}) = -\ln p\_{\theta\chi} \left( \left\{ \mathbf{x}^{(i)} \right\}\_{i=1}^{N} \right) = -\sum\_{i}^{N} \ln p\_{\theta\chi} \left( \mathbf{x}^{(i)} \right).$$

### **Appendix B. Kullback–Leibler Divergence**

• Kullback-Liebler divergence (non-symmetric):

$$\mathbb{D}\_{\mathbf{KL}}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}}) = \mathbb{E}\_{(\mathbf{X},\mathbf{Z})\sim p\_{\mathbf{X}\mathbb{Z}}} \left[ \ln \left( \frac{p\_{\mathbf{X}\mathbb{Z}}}{q\_{\mathbf{X}\mathbb{Z}}} \right) \right] = -\mathbb{S}(p\_{\mathbf{X}\mathbb{Z}}) + \mathbb{S}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}}) \le \mathbb{S}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}})$$

$$\mathbb{D}\_{\mathbf{KL}}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}}) + \mathbb{S}(\mathbf{X}) = \underbrace{-\mathbb{S}(\mathbf{X}|\mathbf{Z})}\_{\mathbb{E}\mathbf{x}\sim p\_{\mathbf{X}}\left[\mathbb{S}(p\_{\mathbf{Z}\mathbb{Z}}|\|q\_{\mathbf{X}\mathbb{Z}}) \le \mathbb{S}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}}) \right]}\_{\mathbb{E}\mathbf{x}\sim p\_{\mathbf{X}}\left[\mathbb{S}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}}) \le \mathbb{S}(p\_{\mathbf{X}\mathbb{Z}}||q\_{\mathbf{X}\mathbb{Z}}) \right]}$$

<sup>D</sup>*KL*(*pXZqXZ*) <sup>+</sup> <sup>S</sup>(*X*) <sup>≤</sup> <sup>D</sup>*KL*(*pXZqXZ*)

• Mutual Information between *X* and *X*|*Z*:

$$I(\mathbf{X}, \mathbf{Z}) = \mathbb{S}(\mathbf{X}) - \mathbb{S}(\mathbf{X}|\mathbf{Z}) \ge 0$$

If *pX*|*<sup>Z</sup>* = *pX* ((*X*, *<sup>Z</sup>*) are independent) then *<sup>I</sup>*(*X*, *<sup>Z</sup>*) = 0. If *pX*|*<sup>Z</sup>* = *<sup>δ</sup>*(*<sup>Z</sup>* − *<sup>f</sup>*(*X*)) with *f* deterministic, then *I*(*X*, *Z*) = max *<sup>X</sup>*,*<sup>Z</sup> <sup>I</sup>*(*X*, *<sup>Z</sup>*) = <sup>S</sup>(*X*).

• <sup>S</sup>(*Z*|*X*) = <sup>−</sup>E*Z*∼*pZ* <sup>E</sup>*X*∼*pX*|*<sup>Z</sup>* ln *pZ*|*<sup>X</sup>* = <sup>=</sup> <sup>−</sup>E*X*∼*pX* <sup>E</sup>*Z*∼*pZ*|*<sup>X</sup>* ln *pZ*|*<sup>X</sup> qZ*|*<sup>X</sup>* <sup>−</sup> <sup>E</sup>*Z*∼*pZ* <sup>E</sup>*X*∼*pX*|*<sup>Z</sup>* ln *qX*|*<sup>Z</sup>* = <sup>=</sup> <sup>−</sup>E*X*∼*pX* <sup>D</sup>*KL pZ*|*X*||*qZ*|*<sup>X</sup>* <sup>−</sup> <sup>E</sup>*Z*∼*pZ* <sup>E</sup>*X*∼*pX*|*<sup>Z</sup>* ln *qX*|*<sup>Z</sup>* ≤ ≤ −E*Z*∼*pZ* <sup>E</sup>*X*∼*pX*|*<sup>Z</sup>* ln *qX*|*<sup>Z</sup>* 

### **Appendix C. Generative Adversarial Networks (GAN)**

	- **–** *P*(*C* = "*d*") = *α*; *P*(*C* = "*g*") = 1 − *α*

$$-\quad P(\mathcal{C} = \text{``}d\text{''}|\text{x}^{(i)}) = D(\text{x}^{(i)})$$

**–** *<sup>P</sup>*(*<sup>C</sup>* <sup>=</sup> "*d*"|*x*(*i*)) = <sup>1</sup> <sup>−</sup> *<sup>D</sup>*(*G*(*z*(*i*)))

$$\mathbb{S}(\mathbb{C}|\mathbf{X}) = -\mathbb{E}\_{\mathbf{X}\sim\mathbb{P}\_{\mathbf{X}}} \left[ \mathbb{E}\_{\mathbb{C}\sim\mathbb{P}\_{\mathbf{C}|\mathbf{X}}} \ln \left( p\_{\mathbb{C}|\mathbf{X}} \right) \right] = -\mathbb{E}\_{\mathbb{C}\sim\mathbb{P}\_{\mathbf{C}}} \left[ \mathbb{E}\_{\mathbf{X}\sim\mathbb{P}\_{\mathbf{X}} \mid \mathbb{C}} \left[ \ln \left( p\_{\mathbb{C}|\mathbf{X}} \right) \right] \right]$$

$$\mathbb{S}(\mathbb{C}|\mathbf{X}) = -a \mathbb{E}\_{\mathbf{X}\sim\mathbb{P}\_{\mathbf{X}|\mathbb{C}\sim\mathbb{P}\_{\mathbf{X}}}} \left[ \ln \left( p\_{\mathbb{C}\sim\mathbb{P}\_{\mathbf{C}} \mid \mathbf{X}} \right) \right] - (1-a) \mathbb{E}\_{\mathbf{X}\sim\mathbb{P}\_{\mathbf{X}|\mathbb{C}\sim\mathbb{P}\_{\mathbf{X}}}} \left[ \ln \left( p\_{\mathbb{C}\sim\mathbb{P}\_{\mathbf{X}} \mid \mathbf{X}} \right) \right]$$

$$\mathbb{S}(\mathbb{C}|\mathbf{X}) = -a \mathbb{E}\_{\mathbf{X}\sim\mathbb{P}\_{\mathbf{X}}} [\ln(D(\mathbf{X}))] - (1-a) \mathbb{E}\_{\mathbf{Z}\sim\mathbb{P}\_{\mathbf{Z}}} \ln(1 - D(\mathbf{G}(\mathbf{Z})))$$

For tuneable conditional probability distributions *Dω*:

$$\max I(X,\mathbb{C}) \le \mathbb{S}(\mathbb{C}) + \max -\mathbb{S}(\mathbb{C}|X) = \mathbb{S}(\mathbb{C}) + \min \mathbb{S}(\mathbb{C}|X)$$

$$\max I(\mathbf{X}, \mathbf{C}) \le \mathbb{S}(\mathbf{C}) + \min\_{G} \max\_{D} a \mathbb{E}\_{\mathbf{X} \sim p\_{\mathbf{X}}} [\ln(D(\mathbf{X}))] + (1 - a) \mathbb{E}\_{\mathbf{Z} \sim p\_{\mathbf{Z}}} \ln(1 - D(\mathbf{G}(\mathbf{Z}))) $$

Thus, minimizing <sup>S</sup> <sup>+</sup> min*<sup>G</sup>* max*<sup>D</sup>* <sup>S</sup>(*C*|*X*) represents an upper bound for the Mutual Information between *<sup>C</sup>* and *<sup>X</sup>*, which is maximized by maximizing <sup>−</sup>S(*C*|*X*). For an optimum training, *D* must not be able to discriminate between *x*(*i*) and *x***ˆ** (*i*) , therefore *α* = <sup>1</sup> 2 .

### **Appendix D. Standard Autoencoder**

In the standard Autoencoder formulations [16,17], *F* and *G* are trained by maximizing *I*(*X*, *Z*), namely:

$$\{\mathbf{F}, \mathbf{G}\} = \arg\max\_{\mathbf{F}, \mathbf{G}} I(\mathbf{X}, \mathbf{Z}) = \arg\min\_{\mathbf{F}, \mathbf{G}} H(\mathbf{X}|\mathbf{Z}) = \arg\min\_{\mathbf{F}, \mathbf{G}} \mathbb{E}\_{\mathbf{X} \sim p\_{\mathbf{X}}} \left[ \mathbb{E}\_{\mathbf{Z} \sim p\_{\mathbf{Z}|\mathbf{X}}} \left[ \ln \left( \frac{1}{p\_{\mathbf{X}|\mathbf{Z}}} \right) \right] \right] \tag{A1}$$

If the encoder and decoder are parametrized as neural networks, respectively, as *<sup>F</sup>θ<sup>X</sup>* and *<sup>G</sup>θ<sup>Z</sup>* , the AE loss can be approximated by the Empircal Loss:

$$\{\theta\_{X'}\theta\_{Z}\} = \arg\max\_{\theta\_{X'}\theta\_{Z}} \sum\_{i=1}^{N} \left[ \ln \left( p\_{X|Z} \left( \mathbf{x}^{(i)} | \mathbf{Z} = F\_{\theta\_{X}} \left( \mathbf{x}^{(i)} \right) \right) \right) \right] \tag{A2}$$

Given the fact that the Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line, the Empirical Loss in Equation (A2) can be maximized by the Empirical Loss with *pX*|*<sup>Z</sup>* = N *Gθ<sup>Z</sup>* (*Z*), *σ*2 I :

$$\{\boldsymbol{\theta}\_{\mathcal{X}}, \boldsymbol{\theta}\_{\mathcal{Z}}\} = \arg\max\_{\boldsymbol{\theta}\_{\mathcal{X}}, \boldsymbol{\theta}\_{\mathcal{Z}}} \sum\_{i=1}^{N} \frac{1}{2\sigma^{2}} \|\mathbf{x}^{(i)} - \mathbf{G}\_{\boldsymbol{\theta}\_{\mathcal{Z}}} \circ \mathbf{F}\_{\boldsymbol{\theta}\_{\mathcal{X}}} \left(\mathbf{x}^{(i)}\right)\|^{2} + \frac{d\_{\mathcal{Z}}}{2} \ln\left(2\pi\sigma^{2}\right) \tag{A3}$$

### **References**


### *Proceeding Paper* **Parallel WSAR for Solving Permutation Flow Shop Scheduling Problem †**

**Adil Baykaso ˘glu and Mümin Emre ¸Senol \***

Faculty of Engineering, Department of Industrial Engineering, Dokuz Eylül University, 35220 Izmir, Turkey; adil.baykasoglu@deu.edu.tr

**\*** Correspondence: emre.senol@deu.edu.tr; Tel.: +90-232-301-76-21

† Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Abstract:** This study presents a coalition-based parallel metaheuristic algorithm for solving the Permutation Flow Shop Scheduling Problem (PFSP). This novel approach incorporates five different single-solution-based metaheuristic algorithms (SSBMA) (Simulated Annealing Algorithm, Random Search Algorithm, Great Deluge Algorithm, Threshold Accepting Algorithm and Greedy Search Algorithm) and a population-based algorithm (Weighted Superposition Attraction–Repulsion Algorithm) (WSAR). While SSBMAs are responsible for exploring the search space, WSAR serves as a controller that handles the coalition process. SSBMAs perform their searches simultaneously through the MATLAB parallel programming tool. The proposed approach is tested on PFSP against the state-of-the-art algorithms in the literature. Moreover, the algorithm is also tested against its constituents (SSBMAS and WSAR) and its serial version. Non-parametric statistical tests are organized to compare the performances of the proposed approach statistically with the state-of-the-art algorithms, its constituents and its serial version. The statistical results prove the effectiveness of the proposed approach.

**Keywords:** parallel computing; coalition; permutation flow shop scheduling problem

### **1. Introduction**

Optimization constitutes finding the solution that gives the best result in the solution space of a problem. In other words, it is used to achieve the best solutions under the given conditions. Today, different optimization algorithms are used to solve many optimization problems [1–4]. These algorithms can be classified into two groups: exact algorithms and approximate algorithms. Exact algorithms search the entire search space and try every possible alternative solution. Even if they provide the optimal solution, they need a long runtime, especially as the size of the problem grows. On the other hand, approximate algorithms perform their solution space searches through some logical operators. Although they do not guarantee an optimal solution, they provide near-optimal solutions in reasonable time. Through this superiority, most researchers prefer approximate algorithms in optimization problem solving.

Approximate algorithms are classified into two groups: heuristic and metaheuristic algorithms. While a heuristic algorithm's structure is problem specific, a metaheuristic algorithm's structure is generic, allowing it to be applied to any optimization problem. Metaheuristic algorithms are more flexible than heuristic algorithms in that they can handle any problem. They can also provide better solutions to optimization problems than heuristic algorithms. Metaheuristic algorithms, on the other hand, may have drawbacks such as an early convergence and poor speed, and a metaheuristic algorithm may be superior to other metaheuristic algorithms.

The No Free Lunch Theorem [5] must also be mentioned at this point in order to underline the logic for integrating diverse search techniques within the framework of

**Citation:** Baykaso ˘glu, A.; ¸Senol, M.E. Parallel WSAR for Solving Permutation Flow Shop Scheduling Problem. *Comput. Sci. Math. Forum* **2022**, *2*, 10. https://doi.org/10.3390/ IOCA2021-10901

Academic Editor: Frank Werner

Published: 26 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

creating successful optimization methods. According to this theorem, no optimization method beats all remaining solution processes for all optimization problems, and there is no statistical difference between the performances of different metaheuristics when all optimization problems are solved [6]. This is a result that implies that the computing cost of finding a solution for optimization problems is the same for any solution technique. This theorem can be a base point to combine various metaheuristic algorithms to tackle optimization problems more effectively. It will take substantial time to combine the various metaheuristic algorithms and run them sequentially [7]. Most of the metaheuristic algorithms are designed to run sequentially, and the parallel execution of metaheuristic algorithms can increase solution quality while shortening the run time [8,9].

This research is the outcome of an attempt to combine several metaheuristics in order to reveal a high level of synergy and, as a result, deliver a sufficient performance while solving optimization problems.

This paper provides a new framework for addressing the Permutation Flow Shop Scheduling Problem (PFSP) based on a combination of diverse metaheuristics in a parallel computing environment. To implement the multiple metaheuristic algorithms in parallel, a new optimization system combining different single-solution-based metaheuristic algorithms (SSBMA) (Simulated Annealing Algorithm (SA), Random Search Algorithm (RS), Great Deluge Algorithm (GD), Threshold Accepting Algorithm (TA) and Greedy Search Algorithm (GS)) and a controller (Weighted Superposition Attraction algorithm) is designed.

The remainder of the paper is organized as follows: In Section 2, parallel computing is explained and, in Section 3, the proposed optimization approach (p-WSAR) is introduced. In Section 4, PFSP is presented and experimental results are reported. Finally, concluding remarks are presented in Section 5.

### **2. Parallel Computing**

Parallel computing is a type of computing architecture in which many processors execute or process an application or computation simultaneously. Parallel computing helps us carry out large computations by dividing the workload among multiple processors, all working at the same time. Most supercomputers use parallel computing principles to work. Parallel computing is also known as parallel processing. For this to happen, we need to properly empower resources to execute concurrently. Parallel computing can reduce the solution time, increase energy efficiency in our application, and allow us to tackle bigger problems. It is a computational technique developed to solve complex problems faster and more efficiently [10,11].

### **3. p-WSAR Algorithm**

The p-WSAR algorithm is introduced in this section. p-WSAR is comprised of five SS-BMAs, namely, Random Search (RS) [12], Threshold Accepting (TA) [13], Great Deluge [14], Simulated Annealing (SA) [15], Greedy Search (GS) [16] and a controller WSAR [17]. p-WSAR mainly has three stages, namely, the search stage, information-sharing stage and reproduction stage. In the search stage, all of the SSBMAs explore the solution space in parallel. After exploring the solution space, they share their findings with other SSBMAs through the WSAR algorithm superposition principle. One can see the details of the superposition principle in the following study, [17]. Then, all SSBMAs move through their next positions. In the last stage, SSBMAs' parameters are reproduced. This iterative process lasts until the termination criteria are met. Notations of the p-WSAR algorithm are given below.

The main stages of the WSAR algorithm and flow chart of the algorithm are depicted in Figures 1 and 2, respectively.


**Figure 1.** Main steps of p-WSAR.

**Figure 2.** Flow chart of the p-WSAR algorithm.

### **4. Permutation Flow Shop Scheduling Problem and Experimental Results**

In this section, PFSP is first introduced, and then the experimental results are given.

### *4.1. Permutation Flow Shop Scheduling Problem (PFSP)*

The PFSP has a set of m machines and a group of n jobs. Every job is made up of m operations that must be accomplished on several machines. For each of the n jobs, the machine ordering for the process sequence is the same. Each machine may only conduct one operation at a time, and all jobs are completed sequentially according to a permutation schedule. It is assumed that no machine problems would occur during the manufacturing stage, and thus all of the machines will be ready to process activities. Operation preemption is also disallowed. The goal is to design a schedule that reduces the total job completion time (makespan) while adhering to the preceding assumptions.

A permutation-type n-dimensional real-number vector can be utilized in the PFSP to determine the job process sequence. After identifying the job order, the makespan can be calculated using the "completion time matrix approach", which Onwubolu and Davendra proposed [18].

### *4.2. Experimental Results*

The p-WSAR's performance in PFSP was evaluated using the Taillard [19] benchmark instances, which are divided into 12 groups of problems. Five of these problems were selected to test p-WSAR's performance against some state-of-the-art algorithms and WSAR. These problems' size (PS: (J × M) and well-known solutions (WKS) are given in Table 1. The best, the worst and the average performances of 30 runs of each algorithm were recorded. In all of the instances, p-WSAR was able to find better solutions than other algorithms.


**Table 1.** Comparison of p-WSAR with some state-of-the-art algorithms and WSAR.

In addition, the performance of p-WSAR was statistically compared with the other algorithms through non-parametric statistical tests by using average values. Table 2 indicates that (based on the Friedman test results) p-WSAR surpasses the other algorithms. Furthermore, according to the Wilcoxon signed-rank test, the difference between p-WSAR and HPSO is found to be negligible as the *p* > 0.1. In addition, p-WSAR performed slightly better than TLBO, NPSO, and WSAR, as *p* < 0.1.


**Table 2.** Non-parametric test results on Taillard Instances.

Another computational study was conducted to test the performance of p-WSAR with its constituents (SSBMAs) in terms of solution quality. The results are presented in Tables 3 and 4. According to the computational results, p-WSAR' performance is far beyond that of its constituents (SSBMAs). Additionally, in respect of the non-parametric statistical tests, p-WSAR is able to produce more effective results than its constituents. Additionally, there is statistically significant difference between the performance of the p-WSAR and its constituents since *p*-value is < 0.1.

**Table 3.** Comparison of p-WSAR with SSBMAs.



**Table 4.** Non-parametric test results on Taillard Instances p-WSAR vs. SSBMAs.

### **5. Conclusions**

In this research, multiple metaheuristic algorithms are combined to build a coalition for tackling PFSP. The suggested methodology uses WSAR as the controller to run multiple single solution-based metaheuristic algorithms (SSBMAs) in parallel. The suggested method is put to the test on some of the Taillard instances. According to the results, the proposed approach is capable of finding the best solutions. Furthermore, the proposed approach surpasses its constituents. The proposed approach is supported by the computational results. Applying the proposed approach to the other type of problems is planned for a future study.

**Author Contributions:** Conceptualization, A.B. and M.E. ¸S.; methodology, A.B. and M.E. ¸S.; software, M.E. ¸S.; validation, A.B., M.E. ¸S.; formal analysis, A.B.; investigation, A.B. and M.E. ¸S.; resources, A.B.; data curation, A.B.; writing—original draft preparation, A.B. and M.E. ¸S.; writing—review and editing, A.B. and M.E. ¸S.; visualization, A.B. and M.E. ¸S.; supervision, A.B.; project administration, A.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **Advances in Crest Factor Minimization for Wide-Bandwidth Multi-Sine Signals with Non-Flat Amplitude Spectra †**

**Helena Althoff, Maximilian Eberhardt, Steffen Geinitz \* and Christian Linder**

Fraunhofer Institute for Casting, Composite and Processing Technology (IGCV), 86159 Augsburg, Germany; helena.althoff@igcv.fraunhofer.de (H.A.); maximilian.eberhardt@igcv.fraunhofer.de (M.E.); christian.linder@igcv.fraunhofer.de (C.L.)

**\*** Correspondence: steffen.geinitz@igcv.fraunhofer.de; Tel.: +49-821-90678-222

† Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Abstract:** Multi-sine excitation signals give spectroscopic insight into fast chemical processes over bandwidths from 101 Hz to 107 Hz. The crest factor (CF) determines the information density of a multi-sine signal. Minimizing the CF yields higher information density and is the goal of the presented work. Four algorithms and a combination of two of them are presented. The first two algorithms implement different iterative optimizations of the amplitude and phase angle values of the signal. The combined algorithm alternates between the first and second optimization algorithms. Additionally, a simulated annealing approach and a genetic algorithm optimizing the CF were implemented.

**Keywords:** multi-sine signals; crest factor; crest factor optimization; iterative optimization; dielectric analysis; simulated annealing; genetic algorithms

### **1. Introduction**

Dielectric analysis (DEA) is a well-known method for the characterization of material behavior and a technology for monitoring chemical processes, e.g., the curing of thermosetting resins [1], the curing of adhesives [2], and the polymerization process of polyamide 6 [3]. A more general term is electrical impedance spectroscopy (EIS) [4]. In the context of biological processes, it is also referred to as bio-impedance spectroscopy (BIS) [5].

Independent of the application, DEA compares the phase and amplitude of a sinusoid excitation signal applied to a sensor in contact with a specimen with its response signal. Changes in phase and amplitude over time give an indication of the state of the specimen. Ongoing chemical reactions creating new molecular structures result in changing dielectric behavior which can further be used to correlate other physical parameters or states, e.g., the viscosity or the state of cure.

In addition to being a characterization method, DEA has the benefit of being applicable to process monitoring and process control [6,7], thus showing great potential for inline quality monitoring solutions for adhesive part assembly or 3D printing using fast-curing resins [8].

Historically, in order to achieve full spectroscopic results, sweeping approaches using single-frequency sine waves were used. Especially for fast processes that take place in a few seconds or less, new approaches are needed to achieve spectroscopic information. Multisine signals provide the means to achieve the desired results. Nevertheless, using multi-sine excitation signals with only few frequencies for process monitoring and relying on absolute values drastically limits the usage in industrial applications, as the measurement principle is prone to disturbances from external influences, e.g., contamination or parasitic induction. Furthermore, the use of only a small number of frequencies limits the information necessary to derive a complete picture of the processes or effects occurring, not only in the time domain but also in the frequency domain.

**Citation:** Althoff, H.; Eberhardt, M.; Geinitz, S.; Linder, C. Advances in Crest Factor Minimization for Wide-Bandwidth Multi-Sine Signals with Non-Flat Amplitude Spectra. *Comput. Sci. Math. Forum* **2022**, *2*, 11. https://doi.org/10.3390/ IOCA2021-10908

Academic Editor: Frank Werner

Published: 28 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In [3,8], an approach was shown using multi-sine excitation signals with up to 20 frequencies incorporated giving spectroscopic insight into fast chemical processes for high bandwidths. With recent modifications, the system is now able to monitor bandwidths from 10<sup>2</sup> Hz to 10<sup>7</sup> Hz, resulting in a need for excitation signals with more than 20 frequencies distributed over the measurement bandwidth to provide sufficient spectral resolution.

To compare the generated signals objectively, metrics are needed which give insight into the signals and the information they contain. One commonly used metric is the crest factor (CF). The CF determines the information density of a multi-sine signal. Minimizing the CF yields higher information density and is the goal of the presented work.

### *1.1. Multi-Sine*

Large bandwidth impedance spectroscopy (IS) requires a dedicated/specific excitation signal. The main requirement toward these signals is that they allow for a combined analysis of the signals in the time and frequency domains. Multiple options have been reported for these applications in the literature over the past decades. The most commonly used signals are binary sequences such as maximum length binary sequences (MLBS) [9] or discrete interval binary sequences (DIBS) [10], chirp signals [11], and multi-sine signal signals [12]. Multi-sine signals for large-bandwidth impedance analysis offer several advantages over other signal types. They allow for a custom amplitude spectrum while having customizable excitation frequencies.

The signal is generated by adding up multiple sine waves, while each wave can be chosen with its particular frequency *fn*, amplitude *an*, and phase *ϕ<sup>n</sup>* according to the following equation:

$$u(t) = \sum\_{n=1}^{N} a\_n \cos(2\pi f\_n t + \varphi\_n). \tag{1}$$

### *1.2. Crest Factor*

A widely used metric to evaluate and compare multi-sine signals in the time domain is the use of the crest factor (CF). This metric shows how much amplitude is consumed by a signal to introduce a certain amount of energy into a system [13]. Higher values indicate harmonics, while low values for multi-sine signals imply no or little interferences between the specific frequencies. The CF is calculated as the ratio between the peak value of a signal and its effective (root mean square) value.

$$CF = \frac{\mathcal{U}\_{peak}}{\mathcal{U}\_{RMS}}.\tag{2}$$

For a signal in the time domain *s*(*t*) measured over a time interval [0; T], the *CF*(*s*) is calculated according to the following formula:

$$CF(s) = \frac{\max\_{t \in [0, T]} |s(t)|}{\sqrt{\frac{1}{T} \int\_0^T |s(t)|^2 dt}}. \tag{3}$$

### **2. State of the Art**

Current methods used for optimization of multi-sine signals are either analytical approaches for calculating the phase angles or iterative algorithms. The idea behind the analytical formulas is to control the crest factor (CF) by appropriately choosing the phases of the regarded components of the multi-sine signal. One of the first attempts to solve this problem was proposed by Schroeder [14]. His approach adapted the phase angles of the single multi-sine components according to the following formula:

$$\varphi\_{\rm mf} = \varphi\_0 - 2\pi \sum\_{n=0}^{m-1} (m-n) \frac{\left| a\_{\rm mf} \right|^2}{\sum\_{k=0}^{M-1} \left| a\_k \right|^2} \,\tag{4}$$

With *ai* being the amplitude of the *i*-th component, *m* = 1, ... , *M* − 1, and *ϕ*<sup>0</sup> ∈ [−*π*, *π*]. Schroeders approach was adopted by Newman [15] yielding a slightly different formula for optimal phase angles.

$$
\varphi\_n = \frac{\pi n^2}{N}, \ n = 0, 1, \dots, N - 1. \tag{5}
$$

A major difference between these two formulas is that Schroeder took the amplitude values of the different frequency components of the multi-sine signal into account, whereas Newman's method only uses the number of exciting frequencies. Therefore, Schroeder's scheme often gets better results in the case of non-constant amplitudes [13].

By now, further approaches to solve this problem analytically have been presented. In recent years, several formulas have been introduced by Ojarand [16]. These three equations Φ<sup>1</sup> *<sup>i</sup>* , <sup>Φ</sup><sup>2</sup> *<sup>i</sup>* , and <sup>Φ</sup><sup>3</sup> *<sup>i</sup>* calculate the phase angle for frequency component i. They are quite easy to calculate and behave well especially for sparse frequency distributions. In the first case, the formula also shows promising results for denser frequency distributions.

$$
\Phi\_i^1 = B\dot{r}^2,\\
\Phi\_i^2 = 180\frac{B}{\dot{l}},\\
\Phi\_i^3 = 180\frac{B}{\sqrt{\dot{l}}}.\tag{6}
$$

The parameter *B* can be freely chosen in between 0 and 180, and *i* stands for the currently regarded frequency component. The five formulas described above all behave differently from one another depending on the distribution of the frequencies. Nonetheless, it must be pointed out that these methods do not achieve acceptable results in terms of the CF.

Using iterative algorithms to optimize the phase angles values promises to be a more satisfying approach. Several versions of well-behaving algorithms have been presented in the last decades and yet they do not yield optimal solutions. These can so far only be provided by an exhaustive search of all possible phase combinations. Within the last few years, Ojarand proposed two different iterative algorithms [16,17].

The first one, presented in 2014, optimizes the phase angles by selectively searching through a given range of phase angles. Hence, this algorithm takes a lot of time to find suitable phases for each frequency. For 20 or more frequency components, it would takes days to optimize their phase angle, which makes this algorithm unsuitable for industrial applications.

In 2017, Ojarand et al. presented another algorithm trying to solve existing problems with their algorithms only finding local minima. This is a well-known problem that appears with iterative algorithms that minimize the CF of multi-sine signals [17]. Therefore, the main idea of their new method was to start the iteration with a fixed phase set, calculated by an analytical formula. After that, the multi-sine signal is modulated. Then, the main part of the algorithm starts, consisting of another iteration in which they first build the Fourier spectra and calculate the inverse discrete Fourier transformation (IDFT). Thereafter, the CF of the resulting signals needs to be calculated and compared to the currently lowest CF. In the case of an improvement of the CF, the currently optimal phase set gets updated. Subsequently, they use a logarithmic clipping function to clip the current multi-sine signal and then calculate the discrete Fourier transformation (DFT) for this signal. The last step is needed to obtain a new phase set from the DFT that may yield a lower CF. This iteration can be executed an arbitrary number of times. At the end, a new phase set is calculated by one of the analytic formulas, and then the whole process described above is repeated.

As mentioned before, the first algorithm achieves quite low CFs, because it selectively searches the whole given phase spectrum for a near-to-optimal phase angle combination. This comes at the cost of taking more and more time for an increased number of frequencies in the multi-sine signal. Therefore, this algorithm is not useful for the application discussed in this paper. The second algorithm on the other hand returns promising results, especially for signals whose frequencies are distributed over a rather small bandwidth.

### **3. Optimization Approaches**

In this section, three new iterative algorithms that minimize the CF are presented and subsequently tested to compare their performance in terms of CF minimization to the algorithm Ojarand presented in 2017.

### *3.1. Iterative–Stochastic Optimization*

The first new algorithm comprises two separate components, an iteratively operating algorithm and a stochastic algorithm. The iteratively working algorithm optimizes the CF of the currently regarded multi-sine signal by optimizing the phase angles.

An overview over the workflow of the algorithm is given in Figure A1 (Appendix A). The algorithm starts with calculating a first set of phases, e.g., Equation (6). In our implementation, we use the formula Φ<sup>2</sup> *<sup>i</sup>* (*k*) = 180 *B*/*i*. Then, each phase angle is regarded and optimized separately. To achieve this, each phase angle is increased once and decreased once. The multi-sine signal is modulated using the new phase angle, resulting in a new multi-sine signal differing in only one component. At the end of this iteration, the CF of the new signal can be calculated and compared to the previous one to update the so-far bestfound phase set. Summing up, this iteratively working algorithm searches for a minimum that is located close to the consigned set of phases. Therefore, this algorithm only delivers a local optimum and should be used in combination with other globally acting algorithms.

To handle the drawback of the first introduced algorithm, a second one was developed to combine it with. This second one operates stochastically. The idea behind this algorithm is to calculate random phase angles *ϕ* and amplitude values *ai* in a specified range. For the *i*-th component, they are defined as follows:

$$\mathcal{W}\_{\emptyset} = [0, \, 2\pi]\_{\prime} \mathcal{W}\_{a\_{i}} = [0.9A\_{i}, \, 1.1A\_{i}]\_{\prime} \tag{7}$$

where *Ai* is the amplitude value of the *i*-th component calculated at the beginning of the algorithm.

We restrict the amplitudes because the originally prescribed distribution of the amplitudes should preferably keep its form. As a result, the newly calculated amplitude value may deviate by a maximum of 10% from the value calculated at the beginning. This stochastic algorithm alternately calculates a set of random phases angles and then random amplitude values. These are then used to modulate a new multi-sine signal and compare its CF to the lowest CF reached so far. In the case of an improvement, it saves the better phase with respect to amplitude values and continues the random calculations.

To combine the benefits of both above-described algorithms, the iterative–stochastic optimization algorithm alternates between the iterative and the stochastic algorithm. In this way, the chance of finding the global minimum rises, and the chance of getting stuck in a local minimum is minimized. The algorithm terminates when a set number of CF calculations is reached.

### *3.2. Simulated Annealing*

In addition, a simulated annealing (SA) approach is adapted for the specified problem. SA is a metaheuristic approach to approximate a global optimum. The algorithm consists of iteratively executed steps. For each problem, the components of the annealing schedule, acceptance probability *p*, current state *s*, state transition, and cost function have to be defined [18].

The annealing schedule reduces the start temperature *T*<sup>0</sup> at each iteration *k* until it reaches the final temperature *Tf* , and the algorithm terminates. We used a schedule where the current temperature *Tk* is reduced at each iteration *k* by the cooling factor *c* according to

$$T\_k = T\_{k-1} \times c.\tag{8}$$

Furthermore, the algorithm keeps track of the current state that corresponds to a possible solution for the problem. Each state consists of a phase angle and amplitude value for each frequency component. At the start, the state of both parameters is randomly initialized. Afterward, a neighbor state is selected at each iteration by changing a phase angle or amplitude value of a randomly chosen frequency component according to the value range of Equation (7).

The current state *sk*−<sup>1</sup> transitions into the neighbor state *sk* if the crest factor (*CF*) of the neighbor state is smaller than the current state. It also transitions into the neighbor state with the following acceptance probability:

$$p\_k = \varepsilon^{-\frac{\mathcal{CF}(\nu\_k) - \mathcal{CF}(\nu\_{k-1})}{\mathcal{I}\_k}}.\tag{9}$$

Otherwise, the current state is kept. At the end, the algorithm returns the state with the lowest CF. The specific configuration of the algorithm is based on several test runs and

consists of the following parameters: *T*<sup>0</sup> = 100, *Tf* = 0.00005, and *c* = *Tf T*0 <sup>1</sup> *nCF* , where *nCF* specifies the number of CF calculations. The parameter *nCF* can be chosen freely and determines the runtime duration.

### *3.3. Genetic Algorithm*

The genetic algorithm (GA) is another metaheuristic approach that we adopted for the specified problem. GA is part of the domain of evolutionary algorithms and is defined by the following components: initialization, selection, crossover, and mutation [19].

At the beginning, multiple candidate solutions are randomly generated to build a start population. Each of these is characterized by chromosomes that model the properties of a solution. A chromosome, in turn, is modeled as the phase angle and amplitude value of a frequency component. Afterward, the iterative process begins with the selection of parents for the next generation.

We used tournament selection for the selection process. The tournament size *k* was set to the value of 3, and each candidate was chosen randomly. The best candidate of the tournament was selected on the basis of the lowest CF. Then, a crossover operation generated the offspring by combining two selected individuals. For that matter, a uniform crossover that chooses random chromosomes from either parent with equal probability was used.

A mutation operation at the end of the iteration changed for each offspring the amplitude and phase angle of each frequency component with the following probability:

$$p\_{\rm mut} = \frac{1}{2 \times n\_{freq}},\tag{10}$$

where *nf req* specifies the number of frequency components. The mutation changes a phase angle or amplitude value by randomly choosing a new value in the specified value ranges of these parameters mentioned in Equation (7).

These steps were repeated until the final number of generations *ngeneration* was reached. To compare our approaches, we used the number of CF calculations *nCF* to specify the runtime duration. Therefore, we set

$$n\_{\text{generation}} = \frac{n\_{CF}}{n\_{pop}},\tag{11}$$

where *npop* is the population size. We set the population size equal to 100 and the probability for keeping the original parent in the next population to 40%. The specified configuration was determined by trial and error.

### *3.4. Experiments*

According to the intended application scenarios, different parameters were selected for a detailed investigation. Three amplitude distributions—uniform, linear, and exponential (the latter two decreasing with increasing frequency)—were of importance for our research. The upper bandwidth was limited to 106 Hz as this was a reasonable tradeoff between CF reduction and calculation effort. The frequency distribution was fixed as was the number of iterations.

All possible combinations from Table 1 were tested for each algorithm. A random start state was used, which was the same for all algorithms. Furthermore, each configuration was executed five times due to stochastic events. The exception was the algorithm from Ojarand as it always delivered the same result.

**Table 1.** Experiment configuration.


<sup>1</sup> Number of CF calculations.

To conduct the experiments, the calculations were executed on Microsoft Azure using the programming language Python. The used hardware configuration was a Standard-F32sv2 compute unit using 32 virtual Intel(R) Xeon(R) Platinum 8272CL CPUs with 2.60 GHz. The memory size and the storage size (SSD) were set to 64 GiB. A separate process with an individual configuration was started on each CPU core in parallel.

### **4. Results and Discussion**

### *4.1. CF Minimization*

The results for the specified configurations and algorithms are visualized in Figure 1. Each figure represents the results for a specific amplitude distribution. The abbreviation Mixed stands for the iterative–stochastic optimization algorithm and Clip stands for the algorithm presented by Ojarand [16]. Furthermore, Schroeder's analytic formula from Equation (4) was used as a baseline. Using stochastic elements, SA, GA, and Mixed were calculated several times. Thus, the mean and standard deviation of the CF after *nCF* iterations are shown.

#### **Figure 1.** CF results for all runs.

All presented algorithms outperformed Ojarand's algorithm and Schroeder's formula in terms of CF. SA, in general, delivered the best results, followed by GA and the iterative–stochastic algorithm. Especially for multi-sine signals with large frequencies, the outperformance became significantly large, with up to 1–1.5-fold increases in CF.

In summary, a broad range of algorithms was compared using predefined conditions. The advantage is that comparability between different algorithms on uniform conditions was created. Nevertheless, a limited selection of algorithms and hyperparameter optimization was applied. Therefore, it is not guaranteed that there are not more suitable algorithms

for the specified problem. Furthermore, the presented algorithms did not always deliver the same results. We tried to analyze this effect by running the algorithms multiple times, but five repetitions were not enough to give a general statement. Nevertheless, our tests indicate only small deviations between different runs.

### *4.2. Time per CF Reduction*

The runtime and progression of the algorithms were analyzed by investing the improvement of the CF over time. Figure 2 illustrates this process. Accordingly, the improvement ΔCF was calculated by taking the difference between the best-found CF until a specific iteration and the start value. Only the results for 50 frequency components are shown due to limited space. Nevertheless, the CF progression for the other frequency components was similar. For each iteration, the mean and standard deviation over all five repetitions of an algorithm were calculated and visualized with one exception. Due to the lack of stochastic events, the Clip algorithm was not calculated multiple times. The actual runtime can be calculated by taking the time taken for each iteration from Table 2.

**Figure 2.** Progression for 50 frequency components.



The results show that the final CF for Clip and Schroeder was surpassed by all presented algorithms after only a few iterations. The results using Schroeder's formula were similar to the final CF of the Clip algorithm (see Figure 1 for comparison). The high reduction lasted until iteration 10,000, before slowing down. SA was the exception because it depended on the predefined annealing schedule. Therefore, the characteristic curve was always the same regardless of the number of iterations. Nevertheless, the curve of SA

showed that the start phase of the algorithm was purely random, and the last phase seemed very long. This is an indication that a more diligent hyperparameter search could accelerate the algorithm.

Furthermore, none of the algorithms were optimized for runtime, and the values from Table 2 are only reference values. These runtime values depend strongly on various parameters, such as hardware configuration, programming language, or parallelization. Another topic is the standard deviation of different runs compared between the algorithms. In general, the deviation was not as pronounced in the SA algorithm as in the iterative– stochastic and GA methods.

### **5. Conclusions and Future Work**

Using multi-sine signals with low CF is a necessity for high-precision measurement systems such as the DEA relying on the comparison of excitation and response signals. Due to nonlinearities in the electrical components, as well as disturbances in the measuring path, a signal with low CF is favorable as the frequency analysis becomes more robust, thus resulting in a less error-prone measurement device.

With an increase in bandwidth, an increase in frequencies monitored is required, especially if the analysis is difficult to support by a model-based approach using small sets of measurement points for the model fit. For industrial applications, where environmental influences, contamination, material aging, and differences between material batches are more a rule than an exception, a fast implementation is required, and a simplified approach is preferred. The presented methods showed significant improvements in reaching a low CF, as well as obtaining a fair CF, in a short amount of time, especially for high frequencies over a wide bandwidth. Thus, the foundation was laid to apply analytical methods as a function of time-dependent frequency behavior over a wide bandwidth, which opens up new paths for the investigation of fast-curing adhesives or similar chemical processes and phase changes in thermoplastics.

Our plans for future research are to investigate the effect of different start values, e.g., in Schroeder's or Newman's formula, on the performance and results of our presented algorithms. These formulas yielded considerably better results than random values for the phase angles and could, therefore, have a significant impact. Further improvement of our presented algorithms will also be a topic. For the iterative–stochastic method, a selective search instead of the current iterative search is a timesaving option. In addition, the runtime needs to be further inspected. For example, the algorithms can be improved by using a more runtime-oriented programming language or parallelization. Another issue is to ensure reliability of the algorithms by increasing the amount of repetitions.

**Author Contributions:** Conceptualization, S.G. and M.E.; methodology, S.G., M.E., and C.L.; software, M.E., H.A., and C.L.; validation, H.A. and C.L.; formal analysis, H.A. and C.L.; writing—original draft preparation, M.E., C.L., H.A., and S.G.; writing—review and editing, M.E., C.L., H.A., and S.G.; visualization, C.L. and H.A.; supervision, S.G.; project administration, S.G.; funding acquisition, S.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Administration of Swabia and the Bavarian Ministry of Economic Affairs and Media, Energy, and Technology, funding number: 43-6622/485/2.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

**Appendix A**

**Figure A1.** Iterative optimization algorithm.

### **References**


### *Proceeding Paper* **Two-Scale Deep Learning Model for Polysilicon MEMS Sensors †**

**José Pablo Quesada-Molina 1,2,\* and Stefano Mariani <sup>1</sup>**


**Abstract:** Microelectromechanical systems (MEMS) are often affected in their operational environment by different physical phenomena, each one possibly occurring at different length and time scales. Data-driven formulations can then be helpful to deal with such complexity in their modeling. By referring to a single-axis Lorentz force micro-magnetometer, characterized by a current flowing inside slender mechanical parts so that the system can be driven into resonance, it has been shown that the sensitivity to the magnetic field may become largely enhanced through proper (topology) optimization strategies. In our previous work, a reduced-order physical model for the movable structure was developed; such a model-based approach did not account for all the stochastic effects leading to the measured scattering in the experimental data. A new formulation is here proposed, resting on a two-scale deep learning model designed as follows: at the material level, a deep neural network is used a priori to learn the scattering in the mechanical properties of polysilicon induced by its morphology; at the device level, a further deep neural network is used to account for the effects on the response induced by etch defects, learning on-the-fly relevant geometric features of the movable parts. Some preliminary results are here reported, and the capabilities of the learning models at the two length scales are discussed.

**Keywords:** microelectromechanical systems (MEMS); Lorentz force micro-magnetometer; polysilicon; deep learning; neural network; stochastic effects

### **1. Introduction**

In recent years, the development of affordable and highly specialized hardware, designed to optimize large data computations via parallel processing [1,2] has propelled the widespread use of data-driven algorithms, such as machine learning (ML). This paradigm is revolutionizing the approach to the research activity in numerous areas, including the field of materials science [3–5].

The most popular types of ML algorithms are the artificial neural networks (ANNs). In their simplest modern form, feedforward neural networks (FFNNs) [6,7] are obtained by assembling a number of layers of interconnected perceptrons [8]. This architecture is typically referred to as the multilayer perceptron (MLP). By stacking a large enough number of layers, we enter into the realm of deep learning (DL), a subfield of ML that leverages the use of many levels of non-linear information processing and abstraction to produce complex learning tasks from unstructured input information [9]. In this context, a popular subtype of ANNs are the convolutional neural networks (CNNs). CNNs are well suited for input data featuring spatial correlation [10]; CNNs are able to learn position and scale invariant structures in the data. This aspect makes CNNs particularly efficient for tasks that rely upon auto-correlated and sequent data analysis, such as image recognition in computer vision, time series forecasting, or speech recognition in natural language processing. In the

**Citation:** Quesada-Molina, J.P.; Mariani, S. Two-Scale Deep Learning Model for Polysilicon MEMS Sensors. *Comput. Sci. Math. Forum* **2022**, *2*, 12. https://doi.org/10.3390/ IOCA2021-10888

Academic Editor: Frank Werner

Published: 22 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

field of materials science, a significant number of CNN applications can be found in areas, such as material texture recognition [11–14] and structure to property mapping [15–19].

In this work, we propose a ML approach based in the implementation of an assemble of ANN architectures, to produce an accurate mapping between the structure of a polysilicon microelectromechanical system (MEMS) device and its effective response. We consider the Lorentz force MEMS magnetometer introduced in [20,21], and propose a new formulation resting on a two-scale deep learning model designed as follows: at the material level, a deep neural network is used a priori to learn the scattering in the mechanical properties of polysilicon induced by its morphology; at the device level, a further deep neural network is used to account for the effects on the response induced by etch defects, learning on-the-fly relevant geometric features of the movable parts. Hence, material- and geometry-related uncertainty sources, whose effects have been formerly studied and observed to intensify with the reduction in the size [22–27], are accounted for in this formulation. In concrete terms, the response is characterized in terms of the maximum oscillation amplitude of the resonant structure, a significant design parameter due to its direct relation with relevant figures of merit of the entire device, such as the responsivity and resolution [28]. The ground-truth data; required for the training, validation and testing of the proposed datadriven model, are obtained via the reduced-order model derived in [20,21].

The remainder of this work is organized as follows. The model of the polysilicon Lorentz force MEMS magnetometer and the intrinsic uncertainty sources at the microscale, are discussed in Section 2. Section 3 provides the methodology adopted for the implementation of the neural network-based model. Results are reported and analyzed in Section 4. Finally, concluding remarks and insights for future research work are collected in Section 5.

### **2. Model of the Polysilicon MEMS and Intrinsic Uncertainties**

### *2.1. Oscillation Amplitude of the Lorentz Force MEMS Magnetometer*

Within the various approaches to magnetic sensing [29], Lorentz force MEMS sensors operate by detecting effects of the Lorentz force, acting on a current-carrying conductor immersed into the magnetic field. The device is able to sense a magnetic field aligned with the out-of-plane direction *z*, see Figure 1, through the measurement of the in-plane motion of the beam. In the sketch, *g* is the gap between the two surfaces of parallel-plate capacitors (for sensing purposes), *h* is the in-plane width, and *L* the overall length of the flexible beam. The out-of-plane thickness is denoted by *b*, so that the area of the rectangular cross section is given by *A = bh*, and the moment of inertia is *I* = (*bh*3)/12. The beam is made of a polycrystalline silicon film with columnar structure, and the elastic properties governing its in-plane vibrations are assumed to be obtained through homogenization over a statistical volume element (SVE) of the polysilicon film.

**Figure 1.** (**a**) SEM picture of the resonant structure of the MEMS magnetometer [20]; (**b**) Scheme of the slender beam of length *L*. Parallel plates are connected to the mid-span of the beam for capacitive sensing [21].

During the operation, the mechanical structure is driven into resonance in order to obtain the maximum output signal. Larger vibration amplitudes are linked to higher responsivity and, therefore, better resolution. Neglecting a sequence of derivation steps (interested readers can find them in [20,21]), the maximum amplitude of the oscillations at the mid-span cross-section, ν*max*, is obtained as the solution of the following equation:

$$\mathcal{E}\left(\frac{F\_0}{K\_1}\right) = \left(2\left(1 - \frac{\omega}{\omega\_1}\right)\mathbf{v}\_{\max} + \frac{3}{4}\frac{K\_3}{K\_1}\mathbf{v}\_{\max}^3\right)^2 + \left(\frac{d}{m\omega\_1}\mathbf{v}\_{\max}\right)^2\tag{1}$$

Terms in Equation (1) represent the effective mass (*m*), air damping (*d*), linear and cubic stiffnesses (*K*<sup>1</sup> and *K*3), amplitude of the oscillating external Lorentz force (*F*0), frequency of the forcing term (ω), and natural frequency of the beam (*ω*1). The dynamics of this system is governed by weakly coupled thermo-electro-magneto-mechanical multi-physics, noticeable when writing explicit expressions for each of the previous terms. Nonetheless, the key aspect to highlight is that, since the solution for ν*max* depends on *K*<sup>1</sup> and *K*<sup>3</sup> which, in turn, depend on the flexural (*EI*) and axial (*EA*) rigidities of the beam, uncertainties in the values of the homogenized Young's modulus, *E* and in-plane width, *h* (induced by defects such as over-etch, *O*), produce a scattering in the expected value of ν*max*.

### *2.2. Sources of Uncertainty in Polysilicon MEMS*

Successful incorporation of MEMS-based products to market, hinges on the ability to engineer these components to have sufficient reliability for the intended applications [30]. Therefore, efforts to characterize the small-scale and scale-specific properties of materials are significantly driven by the need to predict the performance of MEMS and other microscale devices [31]. Manufacturing processes of MEMS are typically subjected to limited repeatability, impeding to obtain deterministic nominal geometries and material properties, which are instead characterized by a scattering around these values [32].

For the particular case of polysilicon-based MEMS, two main sources of uncertainty have been observed to intensify with the miniaturization of the devices [22–27]. The first source is related to the limits of the production process, i.e., when the size of the MEMS is in the same order of magnitude of the tolerances established by the microfabrication process. The second source is instead associated with the intrinsic heterogeneity of the material, i.e., when the size of the MEMS is in the same order of magnitude of the characteristic length of heterogeneities present in the material (for example the grain size in the case of polycrystalline materials). Since both sources of uncertainty are governed by variables that are stochastic in nature, statistical approaches need to be adopted to quantify their impact on the final properties. In practice, the effects of the first uncertainty type can be accounted for in terms of a defect parameter called over-etch depth *O*, while those associated with the second uncertainty type can be accounted in terms of the scattering observed in the apparent elastic properties (e.g., the homogenized Young's modulus *E*).

By following the procedure proposed in [22], it is possible to characterize the scattering of the homogenized in-plane Young's modulus for SVEs of polysilicon films featuring different sizes, i.e., *h =* {2, 5} μm. Size-dependent statistics have been found to be well fitted by lognormal distributions and the relevant parameters are reported in Table 1.


**Table 1.** Statistical indicators characterizing *E* for different SVE sizes, obtained with uniform strain boundary conditions.

On the other hand, geometry related uncertainties can be handled in accordance with former studies [23,25], wherein the over-etch depth *O* was sampled out of a microfabricationtailored normal distribution featuring a zero mean *μ* and a standard deviation *σ =* 0.05 μm. Therefore, assuming *O* to be homogeneously distributed, it changes the in-plane film width according to *h* − 2*O* (*h* represents only a target size); accordingly, the cross-sectional area *A* of the beam is affected linearly by *O*, whereas the moment of inertia *I* is affected cubically. The gap at the capacitors is also modified as *g +* 2*O*, where *g* again represents a target size.

### **3. Methodology**

### *3.1. Representation of the Resonant Structure*

The resonant structure is considered as a concatenation of squared SVEs. These SEMlike subdomains are digitally generated via the regularized Voronoi tessellation procedure, such as described in [22]. These input images are characterized by two different target in-plane widths, *h =* 2 μm and 5 μm.

Given the symmetry displayed by the elastic properties as a function of the orientation in a silicon (1 0 0) wafer, see [33], the Young's modulus of the monocrystalline domains remains invariant under a series of similarity transformations. This particularity motivated a data augmentation (D.A.) procedure wherein, for each original or parent SVE, seven new instances are generated, all linked to the same ground-truth value of the parent. The resonant structure is then regarded as a random concatenation (in space and frequency) of a parent SVE and its corresponding instances. Figure 2 illustrates these aspects.

**Figure 2.** Example of parent SVE with *h =* 2 μm, its instances, and the resonant structure.

The specific similarity transformations used for the D.A. procedure were: three counterclockwise 90◦ rotations, and four mirror transformations (horizontal, vertical, and about the two diagonals). Moreover, the pixel values ranging from [0, 255] encode the in-plane lattice orientation angle displayed by each monocrystalline domain.

### *3.2. The Neural Network-Based Model*

Starting with a general description of the neural network-based model, Figure 3 illustrates the overall model topology. Two stages can be distinguished in the proposed two-scale deep learning approach. At the material level (first stage), the digitally generated 128 × 128 pixels SVE images are fed to the ResNet-based model, developed in our former work [34], which is leveraged to provide the estimation of the homogenized in-plane Young's modulus, ˆ *E*. At the device level (second stage), the model takes the same SVE image (which is regarded as characteristic of the microstructure of the resonant structure), the estimated ˆ *E* and an associated over-etch value, *O* (sampled from the relevant probability distribution described in Section 2.2). Differently from the first stage, the second stage handles multiple and mixed data type inputs. After training, the second stage learns to map from this input information to the maximum oscillation amplitude *νmax*, dealt with as the target variable of the entire model.

**Figure 3.** Topology for the proposed data-driven model.

A detailed description of each of the components is presented hereafter. First, the ResNet-based regression leverages the use of residual learning [35] for a feature learning stage. Local and translational invariant low-level features, such as colors, edges, and shapes of the grains, are extracted in the initial convolutional layers. These features are then combined through further convolution operations in deeper layers, to achieve complex levels of abstraction and obtain high-level features, from which the model is ultimately able to produce the estimation of the homogenized in-plane Young's modulus ˆ *E*. More details about the training, validation, and testing process of this model can be found in [34].

Figure 4 shows, in detail, the architectures used to learn the mapping at the device level. The single-neuron output layer of the CNN branch, and the single-neuron output layer of the MLP1 branch, are fully connected to the eight-neuron input layer of the MLP2.

**Figure 4.** Architectures of the model used for the mapping at the device level.

The backbone of the CNN architecture employs a consecutive application of Convolution, ReLU activation, Batch Normalization, and Max-Pool operations. After the last Max-Pool layer, a flatten operation enables the connection of the feature extractor to a set of fully connected layers featuring a 16-node hidden layer and a 1-node output layer. For regularization purposes, dropout was applied to the 16-node hidden layer with a dropout rate of *p =* 0.5.

The MLP1 is composed of a sequence of fully connected layers. The specific sequence features an 8-node hidden layer followed by a 4-node and a 1-node output layer. As in the case of the CNN, only the output neuron was activated by a linear activation function while the rest of the units use ReLU activations. An identical configuration was chosen for MLP2.

Regarding the selection of hyperparameters, the total number of epochs was set to 1000, the patience (early stopping) to 100, the mini-batch size to 10, the learning rate <sup>α</sup> to 1 · <sup>10</sup><sup>−</sup>3, the optimizer to Adam and the loss function to MSE. Furthermore, the implementation was completed, making use of Keras API. Regarding the hardware, a GeForce GTX 1050 Ti GPU was used.

Concerning the data splitting, training, validation, and test sets were considered. Table 2 summarizes the information related to the number of samples and the statistics associated with the ground-truth values characterizing each set.

**Table 2.** Data splitting information.


### **4. Results**

We focus now on the results obtained with the second stage of the model, as related to the mapping of the maximum amplitude of oscillations performed at the device level. During the training, a minimum validation loss of 4.07 × <sup>10</sup>−<sup>8</sup> <sup>μ</sup>m2 was attained after 328 epochs. After this epoch, no considerable improvement was observed on the validation set over the next 100 epochs (set as the patience parameter), inducing the early stopping.

The parity plots in Figure 5 summarize the performance of the trained model over the training, validation, and test sets. These plots show the correlation between the predicted values of *νmax* obtained after the training of the model, and the corresponding groundtruth data provided by the reduced-order analytical model of the Lorentz force MEMS magnetometer (see Section 2.1). In black and grey, we can observe the mapping of the data associated with the 2 × <sup>2</sup> <sup>μ</sup>m2 and the 5 × <sup>5</sup> <sup>μ</sup>m2 SVEs, respectively. In light and dark green we have included the corresponding identity mapping, which represents the ideal behavior we could expect from the model.

**Figure 5.** Parity plots for the model at the device level: (**a**) Training set; (**b**) Validation set; (**c**) Test set.

As anticipated from the low loss values attained during the training, a very good agreement between predicted and ground-truth data is observed for all the sets and within each, for the two different in-plane widths handled for the SVE samples. To quantify this agreement, the coefficients of determination R<sup>2</sup> are reported in the plots: R2 values are all close to 1, indicating a good performance. Moreover, as it could have been foreseen from the imbalance of the datasets, a slightly better result is observed for the smaller SVE samples.

In this work, the validation set has been employed, as conventionally, with the purpose of tuning the hyperparameters of the model during training. In order to assess the ability of the model to adapt to new, previously unseen data, drawn from the same distribution as the one used to train and tune the model, the performance over the test set can be analyzed. As can be observed in the parity plot, the predictions reproduce almost exactly the identity

map. In terms of R2 values, the results are comparable to the performance obtained on the training set, which is a clear indication of the good generalization capability of the model.

### **5. Conclusions**

A data-driven framework has been proposed and effectively implemented for modeling the response of a Lorentz force MEMS magnetometer. By enabling as input information, a digitally generated representative image of the microstructure together with a characteristic value for the over-etch *O*, the trained model has been able to produce accurate predictions of the maximum oscillation amplitude of the resonant structure. Moreover, the neural network-based model has been able to generalize satisfactorily over unseen samples drawn from the same distribution of the training data.

Future research activities will be oriented to incorporate the complete representation of the microstructure of the resonant structure, avoiding the introduced approximation of it as a random concatenation of an individual SVE and its instances obtained via D.A. An additional homogenization procedure is, therefore, foreseen, to upscale the Young's modulus from the scale of the SVE to the scale of the resonant structure.

The implementation of an additional neural network is also envisioned, to achieve automatic defect detection, enabling *O* to be directly extracted from defect-informed digitally generated images of the microstructure.

**Author Contributions:** Conceptualization, J.P.Q.-M. and S.M.; methodology, J.P.Q.-M. and S.M.; software, J.P.Q.-M. and S.M.; validation, J.P.Q.-M. and S.M.; formal analysis, J.P.Q.-M. and S.M.; investigation, J.P.Q.-M. and S.M.; resources, J.P.Q.-M. and S.M.; data curation, J.P.Q.-M. and S.M.; writing—original draft preparation, J.P.Q.-M. and S.M.; writing—review and editing, J.P.Q.-M. and S.M.; visualization, J.P.Q.-M. and S.M.; supervision, J.P.Q.-M. and S.M.; project administration, J.P.Q.-M. and S.M.; funding acquisition, J.P.Q.-M. and S.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** JPQM acknowledges the financial support by Universidad de Costa Rica, to pursue postgraduate studies abroad.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **A Hybrid Deep Learning Approach for COVID-19 Diagnosis via CT and X-ray Medical Images †**

**Channabasava Chola 1,2, Pramodha Mallikarjuna 1, Abdullah Y. Muaad 1,3,\*, J. V. Bibal Benifa 2, Jayappa Hanumanthappa <sup>1</sup> and Mugahed A. Al-antari 3,4,5,\***


**Abstract:** The COVID-19 pandemic has been a global health problem since December 2019. To date, the total number of confirmed cases, recoveries, and deaths has exponentially increased on a daily basis worldwide. In this paper, a hybrid deep learning approach is proposed to directly classify the COVID-19 disease from both chest X-ray (CXR) and CT images. Two AI-based deep learning models, namely ResNet50 and EfficientNetB0, are adopted and trained using both chest X-ray and CT images. The public datasets, consisting of 7863 and 2613 chest X-ray and CT images, are respectively used to deploy, train, and evaluate the proposed deep learning models. The deep learning model of EfficientNetB0 consistently performed a better classification result, achieving overall diagnosis accuracies of 99.36% and 99.23% using CXR and CT images, respectively. For the hybrid AI-based model, the overall classification accuracy of 99.58% is achieved. The proposed hybrid deep learning system seems to be trustworthy and reliable for assisting health care systems, patients, and physicians.

**Keywords:** COVID-19 pandemic; hybrid deep learning model

### **1. Introduction**

The outbreak of COVID-19 is considered an epidemic and pandemic, affecting people around the world in a short period. It is rapidly transmitted among people in different local and global communities due to travel issues [1]. To date, the number of confirmed cases and deaths has reached 226 million and 4 million worldwide, respectively. COVID-19 is a novel coronavirus coined as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and it targets the human respiratory system. The confirmed biological symptoms of COVID-19 are fever, shortness of breath, dizziness, cough, headache, sore throat, fatigue, and muscle pain. Accurate and rapid classification techniques have become necessary to automatically diagnose COVID-19, especially in a pandemic situation. Recently, AI techniques (deep learning and machine learning) were employed to build a robust decision-making system against COVID-19 [2–4]. Traditionally, COVID-19 screening involves RT-PCR (reverse transcription polymerase chain reaction) carried out at the pathogen laboratory. Due to its higher time consumption and lower sensitivity, medical imaging techniques such as computed tomography (CT) as well as Chest X-ray (Radiological Image) images are being used to fight and classify the COVID-19 respiratory disease [5–7]. The lungs are the major target of the COVID-19 virus. RT-PCR is useful for the diagnosis of disease, while CT and CXR images are useful to assess the damage caused to the lungs due to

**Citation:** Chola, C.; Mallikarjuna, P.; Muaad, A.Y.; Bibal Benifa, J.V.; Hanumanthappa, J.; Al-antari, M.A. A Hybrid Deep Learning Approach for COVID-19 Diagnosis via CT and X-ray Medical Images. *Comput. Sci. Math. Forum* **2022**, *2*, 13. https:// doi.org/10.3390/IOCA2021-10909

Academic Editor: Frank Werner

Published: 29 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

COVID-19 at various stages of the disease. Inflammation of lung tissues can be identified based on the size and shape of attacked tissues with the help of X-ray and CT images [3,6]. Deep convolutional networks are extensively utilized in the fields of hyperspectral images, microscopic images, and medical image analysis; current trending coronavirus-related diagnostic studies also used deep learning-based architectures, namely COVID-SDNet, DL-CRC, and EDL-COVID [1,6,8]. Machine learning-based techniques such as SOM-LWL, PB-OCSVM, and one-shot cluster-based approaches for COVID CXR images have also been introduced to the diagnosis of COVID detection and classification [9–11]. In addition, other techniques such as transfer and learning methods have been implemented using MobileNet, VGG, ResNet, Alexnet, and DensNet architectures as a base module for training for the task of COVID image classification [12,13]. Computer-aided diagnosis systems have been proposed for several medical image analysis tasks such as breast cancer, brain tumors, and kidney and lung disorders using deep learning methods [5,12,14,15].

In this proposed work, a hybrid deep learning system is deployed to perform the classification task of COVID-19 using two CXR and CT datasets. Deep convolutional networks have a promising feature extraction method to automatically identify a large quantity of deep features directly from the input images, thus improving overall classification accuracy. The objective of this study is to provide a unified deep learning model using both medical CXR and CT images. The main contributions in this hybrid system are summarized as follows: First, the building of a novel hybrid deep learning model in unified architecture to automatically and rapidly classify COVID-19 disease using both CXR and CT images. Second, deep learning regularizations of data balancing, transfer learning-based approaches, and data augmentation are used in order to improve overall diagnostic performance. Such experiments will help to improve understanding of COVID-19 disease and to diagnose it using different medical imaging modalities [16–36].

The objective of this work is to provide a robust and feasible AI-based system for medical institutions, health care service providers, physicians, and patients by providing practical solutions for COVID-19 diagnosis.

The rest of this paper is organized as follows: A review of the relevant literature is presented in Section 2; The technical aspects of the deep learning methods for classification systems are detailed in Section 3; The results of the experiment with COVID-19 are reported and discussed in Sections 4 and 5; Finally, the most important findings of this work are summarized in the conclusion in Section 6.

### **2. Related Work**

In early 2020, while the world was under pandemic conditions due to the COVID-19 outbreak, some computer-aided diagnosis systems, proposed based on deep learning, were introduced to predict COVID-19 on digital X-ray and CT images. In [18], Yang et al. presented diagnosis of COVID-19 with the help of CT images, and proposed an AI-based diagnosis system based on DensNet and ResNet pre-trained models with transfer and learning techniques to classify, and reported an accuracy of 89%, AUC of 98%, and F1 score of 90%; the dataset was made open-source. Madallah Alruwaili et al. in [19] used an improved Inception-ResNetV2 to diagnose COVID-19 in X-ray images, which had high accuracy in the radiography dataset at detecting COVID-19. Mundher et al. in [13] designed a model to detect COVID-19 from X-ray images using convolutional neural networks with transfer learning-based techniques with VGG16, and MobileNet modules reported the highest accuracy of 98.28% with VGG16 as a base model. In [19], Madallah Alruwaili et al. used an improved Inception-ResNetV2 to diagnose COVID-19 in X-ray images. Xception, VGG16, InceptionV3, ResNet50V2, MobileNetV2, ResNet101V2, and DenseNet121 models were used for experimentation and with CXR images, with the Inception-ResNetV2 model achieving 99.8%. In [20], Fareed Ahmad et al. designed a deep learning model for detecting COVID-19 using chest X-ray images, using different CNN models such as MobileNet, InceptionV3, and ResNet50. The best model was InceptionV3, which reported 95.75% and 91.47% accuracy and F-score, respectively. In [5], Boran Sekeroglu et al. used the CNN

model to detect COVID-19 from chest X-ray images using the available dataset. They used CNN without preprocessing and with a decreasing number of layers, and were capable of detecting COVID-19 in a limited number of data and imbalanced chest X-ray images with an accuracy of 98.50%. In [21] Pramit Brata Chanda et al. implemented a new model to diagnose COVID-19 using chest X-rays. They used the CNN-based transfer learning framework for the classification task and reported an accuracy of 96.13%. In [22], Mubashir Rehman et al. designed a platform-monitoring system to detect and diagnose of COVID-19 using the breathing rate measurement. In [8], S. Tabik et al. contributed a new open-source dataset, called COVIDGR-1.0. In their experiment, they designed a new model to detect COVID-19 using X-ray images, and also helped to measure severity. They reported the classification as moderate and severe—86.90% and 97.72%, respectively, on the basis of the CXR database. In [12] Wentao et al. presented a new model based on deep learning for the diagnosis of COVID-19 using CT images. The transfer learning technique achieved a good accuracy of 98%. In [6], Sadman et al. proposed a deep learning-based chest radiograph classification (DL-CRC) framework to distinguish COVID-19 cases with high accuracy from two classes, abnormal and normal. They presented a deep learning model called the DL-CRC framework, with two parts: the DARI algorithm and generic data augmentation, with an accuracy of 93.94%. In [23] Khalid M. Hosny et al. designed a hybrid model to detect COVID-19 using two types of CT scans and chest X-ray images. Their work combined two types of images to fit memory and computational time. They proposed the framework for CXR and CT images with 99.3% and 93.2%, respectively. In [7], transfer learning was presented to detect COVID-19 using X-ray and CT-scan images. This was because in COVID-19, initial screening of chest X-rays (CXR) may provide significant information in the detection of suspected COVID-19 cases. In [24], Ravi et al. presented a model to detect COVID-19 using both CT and CXR datasets. In [25], Elmehdi Benmalek et al. aimed to make a comparison for the performances of CT-scan and chest X-ray images to detect COVID-19 utilizing CNN, achieving an accuracy equal to 98.5% and 98.6%, respectively. In [16], Muhammad E. H. et al. presented a strong model to detect COVID-19 pneumonia using chest X-ray images, utilizing the pre-trained deep learning technique. They created a database by merging data that had been created by previous work. They obtained a classification accuracy of 99.77%.

### **3. Methods and Materials**

The proposed hybrid deep learning system for COVID-19 diagnosis is demonstrated in Figure 1. Two different deep learning models, namely ResNet50 and EfficientNetB0, were used for CXR and CT images, respectively. Both deep learning models were trained using 100 epochs. The final layers from both deep learning models were concatenated together to merge the derived deep features and generate the single most robust deep-feature set. This set carries promising features generated from both CXR and CT images at the same time; this is key to improve the overall accuracy performance of the proposed deep learning system. The concatenated deep features were then scaled in 1D form using a global average pooling (GAP); this is to make the derived feature maps suitable for the following two fully connected layers. Finally, Softmax layer is used to make the final decision of whether the output is a positive COVID-19 case or a normal negative case. To reduce overfitting that may occur during the training phase, the 0.5 dropout strategy is used. For pre-training, the transfer leering strategy is used with the ImageNet database.

**Figure 1.** Schematic hybrid deep learning diagram of the COVID-19 classification system.

### *3.1. Preprocessing*

The preprocessing technique is the most significant step of the model. Here, we considered raw data and transformed it into a specific input data format and dimension [3]. We inculcated the data augmentation and class-balancing strategy to reduce overfitting, and this acted as a catalyst for the training process [15,31]. Later, we divided both data into 70% for training, 20% for testing, 10% for validation. For each class, the dataset was selected in a randomized manner. For hyperparameter initialization, the transfer-earning strategy was applied using the dataset of ImageNet [3,15,31].

### *3.2. Feature Extraction*

Deep CNN has shown improved performance regardless of domain, particularly in medical imaging, and generalization of the model has been observed. Transfer learning is being explored to provide an efficient solution [6]. In our experimental analysis, we employed ResNet50 [27] and EfficientNet0 [24,26] models for the task of feature generation; deep features were later passed to custom user-specific layers. In our work we pushed to global average pooling, followed by a fully connected layer [24]. We used two FC layers, improving efficiency; and to generalize learning, we also introduced dropout in the middle of the FC layers. These extracted features were passed to the classification layer to assign the appropriate class label of the given input data instance.

### *3.3. Classification*

The pipeline of extracted deep features through feed-forward models ResNet50 and EfficientNet0 was passed to the SoftMax layer for classification and feature extraction. The results were generated for both CT and CXR databases separately and results were discussed in the below tables. The results were promising in contrast to existing research.

### **4. Experimental Analysis**

### *4.1. Dataset*

To quantify our work, two datasets of chest X-rays and CTs were used. These datasets are publicly available at Kaggle databases [16–18]. The datasets are described as shown in Table 1.



### *4.2. Implementation Environment*

To perform all experiments in this study, we used a PC with the following specifications: Intel R © Core(TM) i7-6850 K processor with 32 GB RAM, 3.360 GHz frequency GPUs NVIDIA GeForce GTX1050Ti. Deep learning algorithms were implemented herein using Python 3.8.0 programming with Anaconda [Jupyter notebook]. The Python-based ML libraries such as Torch, TensorFlow, OpenCV, pandas, and Scikitlearn were utilized to investigate the performance metrics by the proposed methods; at the same time TensorFlow and Keras in Colab were used to implement transfer learning. The results and discussions concerning various techniques incorporated are highlighted in the subsequent sections. The source codes are available at GitHub (https://github.com/IIITK-AI-LAB/Hybridcovid-model (accessed on 25 September 2021)).

### *4.3. Evaluation Metrics*

To assess our proposed system, we used the evaluation metrics of recall/sensitivity (Re), specificity (Sp), F1-measure (F-M), and overall accuracy (Az). The mathematical formula for these evaluation metrics is defined as follows:

$$\text{Recall/Sensitivity (Re)} = \frac{\text{TP}}{\text{TP} + \text{FN}} \text{ \text{\textdegree} \tag{1}$$

$$\text{Specificity} \,(\text{Sp}) = \frac{\text{TN}}{\text{TN} + \text{FP}} \,\text{'} \tag{2}$$

$$\text{F1-score (F-M)} = \frac{2 \cdot \text{TP}}{2 \cdot \text{TP} + \text{FP} + \text{FN}} \text{ \text{\textdegree} \tag{3}$$

$$\text{Overall accuracy } (\text{Az}) = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FN} + \text{TN} + \text{FP}},\tag{4}$$

where TP, TN, FP, and FN are defined to represent the number of true positive, true negative, false positive, and false negative detections, respectively. The confusion matrix is used to derive all of these parameters.

### **5. Results and Discussion**

This section shows our experimental results in the following two different scenarios: a single straightforward scenario and a hybrid scenario. The former scenario means both deep learning models (i.e., ResNet50 and EfficientNetB0) are separately used and tested to investigate which model could provide the best overall classification accuracy for a single dataset—chest X-ray or CT images. The later scenario means both deep learning models are concatenated to produce the proposed hybridization model, as shown in Figure 1. This is to check which hybridization combination could achieve the best performance when both medical chest images are used.

### *5.1. Single Straightforward Scenario*

For each medical chest dataset (i.e., chest X-ray or CT images), two different experiments were performed. One experiment was carried out using the deep convolutional ResNet50 model, and the other was performed using the EfficientNetB0 deep learning model. In other words, the single deep learning model (i.e., ResNet50 or EfficientNetB0) was trained twice: once for the chest X-ray and again for the CT images. In both training styles, the same deep learning architecture, as well as training/testing settings, were used.

### 5.1.1. COVID-19 Classification Based on Chest X-ray Images

In this case, the input dataset consists only of X-ray images for the ResNet50 or EfficientNetB0 deep learning models. The overall classification evaluation results are summarized in Table 2. Although it is obviously shown that both deep learning models achieve almost the same results, the EfficientNetB0 deep learning model achieves a slightly better overall accuracy of 99.36%.


**Table 2.** Classification evaluation results (%) using chest X-ray images.

### 5.1.2. COVID-19 Classification Based on Chest CT Images

The chest CT dataset is only used to separately train the deep learning models of ResNet50 and EfficientNetB0. The overall classification evaluation results are reported in Table 3. The EfficientNetB0 deep learning model achieves a slightly better overall accuracy of 99.23%, while other evaluation metrics show a consistent and stable performance.



### *5.2. Hybrid Scenario: COVID-19 Classification Using Chest and X-ray Images*

In the proposed hybrid deep learning model, both chest X-ray and CT datasets are used as input, as shown in Figure 1. The evaluation classification results for the best combination hybrid model are demonstrated in Table 4. Each row in Table 4 presents the classification assessment results by using a single deep learning model in a hybrid style for both chest X-ray and CT images.

**Table 4.** Classification evaluation results (%) for the proposed hybrid deep learning model using both chest X-ray and CT medical images.


### **6. Conclusions**

A hybrid deep learning model is proposed to automatically detect COVID-19 respiratory disease from both chest X-ray and CT images. The proposed hybrid model uses two deep convolutional networks, namely ResNet and EfficientNet, to generate promising deep hierarchical features. The proposed hybrid deep learning approach could achieve classification accuracies of 99.58% using chest X-ray and CT images. Further improvements could be achieved by including ultrasound images as well. This can help to construct and build a much more robust and reliable diagnosis system to fight COVID-19 in the early stages. The promising results could help to provide a better real-time diagnosis system for health care service providers, physicians, and patients.

**Author Contributions:** Conceptualization, P.M., A.Y.M., C.C. and M.A.A.-a.; methodology, A.Y.M., J.V.B.B., C.C. and M.A.A.-a.; software, P.M., A.Y.M. and C.C.; validation, A.Y.M., C.C. and M.A.A.-a.; formal analysis, A.Y.M., J.V.B.B. and C.C.; investigation, A.Y.M., J.V.B.B., J.H., C.C. and M.A.A.-a.; resources, A.Y.M., J.H., J.V.B.B. and C.C.; data curation, P.M., A.Y.M. and C.C.; writing—original draft preparation, P.M., A.Y.M., C.C. and M.A.A.-a.; writing—review and editing, P.M., A.Y.M., J.V.B.B., C.C. and M.A.A.-a.; visualization, J.V.B.B. and M.A.A.-a.; supervision, J.V.B.B., J.H. and M.A.A.-a.; project administration, J.V.B.B. and M.A.A.-a.; funding acquisition, J.V.B.B., J.H., C.C. and M.A.A.-a. All authors have read and agreed to the published version of the manuscript.

**Funding:** The experimental part of the work reported herein (Medical\_Image\_DL-PR) is fully supported by National PARAM Supercomputing Facility (NPSF), Centre for Development of Advanced Computing (C-DAC), Savitribai Phule Pune University Campus, India. We acknowledge our sincere thanks for providing such excellent computing resources.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data used to support the findings of this study are available from the corresponding author upon request.

**Conflicts of Interest:** There are no conflict of interest associated with publishing this paper.

### **References**


### *Proceeding Paper* **A Novel Deep Learning ArCAR System for Arabic Text Recognition with Character-Level Representation †**

**Abdullah Y. Muaad 1,2,\*, Mugahed A. Al-antari 3,\*, Sungyoung Lee <sup>4</sup> and Hanumanthappa Jayappa Davanagere 1,\***


**Abstract:** AI-based text classification is a process to classify Arabic contents into their categories. With the increasing number of Arabic texts in our social life, traditional machine learning approaches are facing different challenges due to the complexity of the morphology and the delicate variation of the Arabic language. This work proposes a model to represent and recognize Arabic text at the character level based on the capability of a deep convolutional neural network (CNN). This system was validated using five-fold cross-validation tests for Arabic text document classification. We have used our proposed system to evaluate Arabic text. The ArCAR system shows its capability to classify Arabic text in character-level. For document classification, the ArCAR system achieves the best performance using the AlKhaleej-balance dataset in terms of accuracy equal to 97.76%. The proposed ArCAR seems to provide a practical solution for accurate Arabic text representation, both for understanding and as a classifications system.

**Keywords:** deep learning ArCAR system; Arabic character-level representation; Arabic text document classification; Arabic sentiment analysis

### **1. Introduction**

Natural Language Processing (NLP) is one of the most important topics which came from the combination of linguistics and artificial intelligence, etc. NLP is an interesting topic for humans to make interactions with machines. NLP's purpose is to process textual content and extract the most useful information so that we can make better decisions in our daily lives.

There are about 447 million native Arabic speakers and dialects in the world [1,2]. The Arabic language is the main language of 26 Arab countries (i.e., Arab countries) which possesses many difficulties compared to English. Arabic text analytics are incredibly significant with respect to making our lives easier in many domains such as document text categorization [3], Arabic sentiment analysis [4], and detection of email spam. In fact, the Arabic text faces many challenges as mentioned in [5] such as stemming, dialects, phonology, orthography, and morphology. Each level of the classification method necessitates a significant amount of labor and attention from the user, especially with preprocessing text which requires various steps due to the difficulties of Arabic text. Until today most of the representation techniques for the classification of Arabic text have depended on words rather than characters while at the same time the difficulty of stemming Arabic words is still a big challenge. For that reason, we attempted to determine a representation for Arabic

**Citation:** Muaad, A.Y.; Al-antari, M.A.; Lee, S.; Davanagere, H.J. A Novel Deep Learning ArCAR System for Arabic Text Recognition with Character-Level Representation. *Comput. Sci. Math. Forum* **2022**, *2*, 14. https://doi.org/10.3390/ IOCA2021-10903

Academic Editor: Frank Werner

Published: 26 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

text which would decrease these difficulties. Stemming Arabic words is still a big challenge which, requires an understanding of the word's root which is not easy in many cases.

Due to these challenges, we developed a new Arabic text computer-aided representation and classification system that understands and recognizes Arabic at the character level to classify Arabic documents. This paper will aid in the representation of Arabic text while at the same time assisting the classification.

### **2. Related Works**

The work which has been done for Arabic text representation and classification is much less compared to English text. Little research on the analysis of Arabic text classification had been done but it has been shown to give different results when working with Arabic text. The most important technique for Arabic text classification is usually representation and classification, so in this section, we will survey the most important steps for that reason. In this section, we will conduct a brief literature review focusing on two key stages: representation such as paper [6,7] and classification such as paper [8] as follows:

### *2.1. Representation*

The authors in [8] introduced Term Class Weight-Inverse Class Frequency (TCW-ICF) as a new representation approach for Arabic text. Using their representation, the most promising features of Arabic texts can be retrieved.

Etaiwi et al. introduced an Arabic text categorization model based on a graph-based semantic representation model [7]. Their accuracy, sensitivity, precision, and F1-score, for their work increased by 8.60 percent, 30.20 percent, 5.30 percent, and 16.20 percent, respectively.

To improve Arabic text representation, Almuzaini et al. presented a framework that combined document embedding representation (doc2vec) with sense disambiguation. They then used the OSAC corpus dataset to conduct their work experiments. In terms of F-measure, they were able to attain a text categorization accuracy of 90% [9].

Oueslati et al. implemented Deep CNN to Arabic sentiment analysis (SA) text in 2020. They used character-level features to represent Arabic text for sentiment analysis. As a result, their effort has several limitations, such as the absence of all characters and a large number of Arabic characters, which lead to misunderstandings of the Arabic text [10].

As a result, we're quite enthusiastic to look for a better option for representing Arabic text in order to overcome these challenges.

### *2.2. Classification*

The most crucial phase in the classification of the various contextual Arabic materials into a valid category is the classification itself. Here we survey some of the recent work.

The authors in [11] implemented a fuzzy classifier to improve Arabic document classification performance. Their results were equal to a precision of 60.16%, recall 62.66%, and f-measure 61.18%.

The first character-level deep learning ConvNet for English text classification was proposed by Zhang et al. [12]. They employed eight large-scale datasets to validate their model and had the lowest testing errors across the board.

In 2020, Daif et al. presented AraDIC [6], the first deep learning framework for Arabic document classification based on image-based characters

Ameur et al. suggested a hybrid CNN and RNN deep learning model for categorizing Arabic text documents using static, dynamic, and fine-tuned word embedding [3]. The most meaningful representations from the space of Arabic word embedding are automatically learned using a deep learning CNN model.

Due to this survey of the classification algorithm for Arabic text, we concluded that we should use Python 3.7 programming to complete our project. We also employed machine learning technologies.

### **3. Proposed Model**

Figure 1 shows the proposed framework for Arabic text classification at the character level with two types of algorithms; (1) traditional machine learning, (2) Deep learning using CNN as we mention in Figure 2. Our proposed approach can be used to recognize Arabic documents

**Figure 1.** Arabic document classification using machine learning.

**Figure 2.** Arabic document classification using deep learning.

### *3.1. Architecture*

The proposed machine learning for Arabic text classification based on different types of representation is presented in Figure 1.

### *3.2. Machine Learning*

This model utilizes two different types of representation TFIDF and BOW.

### *3.3. Deep Learning*

We proposed a deep learning model for Arabic text classification based on CNN. The represented text was at character level as shown in Figure 2 with an Arabic documents classification of accuracy equal to 97. The beauty of this model is that we can avoid preprocessing steps by representing text in character level which at the same time enables better accuracy.

### **4. Experimental Analysis**

We used Python programming to complete our work. We also employed machine learning technologies and data analysis known as scikit-learn2, TensorFlow, and Kera's. We used a classification system based on CNN and character level representation to classify Arabic text.

### *4.1. Dataset*

This dataset is gathered from all articles published in the news portal from 2008 to 2018. The collected text dataset exceeds a volume of 4 GB and most of the articles published on the websites were not categorized and had a vague label. As a result, there were seven categories populated with a reasonable number of articles under each category to serve the text classification tasks. The dataset was balanced by restricting the number of articles in each category to around 6500, as shown in Table 1


### *4.2. Implementation Environment*

We utilized a PC with the following characteristics to carry out all of the experiments in this study: One NVIDIA GeForce GTX 1080 GPU and an Intel R Core(TM) i5 K processor with 8 GB RAM and a 3.360 GHz clock. The described system is built with a Python 3.7 with TensorFlow and Kera's back-end libraries on a Windows operating system.

### *4.3. Evaluation Metrics*

To evaluate our proposed ArCAR, we used the following metrices as in [13]

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}'} \tag{1}$$

$$\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}'} \tag{2}$$

$$\text{F1} - \text{Measure} = \frac{\text{2} \cdot \text{TP}}{\text{2} \cdot \text{TP} + \text{FP} + \text{FN}'} \tag{3}$$

$$\text{Overall Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FN} + \text{TN} + \text{FP}'} \tag{4}$$

where TP, TN, FP, and FN stand for true positive, true negative, false positive, and false negative detections, respectively. A multidimensional confusion matrix was utilized to generate all of these properties. Finally, we used the weighted-class technique to determine the evaluation for each dataset to avoid having test sets that were uneven across all classes or indices [14,15].

### **5. Results and Discussion**

The algorithms such as MNB, BNB, Logistic Regression, SGD Classifier, SVC, and linear SVC are implemented herein using Python with Anaconda [Jupyter notebook]. The proposed methods use Python-based machine learning tools such as NLTK, pandas, and scikit-learn to investigate performance indicators. Meanwhile, for deep learning models such as CNN, additional libraries like as Kera's and TensorFlow were used. The results and discussions concerning the various techniques incorporated are highlighted in the subsequent sections.

### *5.1. Machine Learning*

For this work, the proposed system was evaluated using Khaleej datasets with machine learning. As shown in Table 2, the best performance was achieved using Linear SVC with Accuracy 93 with TFIDF representation. At the same time, the best accuracy with BOW representation was SGD Classifier.


**Table 2.** Accuracy for Alkhaleej with and without preprocessing.

### *5.2. Our Proposed Deep Learning*

For this work, the proposed system was evaluated using Khaleej datasets with deep learning. As shown in Table 3 and Figure 3, the best performance was achieved using CNN with overall accuracy, F1 measure score, precision, and recall, of 97.47%, 93.23%, 92.75%, and 92%, respectively.

**Table 3.** Result of the proposed system in deep learning.


**Figure 3.** Averaged multiclass confusion matrices AlKhaleej data set.

### **6. Conclusions**

This paper provides a new deep learning strategy for character-level Arabic text classification in Arabic text data. We used datasets in the multiclass problem to demonstrate our system's dependability and capability regardless of the number of classes in our technique, which encodes Arabic text at the character level to avoid preprocessing restrictions like stemming. Simultaneously, we compared our results to those of five machine learning techniques to show that our model outperformed them all. The following are our future plans to increase the performance of the planned system: The problems of multi-label text categorization and Arabic data augmentation need to be handled.

**Author Contributions:** Conceptualization, A.Y.M. and M.A.A.-a.; methodology, A.Y.M. and M.A.A. a.; software, A.Y.M.; validation, A.Y.M. and M.A.A.-a.; formal analysis, A.Y.M.; investigation, H.J.D. and A.Y.M.; resources, A.Y.M. and H.J.D.; data curation, A.Y.M.; writing—original draft preparation, A.Y.M. and M.A.A.-a.; writing—review and editing, A.Y.M. and M.A.A.-a.; visualization, M.A.A.-a.; supervision, S.L.; project administration, M.A.A.-a.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by University of Mysore and the Ministry of Science and ICT (MSIT), South Korea, through the Information Technology Research Center (ITRC) Support Program under Grant IITP-2021-2017-0-01629, and in part by the Institute for Information & Communications Technology Promotion (IITP), through the Korea Government (MSIT) under Grant 2017-0-00655 and IITP-2021-2020-0-01489 and Grant NRF-2019R1A2C2090504.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data used to support the findings of this study are available from the corresponding author upon request.

**Conflicts of Interest:** There are no conflict of interest associated with publishing this paper.

### **References**


### *Proceeding Paper* **AI-Based Misogyny Detection from Arabic Levantine Twitter Tweets †**

**Abdullah Y. Muaad 1,2,\*, Hanumanthappa Jayappa Davanagere 1,\*, Mugahed A. Al-antari 3,\*, J. V. Bibal Benifa <sup>4</sup> and Channabasava Chola 4,\***


**Abstract:** Twitter is one of the social media platforms that is extensively used to share public opinions. Arabic text detection system (ATDS) is a challenging computational task in the field of Natural Language Processing (NLP) using Artificial Intelligence (AI)-based techniques. The detection of misogyny in Arabic text has received a lot of attention in recent years due to the racial and verbal violence against women on social media platforms. In this paper, an Arabic text recognition approach is presented for detecting misogyny from Arabic tweets. The proposed approach is evaluated using the Arabic Levantine Twitter Dataset for Misogynistic, and it gained recognition accuracies of 90.0% and 89.0% for binary and multi-class tasks, respectively. The proposed approach seems to be useful in providing practical smart solutions for detecting Arabic misogyny on social media.

**Keywords:** Arabic language processing; Arabic Text Representation; misogyny detection

### **1. Introduction**

People express their thoughts, emotions, and feelings by means of posts on social media platforms. Recently, online misogyny, considered as a harassment, has increased against Arab women on a daily basis [1,2]. An automatic misogyny-detecting system is necessary for minimizing the prohibition of anti-women Arabic harmful content [2]. People are increasingly using social media platforms such as Twitter, Facebook, Google, and YouTube to communicate their various ideas and beliefs [3]. Misogyny on the internet has become a major problem that has expanded across a variety of social media platforms. Women in the Arab countries, like their peers around the world, are subjected to many forms of online misogyny. This is, unfortunately, not compatible with the values of the Islamic religion or with any other values or beliefs regarding women. Detecting such content is crucial for understanding and predicting conflicts, understanding polarization among communities, and providing means and tools to filter or block inappropriate content [3]. The main challenges and opportunities in this field are the lack of tools, with an absence of resources in the non-English (such as Arabic) dataset [4]. This research aims to develop a deep learning-based accurate approach to limit the misogyny problems. The lack of such studies from an Arabic perspective was an inspiration to investigate and find out practical smart solutions by designing and developing an automatic identification misogyny system [5].

The main contributions of this work are summarized as follows:

• The Arabic text is represented using the word and word-embedding techniques.

**Citation:** Muaad, A.Y.; Davanagere, H.J.; Al-antari, M.A.; Benifa, J.V.B.; Chola, C. AI-Based Misogyny Detection from Arabic Levantine Twitter Tweets. *Comput. Sci. Math. Forum* **2022**, *2*, 15. https://doi.org/ 10.3390/IOCA2021-10880

Academic Editor: Frank Werner

Published: 19 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

• The state-of-art deep learning BERT technique is used to detect Arabic misogyny.

A comprehensive comparison study was conducted using different machine learning and deep learning techniques to achieve prominent and superior detection results.

### **2. Related Works**

In 2020, Aggression and Misogyny Detection using BERT was proposed for three languages, such as English, Hindi, and Bengali [6]. The proposed model used an attention mechanism over BERT to get the relative importance of words, followed by fully connected layers and a final classification layer, which predicted the corresponding class [6]. The misogyny identification techniques offered satisfactory results, but the recognition of aggressiveness is still in its infancy for some languages [7]. Misogyny detection in the Arabic language is still in its early stages, with only a few important contributions existing [8]. In the last five years, there has been a growth in the number of researchers who are interested in automatic Arabic hate speech detection in social media. In the presented research, Arabic text detection based on Misogyny has been extensively studied. This study starts with a comparative study of the neural network and transformer-based language models that have been applied for Arabic fake news detection [9]. In terms of generalization, AraBERT v02 outperformed all other models evaluated. They advised using a gold-standard dataset annotated by humans in the future, rather than a machine-generated dataset, which may be less reliable [9]. In the same domain of detection, the word2vec model was suggested to detect semantic similarity between words in Arabic, which could assist in the detection of plagiarism. The authors built the word2vec model using the OSAC corpus [10]. Here, the authors focused on creating a successful offensive tweet identification dataset. They quickly constructed a training set from a seed list of offensive words. Given an autonomously generated dataset, they represented a character n-gram and used a deep learning classifier to achieve a 90% F1 score [11]. A single learner machine learning approach and ensemble machine learning approach was investigated for offensive language detection in the Arabic language [12]. In addition to this, a transfer learning method and AraBERT were used for Arabic offensive detection datasets. The results reported an outperformance of Arabic monolingual BERT models over BERT multilingual models. Their results mentioned that there was a limitation by the effects of transfer learning on the performance of the classifiers, particularly for the highly dialectic [13]. With the augmentation of the data to improve text detection, the authors experimented with seven BERT-Based models, and they augmented a task dataset to identify the sentiment of a tweet or detect if a tweet was sarcasm [14]. Their experiments were based on fine-tuning seven BERT-based models with data augmentation to solve the imbalanced data problem. For both tasks, the MARBERT BERT-based model with data augmentation outperformed other models with an increase of the F-score by 15%. Regarding the influence of preprocessing in text detection, a simple but intuitive detection system based on the investigation of a number of preprocessing steps and their combinations was addressed [15]. Here, a comparison between LSVC and BiLSTM classifiers was conducted. The detection of misogyny in Arabic text was presented using the Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi), which was the first benchmark dataset for Arabic misogyny. They employed an MTL configuration to investigate its effect on the tasks. They presented an experimental evaluation of several machine learning systems, including SOTA systems. The result for accuracy was equal to 88 and presented an approach based on stylistic and specific topic information for the detection of misogyny, exploring several aspects of misogynistic Spanish and English user-generated texts on the Twitter Section (Heading 1) [16]. Finally, an approach based on character level for Arabic text utilizing convolutional neural network (CNN) has been presented to solve many problems, such as difficulties in preprocessing, etc. [17].

### **3. Proposed Model**

*3.1. ATDS Architecture*

The proposed model for detection of Arabic text from the Arabic Levantine Twitter dataset based on different types of representation and different machine learning and deep learning model has been presented in Figure 1:

**Figure 1.** Architecture of the Arabic text detection system (ATDS): Abstract view.

### *3.2. Pre-Processing*

The pre-processing technique is most commonly used for preparing raw data into a specific input data format, which could be useful for machine learning and deep learning techniques. The main purpose of preprocessing is to clean the dataset regarding stop-words, punctuation, poor spelling, slang, and other undesired words abound in text data. This unwanted noise and language may have a negative impact on the recognition performance of the Arabic misogyny detection task. In this work, we eliminated all the non-Arabic words, stop words, and punctuation through the following steps:

(a) Tokenization

This process was used to convert the Arabic text (sentence) into tokens or words. Tokenized documents can be transformed into sentences, and sentences can be converted into tokens. Tokenization divides a text sequence into words, symbols, phrases, or tokens [18].

(b) Normalization

The normalization is performed to make all words in the same form, and there are many techniques, such as stemming. We can make normalization by different methods such as regular expressions.

(c) Stop Word Elimination

In the text preprocessing task, there are numerous terms that have no critical meaning but appear frequently in a document. It refers to words that do not help to increase performance, because they do not provide much information for the sentiment classification task; therefore, stop words should be removed before the feature selection process.


The goal of lemmatization is the same as stemming: to reduce words to their base or root words. However, in lemmatization, the inflection of words is not simply cut off; rather, it leverages lexical information to turn words into their base forms [19].

### *3.3. Representation*

After Arabic text preprocessing, the data were transformed to be in a specific structure style for representation purposes. To perform this, bag-of-words (BOW) and term frequency-inverse document frequency (TFIDF) were used for data representation with traditional machine learning techniques. For deep learning techniques, we used a new technique called word embedding, in bidirectional encoder representations from Transformers (BERT). Instead of the basic language task, BERT was trained with two tasks to encourage bidirectional prediction and sentence-level understanding [20,21].

### *3.4. Text Detection*

Detection of text and classification to true labeled classes based on their content is known as classification. Several works have been reported here based on text classification using different algorithms as we will explain in part 5. There are many algorithms that have been implemented as follows:

• Passive Aggressive Classifier

Passive-Aggressive algorithms are a family of Machine learning algorithms that are popularly used in big data applications. Passive-Aggressive algorithms are generally used for large-scale learning. It is one of the online-learning algorithms. In online machine learning algorithms, the input data comes in sequential order, and the machine learning model is updated sequentially, as opposed to conventional batch learning, where the entire training dataset is used at once [20].

• Logistic Regression

Logistic regression is a statistical model that, in its basic form, uses a logistic function to model a binary dependent variable, although many more complex extensions exist [19].

• Random Forest Classifier

The term "Random Forest Classifier" refers to the classification algorithm made up of several decision trees. The algorithm uses randomness to build each individual tree to promote uncorrelated forests, which then uses the forest's predictive powers to make accurate decisions [19].

• Linear SVC

The support vector machine (SVM) classifier is one of the commonly used algorithms for text classification due to its good performance. SVM is a non-probabilistic binary linear classification algorithm, which is performed by plotting the training data in a multi-dimensional space. Then, SVM categorizes the classes with a hyper-plane. The algorithm will add a new dimension if the classes cannot be separated linearly in multi-dimensional space to separate the classes. This process will continue until the training data can be categorized into two different classes [19].

• Decision Tree Classifier

Decision Trees are also used in tandem when you are building a Random Forest classifier, which is a culmination of multiple Decision Trees working together to classify a record based on a majority vote. A Decision Tree is constructed by asking a series of questions with respect to a record of the dataset we have got [19].

• K Neighbors Classifier

KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, and then voting for the most frequent label (in the case of classification) or averaging the labels (in the case of regression) [19].

• ARABERTv2

AraBERT is an Arabic pre-trained language model based on Google's BERT architecture. AraBERT uses the same BERT-Base configuration [21]

### **4. Experimental Analysis**

*4.1. Dataset*

The dataset [1] was unbalanced by limiting the number of articles in each specific category, as summarized in Tables 1 and 2.

**Table 1.** Data Distribution for each class in binary classification.


**Table 2.** Data Distribution for each class in multi-classification.


The author classified his data as mentioned below:


### *4.2. Implementation Environment*

To perform all experiments in this study, we used a PC with the following specifications: Intel R © Core(TM) i7-6850 K processor with 4 GB RAM and 3.360 GHz frequency. The algorithms such as Passive-Aggressive Classifier, Logistic Regression, Logistic Regression, Random Forest Classifier, K Neighbors Classifier, and linear SVC were implemented herein using Python 3.8.0 programming with Anaconda [Jupyter notebook]. The Pythonbased ML libraries, such as NLTK, pandas, and sci-kit-learn, were utilized to investigate the performance metrics by the proposed methods; at the same time, TensorFlow and Keras in collab were used to implement ARABERTv2. The results and discussions concerning various techniques incorporated are highlighted in the subsequent sections. The code will

be available in our account in GitHub (https://github.com/abdullahmuaad8, accessed on 8 February 2022).

### *4.3. Evaluation Metrics*

To assess our proposed system, we used the following indices:

The recall was calculated by dividing the number of true positive (TP) observations by the total number of observations (TP + FN).

Specificity was defined as the proportion of true positive (TP) observations to the total positive forecasted values (TP + FP).

F1-score is the weighted average of recall and precision, which means that the F1-score included both FPs and FNs.

Accuracy was defined as the simple ratio of accurately predicted observations to total observations.

The definition formula of all these metrics were defined in [22] as follows:

$$\text{Recall} / \text{Sensitivity} \ (\text{Re}) = \frac{\text{TP}}{\text{TP} + \text{FN}'} \tag{1}$$

$$\text{Specificity} \,(\text{Sp}) = \frac{\text{TN}}{\text{TN} + \text{FP}'} \tag{2}$$

$$\text{F1}-\text{score}\,(\text{F}-\text{M}) = \frac{2\cdot\text{TP}}{2\cdot\text{TP}+\text{FP}+\text{FN}},\tag{3}$$

$$\text{Overall accuracy } (\text{Az}) = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FN} + \text{TN} + \text{FP}'} \tag{4}$$

where TP, TN, FP, and FN were defined to represent the number of true positive, true negative, false positive, and false negative detections, respectively. To derive all of these parameters, a multidimensional confusion matrix was used.

### **5. Results and Discussion**

The results and discussions concerning various techniques incorporated are highlighted in this section; we describe our experiments on this data. We evaluate the performance of all algorithms on this data. We designed our experiments at two levels (tasks):


### *5.1. Binary Classification*

The results of the misogyny identification task are shown in Table 3. In terms of accuracy, precision, recall, and F-measure, the Linear SVC model outperformed the others. We also could observe that the model outperformed all the other models, except Random Forest Classifier, which works better in terms of recall. At the same time, we have been used one of the transfer learning tools called ARABERTv2, which provided excellent accuracy, but the time was more when we compared to machine learning.


**Table 3.** Arabic misogyny detection Evaluation results for binary classification tasks.

### *5.2. Multi Classification*

The results of the misogyny identification task are shown in Table 4. In terms of accuracy, the Linear SVC model outperformed the others. According to the results, the typical machine learning Random Forest Classifier model performance was poor. At the same time, we have used one of the transfer learning tools called ARABERTv2, which provided excellent accuracy, but the time was more than machine learning methods.

**Table 4.** Arabic misogyny detection Evaluation results for multiclass classification tasks.


Finally, we would like to note that the data set was unbalanced. For example, as shown in Table 1, the class Sexual harassment only had 17 comments, which means that learning the pattern for these classes was very limited. As a result, we recommend that the number of comments in this data must increased as a future project

### **6. Conclusions**

The problem of misogyny has become a major problem for Arab women. In this work, we introduced a model for the detection of misogyny in Arabic text. We performed our work utilizing a dataset called Arabic Levantine Twitter Dataset for Misogynistic. Our results provided excellent accuracy, equal to 83%, using machine learning for detection and classification tasks. This article proves that many open issues need to be handel, starting with the limitation of benchmark dataset, lexicons of Arabic text in general, and especially for the misogyny of women. At the same time, the difficulty of the nature of the Arabic language in morphology and delicacy. Then, the augmentation of data using techniques such as oversampling to solve an unbalance of classes could get better performance. Finally, there is a need to study the correlation between hate speech, misogyny, and the problem of mixed language in future works.

**Author Contributions:** Conceptualization, A.Y.M., C.C., J.V.B.B. and H.J.D.; methodology, A.Y.M. and M.A.A.-a.; software, A.Y.M.; C.C. validation, A.Y.M. and M.A.A.-a.; formal analysis, A.Y.M.; investigation, H.J.D. and A.Y.M.; resources, A.Y.M. and H.J.D.; data curation, A.Y.M.; writing original draft preparation, A.Y.M. and M.A.A.-a.; writing—review and editing, A.Y.M.; C.C., J.V.B.B. and M.A.A.-a.; visualization, M.A.A.-a.; supervision, M.A.A.-a.; project administration, M.A.A. a.; funding acquisition, M.A.A.-a. All authors have read and agreed to the published version of the manuscript.

**Funding:** The experimental part of the work reported herein (Medical\_Image\_DL-PR) is fully supported by National PARAM Supercomputing Facility (NPSF), Centre for Development of Advanced

Computing (C-DAC), Savitribai Phule Pune University Campus, India. We acknowledge our sincere thanks for providing such excellent computing resources.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data used to support the findings of this study are available from the corresponding author upon request.

**Acknowledgments:** This work was supported in part by University of Mysore and the Ministry of Science and ICT (MSIT), South Korea, through the Information Technology Research Center (ITRC) Support Program under Grant IITP-2021-2017-0-01629, and in part by the Institute for Information & Communications Technology Promotion (IITP), through the Korea Government (MSIT) under Grant 2017-0-00655 and IITP-2021-2020-0-01489 and Grant NRF-2019R1A2C2090504. We also acknowledge to HPC lab UOM and University of Mysore.

**Conflicts of Interest:** There are no conflict of interest associated with publishing this paper.

### **References**


### *Proceeding Paper* **Health Monitoring of Civil Structures: A MCMC Approach Based on a Multi-Fidelity Deep Neural Network Surrogate †**

**Matteo Torzoni 1,2,\* , Andrea Manzoni <sup>2</sup> and Stefano Mariani <sup>1</sup>**


**Abstract:** To meet the need for reliable real-time monitoring of civil structures, safety control and optimization of maintenance operations, this paper presents a computational method for the stochastic estimation of the degradation of the load bearing structural properties. Exploiting a Bayesian framework, the procedure sequentially updates the posterior probability of the damage parameters used to describe the aforementioned degradation, conditioned on noisy sensors observations, by means of Markov chain Monte Carlo (MCMC) sampling algorithms. To enable the analysis to run in real-time or quasi real-time, the numerical model of the structure is replaced with a data-driven surrogate used to evaluate the (conditional) likelihood function. The proposed surrogate model relies on a multi-fidelity (MF) deep neural network (DNN), mapping the damage and operational parameters onto sensor recordings. The MF-DNN is shown to effectively leverage information between multiple datasets, by learning the correlations across models with different fidelities without any prior assumption, ultimately alleviating the computational burden of the supervised training stage. The low fidelity (LF) responses are approximated by relying on proper orthogonal decomposition for the sake of dimensionality reduction, and a fully connected DNN. The high fidelity signals, that feed the MCMC within the outer-loop optimization, are instead generated by enriching the LF approximations through a deep long short-term memory network. Results relevant to a specific case study demonstrate the capability of the proposed procedure to estimate the distribution of damage parameters, and prove the effectiveness of the MF scheme in outperforming a single-fidelity based method.

**Keywords:** structural health monitoring; Markov chain Monte Carlo; deep learning; multi-fidelity; reduced order modeling; damage identification

### **1. Introduction**

Civil structures and infrastructures are critical for the life of the world population and play a strategic role for the global economy [1]. Aging and ever-increasing extreme loading conditions threaten existing and new structural systems, stressing the need of real-time structural health monitoring (SHM) procedures to detect and identify any deviation from the damage-free baseline [2].

Vibration-based SHM techniques investigate the structural health by recording and analyzing the vibration response, e.g., acceleration or displacement multivariate time series, of the monitored structure. Two competitive SHM approaches can be formally distinguished [3]: the model-based one, e.g., [4,5], and the data-based one, e.g., [6,7]. The former is usually implemented through an updating strategy of a physics-based model on the basis of measured experimental data, which attempts to estimate the location and the extent of the occurred structural changes. The latter is based on a machine learning (ML)

**Citation:** Torzoni, M.; Manzoni, A.; Mariani, S. Health Monitoring of Civil Structures: A MCMC Approach Based on a Multi-Fidelity Deep Neural Network Surrogate. *Comput. Sci. Math. Forum* **2022**, *2*, 16. https://doi.org/10.3390/ IOCA2021-10889

Academic Editor: Frank Werner

Published: 22 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

paradigm that, once trained, can be used as a black-box tool. ML systems automatically learn how the features, originated from the recorded data, are statistically correlated with the sought damage patterns [8]. After the advent of deep learning (DL), which can incorporate the selection and extraction of optimized features into the end-to-end learning processes, the feature engineering stage has been progressively automatized.

This work proposes an output-only approach to the damage localization problem (see for instance [9,10]), leveraging a synergic combination of multi-fidelity (MF) data-driven meta-modeling and Bayesian parameter identification. The probability distribution of the unknown damage parameters is approximated through a Markov chain Monte Carlo (MCMC) sampling algorithm.

MCMC has been applied in Bayesian model updating and model class selection in structural mechanics as well as in SHM, see, e.g., [11,12]. In this work, MCMC is used to construct a Markov chain of the sought damage parameters, whose limit distribution is the target probability distribution. The probability distribution is sequentially updated by exploring the support of the damage parameters with a density of steps proportional to the unknown posterior distribution. The sampling acceptance is governed by the evidence of the current parameters to represent sparse dynamic response measurements, as provided by a sensors network, by means of a data-driven surrogate model.

Because handling finite element (FE) simulations within an MCMC analysis is computationally impractical, a FE model capable of simulating the effect of damage on the structural response is adopted only to build labelled datasets of vibration recordings for known damage positions, see for instance [13]. A data-driven surrogate model is adopted instead to map operational and damage parameters to the associated vibration signals in place of the FE model. Such surrogate is based on a multi-fidelity deep neural network (MF-DNN) trained on synthetic data of multiple fidelities, a ML paradigm adopted and extended for instance in [14,15]. Specifically, a limited amount of high fidelity (HF) data and a lot of cheaper low fidelity (LF) data are considered. This type of meta-modeling is useful to alleviate the high demand during training of HF data, potentially expensive to collect. Indeed, the LF data supply useful information on the trends of HF data, allowing the MF-DNN to enhance the prediction accuracy only leveraging few HF data in comparison to the single-fidelity method [16].

### **2. SHM Methodology**

The proposed methodology is detailed as follows. The composition of the datasets used to train the surrogate model is specified in Section 2.1, the considered numerical models are discussed in Section 2.2, the MF-DNN surrogate model is described in Section 2.3, and the setup of the MCMC analysis for damage localization is explained in Section 2.4.

### *2.1. Datasets Definition*

The LF and HF datasets, respectively **D**LF and **D**HF, are built from the assembly of *I*LF and *I*HF instances, as follows

$$\mathbf{D}\_{\rm LF} = \left\{ (\mathbf{x}\_{i}^{\rm LF}, \mathbf{U}\_{i}^{\rm LF}) \right\}\_{i=1}^{l\_{\rm LF}}, \quad \mathbf{D}\_{\rm HF} = \left\{ (\mathbf{x}\_{j}^{\rm HF}, \mathbf{U}\_{j}^{\rm HF}) \right\}\_{j=1}^{l\_{\rm HF}};\tag{1}$$

each LF instance is provided by a LF model of the structure to be monitored in undamaged conditions, and consists of the input parameters **x**LF *<sup>i</sup>* <sup>∈</sup> <sup>R</sup>NLF par defining the operational conditions, i.e., the loadings acting on the structure during the *i*-th instance, and the relative LF vibration time-histories **U**LF *<sup>i</sup>* (**x**LF *<sup>i</sup>* )=[**u**LF <sup>1</sup> , ... , **<sup>u</sup>**LF *Nu* ]*<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*Nu*×*<sup>L</sup>* shaped as *Nu* arrays of length *L*. The HF counterpart is provided by a HF model of the same structure, which also accounts for the presence of structural damage and internal damping. Each HF instance consists of the input parameters **x**HF *<sup>j</sup>* <sup>∈</sup> <sup>R</sup>NHF par , defining the operational and damage conditions, with NHF par > NLF par, and the associated HF vibration recordings **U**HF *<sup>j</sup>* (**x**HF *<sup>j</sup>* ) ∈ R*Nu*×*L*. As often done in the SHM literature, see for instance [3,6,12], the structural damage is modeled as a selective reduction of the material stiffness, applied to a subdomain identified by the spatial coordinates of its center *<sup>θ</sup><sup>j</sup>* <sup>⊂</sup> **<sup>x</sup>**HF *<sup>j</sup>* . For simplicity, the same sampling frequency and monitored degrees of freedom (dofs) are considered for the two fidelities, but there are no restrictions on this respect. Each instance refers to a time window (0, *T*), short enough to assume steady operational, environmental, and damage conditions. In the reminder of the paper the indexes *i*, *j* will be dropped.

### *2.2. Datasets Population*

The monitored structure is modeled as an elastic continuum discretized in space by means of a FE triangulation. The HF numerical model results from the semi-discretized form of the elasto-dynamic problem defined over the FE mesh. On the other hand, in order to ease the construction of a large LF dataset, a projection-based model order reduction strategy for parametrized systems is adopted to build the LF model, see, e.g., [9]. To this aim, the reduced basis method [17] relying on the proper orthogonal decomposition (POD)- Galerkin approach is considered. Hence, the LF approximation is obtained as a linear combination of POD-basis functions, yet not accounting for the presence of damage and structural damping. The LF and HF models read respectively as

$$\begin{cases} \mathbf{M}^{\mathrm{R}}\dot{\mathbf{d}}^{\mathrm{R}}(t) + \mathbf{K}^{\mathrm{R}}\mathbf{d}^{\mathrm{R}}(t) = \mathbf{f}^{\mathrm{R}}(\mathbf{x}^{\mathrm{LF}}) \,, \quad t \in (0, T) \\\ \mathbf{d}^{\mathrm{R}}(0) = \mathbf{W}^{\top}\mathbf{d}\_{0} \\\ \dot{\mathbf{d}}^{\mathrm{R}}(0) = \mathbf{W}^{\top}\dot{\mathbf{d}}\_{0} \,. \end{cases} \tag{2}$$
 
$$\begin{cases} \mathbf{M}\ddot{\mathbf{d}}(t) + \mathbf{C}(\mathbf{x}^{\mathrm{HF}}(\theta))\dot{\mathbf{d}}(t) + \mathbf{K}(\mathbf{x}^{\mathrm{HF}}(\theta))\mathbf{d}(t) = \mathbf{f}(\mathbf{x}^{\mathrm{HF}}) \,. \quad t \in (0, T) \\\ \mathbf{d}(0) = \mathbf{d}\_{0} \\\ \dot{\mathbf{d}}(0) = \dot{\mathbf{d}}\_{0} \end{cases} \tag{3}$$

where the superscripts *L* and *H* are omitted from all the arrays for simplicity, while the superscript *R* stands for *reduced*. Having denoted by: *t* ∈ (0, *T*) the time coordinate; **<sup>d</sup>**(*t*) <sup>∈</sup> <sup>R</sup>*M*, **<sup>d</sup>**˙ (*t*) <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* and **<sup>d</sup>**¨(*t*) <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* the vectors of nodal displacements, velocities and accelerations, respectively, whereas *<sup>M</sup>* is the number of dofs; **<sup>M</sup>** <sup>∈</sup> <sup>R</sup>*M*×*<sup>M</sup>* the mass matrix; **<sup>C</sup>**(**x**HF(*θ*)) <sup>∈</sup> <sup>R</sup>*M*×*<sup>M</sup>* the damping matrix, modeled as Rayleigh damping for mathematical convenience; **<sup>K</sup>**(**x**HF(*θ*)) <sup>∈</sup> <sup>R</sup>*M*×*<sup>M</sup>* the stiffness matrix; **<sup>f</sup>**(**x**LF),**f**(**x**HF) <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* the vectors of nodal forces; **<sup>d</sup>**<sup>0</sup> and **<sup>d</sup>**˙ <sup>0</sup> the initial conditions at *<sup>t</sup>* <sup>=</sup> 0; **<sup>W</sup>** = [**w**1, ... , **<sup>w</sup>***MR* ] <sup>∈</sup> <sup>R</sup>*M*×*MR* the matrix gathering the *MR <sup>M</sup>* retained POD-basis functions; **<sup>M</sup>***R*, **<sup>K</sup>***R*,**f***R*(**x**LF), **<sup>d</sup>***R*(*t*) the reduced arrays, playing the same role of the FE matrices but with dimension ruled by *MR* instead of *M*. It has to be noted that, even if in this case the two fidelities differ through the presence of structural damage and viscous damping in the HF model, the proposed computational framework is general and can be arbitrarily adapted to different modeling choices.

The datasets **D**LF and **D**HF are populated accordingly to Equation (1) by sampling the parametric input spaces, respectively defined by a uniform probability distribution over **x**LF and **x**HF, via latin hypercube sampling. The relevant vibration recordings **U**LF and **U**HF are extracted from **d**LF and **d**HF, respectively, through a Boolean operation.

### *2.3. MF-DNN Surrogate Model*

The MF-DNN N NMF is composed of a LF neural network N NLF, trained on low-cost data, which is used as baseline model, and a HF neural network N NHF, trained on few HF data, which is used to adaptively learn the correlation between LF and HF data. The overall evaluation of N NMF reads as

$$\mathbf{U}^{\rm HF} = \mathcal{N}\mathcal{N}\_{\rm HF}(\mathbf{x}^{\rm HF}, \mathbf{x}^{\rm LF}) = \mathcal{N}\mathcal{N}\_{\rm HF}(\mathbf{x}^{\rm HF}, \mathbf{U}^{\rm LF}) \,, \quad \mathbf{U}^{\rm LF} = \mathsf{reshape}[\mathbf{Y}(\frac{1}{\omega} \odot \mathcal{N}\mathcal{N}\_{\rm LF}(\mathbf{x}^{\rm LF}))] \,, \tag{4}$$

here: **<sup>Y</sup>** = [**y**1, ... , **<sup>y</sup>***M*LF ] <sup>∈</sup> <sup>R</sup>*L*concat×*M*LF , with *<sup>L</sup>*concat <sup>=</sup> *<sup>L</sup>* <sup>×</sup> *Nu*, is a matrix gathering *<sup>M</sup>*LF POD-basis functions built upon **D***<sup>L</sup>* and used to compress the LF data in order to ease the complexity of N NLF; N NLF is a fully connected DNN, mapping the LF input parameters onto the POD-basis coefficients; *<sup>ω</sup>* <sup>∈</sup> <sup>R</sup>*M*LF is a vector of numbers linearly decreasing from 1 to 0.2, used to weight the regression over the POD-basis coefficients by their relative importance; denotes the Hadamard product; the reshape operation is used to recast the reconstructed LF signals from a single vector of size *L*concat into *Nu* arrays of length *L*; N NHF is a long short-term memory (LSTM) NN that, as more appropriate to solve timedependent problems, is adopted to map the HF input parameters and the approximated LF signals onto the HF signals.

### *2.4. Damage Localization via MCMC*

Accordingly to the Bayes' rule, the posterior probability density function (pdf) of the damage parameters *θ*, conditioned on the observed signals **U**EXP 1,...,Nobs is

$$p(\boldsymbol{\theta}|\mathbf{U}\_{1,\ldots,\text{M}\_{\text{flat}}}^{\text{EXP}},\mathcal{N}\mathcal{N}\_{\text{HF}}) = \frac{p(\mathbf{U}\_{1,\ldots,\text{M}\_{\text{flat}}}^{\text{EXP}}|\boldsymbol{\theta},\mathcal{N}\mathcal{N}\_{\text{MF}})p(\boldsymbol{\theta},\mathcal{N}\mathcal{N}\_{\text{MF}})}{\int p(\mathbf{U}\_{1,\ldots,\text{M}\_{\text{flat}}}^{\text{EXP}}|\boldsymbol{\theta},\mathcal{N}\mathcal{N}\_{\text{MF}})p(\boldsymbol{\theta},\mathcal{N}\mathcal{N}\_{\text{MF}})d\boldsymbol{\theta}},\tag{5}$$

where: *<sup>p</sup>*(*θ*, N N MF) is the prior of *<sup>θ</sup>*; *<sup>p</sup>*(**U**EXP|*θ*, N N MF) is the likelihood of the evidence, which measures the goodness of fit of N N MF to **<sup>U</sup>**EXP given the parameters *<sup>θ</sup>*. By assuming that the uncertainties follow a Gaussian distribution, the likelihood function can be assumed Gaussian too thanks to the central limit theorem:

$$p(\mathbf{U}\_{1,\dots,\mathbb{X}\_{\text{data}}}^{\text{EXP}} \mid \boldsymbol{\theta}, \mathcal{N}\mathcal{N}\_{\text{RF}}) = \prod\_{k}^{\mathbb{X}\_{\text{data}}} \frac{1}{(\sqrt{2\pi})^{N\_{u}}\sqrt{|\boldsymbol{\Sigma}\_{c}|}} \exp\left(-\frac{\frac{1}{L}\sum\_{\tau=1}^{L} [(\mathbf{e}\_{\tau}^{\top}\mathbf{A}\_{k})^{\top}\boldsymbol{\Sigma}\_{c}^{-1}(\mathbf{e}\_{\tau}^{\top}\mathbf{A}\_{k})]}{2}\right);\tag{6}$$

here: Nobs is the batch size of the processed observations; **Δ***<sup>k</sup>* = **U**EXP *<sup>k</sup>* <sup>−</sup> **<sup>U</sup>**<sup>ˆ</sup> HF(**x**HF(*θ*), **<sup>x</sup>**LF) is the prediction error for the *k*-th observation, assumed independent between different time instants and modeled as a Gaussian random vector with zero mean and covariance matrix **<sup>Σ</sup>***<sup>c</sup>* <sup>∈</sup> <sup>R</sup>*Nu*×*Nu* , describing the spatial correlation of prediction errors due to modeling errors and measurement noise; **e***τ* is a Boolean vector with a single non-zero entry in *τ*-th position, used to extract the relevant time step. For further details see, e.g., [18].

To avoid the expensive computation of the integral at the denominator of Equation (5), an MCMC sampling algorithm is adopted to approximate the posterior pdf. Specifically, the posterior pdf is sequentially updated accordingly to the Metropolis-Hastings (MH) algorithm [19]. The MH algorithm simulates a chain of *θ* samples distributed according to the posterior, with each sample only depending on the previous one. This generate a random walk in the space of *θ*, where each point is sampled with a frequency proportional to its probability. Hence, the stationary distribution of the Markov chain, under the assumption of ergodicity, asymptotically approaches the target pdf.

Let *<sup>q</sup>*(*ξ*|*θ*) be the assumed *proposal* pdf and *<sup>δ</sup>*(*θ*) = *<sup>p</sup>*(**U**EXP 1,...,Nobs |*θ*, N N MF)*p*(*θ*, N N MF) for the sake of simplicity. The MH algorithm recursively simulate the next Markov chain sample *θk*+<sup>1</sup> from the current sample *θk*, with *k* = 1, ... , *L*chain, as follows [20]: sample a candidate *<sup>ξ</sup>* from *<sup>q</sup>*(*ξ*|*θk*); compute the ratio *<sup>α</sup>* <sup>=</sup> *<sup>δ</sup>*(*ξ*)*q*(*θ<sup>k</sup>* <sup>|</sup>*ξ*) *<sup>δ</sup>*(*θ<sup>k</sup>* )*q*(*ξ*|*θ<sup>k</sup>* ); accept the candidate *<sup>ξ</sup>* with probability min{1, *α*} and store it as next state of the chain, i.e., *θk*+<sup>1</sup> = *ξ*, otherwise reject it and keep the current state of the chain, i.e., *θk*+<sup>1</sup> = *θk*.

After *L*chain states are evaluated, the burn-in period of the chain, i.e., the initial transitory phase, is removed to eliminate the initialization effect. The resulting chain is thinned up to *L*˜ chain = *<sup>L</sup>*chain *kT* , with *kT* a small fixed integer, in order to remove dependencies among consecutive samples. The target distribution can be ultimately approximated via histograms

and the posterior expected values and covariance can be eventually approximated with the empirical mean and covariance of the *θ*1,..., *θL*˜ chain samples:

$$\mu\_{\theta} = \mathbb{E}(\theta | \mathbf{U}\_{1,\dots,\mathbb{X}\_{\text{data}}}^{\text{Exp}} \mathcal{N} \mathcal{N}\_{\text{HF}}) \approx \frac{1}{L\_{\text{chastn}}} \sum\_{l=1}^{L\_{\text{chastn}}} \theta\_{l} \, \, \, \, \tag{7}$$

$$\text{cov}(\boldsymbol{\theta}|\mathbf{U}\_{1,\dots,\mathbb{U}\_{\text{data}}}^{\text{EP}},\mathcal{N}\mathcal{N}\_{\text{MF}}) \approx \frac{1}{L\_{\text{chain}}-1} \sum\_{l=1}^{L\_{\text{chain}}} \left[\boldsymbol{\theta}\_{l} - \boldsymbol{\mu}\_{\boldsymbol{\theta}}\right] [\boldsymbol{\theta}\_{l} - \boldsymbol{\mu}\_{\boldsymbol{\theta}}]^\top \,. \tag{8}$$

### **3. Virtual Experiment**

The proposed method is validated on the digital twin shown in Figure 1. The HF model in Equation (3) is obtained from a FE discretization resulting in *M* = 4659 dofs and integrated in time using the Newmark method. The structure is made of concrete, whose mechanical properties are: Young's modulus *E* = 30 GPa; Poisson's ratio *ν* = 0.2; density *ρ* = 2500 kg/m3 . The structure is excited at the tip by a distributed load *q*(*t*), acting on an area of (0.3 × 0.3) m2, as depicted in Figure 1. The load *<sup>q</sup>*(*t*) varies in time according to *q*(*t*) = *Q* sin (2*π f t*), where *Q* ∈ [1, 5] kPa and *f* ∈ [10, 60] Hz respectively denote the load amplitude and frequency, collected as **x**LF = (*Q*, *f*). Damage is introduced by reducing the material stiffness by 25% within the subdomain Ω, which is a box (0.3 × 0.3 × 0.4) m3 as depicted in Figure 1. The target position of this reduction is given by the coordinates of its center and can be identified with a single abscissa *θ*<sup>Ω</sup> ∈ [0.15, 7.55] m running along the axis of the structure. Hence, the input parameters of the HF part are collected as **x**HF = (*Q*, *f* , *θ*Ω). Also the Rayleigh damping matrix, which account for a 5% damping ratio on the first 4 structural modes, is affected by the damage through the stiffness matrix. Synthetic displacement recordings **u***n*(*t*), with *n* = 1, ... , *Nu*, are collected from *Nu* = 8 dofs, mimicking a monitoring system arranged as depicted in Figure 1, for a time interval (0, *T* = 1 s), providing *L* = 200 data points.

**Figure 1.** Physics-based digital twin of the monitored structure.

The reduced-order model in Equation (2), i.e., the LF model used to construct **D**LF, has been built performing a POD upon 40,000 snapshots in time, collected while exploring the parametric input space **x**LF. 14 POD-bases are selected and stored in matrix **W**, in place of the original 4659 dofs, after having fixed a suitable tolerance on the energy norm of the reconstruction error (tolPOD = 10<sup>−</sup>3); for further details see, e.g., [9,13].

For the training of the surrogate model in Equation (4), *I*LF = 10,000 and *I*HF = 1000 instances have been collected from the LF and HF model, respectively. Concerning the compression of the LF data for the sake of prior dimensionality reduction, 104 POD-bases have been selected (tolPOD = 10<sup>−</sup>3) and stored in matrix **Y**, in place of 1600 data points.

The mean squared error and the mean absolute error have been used as loss functions for the training of N NLF and N NHF, respectively, together with the Adam optimization algorithm [21]. The implementation has been carried out through the Tensorflow-based Keras API [22], running on an Nvidia GeForce RTX 3080 GPU card.

An example of the reconstruction capabilities achieved by the surrogate model is shown in Figure 2 for the monitored gdl *u*8(*t*), where the outcome of the regression over the POD-basis coefficients, ruled by the N NLF, and the corresponding expanded LF signal are reported together with the signal enrichment, provided by the N NHF. To quantify the accuracy of the predicted signals, the Pearson correlation coefficients (PCC) between predicted and ground truth HF signals are adopted as a measure of fitness. The PCC coefficients are evaluated with respect to 40 testing instances generated with the HF model while exploring the parametric input space **x**HF. The minimum PCC value over the 40 testing instances for each monitored channel is respectively {0.983; 0.988; 0.994; 0.995; 0.998; 0.998; 0.998; 0.998}, which largely validate the performance of the surrogate model. The other way around, if the N NHF is employed without being coupled with the N NLF, the maximum PCC value drops to {0.605; 0.603; 0.601; 0.601; 0.791; 0.735; 0.709; 0.696}, showing the utility of the MF setting that outperforms the single-fidelity based method.

**Figure 2.** Reconstruction capacity of N NMF: (**a**) regression over the POD-basis coefficients relative to a compressed LF signal; (**b**) decompressed LF signal; (**c**) regression over the HF signal.

In the absence of experimental data, the Bayesian estimation of the damage parameter *θ*<sup>Ω</sup> is simulated by considering pseudo-experimental instances, generated with the HF model, that have been corrupted by adding independent, identically distributed Gaussian noise featuring a signal-to-noise ratio equal to 80 to each vibration recording. Batches of Nobs = 3 observations relative to the same damage condition but different operational conditions are processed during the evaluation of the likelihood in Equation (6). The prior pdf *p*(*θ*Ω, N N MF) is taken as uniform, while, to account for the bounded domain in which *θ*<sup>Ω</sup> can fall, a truncated Gaussian centered on the last accepted state is considered for the proposal *q*(*ξ*|*θ*Ω). The adaptive Metropolis [23] algorithm is adopted in order to ease the calibration of the proposal distribution, enabling its covariance to be tuned on the basis of past samples as the sampling evolves. The MCMC algorithm is run for 5000 samples, the first 500 of which are removed to get rid of the burn-in period. The obtained chain is ultimately thinned by discarding 3 samples over 4 to remove dependencies among consecutive samples.

Two examples of MCMC analyses are reported in Figure 3, showing the generated Markov chains alongside the estimated posterior mean and credibility intervals. In both cases, the damage parameter *θ*Ω, here normalized between 0 and 1, is properly identified. It has to be noted that the larger uncertainty in the second case is somehow expected; indeed, given the structural layout and the placing of the sensors, the sensitivity of measures to damage positions far apart from the clamped side is smaller.

**Figure 3.** Examples of MCMC analysis, in case of damage position (**a**) close to the clamped side and (**b**) far from the clamped side.

### **4. Conclusions**

This paper has presented a stochastic approach for SHM, here applied to the problem of damage localization in case of slow damage progression. The presence of damage has been postulated as already detected, e.g., as identified by an early warning tool, and only the localization task has been analyzed. The Bayesian identification of damage parameters is achieved through an MCMC sampling algorithm, adopted to approximate their posterior distribution conditioned on a set of measurements. Few investigations are present in literature involving the use of MCMC for the health monitoring of civil structures, and this is the first one considering a MF-DNN surrogate model to accelerate the computation of the conditional likelihood. The surrogate model learns from simulated data of multiple fidelities, i.e., few HF data and several inexpensive LF data, such to alleviate the computational burden of the supervised training stage. The method has been assessed on a numerical case study, showing remarkable accuracy under the effect of measurement noise and varying operational conditions.

The method is suitable for structural typologies whose damage patterns can be represented by a stiffness reduction fixed within the time interval of interest. Since it enables a time scale separation between damage growth and damage assessment, this is a standard assumption for most practical scenarios in SHM. Such description of damage is consistent with the adopted vibration-based SHM approach, and allows the structure to be modeled as a linear system both in the presence and absence of damage. Moreover, as shown in [9], even if the stiffness reduction takes place over domains of different size from that one adopted during the dataset construction, it is still possible to identify the correct position of damage.

Considering data-driven algorithms, damage localization is often addressed by exploiting a DL feature extractor followed by a classification or a regression module, e.g., as done in [9,10,13]. However, due to the need of training in a simulated environment, the risk of losing generalization capabilities on real monitoring data is high. The proposed procedure tries to overcomes such generalization problems. Damage is located by seeking for those parameters of the surrogate model producing the closest output to the measured one, in terms of a suitable distance function measuring the signals similarity. For this reason and thanks to the fully stochastic framework here considered, which is suitable for dealing with noisy data and modeling inaccuracies, it is reasonable to expect a better ability of generalizing outside the training regime.

Besides the need of validating the proposed methodology within a suitable experimental setting, the next studies will extended the Bayesian identification also to the parameters controlling the operational conditions. Moreover, a usage monitoring tool powered by a suitable data-driven paradigm will be considered to provide useful prior knowledge as opposite to an informative flat prior. The analysis of dynamic effects resulting from localized damage mechanisms is also left for future investigations.

**Author Contributions:** Conceptualization, M.T., A.M. and S.M.; methodology, M.T., A.M. and S.M.; software, M.T.; validation, M.T., A.M. and S.M.; formal analysis, M.T., A.M. and S.M.; investigation, M.T.; resources, A.M. and S.M.; data curation, M.T.; writing—original draft preparation, M.T.; writing—review and editing, M.T., A.M. and S.M.; visualization, M.T.; supervision, A.M. and S.M.; project administration, A.M. and S.M.; funding acquisition, A.M. and S.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data generated during the study are available from the corresponding author upon reasonable request.

**Acknowledgments:** M.T. acknowledges the financial support by Politecnico di Milano through the interdisciplinary Ph.D. Grant "Physics-Informed Deep Learning for Structural Health Monitoring".

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **Learning the Link between Architectural Form and Structural Efficiency: A Supervised Machine Learning Approach †**

**Pooyan Kazemi \*, Aldo Ghisi and Stefano Mariani**

Department of Civil and Environmental Engineering, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133 Milano, Italy; aldo.ghisi@polimi.it (A.G.); stefano.mariani@polimi.it (S.M.)

**\*** Correspondence: seyedpooyan.kazemi@polimi.it

† Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Abstract:** In this work, we exploit supervised machine learning (ML) to investigate the relationship between architectural form and structural efficiency under seismic excitations. We inspect a small dataset of simulated responses of tall buildings, differing in terms of base and top plans within which a vertical transformation method is adopted (tapered forms). A diagrid structure with members having a tubular cross-section is mapped on the architectural forms, and static loads equivalent to the seismic excitation are applied. Different ML algorithms, such as kNN, SVM, Decision Tree, Ensemble methods, discriminant analysis, Naïve Bayes are trained, to classify the seismic response of each form on the basis of a specific label. Presented results rely upon the drift of the building at its top floor, though the same procedure can be generalized and adopt any performance characteristic of the considered structure, like e.g., the drift ratio, the total mass, or the expected design weight. The classification algorithms are all tested within a Bayesian optimization approach; it is then found that the Decision Tree classifier provides the highest accuracy, linked to the lowest computing time. This research activity puts forward a promising perspective for the use of ML algorithms to help architectural and structural designers during the early stages of conception and control of tall buildings.

**Keywords:** supervised machine learning; classification; tall building; architectural form; structural efficiency

### **1. Introduction**

Architects and designers have always been curious about building novel forms, though there were lots of restrictions for exploring complex forms. While in some cases there are less limitations in the design, on other occasions engineering fields dictate many considerations, for example in the case of tall buildings, which represent one of the most complicated design processes [1]. Tall buildings are an outstanding architectural production and require amazing resources with immense expenses due to their large scale. Since they became nowadays more sophisticated, it is essential to feature suitable and efficient structural configurations. Design teams are currently looking for specialists with a knowledge about efficient, or optimal structural design [2]. The entire design process also requires a close collaboration between architects and engineers, who look for a software that provides them clear requirements for the architectural form and identifies the alternative with the highest structural efficiency and, at the same time, provides a portfolio of different options. Such a software might help contractors and clients to reduce the total costs of construction [2], since about one third of the total expenses are related to the structure; accordingly, structural considerations should be allowed for in the early stage of the design process [3]. Although the early-stage design phase is a negligible part of the whole design process, it thus plays a relevant role in the whole procedure, see [4]. In our modern, smart-city age, the design of tall buildings has become the outcome of the close teamwork of architects, structural and

**Citation:** Kazemi, P.; Ghisi, A.; Mariani, S. Learning the Link between Architectural Form and Structural Efficiency: A Supervised Machine Learning Approach *Comput. Sci. Math. Forum* **2021**, *2*, 18. https:// doi.org/10.3390/IOCA2021-10891

Academic Editor: Frank Werner

Published: 22 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

mechanical engineers that resulted in a considerable efficiency. By the advance of technology and the constructability of complex forms, the structural efficiency may fade and, even more regrettably, it can lead to depleting the Earth resources in case of an inefficient use of materials load bearing capacity [5]. In the contemporary tall building design approach, structural considerations are not fully taken into account, till when the architectural form is appropriately generated; this procedure compels the structural intervention to fix the single, individual problem rather than really integrating the structural model into the initial architectural form. Since the design process in the early-stage is so critical, a workflow should be utilized to assure that all aspects are considered simultaneously, see e.g., [6,7]. Moreover, architects should simultaneously consider various design objectives, including structural efficiency, since the 80% of the consumption of construction materials is defined at this stage.

Some researchers investigated the relation of architecture and structural efficiency for tall buildings [8]. On some occasions, it was claimed that hyperboloid form has better structural efficiency in comparison to the cylindrical form. Others also considered the effect of the architectural form on the structural efficiency, the investigation resting on different, alternate geometries [9–12].

Within the frame of a parametric design paradigm, all the design features such as form, can be modified at any time during the design process [13]. In the so-called computational design approach, a steady capability to create complex models in terms of form is pursued; with this approach, ordinary structural modeling performances are supplied through a simulation tool, to cope with the problem complexity and speed up requirement compliance: several alternatives can be thus adopted in the current model of a building at a glance [13]. It turns out that parameters that define the architectural and structural components of the model, are flexible and can be adjusted on the fly.

There is currently a lack of research activities regarding the application of ML in the field of architecture. For example, in [14] ML was adopted to generate non-conventional structural forms that can be classified as objectives or subjective of the design product. ML indeed provides the designer with an insight on the structural efficiency of the solutions [15]. Alternatively, artificial intelligence has been exploited to add more creativity to the design process, e.g., by using a variational autoencoder in a design framework: the algorithm generates some samples and then a the autoencoder can start training the model.

In this work, we focus on the use of ML tools to learn the link between the outer shape of tall buildings, their load bearing frame and the overall capacity to resist earthquake excitations. Different algorithms are trained by exploiting a rather small dataset of results regarding the response of buildings of different shapes excited by a seismic-like loading, and a comparison in provided in terms of their efficiency to get trained and their capability to provide accurate surrogates of the real structures.

### **2. Proposed Methodology**

The present approach consists of three stages: (i) architectural form generation; (ii) structural analysis; and (iii) supervised ML. Out of these three stages, only the last one provides novelties, since it addresses the question as to whether it would be possible to obtain accurate surrogates within a ML-based process. It was indeed time consuming to generate all the architectural forms, and then build the structural model for all the considered 144 forms, apply loads, and finally carry out the structural analysis. For example, if only a part of the 144 forms would be investigated for modelling and training the ML tool, the result for the remaining part of the dataset could be generated automatically. The main goal of this research is so to apply ML on the aforementioned problem or, more precisely, to find an optimal case-dependent classification algorithm.

### *2.1. Architectural Form Generation*

The architectural form of a tall building has been interpreted here as a top and bottom plan and a vertical transformation method consisting of morph, twist, or a curvilinear transformation, see [16]. A set of 144 different architectural forms of tall buildings has been generated. Top and base plans could be varied within 3,4,5,6,7,8,9,10,11,12,13 or 24 sided polygons. This process exploited Rhinoceros™ and Grasshopper™, thanks to their powerful parametric tools. In next step, a diagrid (tubular) structure has been designed, sharing pinned joints with intermediate concrete floor slabs carrying only dead load. A seismic load has been then applied to the center of mass of each concrete floor, according to a statical equivalent method, see [17] for details. Finally, a structural analysis was carried out in Karamba™, a parametric structural analysis plug-in for Grasshopper™. In Figure 1, a part of the 144 mentioned tall buildings is shown including both the architectural and the structural models.

**Figure 1.** Sketch of 36 out of 144 generated architectural forms, with the diagrid structural model visible on the building skin.

### *2.2. Structural Results*

After having analyzed all the considered building forms, a spreadsheet summarizing the structural behavior of the models has been filled in. Parameters characterizing the structural response such as drift, total weight, maximum normal forces, maximum utilization have been investigated to compare all the models. A graph, showing the top and base plan of each form, with a color representing the range of the structural parameter of interest, results insightful to compare the outcomes at a glance. Figure 2 shown such a graph in relation to the drift, for all the generated models. According to it, the green color qualitatively represents the tall buildings which are characterized by a lower drift, while the red color shows the tall buildings featuring a higher drift. It can be seen that, by increasing the side number of plans the structural efficiency is improved, and forms located along the diagonal blue lines in the figure mostly have similar structural behavior [18]. It is however difficult to foresee the behavior of a single (variant) form without retracing all the mentioned stages of the analysis. Hence, ML could help in recognizing the patterns in such a representation of the results.

### *2.3. Supervised Machine Learning—A Classification Approach*

While it is possible to explore the structural outcomes for all the models manually, it results profitable to do it automatically by means of a supervised ML approach. In this work, a small data set has been considered: out of the 144 architectural forms, 75% (108 forms) have been used for training, and 25% percent (36 forms) have been instead used for testing. First, a randomization algorithm has been applied to split the dataset into the training and testing sets, without any bias. The next step has been to define a label for data classification: as already mentioned, in tall buildings an important factor is represented by the drift, i.e., the horizontal displacement of top floor [19]; several standards define a limit

for it, like e.g., 1/500 height of the building height. A qualitative label has been defined for the drift, exploiting its values ranging from a minimum of 34 cm to a maximum of 158 cm within the dataset. Tall buildings whose drift was near 34 cm have been considered "very good" in their structural behavior; a drift increase would be linked to a diminished structural efficiency. Five classes have been then defined for data classification (0: very bad, 1: bad, 2: not bad not good, 3: good, 4: very good). In Figure 3, all the five classes are shown for the whole data, in case the drift is represented against the total weight of the structure; similar representations can be obtained with all the other indices, though the trend might not show up so clear in the graph. We anticipate that good classification results are obtained if this label is chosen. It can be also understood that, by increasing the total weight the drift decreased, as it leads to a stiffer structure and, accordingly, to a smaller displacement or drift under the selected excitation.

**Figure 2.** Colored diagram of the drift response computed for all the analyzed models.

**Figure 3.** Representation of the structural response in terms of drift against the total mass, and of the five classes defined on the basis of the drift as a label.

Another possible strategy would be to categorize the forms according to the base plan geometry. In this additional case, twelve labels have been defined by considering the sides or vertices of the polygons (i.e., 3,4,5,6,7,8,9,10,11,12,13,24 sided polygons). In this case the classification algorithms have performed rather badly, with no remarkable results.

### *2.4. Classification Algorithms and Hyperparameter Optimization*

5-fold cross validation has been applied to guarantee lack of overfitting and eight predictors have been considered. After assigning the label, the following six classification algorithms have been adopted in the MATLAB classification learner toolbox: *k*-nearest neighbors; support vector machine; decision tree; ensemble method; discriminant analysis; and Naïve Bayes. Instead of tuning each classification algorithm parameter manually, it would be better to define them within an optimization process. We have inspected three types of optimizations [20]: grid search, random search, and Bayesian optimization.

Each of these optimization approaches has a specific property, see e.g., [20] for further details. The Bayesian optimization approach has been used because it can lead to better results in a shorter time and through fewer iterations; moreover, it is the only approach that efficiently exploits the iteration results according to the Bayes rule. In Figure 4, the Bayesian optimization is showed for the kNN algorithm, for 50 iterations: at iteration 35 the optimum result has been already attained, with a minimum classification error of about 12.5%, so with an accuracy for the training dataset of 87.5%. The four tuned hyperparameters of kNN are also reported in the graph.

**Figure 4.** Example of Bayesian optimization of the ML hyperparameters, and relevant results.

### **3. Results of the ML Classification**

First, it has been tested whether supervised ML classification can be used in this case study. By means of a very simple implementation of the kNN algorithm, the accuracy for training has resulted to be 91.7%, while the accuracy for testing has been 83.3%. It has been thus proved that the classification algorithm can correctly predict the structural response of tall buildings, in case the label is appropriately chosen. According to the confusion matrix for training and testing depicted in Figure 5, it can be understood that each class does not have the same number of observations (represented by the numbers in the matrices). Via the kNN classifier, 4 observations have been misclassified in the training dataset, and 11 in the testing dataset. Another important note is that all observations related to class 1 are completely misclassified; this is due to the fact that, in the training dataset, there are no data associated to this class, and the ML model cannot be trained appropriately. Such results occurred for this specific randomization, and it may vary from one randomization to another of the same set. It is therefore claimed to be a drawback of the procedure, mainly linked to the small dataset.

**Figure 5.** Confusion matrix relevant to the kNN classification algorithm: (**a**) training dataset, (**b**) testing dataset.

In what follows, a brief account of the results achieved with the six different classification algorithms is provided.

### *3.1. k-Nearest Neighbors*

kNN [21] results depend on (i) the number of neighbors allowed for in the state space, (ii) the metric to measure the distance between neighbors, and (iii) a weight for the measured distances. In this research an optimization method was adopted to reach the maximum accuracy, by changing the hyperparameters, by enabling or disabling a principal component analysis (PCA) of the data [22], and by using random search and grid search, instead of Bayesian optimization. A range 1–54 was defined for *k*, and a variety of distance metrics have been adopted. The accuracy has been ranging from 80% to 91.7% for training, and from 94.4% to 97.2% for testing; the computing time was instead ranging from 16.3 s to 64.6 s.

### *3.2. Support Vector Machine*

In comparison to kNN, support vector machine (SVM) consumes a considerable amount of time for the computation, as it originally works with binary classes; multiple classes are treated as several combinations of binary ones [23]. Four kernel functions have been adopted, namely the Gaussian, linear, quadratic, cubic ones, which are related to the kind of support vector classifiers. There is a kernel scale feature, and the multi class method could be one-vs-one, or one-vs-all; the one-vs-one method has turned out to provide more accurate results, though can be very time consuming. The accuracy has finally ranged from 94.4% to 97.2% for training, and from 94.4% to 97.2% for testing; the computing time has varied from 134 s to 250 s.

### *3.3. Decision Tree*

Decision tree works with the number of splits, and a criterion for them [24]. The number of splits has been varied from 1 to 107; the criterion for split has been selected among the Gini's diversity index, Twoing rule, maximum deviance reduction. The computing time has varied from 17.4 s to 45 s, the accuracy from 86.1% to 93.5% for training, and from 77.8% to 100% for testing. The accuracy for testing of four models out of the five considered has attained the 100% result. It has thus resulted the best classification algorithm.

### *3.4. Ensemble Classifier*

The ensemble classifier algorithm exploits several learning algorithms to reach a final prediction [25]. One of the most famous ensemble classifiers is the bootstrap aggregating (Bagging) one. In this work, the ensemble method has been selected among Bag, AdaBoost, RUS Boost, and the maximum spilt has been varied from 1 to 107. The number of learners has changed from 10 to 500, the learning rate ranged from 0.001 to 1, and the number of predictor samples from 1 to 8. The computing time varied from 72 s to 129.4 s, with an

accuracy from 85.2% to 98.1% for training, and from 94.4% to 100% for testing. After the decision tree, ensemble turns out to be the best possible classification algorithm.

### *3.5. Naïve Bayes*

Naïve Bayes classifier works within a stochastic frame [26], by applying the Bayes theorem. This algorithm features only two hyperparameters: distribution type, and kernel type. Specifically, the kernel type can be one out of Gaussian, Box, Epanechnikov, Triangular ones. The computing time varied from 13.7 s to 109.7 s, with an accuracy for training from 82.4% to 92.6%, and from 83.3% to 91.7% for testing.

### *3.6. Discriminant Analysis*

A discriminant classifier assumes that different classes produce data according to different Gaussian distributions [27]. The only model hyperparameter to select is the discriminant type, which can be linear, quadratic, diagonal linear, or diagonal quadratic. In this case, the computing time varied from 45.3 s to 48.7 s, and the accuracy from 85.2% to 94.4%, and from 83.3% to 91.7% for training and testing, respectively.

For the sake of brevity, all the results are not directly compared here. A detailed analysis, in terms of accuracy and computational costs to go beyond the brief account provided here above, is going to be given in the conference presentation. Readers are therefore directed to it for a thorough discuss on the efficiency of the adopted ML tools.

### **4. Conclusions**

In this work, the relation between architectural form and structural efficiency of tall buildings has been studied via a data-driven approach. Several architectural and structural model generation methods could be used to get insights into which architectural detail or modification may increase the structural efficiency, moving in the direction of morphing or smart structures. A novel view has been provided by adopting machine learning tools to learn the links between shape and structural response under seismic excitations, by also reducing the computing time: a sample dataset has been used to predict the performance of new architectural forms of tall buildings.

It has been proven that supervised machine learning can be successfully applied to this case study. Moreover, among the six investigated classification algorithms, even though each of them provides advantages and disadvantages, the ensemble and the decision tree classifier algorithms have attained the best results.

**Supplementary Materials:** The conference presentation files are available at https://www.mdpi. com/article/10.3390/IOCA2021-10891/s1.

**Author Contributions:** Conceptualization and methodology, all the authors; validation, P.K.; data curation, P.K.; writing—original draft preparation, P.K.; writing—review and editing, A.G. and S.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Proceeding Paper* **Deep Learning Methodologies for Diagnosis of Respiratory Disorders from Chest X-ray Images: A Comparative Study †**

**Akhil Appu Shetty 1, Navya Thirumaleshwar Hegde 2,\*, Aldrin Claytus Vaz <sup>1</sup> and Chrompet Ramesh Srinivasan <sup>1</sup>**


**Abstract:** Chest radiography needs timely diseases diagnosis and reporting of potential findings in the images, as it is an important diagnostic imaging test in medical practice. A crucial step in radiology workflow is the fast, automated, and reliable detection of diseases created on chest radiography. To overcome this issue, an artificial intelligence-based algorithm such as deep learning (DL) are promising methods for automatic and fast diagnosis due to their excellent performance analysis of a wide range of medical images and visual information. This paper surveys the DL methods for lung disease detection from chest X-ray images. The common five attributes surveyed in the articles are data augmentation, transfer learning, types of DL algorithms, types of lung diseases and features used for detection of abnormalities, and types of lung diseases. The presented methods may prove extremely useful for people to ideate their research contributions in this area.

**Keywords:** chest X-ray; computer-aided diagnosis; deep learning methodologies; radiography; respiratory disorders

### **1. Introduction**

On this planet Earth, respiratory diseases are the main health threat to humankind. Approximately, 450 million people, i.e., almost 7% of the world's population are infected by pneumonia alone, resulting in nearly 4 million deaths every year [1]. The diagnosis of these respiratory diseases is performed through most common radiology methods such as chest X-ray (CXR) and chest radiography because they are easily accessible and low cost. Visual inspection of a large number of chest radiographs is carried out on a slice-by-slice basis globally. This method involves more concentration, a high degree of precision, and skill, and it is expensive, prone to operator bias, time-consuming, and difficult to extract the valuable information present in such large-scale data [2]. In many countries, the complexity of chest radiographs has resulted in the shortage of expert radiologists because it is a crucial task for them to discriminate the respiratory diseases. Hence, there is a necessity to develop an automated method for the computer-aided diagnosis of respiratory diseases based on chest radiography.

Deep learning (DL) methods have achieved tremendous growth in the last decade, in the area of various computer vision applications such as the classification of medical and natural images [3]. This led to the development of deep convolutional neural networks (CNNs) for the diagnosis of respiratory diseases based on chest radiography.

The computer-based diagnosis of respiratory diseases consists of the detection of pathological abnormalities, followed by their classification. The challenging task is automated abnormality detection on chest radiographs due to the diversity and complexity of

**Citation:** Shetty, A.A.; Hegde, N.T.; Vaz, A.C.; Srinivasan, C.R. Deep Learning Methodologies for Diagnosis of Respiratory Disorders from Chest X-ray Images: A Comparative Study. *Comput. Sci. Math. Forum* **2022**, *2*, 20. https:// doi.org/10.3390/IOCA2021-10900

Academic Editor: Frank Werner

Published: 26 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

respiratory diseases and their limited quality. On chest radiographs, manual marking of abnormal regions needs even more labor and time than labeling it. Therefore, in many chest radiography data, abnormalities are masked [4], leading to the computer-aided diagnosis solution to a weakly supervised problem by showing only the names of abnormalities in each radiograph without their locations.

To predict X-ray image diagnostic information, machine-learning-based methods are proposed by many researchers [5]. The use of computer science-based methods to control huge volumes of medical records can decrease medical costs for health and medical science applications. In recent years, the use of DL algorithms on medical images for respiratory disease detection has grown to great heights. DL is derived from machine learning in that its algorithms are inspired by the structure and function of the human brain. The identification, classification, and quantification of patterns in medical images [6] are supported by DL methods. DL is gaining much importance by improving performance in many medical applications. Figure 1 describes a generalized manner in which deep neural networks process data and classify images. In turn, these improvements support clinicians in the classification and detection of certain medical conditions in a more efficient way [7].

**Figure 1.** Deep CNN learning paradigm of a multilabel classification task using the whole image.

The objective of this paper is to provide a comparison of the state-of-the-art DL-based respiratory disease detection methods and also identify the issues in this direction of research. Section 2 describes the taxonomy of various methods and respiratory issues that are considered for this study. Section 3 focuses on the issues that the authors observed during their research. The conclusion of the article is presented in Section 4.

### **2. Taxonomy of the State-of-the-Art DL Techniques for Respiratory Disorder from X-ray Images**

This section discusses the state-of-the-art DL techniques used for the detection of lung diseases through CXR images. The main aim of this taxonomy is to provide a summarized and articulated view of the main focus points and major concepts related to the existing research carried out in this area. A total of five attributes that the authors found to be present commonly and imminent in the majority of the articles are identified and discussed in detail. These attributes are the types of DL algorithms, features used for detection of abnormalities, data augmentation, transfer learning, and types of lung diseases.

### *2.1. Features Extracted from Images*

In the field of computer vision, a feature can be thought of as some form of numerical information that can be extracted from an image, which could prove beneficial in solving a certain set of problems [8]. Features could be certain structures in the image such as edges, shapes, and sizes of objects, or even specific points in the image.

The process of feature transformation relates to the generation of new features of an image based on the information extracted from existing ones. The newly generated features might have more powerful ways to represent the regions of importance in the image when viewed from a different dimension or space, compared with the original. This improved representation of the image has proven extremely beneficial when they are subjected to various machine learning algorithms. Some of the most prevalent extracted features in images include Gabor, local binary patterns (LBPs), edge histogram descriptor (EHD), color-and-edge direction descriptor (CEDD), color layout descriptor (CLD) [9], autocorrelation, scale-invariant feature transform (SIFT), edge frequency, and speeded up robust features (SURFs). The concept of histograms was also used to generate features in the form of a histogram of oriented gradients (HOGs), pyramid HOGs. intensity histograms (IHs), gradient magnitude histograms (GMs), and fuzzy color and texture histograms (FCTHs). From the recent literature, it has been observed that CNNs have the capability to automatically extract the relevant features without the need for explicit manual implementation of handpicked features from the images [10].

### *2.2. Data Augmentation*

Having a large training dataset image in DL helps in improving the training accuracy. When compared with a strong algorithm with modest data, a weak algorithm on large-scale data can be more accurate. The presence of imbalanced classes is another obstacle that is encountered. The resulting model would be biased when the number of samples belonging to a particular class is larger than the other class during binary classification training. For the optimal performance of DL algorithms, the number of samples should be equal or balanced in each class. Image augmentation is a technique to increase the training dataset by creating variations of the original images, without obtaining new images. The variations are achieved by various processing methods such as flips, rotations, zooms, adding noise, and translations [11]. A few examples of augmented images are shown in Figure 2.

**Figure 2.** Examples of image augmentation: (**a**) original; (**b**) 45◦ rotation; (**c**) 90◦ rotation; (**d**) horizontal flip; (**e**) vertical flip; (**f**) positive x and y translation; (**g**) negative x and y translation; (**h**) salt-andpepper noise; (**i**) speckle noise.

Overfitting is also prevented during data augmentation, where the network tries to learn a very high variance function. Data augmentation addresses it by introducing the model with more diverse data, which decreases variance and improves the generalization of the model. Some of the disadvantages of data augmentation are its inability to overcome all biases in a small dataset [12], transformation computing costs, additional training time, and memory costs.

### *2.3. Types of DL Algorithm*

CNNs are the common DL algorithm used to find patterns in images. A CNN consists of neurons with trainable weights and biases analogous to the neurons of the human brain. Several inputs are received by each neuron, which computes the weighted sum of its inputs, which is then activated to produce an output. CNNs have convolution layers, compared with other neural networks. Figure 3 shows a typical CNN architecture [13].

**Figure 3.** CNN structure.

The two general stages during the learning phase of CNN are feature extraction and classification. In feature extraction, a kernel or a filter is used to perform convolution on the input data, which generates a feature map. The probability of the image belonging to a specific label/class is computed by the CNN in the classification stage. The main advantage of using CNN is that it automatically learns features for image classification and recognition without needing manual feature extraction. Transfer learning can be used to retrain CNN to be applied to a different domain [14], which is shown to produce better classification results.

Another DL algorithm that is a stack of restricted Boltzmann machines (RBMs) is DBN [15]. Except for the first and final layers of DBN, every other layer has two functions, that is, they serve as input layer for succeeding layer nodes and as a hidden layer for the preceding layer nodes. The design of the first RBM is made as accurate as possible to train a DBN. The output from the first RBM is used to train the second RBM by treating the first RBM's hidden layer as the input layer. This process is iterated until all the network layer is trained. The model thus created during this initial training of the DBN can detect data patterns. DBN model finds applications in recognizing motion-capture data, objects in video sequences, and images [16].

### *2.4. Transfer Learning*

Transfer learning is an emerging and popular method in computer vision, as it allows the building of accurate models. A model trained in a particular domain can be retrained using transfer learning to be used on a different domain. Transfer learning could be performed with or without a pretrained model.

A model that has been developed to solve an analogous task is called a pretrained model, which can be used as a starting point to solve the current task. The weights and architecture obtained by the pretrained models on large datasets can be applied to the current task. The main advantage in using such a pretrained model is a reduction in training costs for the new model [17], as it is sufficient to train and modify the weights related to the last few layers.

When the transfer learning approach is followed, there is a necessity to consider two main criteria. The first one is the selection of a pretrained model by ensuring that the model has worked on a similar dataset as the dataset under consideration. The second one is that the weights of the CNN have to be trained and fine-tuned with lower learning rates, such that they are not distorted and are expected to be relatively good [18].

### *2.5. Type of Disease*

Applications of DL techniques for detection of most common causes of critical illness related to the lung [19] such as pneumonia and tuberculosis, as well as COVID-19, an ongoing pandemic, are discussed in the next few sections.

### 2.5.1. Tuberculosis

The bacteria that cause tuberculosis is Mycobacterium tuberculosis. A report by the WHO states that tuberculosis ranks in the 10 most common causes of death in the world. In 2017, tuberculosis caused a total death of around 1.6 million people out of 10 million people infected across the world. Therefore, in order to have an increase in chances of recovery, there is a need for early detection of tuberculosis [20].

Computer-aided detection for tuberculosis (CAD4TB) is a tool jointly developed by Delft Imaging Systems, University of Cape Town Lung Institute, and Radboud University, Nijmegen, for tuberculosis detection is used in two studies. The patient's CXR images are obtained and analyzed via CAD4TB cloud server/computer, to display an abnormality score from 0 to 100 based on the heat map generated of the patient's lung. Murphy et al. [21] verified that CAD4TB v6 accurately performs in comparison with data read by radiology experts. Melendez et al. [22] combined clinical information and X-ray score by computeraided detection for automated tuberculosis screening, which improved accuracies and specificities, compared with when only either type of information is used alone.

Several studies carried out in the literature use CNNs to classify tuberculosis. Heo et al. proposed a technique to improve the performance of CNNs by incorporating demographical information—namely, gender, age, and weight. Results indicate that the proposed method has superior sensitivity and higher area under the receiver-operating characteristic curve (AUC) score than the CNN based on CXR images only. Pasa et al. proposed a simple CNN for tuberculosis detection with reduced computational requirements and memory without losing the performance in the classification that proved more efficient and accurate than previous models [23].

The use of transfer learning has also been researched by several authors. Hwang et.al claimed an accuracy level of more than 90%, along with an AUC value of 0.96, through the use of transfer learning of ImageNet, after they trained their network on more than 10,000 chest X-rays. Lakhani and Sundaram [24] used pretrained GoogLeNet and AlexNet for the classification of pulmonary tuberculosis. Their methodology displayed an AUC value of 0.97 and 0.98, respectively. A combination of SVMs, for classification, and pretrained VGGNet, ResNet, and GoogLeNet, for feature extraction, was used for the detection of X-ray images with tuberculosis by Lopes and Valiati. They obtained AUCs in the vicinity of 0.9–0.912 [25].

Some authors have also used the NIH-14 dataset instead of ImageNet for pretrained models. This dataset contains a wide variety of diseases and falls under the same modality as that of the dataset considered for tuberculosis. Models pretrained on this dataset have been shown to learn better features for classifying tuberculosis.

A variety of other methodologies such as k-nearest neighbors (kNN), sequential minimal optimization, and simple linear regression have also been adopted for the classification of X-ray images related to tuberculosis [26]. Another technique that has been attempted along with the previously mentioned methodologies is the multiple-instance learning-based approach. This method presents the advantage of requiring a lower labeling detail during optimization. Additionally, the previously optimized systems can easily be retained due to the minimal supervision required by this method. Moreover, COGNEX developed an industry-level DL-based image analysis software known as ViDi, which also displayed comparatively accurate detection of tuberculosis in chest X-ray images [27].

### 2.5.2. Pneumonia

Pneumonia is a condition in which alveoli of one or both lungs are filled by pus of fluid, which leads to difficulty in breathing. Symptoms of this lung infection include chest pain, severe shortness of breath, fever or fatigue, and cough. It is still a recurrent cause of mortality and morbidity. A majority of the computer-aided techniques used for detecting pneumonia are aligned toward the use of data augmentation and DL. Tobias et al. [28] used a direct and straightforward CNN technique to detect this condition. Stephen et al. used various data augmentation techniques, such as shear, flip, rescaling, zooming, rotation, and rescaling, to train their CNN from scratch [29].

Authors have also used a pretrained CNN architecture trained on augmented data for the detection of pneumonia. Rajpurkar et al. [30] used randomized horizontal flipping as data augmentation, while Ayan and Ünver [31] used flipping, rotation, and zooming to augment their data. Chouhan et al. [32] on the other hand, also incorporated adding noise to the images along with the other methods to obtain an augmented version of the data on which to train their architecture.

A deep Siamese CNN architecture was also adopted for the purpose of classification of X-ray images with pneumonia. It utilizes a symmetric architecture that takes two images as inputs, which consist of the X-ray being split into the left and right halves. The architecture compares the extent of infection that has spread across both regions in the image, which is claimed to make the classification system more robust, according to a study conducted by Ref. [30]. Ref. [33] proposed that CNNs present higher levels of accuracy when compared with methods such as random forest, AdaBoost, decision tree, KNN, and XGBoost.

### 2.5.3. COVID-19

Coronavirus disease 2019 (COVID-19) is a highly infectious disease that is caused by the very recently discovered coronavirus [34]. The most susceptible people are senior citizens and those who have a history of medical conditions such as chronic respiratory problems, cardiovascular disease, diabetes, and cancer.

Approaches to detect this disease through X-ray images have also been attempted through CNNs trained on augmented datasets. Ref. [35] used an InseptionV3 architecture for this purpose and used it as a feature extractor. Ref. [36] also followed similar approaches. Methodologies to classify X-ray images into normal, COVID-19, and pneumonia with the help of transfer learning have been investigated. Other authors have also used transfer learning with CNN architectures trained on datasets augmented with techniques such as rotation, scaling, flipping, translation, and shifting of image intensity for the three classes of classification of X-ray images [37].

Along with the classification of chest X-ray images for COVID-19, some authors have also modified CNN architectures to rather detect the disease. Sedik et.al [38] used an amalgam of CNN and LSTM, while Ahsan et al. [39] used an MLP–CNN model.

### **3. Issues Observed in the Present Area of Research**

The main issues that were observed during this study were (a) data imbalance, (b) handling of images with large size, and (c) limitation of datasets that can be used.


### **4. Conclusions**

The presented study was an attempt to summarize and provide, in an organized manner, the key focus and concepts used for the detection of lung diseases through DL methodologies. A taxonomy of the state-of-the-art methodologies for this purpose was presented. Three main issues that might hinder the progress of research in this area were also put forward—namely, data imbalance, handling of images with large sizes, and limitation of datasets. The authors strongly believe that, in order for the research of this topic to progress in the right direction, such an investigative study would be helpful for other researchers who might be keen to contribute to this field.

**Author Contributions:** A.A.S.: Methodology, Formal Analysis; N.T.H.: Conceptualization, Investigation, Data Curation; A.C.V.: Resources, Validation; C.R.S.: Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


### *Abstract* **An Image-Based Algorithm for the Automatic Detection of Loosened Bolts †**

**Thanh-Canh Huynh 1,2,\*, Nhat-Duc Hoang 1,2, Duc-Duy Ho 3,4 and Xuan-Linh Tran 1,2**


**Abstract:** The bolted joint has been widely used to connect load-bearing elements in aerospace, civil, and mechanical engineering systems. During its service life, particularly under external dynamical loads, a bolted joint may undergo self-loosening. Bolt looseness causes a reduction in its load-bearing capacity and eventually leads to the failure of a bolted joint. This paper presents an automated image-based algorithm combining the Faster R-CNN model with image processing for the quick detection of loosened bolts in a structural connection. The algorithm is validated using a lab-scale bolted joint model for which various bolt-loosening events are simulated. The imagery data of the joint is captured and passed through the algorithm for bolt looseness detection. The obtained results show that the loosened bolts in the joint were well-detected and that their loosening degrees were precisely quantified; therefore, the image-based algorithm is promising for real-time structural health monitoring of realistic bolted joints.

**Keywords:** image-based algorithm; Faster R-CNN; image processing; structural health monitoring; bolted connection; bolt looseness detection

**Citation:** Huynh, T.-C.; Hoang, N.-D.; Ho, D.-D.; Tran, X.-L. An Image-Based Algorithm for the Automatic Detection of Loosened Bolts. *Comput. Sci. Math. Forum* **2022**, *2*, 1. https://doi.org/10.3390/ IOCA2021-10893

Academic Editor: Frank Werner

Published: 23 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Supplementary Materials:** The conference presentation file is available at https://sciforum.net/ manuscripts/10893/slides.pdf.

**Author Contributions:** Conceptualization, T.-C.H.; methodology, T.-C.H.; validation, T.-C.H.; T.-C.H.; writing—original draft preparation, T.-C.H.; writing—review and editing, T.-C.H., N.-D.H., D.-D.H. and X.-L.T.; supervision, T.-C.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 107.01-2019.332.

**Conflicts of Interest:** The authors declare no conflict of interest.

### *Abstract* **Simple Methods for Traveling Salesman Problems †**

**Nodari Vakhania**

Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, Cuernavaca 62209, Mexico; nodari@uaem.mx

† Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Keywords:** travelling salesman problem; algorithm; time complexity

Here we will focus on an ongoing project on approximation algorithms for the Euclidean Traveling Salesman Problems (TSP). Given *n* points and the distances between them, the aim is to construct a tour that visits each point exactly once and has the minimum total cost/distance (the total cost of a tour is the sum of the distances defined by each pair of points from that tour). The Euclidean version of the TSP, in which the distances between the objects (points or cities) are distances in a two-dimensional Euclidean space, is strongly NP-hard. However, compared to the general setting, the Euclidean version is still more treatable since it naturally allows us to employ simple geometric tolls in the solution process, so that a "blind" enumeration of all feasible tours can be replaced by a more rational and transparent kind of selection of reasonable tours. Our method for solving Euclidean TSP initially, at Phase 1, constructs a girding polygon, a boundary that includes all the given points. The convex boundary (hull) of the polygon, consisting of its edges, already defines a partial tour, which is optimal for the set of nodes that this boundary contains. The insertion heuristic of Phase 2 iteratively augments the partial tour of Phase 1 to a complete feasible tour using the cheapest insertion strategy: iteratively, the current partial tour is augmented with a new point, which yields the minimal increase in the cost. The tour improvement heuristic of Phase 3 improves the tour of Phase 2 using some local optimality conditions. Thanks to simple geometry in the decision-making process at Phases 2 and 3, our algorithm is extremely fast and requires little computer memory, whereas the quality of the delivered solutions is comparable with that of the state-of-the-art algorithms, see [1]. We are currently working on another approximation algorithm for the Euclidean TSP which also starts with the girding polygon. A special kind of inner and outer convex boundaries (convex hulls) that include subsets of points are iteratively constructed. Each convex hull defines an optimal tour for the points of that hull. Two successively generated convex hulls are unified into a partial feasible tour that covers the nodes from both boundaries. The algorithm halts when it constructs a complete feasible tour. In Multiprocessor TSP there is one distinguished point called the depot, and *k* salesmen. Each salesman has to build its own tour that starts from the depot, ends in the depot and only visits one or more additional points once. All points are to be visited by a salesman. The aim is to minimize the total cost of all *k* tours. In the bounded version of Multiprocessor TSP lower and upper bounds determine the minimum and maximum number of points in a tour, i.e., a feasible tour respects these restrictions. Our algorithm for the Bounded Multiprocessor TSP initially partitions the set of vertices into *k* disjoint subsets at Phase 1. Then, at Phase 2, it constructs the initial *k* tours using the abovementioned three-phase algorithm for TSP. The feasible solution of Phase 2 is further improved at Phase 3: iteratively, a vertex from a tour is moved from its current position to another specially determined position within the same or another tour so that the resultant solution remains feasible. The destiny vertex and its new position are selected so that the accomplished rearrangement provides the maximum decrease of the current cost. We obtained preliminary experimental results for the 22

**Citation:** Vakhania, N. Simple Methods for Traveling Salesman Problems. *Comput. Sci. Math. Forum* **2022**, *2*, 6. https://doi.org/10.3390/ IOCA2021-10914

Academic Editor: Frank Werner

Published: 13 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

known benchmark instances. The approximation rate provided by the proposed heuristic is comparable to the state-of-the-art results, but it requires considerably less memory and CPU time.

**Supplementary Materials:** The conference presentation video is available at https://www.mdpi. com/article/10.3390/IOCA2021-10914/s1.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Ethical review and approval were waived for this study due to inapplicability.

**Informed Consent Statement:** Patient consent was waived due to inapplicability.

**Conflicts of Interest:** The author declares no conflict of interest.

### **Reference**

1. Pacheco-Valencia, V.; Hernández, J.A.; Sigarreta, J.M.; Vakhania, N. Simple Constructive, Insertion, and Improvement Heuristics Based on the Girding Polygon for the Euclidean Traveling Salesman Problem. *Algorithms* **2020**, *13*, 5. [CrossRef]

### *Abstract* **Vectorial Iterative Schemes with Memory for Solving Nonlinear Systems of Equations †**

**Ramandeep Behl 1, Alicia Cordero 2,\*, Juan R. Torregrosa <sup>2</sup> and Sonia Bhalla <sup>3</sup>**


**Abstract:** There exist in the literature many iterative methods for solving nonlinear problems. Some of these methods can be transferred directly to the context of nonlinear systems, keeping the order of convergence, but others cannot be directly extended to a multidimensional case. Sometimes, the procedures are designed specifically for multidimensional problems by using different techniques, as composition and reduction or weight-function procedures, among others. Our main aim is not only to design an iterative scheme for solving nonlinear systems but also to assure its high order of convergence by means of the introduction of matrix accelerating parameters. This is a challenging area of numerical analysis wherein there are still few procedures defined. Once the iterative method has been designed, it is necessary to carry out a dynamical study in order to verify the wideness of the basins of attraction of the roots and compare its stability with other known methods.

**Keywords:** nonlinear system of equations; iterative methods; stability; basin of attraction

**Citation:** Behl, R.; Cordero, A.; Torregrosa, J.R.; Bhalla, S. Vectorial Iterative Schemes with Memory for Solving Nonlinear Systems of Equations. *Comput. Sci. Math. Forum* **2021**, *2*, 17. https://doi.org/10.3390/ IOCA2021-10892

Academic Editor: Frank Werner

Published: 22 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Supplementary Materials:** The conference presentation video is available at https://www.mdpi. com/article/10.3390/IOCA2021-10892/s1.

**Author Contributions:** Conceptualization, R.B. and S.B.; methodology, R.B.; software, A.C.; validation, J.R.T. and A.C.; formal analysis, R.B.; investigation, S.B.; writing—original draft preparation, R.B. and S.B.; writing—review and editing, J.R.T. and A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### *Abstract* **Avoiding Temporal Confounding in Timeseries Forecasting Using Machine Learning †**

**Felix Wick <sup>1</sup> and Ulrich Kerzel 2,\***

	- **\*** Correspondence: ulrich.kerzel@iu.org
	- † Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Abstract:** Timeseries forecasting plays an important role in many applications where knowledge of the future behaviour of a given quantity of interest is required. Traditionally, this task is approached using methods such as exponential smoothing, ARIMA and, more recently, recurrent neural networks such as LSTM architectures or transformers. These approaches intrinsically rely on the autocorrelation or partial auto-correlation between subsequent events to forecast future values. Essentially, the past values of the timeseries are used to model its future behaviour. Implicitly, this assumes that the autocorrelation and partial auto-correlation is genuine and not spurious. In the latter case, the methods exploit the (partial) auto-correlation in the prediction even though they are not grounded in the causal data generation process of the timeseries. This can happen if some external event or intervention affects the value of the timeseries at multiple times. In terms of causal analysis, this is equivalent to introducing a confounder into the timeseries where the variable of interest at different times takes over the role of multiple variables in standard causal analysis. This effectively opens a backdoor path between different times that, in turn, leads to a spurious autocorrelation. If a forecasting model is built including such spurious correlations, the generalizability and forecasting power of the model is reduced and future predictions may consequently be wrong. Using a supervised learning approach, we show how machine learning can be used to avoid temporal confounding in timeseries forecasting, thereby limiting or avoiding the influence of spurious autocorrelations or partial autocorrelations.

**Citation:** Wick, F.; Kerzel, U. Avoiding Temporal Confounding in Timeseries Forecasting Using Machine Learning. *Comput. Sci. Math. Forum* **2022**, *2*, 19. https://doi.org/ 10.3390/IOCA2021-10881

Academic Editor: Stefano Mariani

Published: 19 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/IOCA2021-10881/s1.

**Author Contributions:** Conceptualization: F.W. and U.K., methodology: F.W. and U.K., writing, review and editing: U.K., visualisation: F.W. and U.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Keywords:** timeseries; machine learning; causal analysis; confounder

### *Abstract* **Multi-Commodity Contraflow Problem on Lossy Network with Asymmetric Transit times †**

**Shiva Prakash Gupta \*, Urmila Pyakurel and Tanka Nath Dhamala**

Central Department of Mathematics, Tribhuvan University, Kirtipur 44618, Nepal; urmilapyakurel@gmail.com (U.P.); tanka.nath.dhamala@gmail.com (T.N.D.)

**\*** Correspondence: shivaprasadgupta99@gmail.com

† Presented at the 1st International Electronic Conference on Algorithms, 27 September–10 October 2021; Available online: https://ioca2021.sciforum.net/.

**Abstract:** During the transmission of commodities from one place to another, there may be loss due to death, leakage, damage, or evaporation. To address this problem, each arc of the network contains a gain factor. The network is a lossy network with a gain factor of at most one on each arc. The generalized multi-commodity flow problem deals with routing several distinct goods from specific supply points to the corresponding demand points on an underlying network with minimum loss. The sum of all commodities on each arc does not exceed its capacity. Motivated by the uneven road condition of transportation network topology, we incorporate a contraflow approach with orientationdependent transit times on arcs and introduce the generalized multi-commodity contraflow problem on a lossy network with orientation-dependent transit times. In general, the generalized dynamic multi-commodity contraflow problem is NP-hard. For a lossy network with a symmetric transit time on anti-parallel arcs, the problem is solved in pseudo-polynomial time. We extend the analytical solution with a symmetric transit time on anti-parallel arcs to asymmetric transit times and present algorithms that solve it within the same time-complexity.

**Keywords:** multi-commodity; contraflow; asymmetric transit times; lossy network

**Citation:** Gupta, S.P.; Pyakurel, U.; Dhamala, T.N. Multi-Commodity Contraflow Problem on Lossy Network with Asymmetric Transit times. *Comput. Sci. Math. Forum* **2021**, *2*, 21. https://doi.org/10.3390/ IOCA2021-10878

Academic Editor: Frank Werner

Published: 19 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Supplementary Materials:** The conference presentation file is available at https://www.mdpi.com/ article/10.3390/IOCA2021-10878/s1.

**Author Contributions:** S.P.G.—conceptualization, investigation and documentation, U.P., and T.N.D. formal analysis, editing and supervision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research work received no specific grants from any funding in the public, commercial or non-profit organizations.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The authors have not used any additional data in this article.

**Conflicts of Interest:** Authors have no conflict of interest regarding the publication of the paper.

### *Abstract* **New Explicit Asymmetric Hopscotch Methods for the Heat Conduction Equation †**

**Mahmoud Saleh \* and Endre Kovács**

Institute of Physics and Electrical Engineering, University of Miskolc, 3515 Miskolc, Hungary; kendre01@gmail.com


**Abstract:** This study aims at constructing new and effective fully explicit numerical schemes for solving the heat conduction equation. We use fractional time steps for the odd cells in the well-known odd–even hopscotch structure and fill it with several different formulas to obtain a large number of algorithm combinations. We generate random parameters in a highly inhomogeneous spatial distribution to set up discretized systems with various stiffness ratios, and systematically test these new methods by solving these systems. The best combinations are verified by comparing them to analytical solutions. We also show analytically that their rate of convergence is two and that they are unconditionally stable.

**Keywords:** odd–even hopscotch methods; diffusion equation; heat equation; parabolic PDEs; explicit time-integration; stiff equations; unconditional stability

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/IOCA2021-10902/s1.

**Funding:** The research was funded by the ÚNKP-21-3 new national excellence program of the ministry for innovation and technology from the source of the national research, development and innovation fund.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** No data is available.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Citation:** Saleh, M.; Kovács, E. New Explicit Asymmetric Hopscotch Methods for the Heat Conduction Equation. *Comput. Sci. Math. Forum* **2022**, *2*, 22. https://doi.org/10.3390/ IOCA2021-10902

Academic Editor: Frank Werner

Published: 26 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

MDPI Books Editorial Office E-mail: books@mdpi.com www.mdpi.com/books

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18