**Mapping of Hydrothermal Alteration Zones in the Kelâat M'Gouna Region Using Airborne Gamma-Ray Spectrometry and Remote Sensing Data: Mining Implications (Eastern Anti-Atlas, Morocco)**

**Younes Mamouch 1,\*, Ahmed Attou 1, Abdelhalim Miftah 1, Mohammed Ouchchen 2, Bouchra Dadi 3,4, Lahsen Achkouch 1, Yassine Et-tayea 1, Abdelhamid Allaoui 5, Mustapha Boualoul 5, Giovanni Randazzo 6,\*, Stefania Lanza 7,\* and Anselme Muzirafuti 6,\***


**Abstract:** The mapping of hydrothermal alteration zones associated with mineralization is of paramount importance in searching for metal deposits. For this purpose, targeting alteration zones by analyzing airborne geophysical and satellite imagery provides accurate and reliable results. In the Kelâat M'Gouna inlier, located in the Saghro Massif of the Moroccan Anti Atlas, natural gamma-ray spectrometry and ASTER satellite data were used to map hydrothermal alteration zones. Natural gamma-ray spectrometry data were processed to produce maps of Potassium (K in %), Uranium (eU in ppm), Thorium (eTh in ppm) and ratios of K/eTh and K/eU. In addition, four-band ratios were computed, on ASTER data, to map the distribution of clay minerals, phyllitic minerals, propylitic minerals, and iron oxides. The combined results obtained from geophysical and satellite data were further exploited by fuzzy logic modelling using the Geographic Information System (GIS) to generate a mineral prospectivity map. Seven hydrothermal alteration zones likely to be favorable for mineralization have been identified. They show a spatial correlation with (i) known surface prospects and mineral occurrences, (ii) the granite-encasing contact zone, and (iii) the fault zones (Sidi Flah and Tagmout faults). This research therefore provides important information on the prospecting of mineral potential in the study area.

**Keywords:** mineral exploration; natural gamma-ray spectrometry; ASTER; fuzzy logic modelling; Kelâat M'Gouna inlier; Eastern Anti-Atlas; Morocco

#### **1. Introduction**

The mapping of hydrothermal alteration zones associated with mineralized systems is of great importance in mineral exploration, especially in the early stages of metal de-

**Citation:** Mamouch, Y.; Attou, A.; Miftah, A.; Ouchchen, M.; Dadi, B.; Achkouch, L.; Et-tayea, Y.; Allaoui, A.; Boualoul, M.; Randazzo, G.; et al. Mapping of Hydrothermal Alteration Zones in the Kelâat M'Gouna Region Using Airborne Gamma-Ray Spectrometry and Remote Sensing Data: Mining Implications (Eastern Anti-Atlas, Morocco). *Appl. Sci.* **2022**, *12*, 957. https://doi.org/10.3390/ app12030957

Academic Editor: Saro Lee

Received: 3 December 2021 Accepted: 13 January 2022 Published: 18 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

posit exploration [1–11]. However, the hydrothermal alteration minerals formed can vary significantly, depending on the chemical composition of the primary rock and hydrothermal fluids, the nature of the host, temperature and pressure conditions, and the tectonic setting [12].

Many methods have been proposed to map the spatial distribution of hydrothermal alteration zones. The method of processing multispectral satellite images, especially those of the ASTER sensor, is qualified among the most used approaches [4,6–8,10,11,13–18]. Apart from remote sensing data, processing of geophysical data contributes assuredly to minerals exploration through their different geophysical components (gravimetric, magnetometric, electrical, electromagnetic and natural gamma spectrometric methods) [19–24]. The natural gamma-ray spectrometric method has been widely and successfully used in mineral exploration and identification of alteration zones [20,21,25–30]. However, few studies have been tested the application of this method in conjunction with the open accessible free satellite images.

In this regard, we used satellite data from the ASTER sensor, airborne natural gammaray spectrometry and surface geological data to identify potential mineral explo-ration sites in the area of Kelâat M'Gouna/Morocco. This area is located in the Jbel Saghro massif, one of the most important massifs of Morocco's Anti-Atlassic chain, which is recognized for its composition in mineral raw material (Figure 1). This area has recently been the subject of mining exploration activity that has led to discovering three gold deposits: Ismlal, Talat-n-Tabarought and Tawrirt-n-Çwalh. These deposits are controlled by complex hydrothermal processes such as silicification, chloritization, hematization and sericitization [31,32]. These processes occurred under specific geodynamic conditions that characterized the geological history of the Anti-Atlas.

The aim of this study is to conduct a geological mapping of mineral composition in Kelâat M'Gouna area by combining geophysical and remote sensing data. This study was conducted with main objective of (i) accurately mapping hydrothermal alteration zones; (ii) developing a mineral prospectivity maps by combining thematic layers using fuzzy logic modelling; (iii) relating mineral formation processes to hydrothermal events; and (iv) clarifying the characteristics of the mineral formations in the different tectonic units of the study area.

**Figure 1.** (**a**) Location of the Moroccan Anti-Atlas Range relative to the West African Craton [33]. (**b**) General geological map of the Anti-Atlas showing its main Precambrian inliers [34,35], modified. (**c**) Geological map of the Saghro Massif with its main metalliferous deposits (IMD: Imeter Deposit; TGHD: Taghassa Deposit; KMD: Kelâat M'Gouna Deposits; ISFD: Issarfane; BSD: Bouskour Deposit; AMD: Amzwaro Deposit; TMD: Tizi Moudou Deposit; AFD: Asfalou Deposit; TWD: Tiwit Deposit; TGMD: Tagmout Deposit; SFD: Sidi Flah Deposit). In addition, their rose diagrams showing the trends of faults on the left and dykes on the right. The study area is marked by the red polygon [36,37], modified.

#### **2. Material and Methods**

#### *2.1. Geological Context of the Study Area*

The Saghro massif is located NE of the major accident of the Anti-Atlas (Figure 1b), which is interpreted as a Pan-African suture following the identification of a complex ophiolite that stakes it [38]. It is subdivided into three domains, (i) the Western Saghro corresponding to the Sidi Flah and Bouskour inliers, (ii) the Central Saghro corresponding to the Kelâat M'Gouna inlier and (iii) the Eastern Saghro formed by the Boumalne and Imiter inliers (Figure 1).

The study area corresponds to the geological sheet of Kelâat M'Gouna, at 1:50,000 scales. It is geographically attached to the Kelâat M'Gouna inlier, between meridians 6◦00 W and 6◦15 W and parallels 31◦00 N and 31◦15 N. This area contains a wide range of geological formations ranging in age from Cryogenian to the present day (Figure 2). The Lower Cryogenian formations, which are volcano-sedimentary and metamorphic, are the oldest in the Jbel Saghro massif and correspond to turbidites interbedded with basic volcanic flows and intruded by gabbros, diorites and granites [32,39–42]. They are overlain in anomalous contact, by Upper Cryogenian conglomerates, sandstones, limestones, cinerites, rhyolites and andesites [36,43], and they are overlain by the formations of the Ouarzazate Group, which in turn, comprises two discordant sets on top of each other [31]. The first set corresponds to the Lower Ediacaran or the Lower Ouarzazate Group, composed of potassium-rich volcanic and granitic formations that intrude all the previous geological formations. The second set corresponds to the Upper Ediacaran or Upper Ouarzazate Group, consisting of detrital and volcanic formations intruded by rhyolite and microgranite dikes. The formations of the Upper Ouarzazate Group are tectonically continuous with the Adoudounian sedimentary formations. The Paleozoic formations are poorly represented in the study area. They are limited to the Tagmout graben (Figure 2), where they are exposed in conglomerates, pink sandstones with basalt intercalation, Paradoxides shales and Tabanites sandstones [31]. The Cenozoic-Quaternary sedimentary deposits correspond to the filling of the Ouarzazate basin.

#### *2.2. Structural and Tectonic Context of the Study Area*

The Saghro Massif is affected by the major phase of the Pan-African Orogeny (B1) dated at 685 ± 15 Ma [44]. This phase is responsible for the emplacement of diorite, quartz diorite and granodiorite massifs along N130◦ troughs at Bouskour [45,46] and at Boumalne-Dadès [47]. This phase is followed by the late phase (B2) of weak intensity and without metamorphic transformation. It is responsible for the development of granitic massifs in the Bouskour and Ougnat inliers.

The formations, which outcrop on the Kelâat M'Gouna sheet, have undergone the various tectonic events that affected the Anti-Atlas. These events are reflected by the dominance of brittle structures classified in general according to their orientation in three families:


**Figure 2.** Geological map of the study area extracted from the geological map 1:50,000 of Kelâat M'Gouna, showing its main ore deposits (ISD: Ismlal gold Deposit; TNÇD: Taourirt-n-Çwalh gold Deposit; TNTD: Talat-n-Tbarought gold Deposit; TGD: Tagmout copper Deposit).

#### *2.3. Ore Deposits of the Study Area*

Many distinct categories of mineralization have been identified on the Kelâat M'Gouna sheet and constitute about 50 showings and deposits such as gold, copper, iron, lead, fluorine, manganese, cobalt and silver (Figure 2). The most important mineralization is represented by gold deposits represented by three gold occurrences that are so far in the development phase: Ismlal, Talat-n-Tabarought and Tawrirt-n-Çwalh [31,32,41]. According to mineralogical, textural, structural and chemical aspects of these occurrences, two main types of mineralization have been distinguished: an old porphyry type system, followed by a younger epithermal type system [41,51]. At Ismlal, gold mineralization is hosted in Lower Cryogenian volcano-sedimentary turbidites (NP2i), intruded by granodiorites of Lower Ediacaran age (NP3i). It occurs as quartz veins ranging from N0◦ to N120◦, in breccias of general orientation N120◦ and dissemination in the volcano-sedimentary turbidites of NP2i. Gold grades are estimated at 0.5 g/t. The length of this mineralized zone is about 800 m, and its width is about 200 m [31,32]. At Talat-n-Tabarought, the gold-bearing structure is in the form of the NW-SE, and NE-SW trending T. Gold grades (between 0.1 and 0.3 g/t) is low compared to those at Ismlal [31]. The mineralization host is NP2i sandstone, and it presents quartz micro veins at the edge of pyrite granodiorite and tourmaline granodiorite intrusions. At Tawrirt-n-Çwalh, the gold mineralization is discontinuous in an NNE-SW direction and is 800 m in extent. The host of this mineralization corresponds to a metamorphic sandstone-pelitic alternation of NP2i age injected by quartz veins and potassic feldspars with gold visible to the naked eye [31,41]. Gold grades are interesting and range from 1 to 9 g/t [31,32,41]. In addition, copper mineralization is well recognized in the region, the most important of which is Tagmout which is located south of the study area (Figure 2). It is a hydrothermal vein deposit hosted by gabbros, monzogabbros and granites of the Upper Neoproterozoic. The metallogenic study of the deposit identified a copper paragenesis dominated by chalcopyrite, chalcocite, bornite, covellite, cuprite, grey copper and malachite [31].

#### *2.4. Geophysical and Remote Sensing Data*

#### 2.4.1. Radiometric Data

Airborne geophysics based on gamma-ray spectrometry estimates the concentration of Potassium (K), Uranium (eU) and Thorium (eTh) at the earth's surface down to 30– 45 cm [52–55]. These radioelements occur in widely varying concentrations in the rocks that form the earth's crust. The Table 1 shows the concentrations of the three radioelements in the main rocks categories.

The airborne gamma-ray spectrometry data used in this study were acquired in the Moroccan Anti Atlas in 1998 by the company Géoterrx-Dighem for the Ministry of Energy and Mines. The measurements were made with an average ground clearance of 60 m by the Exploranium GR-830/3 spectrometer. The flight lines are oriented N315◦ and spaced at 500 m. The flight line crossings are N45◦ and 4000 m apart. Gamma-ray emissions were recorded from the ground and air over an energy range of 0 to 3 MeV with 8 downward and 2 upward crystals, respectively. Count rates were determined within three windows corresponding to natural radiogenic concentrations of Potassium (K, 1.46 MeV), Uranium (U, 1.76 MeV and Thorium (Th, 2.62 MeV). The radiometric data were recorded at a lower frequency (1 Hz) and with an average spacing of 63 m. The recorded data then underwent a series of corrections which are: (i) activity time correction; (ii) calculation of effective ground clearance at standard temperature and pressure conditions; (iii) subtraction of cosmic and helicopter noise; (iv) subtraction of radon background (assessed by upward facing detector measurements); (v) Compton effect correction and (vi) attenuation corrections.

These radiometric data were provided to us in the form of digital maps. Using ArcGIS 10.3 software, these maps were first georeferenced and digitized to create a digital database (by digitizing the intersection of the iso-value curves and the flight lines). They were then interpolated using the inverse distance weighting (IDW) interpolation method to obtain maps representing the horizontal variation of radioactive concentrations of three elements: (i) Potassium (K in %), (ii) equivalent Thorium (eTh in ppm), and (iii) equivalent Uranium (eU in ppm). Furthermore, ratios of K/eTh and K/eU were calculated to delineate Potassium enrichment zones as indicators of potential mineral resource-related alteration zones [25,28,56–58] (Figure 3).


**Table 1.** Radioelement concentrations in main categories of rocks [53,56].

**Figure 3.** The methodological flowchart used in this study.

The image was pre-processed using the Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) method [59–61]. This correction process requires the luminance image and generates a reflectance corrected image. Subsequently, we calculated the band ratios "Bands Ratios" related to the depth of the absorption band "RBD" which is a technique that has been used for many years in remote sensing for mapping hydrothermal alteration minerals [7,15,62–66]. In the present study, four alteration mineral assemblage domains were calculated: (i) clay alteration (kaolinite and montmorillonite), (ii) phyllic alteration (sericite and illite), (iii) propylitic alteration (epidote, chlorite and carbonates), and (iv) alteration ferric iron uptake (hematite, goethite and jarosite) (Figure 3).

#### 2.4.2. ASTER Satellite Imagery

The space remote sensing data used in this study correspond to an ASTER sensor image acquired on 9 September 2003, with a cloud cover of the order of 0%. Its spectral resolution makes it possible to map alteration minerals related to mineralization processes since it contains six spectral bands in the short wave infrared region ranging from 1.60 to 2.430 μm [13]. Other characteristics and features of the Aster sensor are presented in Table 2.


**Table 2.** Characteristics and performance of the ASTER sensor [67].

#### 2.4.3. Fuzzy Logic Modelling of Radiometric and ASTER Satellite Imagery

Fuzzy logic modelling is a widely and successfully used technique in mining mapping and mainly for the development of mineral prospectivity maps [8,66,68–70]. Mathematically, it is a form of multi-valued logic based on the fuzzy set theory where the real values of the variables are included in the interval [0–1]; zero corresponds to non-membership, and 1 corresponds to full membership [71]. This method was first proposed in 1965 by Zadeh and is defined as follows:

$$\mathcal{A}\_{\mathrm{ij}} = \left\{ \begin{pmatrix} \mathbf{x}\_{\mathrm{ij}} \ \boldsymbol{\mu}\_{\mathrm{A}} \end{pmatrix} / \mathbf{X}\_{\mathrm{ij}} \in \mathbf{X}\_{\mathrm{i}} \right\} \begin{array}{c} \mathbf{0} \ \leq \ \boldsymbol{\mu}\_{\mathrm{A}} \leq \ \mathbf{1} \end{array}$$

where μ<sup>A</sup> is called the degree of membership function of x in A and X corresponds to a set of layers X*<sup>i</sup>* (*i* = 1, 2, 3, . . . , n), and each layer to r classes defined as (*j* = 1, 2, 3, . . . , r).

The degree of membership μ A plays in the interval [0–1], so: If 0 ≤ μ<sup>A</sup> < 0.5: x*ij* is not conducive to mineralization; if μ<sup>A</sup> = 0.5: we cannot determine is x*ij* conducive to mineralization or not; if 0.5 < μ<sup>A</sup> ≤ 1 it means that x*ij* is conducive to mineralization.

In this study, we applied this analysis to produce mineral prospectivity maps following the methodological approach presented in the Figure 3. The first map combines the four layers corresponding to the band ratios extracted from the ASTER image (Table 3). Meanwhile, the second map combines the geophysical data corresponding to the K, K/eTh ratio and K/eU ratio layers. These layers were fuzzified individually using the linear membership function. Then the fuzzy gamma operator was applied to combine our thematic layers. The choice of this operator is based on the fact that it is a compromise between the fuzzy algebraic sum and its product [72]. In addition, it is possible to develop improvements on the input of the resulting maps [73]. After several trials, we took 0.72 as the value of parameter γ. The fuzzification parameters used for the input data are represented in the Table 4.



**Table 4.** Fuzzy membership parameters used for input layers.


#### **3. Results**

#### *3.1. Mapping of Hydrothermal Alteration Zones*

3.1.1. Contribution of the Radiometry

The resulting radiometric maps provide a synthetic view of the heterogeneities of the geological formations encountered based on the radiometric signature of the different rock units and the structural trends that affect them. The highest concentrations of radioactive elements (more than 5.70% K, 15.80 ppm eTh and 8.30 ppm eU) are concentrated in the SSE part of the map (Figure 4), which could be related to the presence of a large granitic massif called "pink granites of Isk n'Alla" (Figure 2). This contrast sequence is also marked by low to moderate amounts observed, especially if we go towards the North of the study area (0.20 to 0.45% K, 5.25 to 7.24 ppm Th and 0.91 to 1.29 ppm U) (Figure 4). The geological map in Figure 2 and radioelement contents in the main rock categories (Table 1) were used to geologically interpret the spatial distributions of potassium, thorium and uranium.

**Figure 4.** Maps of natural gamma-ray spectrometry showing the distribution of Potassium (**a**), Thorium (**b**), Uranium (**c**), and the ratio K/eTh (**d**).

At a too detailed scale, the overall spatial distribution of relative Potassium concentrations suggests that the Lower Cryogenic turbiditic metavolcanic and the Quaternary and Miocene formations show low levels (0.70–2.20%), while intermediate values (2.70–3.70%) are associated with the oldest granites (Wawitcht Granite). High levels (up to 6%) were obtained in NE-trending fault and fracture zones and are related to granites (Figure 4a). The Thorium map reveals that the youngest granites and the Upper Ediacaran rhyolites have the highest values of 16 and 13 ppm, respectively, in which their boundaries are evident (Figure 4b). Lower levels are observed in wadi sediments, metasediments and Cryogenian metavolcanic where concentrations are lower than 6.5 ppm (Figure 4b). The Uranium map can also be subdivided into three levels. The first level extends to 8 ppm, associated with the youngest granites, the volcano-sedimentary complex and the Upper Ediacaran rhyolites. The intermediate level ranges from 4 to 6 ppm, associated mainly with the Ediacaran rhyodacites and rhyolites. The last level decreases to 3 ppm as a minimum value on the Miocene sedimentary formations and the Wadi sediments (Figure 4c).

#### 3.1.2. Contribution of the K/eTh Ratio

In addition to three radiometric maps, a K/eTh ratio was calculated to highlight locations with higher Potassium content (Figure 4d). High concentrations of this element may reveal possible areas of hydrothermal alteration and mineralization [25,29,56,74]. View that Potassium is more mobile than thorium in terms of element mobility during chemical alteration processes [75,76]. The K/eTh ratio is often considered the best indicator of Potassium enrichment zones related to hydrothermal alteration. The authors of [77] show that the Potassium/Thorium ratio is nearly constant in most rocks and generally ranges from 0.17 to 0.2 (K/eTh). Values of the k/eTh ratio that exceed this range could be due to hydrothermal alteration processes associated with the emplacement of magmatichydrothermal mineralization. In the case of the study area, the resulting K/eTh values allow us to distinguish seven anomalous potassium domains (Figure 4d). Domain 1 presents extremely high K/eTh values reaching 0.65, associated with alteration zones probably linked to mineralized zones. Indeed, this domain is characterized by the outcrop of several showings, namely copper, manganese, silver, etc ... (Figure 4d). Domains 2, 3 and 5 are respectively associated with young granites "pink granite of Isk n'Alla," rhyolites and rhyodacites of lower Ediacaran and granites of Wawitcht whose ratio values exceed 0.4 (Figure 4d). Furthermore, domain 4 is related to the Sidi Flah fault zone, which hosts several mineralized showings such as gold, copper, lead and manganese (Figure 4d). Domain 6 is related to gold occurrences closely related to the Wawitcht granite and is characterized by high values of K/eTh ratios (>0.35). To the north of the study area is domain 7, where K/eTh values vary considerably and sometimes reach 3.35. This area is associated with the Azlag granite, which hosts a copper occurrence (Figures 2 and 4d).

#### 3.1.3. Contribution of Aster Data

The results of the extracted band ratios for mapping clay alteration minerals (kaolinite, alunite and montmorillonite), phyllite alteration (sericite, muscovite and illite), propylitic alteration (epidote, chlorite and carbonates) and iron oxide alteration (hematite, goethite and jarosite) are shown in Figure 5.

The band ratio (b4 + b6)/b5 shows the spatial distribution of clay alteration minerals (blue pixels), which are mostly mapped in the Lower Cryogenian turbidites, in the Upper Cryogenian units, in the Lower Ouarzazate Group units and very locally in the Quaternary unit along the Oueds (Figure 4a).

The band ratio (b5 + b7)/b6 illustrates the surface distribution of phyllic alteration minerals in yellow pixels (Figure 4b), which are mapped in the Upper Cryogenian conglomerate and pelite units with intercalations of cinerites, rhyolites and andesites and the Lower Ouarzazate Group granite unit. These bands are mapped because of their mineralogical composition's high content of sericite, muscovite, and illite. The band ratio (b7 + b9)/b8, corresponds to the propylitic alteration mineral index shown in green pixels; it shows a distribution more or less similar to the phyllic alteration mineral index and is generally related to the Upper Cryogenian volcano-sedimentary unit and the Lower Ouarzazate Group granite unit (Figure 4c).

The band ratio (b5/b3) + (b1/b2) shows the surface distribution of iron oxides in red pixels (Figure 4d). Compared with the geological map, the high abundance of all these mapped minerals is typically associated with Cryogenian and Lower Ouarzazate Group units. It shows a strong correlation with the mineral occurrences and prospects in the study area.

**Figure 5.** Band ratio maps derived from the ASTER image showing the distribution of anomalous clay minerals (**a**), phyllic minerals (**b**), propylitic minerals (**c**), and iron oxides (**d**).

#### 3.1.4. Generating Mineral Prospectivity Maps

The application of fuzzy logic modelling allowed us to map the spatial distribution of true hydrothermal alteration zones (i.e., zones induced during the emplacement of metal deposits). In this regard, three mineral prospectivity maps were created. The first map combines the natural gamma-ray spectrometry data, which are the thematic layers of K, K/eTh ratio and K/eU ratio. The second map uses the thematic layers of band ratios derived from the ASTER image. These thematic layers were selected and then combined by

the fuzzy gamma operator (γ = 0.72). The third map is the result of combining the first two mineral prospectivity maps.

Figure 6a shows the mineral prospectivity map derived from the combination of natural gamma-ray spectrometry data. The fuzzy gamma operator (γ = 0.72) merged the K, K/eTh and K/eU thematic layers. Evaluation of the fuzzy membership results shows that the favorability index varies spatially, and high values are associated with certain rocks. High favorability index values are mapped in granites and volcanic rocks, including miarolitic rhyolites, rhyodacites and Lower Ediacaran rhyolites. Most of the prospects and showings already mapped are located in the high-value areas of this showing. The map shows that the Tagmout graben fault zone has the highest value of the favorability index (0.8 to 1.0). The evaluation of the mineral prospectivity map derived from the geophysical data shows several anomalous zones that could be the object of future mining prospecting.

**Figure 6.** Mineral prospectivity maps: (**a**) mineral prospectivity map derived from natural gamma spectrometry data. (**b**) Mineral prospectivity map derived from ASTER image.

Figure 6b shows the mineral prospectivity map of the study area derived from the combined Aster data. Examination of this map shows that the high favorability index is associated with certain lithological units. Lower Cryogenian volcano-sedimentary turbidites, Lower Ediacaran intrusive rocks and Cambrian units show high favorability index values (0.6 to 1). The altered zones associated with these rocks are the most favorable locations for mineralization. The resulting map shows that the Wawitch and Isk n'Alla granites and faulted zones have the highest favorability index values (0.9 to 1.0).

#### **4. Discussions**

The processing of the geophysical and ASTER data used in the present work is a preliminary step of great importance for recognizing new potential targets for mining research at the scale of the Kelâat M'Gouna inlier. The combination of two different data types allowed us to map in detail the hydrothermal alteration zones associated with the

metal deposits, especially the gold deposits. Benziane et al., 2008 [31], showed that the three gold prospects in the study area are associated with hydrothermal alteration phenomena. According to these authors, silicification is expressed either by developing quartz stockworks or by dispersion in the host volcano-sedimentary formations, following a general NE-SW direction. Generally, this silicification is accompanied locally by opening zones filled with potassium feldspar and tourmaline. In this respect, ratios of K/eTh and K/eU were calculated to highlight enrichment zones in this element. Chloritization is manifested by the development of large zones in the soft greenish levels of the volcano-sedimentary series. For this purpose, the Aster image band ratio (b7 + b9)/b8 was calculated to map the spatial distribution of high chlorite content zones. Hematization is materialized by the development of iron oxides and hydroxides, following the fracturing planes, which are generally oriented N20◦. This type of alteration mineral was mapped using the band ratio (b5/b3) + (b1/b2). Finally, the band ratio (b5 + b7)/b6 was used to map the sericitization in the study area in a subordinate manner [31]. In the form of thematic layers, this database was fuzzified by the fuzzy logic modelling technique by combining the two mineral prospectivity maps, Aster and radiometric. Subsequently, a mineral prospectivity map for the study area was generated. A total of seven prospective value areas were identified (Figure 7), and they are mainly associated with alteration zones:


**Figure 7.** Combination of mineral prospectivity maps.

#### **5. Conclusions**

The study demonstrates the importance of combining natural gamma-ray spectrometry and ASTER data in the early stages of mineral exploration. This combination was applied to target areas with high mining potential in the Kelâat M'Gouna inlier. The results obtained allowed detailed mapping of hydrothermal alteration zones related to mineralization. Maps of Potassium (K in %), Uranium (eU in ppm), Thorium (eTh in ppm) and ratios of K/eTh and K/eU were generated to delineate the high concentrations of radioactive elements related to the altered zones, particularly in Potassium.

Band ratios extracted from the ASTER image were calculated to visualize the spatial distribution of specific minerals in the alteration zones. Clay, phyllite, propylitic minerals and iron oxides were mapped in some lithologies that host several mineral occurrences.

The mineral prospectivity maps generated by the fuzzy logic modelling allowed us to locate the alteration zones. Seven anomalous zones were then distinguished. The geological data showed that these zones are located in the contact zones between the granitic massifs, especially the Wawitcht, Isk n'Alla and Azlag granites and their host rocks formed by the Ediacaran volcano-sedimentary rocks of the Ouarzazate Group. In addition, most of these zones have been mapped in rocks that host the prospects and, mining showings already indicated, notably those of gold. Other anomalous zones have been mapped in fault zones, mainly in NE-SW and N-S trends, such as the Sidi Flah fault zone and the Tagmout graben zone. To this end, it is recommended to carry out a detailed structural study in conjunction

with geophysics to locate, delineate, and follow the deep rooting of the metallic bodies and tectonic structures that may plug the mineralization in the study area.

**Author Contributions:** Conceptualization, Y.M., A.A. (Ahmed Attou), A.M. (Abdelhalim Miftah) and M.O.; Data curation, Y.M. and M.O.; Formal analysis, Y.M., A.A.(Ahmed Attou) and M.O.; Funding acquisition, Y.M. and A.M. (Anselme Muzirafuti); Investigation, Y.M.; Methodology, Y.M., M.O., B.D. and Y.E.-t.; Project administration, Y.M. and A.A.(Ahmed Attou); Resources, Y.M., G.R., S.L. and A.M. (Anselme Muzirafuti); Software, Y.M., M.O. and L.A.; Supervision, A.A. (Ahmed Attou) and A.M. (Abdelhalim Miftah); Validation, Y.M., A.A. (Ahmed Attou), A.M. (Abdelhalim Miftah), A.A. (Abdelhamid Allaoui), M.B., G.R., S.L., A.M. (Anselme Muzirafuti), M.O. and B.D.; Visualization, Y.M., A.A. (Ahmed Attou), A.M. (Abdelhalim Miftah), A.M. (Anselme Muzirafuti) and M.O.; Writing—original draft, Y.M. and M.O.; Writing—review and editing, Y.M., A.A. (Ahmed Attou), A.M. (Abdelhalim Miftah), M.O., B.D., L.A., Y.E.-t., A.A. (Abdelhamid Allaoui), M.B., G.R., S.L. and A.M. (Anselme Muzirafuti). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Location Planning of Charging Stations for Electric Buses in Public Transport Considering Vehicle Scheduling: A Variable Neighborhood Search Based Approach**

**Nils Olsen † and Natalia Kliewer \*,†**

Department of Information Systems, Freie Universität Berlin, Garystr. 21, 14195 Berlin, Germany; nils.olsen@fu-berlin.de

**\*** Correspondence: natalia.kliewer@fu-berlin.de

† These authors contributed equally to this work.

**Abstract:** Many public transport companies have recently launched projects testing the operation of electric buses. Progressively, traditional combustion engine buses are being replaced by electric buses. In such cases, some stops on bus lines are equipped with charging technology. Combustion engine buses can operate for an entire day without having to refuel. By contrast, electric buses have considerably shorter ranges and need to recharge their batteries throughout a day. For cost-efficient use of electric buses, charging stations must be located within the road network so that required deadhead trips are as short as possible, but attention must also be paid to construction costs. In contrast to vehicle scheduling, which is a more short-term planning task of public transport companies, location planning of charging stations is a long-term planning problem and requires a simultaneous solving of both optimization problems. Specifically, location planning and vehicle scheduling have to be considered simultaneously in order to open up optimization potentials by comparison to sequential planning, since locations of charging stations directly influence the resulting vehicle rotations. To this purpose, we present a novel solution method for the simultaneous optimization of location planning of charging stations and vehicle scheduling for electric buses in public transport, using variable neighborhood search. By a computational study using real-world public transport data, we show that a simultaneous consideration of both problems is necessary because sequential planning generally leads to either infeasible vehicle rotations or to significant increases in costs. This is especially relevant for public transport companies that start operating electric bus fleets.

**Keywords:** location planning; vehicle scheduling; electric buses; charging stations; partial charging

#### **1. Introduction**

In the last years, awareness of climate change and sustainable operations has increased significantly throughout the entire economy and public life. Electromobility is currently considered a highly relevant technology in order to make public transport systems more sustainable and environmentally friendly. Therefore, traditional buses with combustion engines are being progressively replaced by electric buses. Electrically powered buses facilitate a locally emission-free movement which leads to minimal emission levels of greenhouse gases, dust particles, and nitrogen oxides. Seeking to improve the quality of life, especially in congested urban areas, electric buses enable much more quietly operations [1].

At present, the electric energy required for powering electric buses is either provided by batteries or is generated by fuel cells from hydrogen, methanol, or similar fuel [2]. Due to the lower energy density of modern electric batteries compared to common tank capacities for hydrogen or methanol, battery-powered buses involve the greatest challenges for bus operations. For this reason, we focus on battery electric buses (BEBs) within this work. However, the methodology and results of this work can be transferred to any other type of electric engine. We will consider electric bus and battery electric bus as synonyms.

**Citation:** Olsen, N.; Kliewer, N. Location Planning of Charging Stations for Electric Buses in Public Transport Considering Vehicle Scheduling: A Variable Neighborhood Search Based Approach. *Appl. Sci.* **2022**, *12*, 3855. https://doi.org/10.3390/app12083855

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti, Dimitrios S. Paraforos and Stefania Lanza

Received: 2 March 2022 Accepted: 8 April 2022 Published: 11 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Traditional combustion engine buses can often operate for an entire day without having to refuel. By contrast, modern BEBs have only a fraction of the ranges of combustion engine buses and need to recharge their batteries several times a day [3]. Nowadays, BEBs are charged overnight at vehicle depots after the completion of their daily operations. In addition, the vehicles are charged at charging stations during shorter waiting periods while operating (opportunity charging). Energy transmission occurs either conductively by a wire or inductively. In some cases, the vehicle batteries are also replaced with a fully charged battery (battery swapping).

With a view, for example, to the current real-world bus project at the Schiphol Airport in Amsterdam, the Netherlands, the bus company Connexxion operates with up to 100 BEBs at the present time [4]. Electric VDL Citea buses are operated within this project, with batteries capable of storing 215 kWh which results in a range between 80 and 120 km. The batteries are charged inductively with fast charging systems. Most modern electric buses like the *Irizar ie Bus* are able to store about 350 kWh and may operate up to 17 h in urban bus systems without charging [5].

In recent years, many other public transport companies have launched similar pilot projects testing the operation of BEBs. An overview on current projects is provided by [6]. Most projects initiated aim towards substituting diesel buses with BEBs during the daily services while retaining cost-minimal vehicle rotations. In such cases, charging systems are established at some stops on the bus lines to facilitate the recharging of the vehicle batteries during operation. For a cost-efficient deployment of BEBs, the charging stations must be built within the road network so that deadhead trips are as short as possible or are not necessary at all. Longer deadhead trips increase the operational costs and may lead to higher demands for buses.

Therefore, construction costs for charging stations as well as the buses' purchase and operational costs have to be considered at the planning stage. The planning process of public transport companies consists principally of strategic, tactical, and operational planning tasks, which differ with regard to the time periods considered. Figure 1 provides an overview of the planning process. Strategic planning comprises the network design and line planning. The network design determines stop points and necessary infrastructure, particularly including the distribution of charging stations within the road network. In this scope, specific technical aspects such as energy grids' transmission capacities or restrictions imposed by local conditions may be considered [7,8]. Within the tactical planning, timetables are constructed according to the previously planned lines. Operational planning determines the deployment of vehicles and personnel.

**Figure 1.** Overview of the planning process arising for companies in public transport when deploying BEBs.

The first operational planning task is vehicle scheduling, which specifies the vehicle deployment for operating service trips offered daily. Service trips denote trips to transport passengers from a departure stop via intermediate stops to an arrival stop at fixed times determined by a timetable. The objective is to assign the set of service trips to vehicles at minimum costs. As part of this task, each service trip must be covered exactly once, each vehicle must execute a feasible sequence of trips (vehicle rotation) without time overlaps, and each vehicle must start and end its rotation at the same depot. This optimization problem commonly refers to the term *Vehicle Scheduling Problem* (VSP). Between successive service trips a vehicle can perform deadhead trips without transporting passengers if necessary. If BEBs are considered within vehicle scheduling, restricted operating ranges due to limited battery capacities and battery charging must be taken into account. This extended optimization problem is commonly denoted as the *Electric Vehicle Scheduling Problem* (E-VSP). While charging, a vehicle stops at a charging station for a specific time period depending on the battery's remaining energy (State of Charge, SoC). Batteries can be either fully or partially charged. The task of determining when, where, and to what extent a battery is charged is denoted as battery management which is closely related to vehicle scheduling.

Unlike vehicle scheduling, which is a more short-term planning task in operational planning, location planning of charging stations is a long-term planning task belonging to strategic network planning and requires a simultaneous optimization of location planning of charging stations and vehicle scheduling for BEBs. Both optimization problems have to be considered simultaneously in order to open up optimization potentials by comparison to sequential planning. At the present time, there are solution approaches to the E-VSP considering fixed locations of charging stations determined in advance, on the one hand. On the other hand, location planning problems for charging stations are being solved to provide for the operation of cost-minimal vehicle rotations computed for buses without range limitations by BEBs. Both approaches belong to a sequential planning.

Simultaneous problem solving is always applicable when a public transport company fully or partially substitutes its fleet of diesel buses with BEBs for the first time. This is particularly the case because charging stations are not usually available within public transport systems yet and need to be built. Furthermore, it is expected that in the future private energy companies will operate networks of charging stations, especially within urban areas, that can be used by vehicles and buses. Some of these networks already exist, such as *E.on Drive* in Germany, but it is expected that such offers will be expanded in the future [9]. In this scenario, each transport company has to pay a usage fee in order to charge a vehicle at specific stations. While location planning of charging stations is a long-term planning problem, vehicle scheduling is carried out every time the timetable changes. However, the simultaneous approach is still applicable because then it is based on the modified timetable and the set of charging stations provided by the energy companies. The construction costs for building a charging station then correspond to the usage fees.

In this paper, we present a novel solution method for the simultaneous optimization of location planning of charging stations and vehicle scheduling for BEBs in public transport to open up potentials for cost savings in comparison with a sequential planning. To do so, we develop a solution approach based on *Variable Neighborhood Search* (VNS), which has been successfully applied to real-world combinatorial optimization problems in a variety of application areas [10]. We propose a heuristic solution approach because the E-VSP and the location planning problem are both difficult to solve, especially with regard to larger instances. Following Lenstra and Rinnooy [11] and Yang and Sun [12], both problems are NP-hard. Simultaneous problem solving is expected to be no less difficult [13]. Within our solution approach we incorporate complete as well as partial charging procedures of the vehicle batteries. By a computational study, we prove the need for simultaneous optimization as opposed to sequential planning. We show that simultaneous problem solving is necessary because sequential planning generally leads to either infeasible vehicle

rotations or to significant increases in costs. Further on, we discern that the incorporation of partial charging procedures leads in principle to major cost savings.

This paper is outlined as follows: In Section 2 we provide an overview of existing work about scheduling of electric vehicles and location planning of charging stations for BEBs. In Section 3 we define the problem to be solved formally. Following this, we introduce the metaheuristic solution method in Section 4. In Section 5 we perform comprehensive computational experiments and analyze the results in order to make key statements. We provide conclusions and present potentials for further research in Section 6.

#### **2. Literature Overview**

In this section, we give an overview of related work. As mentioned above, existing work can generally be divided into scheduling of BEBs assuming fixed locations of charging stations and location planning of charging stations for given vehicle rotations. Consequently, we begin by discussing existing solution approaches for scheduling BEBs in public transport. We then present literature on location planning of charging stations.

#### *2.1. Scheduling Electric Buses*

As one of the first contributions dealing with alternative engine types within vehicle scheduling, Stasko and Gao [14] present a solution method for the VSP taking into account different engine options. The solution approach is based on integer programming. Engines powered by compressed natural gas (CNG) are considered besides combustion engines. The approach aims at reducing emission levels within vehicle scheduling.

Reuer et al. [15] consider a mixed fleet of vehicles consisting of electrically powered buses and buses without range limitations within the basic VSP. The authors apply a timespace network based exact solution method for the VSP introduced by Kliewer et al. [16] to solve the enhanced optimization problem. Solutions obtained to this problem contain optimal flow values through the network. Therefore, strategies for flow decomposition are necessary to obtain vehicle rotations. The authors analyze six strategies for flow decomposition that aim at maximizing the proportion of feasible vehicle rotations for BEBs. Battery charging is assumed to be performed within constant time periods. The authors show that a simple substitution of traditional buses with BEBs leads to widely infeasible vehicle schedules.

Haghani and Banihashemi [17] consider a fleet consisting entirely of range restricted vehicles. They consider vehicle scheduling with route and time constraints in order to limit the lengths and durations of vehicle rotations. However, battery charging is not considered. The authors propose one exact and two heuristic solution models together with techniques for reducing the problem sizes in order to solve even larger-scale problem instances. Chao and Xiaohong [18] consider battery swapping in addition to limited operating ranges of BEBs within the VSP. To solve the problem, a solution method based on a Non-dominated Sorting Genetic Algorithm (NSGA-II) is introduced. A case study based on real-world data taken from a project in Shanghai is performed to analyze the solution approach. Li [19] addresses vehicle scheduling of BEBs with either battery swapping or charging and presents a model for restricting the maximum route distance. Both fast charging and battery swapping are presumed to be performed within constant time windows, but the time for fast charging depends on the location. Adler and Mirchandani [20] deal with scheduling of BEBs incorporating charging procedures at given charging stations located within the road network. To solve the problem, they present a column-generation approach. A heuristic method is presented to obtain necessary initial solution. The algorithm is based on a greedy algorithm and computes vehicle rotations under consideration of range limitations and charging. In this work, again full chargings of vehicle batteries are assumed.

As one of the first authors, Wen et al. [21] address the E-VSP with partial chargings. They present an exact solution method based on mixed integer programming and an adaptive large neighborhood search heuristic approach. The results demonstrate that the exact solution methods is only applicable to small problem instances. However, the

heurstic solution approach also solves larger instances in a reasonable amount of time. van Kooten Niekerk et al. [22] also consider partial charging procedures of BEBs. The authors introduce a solution approach based on column generation. Charging times depend linearly on a battery's SoC. Furthermore, battery aging and time-dependent energy prices are considered. The authors show that in some cases, the consideration of partial charging procedures leads to cost savings.

Recently, Wang et al. [23] proposed an exact solution method for the E-VSP based on dynamic programming. Within this contribution, battery aging is particularly considered. The objective of the solution method is minimize the total costs especially incorporating costs for battery replacements during the life spans of the vehicles deployed. By a computational study, the authors analyze the influence of different working loads, battery management, and working temperatures of batteries on resulting vehicle schedules.

#### *2.2. Location Planning of Charging Stations for Electric Buses*

At the present time, only few publications deal with location planning of charging stations for BEBs in public transport. Kunith et al. [2] present a mixed integer linear optimization model for determining locations for charging stations for a bus route. The model is based on a set covering problem. The objective is to minimize the number of charging stations needed. The authors consider constraints imposed by the buses' operation and the battery charging process. In addition, different energy consumption scenarios are considered to reflect external influencing factors on the buses' energy consumption, such as traffic volume and weather conditions. Standard optimization libraries are used for solving the problem.

Berthold et al. [24] propose a mixed integer linear program in order to determine optimal locations of charging stations for the electrification of a single bus line in Mannheim. The problem is solved by using standard optimization libraries. Furthermore, partial charging procedures and battery aging effects over several time periods are considered. Since the problem is very complex, the solution approach is not suitable for larger instances. Xyliaa et al. [25] develop a dynamic optimization model to establish a charging infrastructure for BEBs in Stockholm, Sweden, considering restricted waiting times at intermediate stops on service trips given by the schedule and different currents of the charging systems imposed by local conditions. They provide statements about the application possibilities of BEBs in urban areas and effects on vehicle rotations. Within both works, no line changes of the buses used are considered.

Liu et al. [26] consider energy consumption uncertainties within location planning of charging stations for BEBs in public transport. Therefore, the authors propose a robust optimization model represented by a mixed integer linear program. Using real-world data, the authors show that the proposed solution model can provide optimal locations for charging stations that are robust against uncertain energy consumption of BEBs. Lin et al. [27] introduce a spatial-temporal model for a large-scale planning of charging-stations for BEBs in public transport. The authors consider characteristics of BEBs operation and plug-in fast charging technologies. The model is represented by a mixed-integer second-order cone programming formulation with high computational efficiency. A case study using data from Shenzhen, China is used to analyse the robustness of the solution model to timetable changes.

Based on the solution method presented in this paper, Stumpe et al. [13] present an exact mathematical model for integrated optimization of vehicle scheduling with BEBs and location planning for charging stations. The authors particularly perform a robustness analysis and study the impact of technological aspects such as battery capacity, charging power, and energy consumption as well as economic issues containing investment costs for charging stations and electric buses. A computational study points out that the exact solution model introduced is not capable of solving realistic problem instances to optimality.

Regarding related optimization problems in the scope of transportation, there are some contributions dealing with the charging infrastructure for electric vehicles. Regarding *Vehicle Routing Problems* (VRP) with electric vehicles, Worley et al. [28] propose a solution approach for the simultaneous determination of optimal locations for charging stations and vehicle routes. They show that this approach leads to lower total costs of the vehicle deployment by comparison to locations of charging stations known a priori. Schiffer and Walther [29] also deal with the simultaneous determination of locations for charging stations and routes for electric vehicles. The authors extend this optimization problem by considering uncertain characteristics of the customers to be served. Uncertain spatial customer distributions, demand, and service time windows are particularly addressed. The authors introduce a robust optimization approach based on adaptive large neighborhood search. Vehicle routing comprises different challenges and conditions than vehicle scheduling and therefore needs other solution approaches. Consequently, it is not possible to draw concrete statements with regard to the E-VSP.

#### *2.3. Summary and Need for Further Research*

Table 1 presents the main characteristics of the presented literature. As described there, there is no existing work that deals with scheduling of BEBs and location planning of charging stations simultaneously. However, as underlined by Worley et al. [28] with regard to vehicle routing, a simultaneous optimization opens up potentials for cost savings. It is to be expected that a simultaneous problem solving will also be beneficial for scheduling of BEBs in public transport. In addition, partial charging procedures have not yet been considered sufficiently within the scope of scheduling BEBs. As shown by van Kooten Niekerk et al. [22] for fixed locations of charging stations, the incorporation of partial charging procedures facilitates further optimization potentials. Simultaneous problem solving under consideration of partial charging procedures forms the basic idea of our contribution.



#### **3. Problem Description and Cost Model**

In this section, we present the *Electric Vehicle Scheduling Problem with Location Planning of Charging Stations* (E-VSP-LP) as the key problem being solved in this paper. In the following, we first introduce the parameters of the problem. Afterwards, we introduce decision variables and the objective function.

We assume a public transportation network given by a set *<sup>S</sup>* <sup>=</sup> {*s*1, ... ,*sn*} of *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> stop points also containing the set of vehicle depots *D* ⊆ *S*. Service trips are defined by a given timetable as a set *<sup>T</sup>* <sup>=</sup> {*t*1, ... , *tm*} with *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>. A service trip *<sup>t</sup>* <sup>∈</sup> *<sup>T</sup>* is characterized by its departure and arrival time as well as its departure and arrival stop. For any pair (*si*,*sj*) ∈ *S* × *S* of stop points there is a specific distance and travel time that can be different depending on whether the trip is a service or deadhead trip. In our study, we do not consider opportunity charging of BEBs during the execution of service trips. Consequently, the set *S* contains the departure and arrival stop of each service trip *t* ∈ *T* as well as the set of depots. The aim is to assign the service trips contained in *T* to a set of BEBs that are substantially determined by their battery capacities. There may be other specifications such as vehicle dimensions or passenger capacities. Each combination of these features is denoted as a *vehicle type*. To recharge the vehicle batteries, charging stations can be built at each stop point of *S*. The installed charging system at a charging station considerably influences the time needed for charging. A vehicle can be either fully or partially charged, which also affects the charging time.

For the deployment of a BEB fixed costs *cbus fixed* > 0 incure independently of the executed trips. Each charging or trip operated during a vehicle rotation results in operational costs. Therefore, we consider time costs per hour *cbus time* > 0 and for the distances covered of *cbus distance* > 0. The equipment of stop points with charging technology causes fixed costs *c charging fixed* > 0. These costs may be different, depending on the type of the charging system to be installed or the location. For instance, it is more expensive to build a charging station at a busy crossing than in a quiet side street.

We define decision variables *ys* ∈ {0, 1}, ∀*s* ∈ *S* and *xv* ∈ {0, 1}, ∀*v* ∈ *V* denoting the decision whether a charging station is built at stop point *s* or respectively, whether a vehicle *v* is used or not. The objective of the simultaneous optimization problem is to minimize the total costs for a given timetable and potential locations of charging stations. Accordingly, fixed costs for BEBs as well as charging stations and operational costs for the buses' operation must be minimized. The objective function can be formulated as

A trip's duration is specified by *dur*(*t*) ≥ 0 and a trip's length by *len*(*t*) ≥ 0. The objective function's value may be interpreted as the total costs caused by a first investment into an electrification of a public transport company's fleet and infrastructure for a specific timetable period. Variable costs for the maintenance of the charging infrastructure or battery replacements are not considered within this work.

In this paper, we solve the E-VSP-LP heuristically as large real-world instances cannot be solved to optimality in an acceptable time [13]. For that reason, we do not present a formal model at this point. However, we refer to Stumpe et al. [13] for a comprehensive mathematical problem formulation and further insights.

#### **4. A Variable Neighborhood Search Based Solution Method for the E-VSP-LP**

In this section, we discuss our solution approach for the E-VSP-LP. The objective is to find vehicle rotations for BEBs and locations for charging stations simultaneously and at a minimum cost. We begin by presenting the basic procedure of our heuristic solution method. The solution method consists primarily of generating initial solutions first and then finding new solutions with lower total costs. To do so, we introduce a savings algorithm for generating initial solutions in Section 4.2. Afterwards, we present an algorithm for improvement based on VNS in Section 4.3.

#### *4.1. General Approach*

Algorithm 1 provides the main procedure of our solution method. The set of scheduled service trips to be assigned and an initial set of charging stations, together with their locations, serve as the input data. Already existing charging infrastructure, for example due to the implementation of previous pilot projects, may be included in the set of charging stations. Usually, at the beginning of the algorithm the set of charging stations is empty. The algorithm basically consists of two consecutive steps: First, we use a savings algorithm to generate initial sets of vehicle rotations for BEBs and charging stations (l. 1). Subsequently, we use this initial solution as the input for an improvement method based on VNS, which we denote as BVNS (l. 2). The algorithm terminates by returning the best solution found. The two key Algorithms 2 and 3 are explained in the following sections.

#### **Algorithm 1** Main Variable Neighborhood Search.

**Input:** scheduled service trips *T*, charging stations *S* **Output:** vehicle rotations *V*, charging stations *S*

1: (*V* , *S* ) ← SA(*T*, *S*); 2: (*V*, *S*) ← BVNS(*V* , *S* ); 3: **return** *V*, *S*;

## *4.2. Savings Algorithm for Generating Initial Solutions*

The savings algorithm was first introduced by Clarke and Wright [30] to solve VRPs heuristically. The objective of vehicle routing is to determine an optimal set of routes seeking to service a number of customers with a fleet of vehicles. Following Cordeau et al. [31], the savings algorithm is one of the most commonly used methods for vehicle routing in practice. Starting from routes each containing one customer the basic procedure is to compute cost savings iteratively for merging two routes into the same one. Within each iteration the merging that results in the highest saving is performed. A saving consists of fixed and operative costs saved. This procedure terminates when no further mergings can be performed. While this algorithm has been applied generally to VRPs, we adapt this algorithm hereinafter in order to apply the same procedure to the E-VSP-LP.

Algorithm 2 shows the procedure for generating initial solutions to the E-VSP-LP formally using the idea of cost savings. The set of scheduled service trips to be assigned and an initial set of charging stations, together with their locations, serve as the input data. The algorithm begins by adding a vehicle rotation for each scheduled service trip, now containing only the associated trip together with a deadhead trip from and to the depot (l. 4). If these vehicle rotations are not feasible for BEBs the entire optimization problem is infeasible. Within each iteration of the algorithm those two vehicle rotations (l. 7 and 8) are merged that lead to a feasible rotation and entail the highest saving. Therefore, the set of service trips of both rotations to be merged are processed consecutively, in order of departure times (l. 9). Since the SoC mostly influences the feasibility of a vehicle rotation besides temporal restrictions the algorithm aims at adding charging procedures as often as possible. For this purpose, starting with a new and empty vehicle rotation (l. 10), four different cases are considered for each service trip of the rotations to be merged. First, we check whether a charging procedure can be performed at an existing charging station of *S* before executing the current service trip, taking into account necessary deadhead trips (l. 12). If this can be done, necessary deadhead trips, the charging procedure, and the service trip are added (l. 13). If this is not possible, we examine whether the current service trip can be executed without detours to charging stations (l. 14). If the SoC is insufficient, we check whether the current service trip can be executed by building a new charging station at the trip's departure stop and performing a charging procedure (l. 16). Lastly, the same is checked but for the latest position of the vehicle, which is less strict because the deadhead trip is executed after the charging procedure (l. 18). If none of these options can be carried

out, the current merging is aborted (l. 20). When a merging is feasible, the saving for merging two vehicle rotations *v*, *w* ∈ *V* into a new rotation *v*, *w* is given by

$$s(v, w) = c\_{fix}^{bus} - \delta \cdot c\_{fix}^{charging} - (o(\overline{v, w}) - o(v) - o(w)) \tag{2}$$

where *<sup>o</sup>*(*v*) <sup>≥</sup> 0, <sup>∀</sup>*<sup>v</sup>* <sup>∈</sup> *<sup>V</sup>* denotes the operational costs for each vehicle rotation and *<sup>δ</sup>* <sup>∈</sup> <sup>N</sup> the number of additionally respectively fewer needed charging stations. After each iteration the merging is performed that involves the highest positive saving (l. 25). Then, the set *S* of charging stations is modified, the new vehicle rotation is added, and the rotations merged are removed (l. 26 and 27). If no positive savings exist, the algorithm terminates and returns the sets of vehicle rotations and charging stations (l. 29). Hence, solutions generated by this procedure are always feasible.

The procedure of Algorithm 2 is based on the heuristic solution method proposed for the E-VSP by Adler and Mirchandani [20]. Within this algorithm, the charging stations are assumed to be known a priori and cannot be changed. However, within Algorithm 2, we extend the procedure from Adler and Mirchandani [20] significantly by incorporating location planning for charging stations.

**Algorithm 2** Savings Algorithm (SA).


#### *4.3. Variable Neighborhood Search for Improvement*

To finding new solutions with lower total costs, we use a VNS based solution method. VNS was first introduced by Hansen et al. [10]. Solution approaches based on VNS have already been extensively considered in the literature and have been proven to be suitable for numerous practical problems with realistic data sizes [32]. The underlying concept of VNS is a systematic change of neighborhoods, both in an improvement phase to find a local optimum and in a perturbation phase to escape from local optima. In the perturbation phase, a so-called shaking method is applied, which exerts a stochastic influence on an incumbent solution by performing stochastic changes. Even this procedure can cause a deterioration in the objective function value it has used to escape from local optima. In the improvement phase, a local search method is used to find new solutions with lower total costs.

Adapting the basic VNS concept to solve the E-VSP-LP thus requires the definition of a problem specific neighborhood structure and methods for shaking, a local search, and changing the neighborhood. Algorithm 3 provides the procedure for our solution method. The algorithm follows the *basic* VNS adapted from Hansen et al. [33]. Note that the following procedure is applicable not only for solutions generated by Algorithm 2 but also for every possible feasible solution.

#### **Algorithm 3** Basic Variable Neighborhood Search (BVNS).

10: **end while** 11: **return** (*V*, *S*);


We first define a neighborhood *Nk* of size *<sup>k</sup>* <sup>∈</sup> <sup>N</sup> by selecting *<sup>k</sup>* vehicle rotations. The choice of the vehicle rotations will be made randomly from the entire set in order to incorporate stochastic influences. It follows the maximum neighborhood size *kmax* <sup>∈</sup> <sup>N</sup> as the number of vehicles used within the incumbent solution. After each iteration of shaking and local search, a neighborhood change is performed. In this step, the objective function values of the incumbent and improved solution are compared. If the improved solution is better than the incumbent, it is accepted and the size of the neighborhood is reset to the smallest possible value. Otherwise, the size of the neighborhood is increased and the procedure is repeated. The procedure terminates when the maximum computational time is exceeded. Algorithm 4 shows the procedure formally.

#### **Algorithm 4** NEIGHBORHOODCHANGE.

**Input:** solutions (*V*, *S*), (*V* , *S* ), neighborhood size *k*, objective function *f* **Output:** solution (*V*, *S*), neighborhood size *k*

1: **if** *f*(*V* , *S* ) < *f*(*V*, *S*) **then** 2: (*V*, *S*) ← (*V* , *S* ); 3: *k* ← 1; 4: **else** *k* ← *k* + 1; 5: **end if** 6: **return** (*V*, *S*), *k*;

Second, we use Algorithm 5 as the local search method within Algorithm 3 for improving a solution. As the total costs of a solution consist of operational costs for deadheading as well as fixed costs for vehicles and charging stations, Algorithm 5 combines the three following Algorithms 6–8, each aiming towards reducing one cost component. In Algorithm 5, the move is performed that involves the highest cost saving.



can be exchanged, and the move with the highest saving is returned.

Algorithm 7 aims at inserting service trips of vehicle rotations with a lower number of service trips into vehicle rotations with a higher number of service trips, again based on a neighborhood. If an insertion is possible, a saving is computed containing proportionate fixed costs for the remaining service trips, fixed costs for additional charging stations, and operational costs for possible detours. Again, the best move found is returned. The algorithm attempts to omit vehicle rotations whereby no service trips are being executed any more.

#### **Algorithm 7** Shift Service Trips (SST).

```
Input: neighborhood Nk, fixed costs for an BEB cbus
                                               fix
   Output: neighborhood Nk
1: for all v ∈ V do
2: for all w ∈ V : |STw| < |STv| do
3: for all tw ∈ w do
4: if tw can be inserted in v then
5: Compute saving (cbus
                                 fix/|STw|) less the costs for newly built charging
   stations
6: and additional operational costs;
7: end if
8: end for
9: end for
10: end for
11: Perform move with the highest saving, omit a vehicle if no trips are being performed;
12: return Nk;
```
Algorithm 8 aims at decreasing the number of charging stations used by moving charging procedures from less frequented charging stations to higher frequented charging stations, considering the vehicle rotations of a neighborhood. The move is returned that is feasible and entails the highest saving including proportionate fixed costs for remaining charging procedures at a specific charging station and operational costs for additional detours. Similar to Algorithm 7, this procedure aims at omitting charging stations where chargings are no longer being performed at a specific stop point.


While stochastic influences on incumbent solutions are already incorporated by the random selection of a neighborhood's set of vehicles, the Algorithm 9 is applied additionally within Algorithm 3. This approach is intended to enable more stochastic changes to the procedure aiming to escape from local optima. Shaking is based on the procedures given by Algorithms 6–8. Within each method call of Algorithm 9, one of the three algorithms is randomly applied if the corresponding move is feasible. This is done even though the objective function value is being worsened.

#### **Algorithm 9** SHAKE.


5: **end if**

#### *4.4. Inserting Partial Chargings*

In our computational study, which follows this section, we incorporate complete and partial charging procedures. So far, the algorithms presented operate with any kind of charging procedures. However, we need more algorithmic effort in order to incorporate partial chargings within Algorithms 2 and 3. To that purpose, we consider the following Algorithm 10 by Olsen and Kliewer [34]. It is applied to each vehicle rotation that is generated respectively modified within the solution procedure. As a result, Algorithm 10 either returns the set of partial charging procedures that have to be inserted into the corresponding vehicle rotation or its infeasibility. Only if a resulting vehicle rotation is feasible is it taken into further consideration.

Algorithm 10 checks iteratively, after each trip of a rotation, whether the SoC has been violated (l. 2). If this is the case, the previous trips are considered (l. 3). Each trip that begins or ends at a charging station represents a charging opportunity (l. 5). If no such possibilities are found the vehicle rotations is infeasible (l. 9). Over all charging possibilities determined, the one performed at the most highly frequented charging station is processed (l. 11). This aims at reducing the number of charging stations by shifting charging procedures from less to more highly frequented charging stations. In the next step, the vehicle rotation is divided at the specific charging station into two sub-rotations containing the previous and subsequent trips. Then, both sub-rotations are processed by the algorithm. In the case that all sub-rotations are feasible, the algorithm terminates (l. 13). If a charging station is no longer needed it is omitted. If at least one sub-rotation is infeasible, the next charging opportunity is processed (l. 15 and l.16).

```
Algorithm 10 Inserting Partial Chargings (PCP).
   Input: vehicle rotation v, set S of charging stations
   Output: vehicle rotation v, feasibility or infeasibility of v
 1: for all t1 ∈ v do
 2: if SoC after executing t1 is not sufficient then
 3: for all t2 ∈ v previous to t1 do
 4: if Departure stop is a charging station then
 5: Save charging opportunity;
 6: end if
 7: end for
 8: if Set of charging opportunity is empty then
 9: return v, infeasible;
10: end if
11: Add charging opportunity at the highest frequented charging station;
12: if Vehicle rotation can be performed then
13: return v, feasible;
14: else
15: Exclude charging opportunity from the set of all opportunities;
16: Go to 8;
17: end if
18: end if
19: end for
```
#### **5. Computational Analysis**

In the following, we perform our computational experiments. We first present the instances to be solved and the problem parameters. Then, we look at the results of a sequential planning approach. In this case, location planning of charging stations and vehicle scheduling of BEBs are solved one by one. Therefore, our analysis is twofold: First, we discuss the results of solving a location planning problem for charging stations to enable the operation of given cost-optimal vehicle rotations computed for traditional buses without the range limitations of BEBs. Second, we present the results of solving an E-VSP given the locations of charging stations computed in the previous step. Last, we analyze the results of simultaneous problem solving using our heuristic solution method provided by Algorithm 3 for the E-VSP-LP and compare the results to the sequential planning approaches. We specifically investigate the impact of considering complete and partial charging procedures on solutions.

#### *5.1. Experimental Design*

Our computational experiments are performed on 10 real-world instances that are inspired by real-world public transport data. The instances are characterized by different numbers of stop points and service trips as well as different distributions of service trips over a day. To simplify the analysis, the instances' labels reflect the numbers of service trips and stop points. The instances' distributions of cumulative service trips over the day are presented by Figure 2. The figure shows that the instances differ substantially with regard to the distributions. It is worth mentioning that the last five instances consist of subsets of the service trips taken from instance t3067\_s209 for runtime reasons. In the case of instances t1580\_s209 and t1487\_s209 the original set of service trips was halved randomly, and in the case of instances t1060\_s209, t1074\_s209, and t933\_s209 the set was divided into three parts also in a random way.

**Figure 2.** Profiles of cumulative service trips.

Within our experiments, we presume a single vehicle depot, a single vehicle type, and a single charging system. Accordingly, each timetabled service trip can be executed by every available BEB. Additionally, each BEB can charge its battery at every charging station. With regard to the practical implementations of BEBs, we assume that three buses at most can be charged at a charging station at the same time. This is because building sites for charging systems are usually restricted, especially in urban areas. In our study, we distinguish between complete and partial charging procedures. In order to incorporate battery aging, we presume that a battery's SoC ranges between 20% and 80% of a battery's capacity as indicated by Fernandez et al. [35]. In our experiments, we first presume that a vehicle is always charged up to a SoC of 80%. After that, we consider partial chargings. In that regard, the threshold until a battery is charged may vary depending on the idle times at charging stations. Irrespective of the threshold until a battery is charged during its rotation, we assume that a vehicle always begins its rotation with a fully charged battery.

Following Stamati and Bauer [36], charging modern batteries is a nonlinear and therefore complex procedure. The current during a charging process is of particular importance. As demonstrated by Olsen and Kliewer [34], the current decreases quickly when a battery is charged to over 80% of its capacity. Below this threshold, the current is almost constant. For that reason, we assume a constant current and thus linear charging times for vehicle batteries within this paper. We assume that 5 kWh can be transferred into a vehicle battery per minute. In our study we consider chargings before the start or after the end of service trips. To reflect the lower consumption of BEBs on deadhead trips we therefore assume a consumption of 1.5 kWh per kilometer and of 1.8 kWh per kilometer driving on service trips. These parameters are inspired by the data of the previously introduced project at the Schiphol Airport in Amsterdam. At present, there is a wide range of battery capacities offered on the market that range between approximately 60 and 300 kWh. Based on this, we consider different battery capacities of 60, 120, 300 and 500 kWh within our experiments. A battery capacity of 500 kWh may be considered as a future development of battery research. Since we consider only one vehicle type at the same time, we conduct our study for each capacity. Based on Stamati and Bauer [36], a BEB in use and equipped with a 60-kWh battery causes fixed costs of about 350,000 monetary units. Measured by the battery sizes this results to fixed costs for the other vehicles of 365,000, 405,000, and 450,000 monetary units. With regard to the operational costs, we presume 0.5 units per driven kilometer and 50 units per hour of operation. Again based on the bus project in Amsterdam, the equipment of a stop point with charging technology is incorporated with fixed costs of 200,000 monetary units. We use the term "monetary units" here since we assume that these units are roughly comparable—at least in terms of scale—and, based on this, that monetary units form a system of imputed cost components.

#### *5.2. Location Planning of Charging Stations for the Electrification of Cost-Minimal Vehicle Rotations, Computed without Range Limitations*

We begin our computational analysis by discussing the results of solving a location planning problem for charging stations for the electrification of given cost-minimal vehicle rotations computed without range limitations. The vehicle rotations were generated using the exact optimization method for the traditional VSP by Kliewer et al. [16], which is based on a time-space network. In order to enable the operation of these rotations by BEBs, charging stations are added to the network and charging procedures are inserted into the vehicle rotations. Partial charging procedures are performed, since the idle times at potential charging stations are given by the vehicle rotations. The objective is to maximize the proportion of vehicle rotations that are feasible for BEBs. Ideally, this procedure should ensure the holistic operation of the timetabled service trips by BEBs. For this purpose, we adapt the location planning problem for charging stations introduced by Berthold et al. [24] and solve it using standard optimization libraries.

Table 2 provides the results of solving a location planning problem for charging stations, containing the proportions and absolute numbers of feasible vehicle rotations for BEBs together with the numbers and proportions of charging stations needed for each instance and each battery capacity. Additionally, the optimal number of vehicles used is indicated when no range limitations are considered. If the totality of all vehicle rotations is feasible for BEBs, the operational and total costs are specified for subsequent analyses. First, we observe that in the vast majority of cases the holistic electrification of vehicle rotations by means of inserting charging procedures is not possible. It is apparent that this observation holds regardless of the instance to be solved. However, the proportion of feasible vehicle rotations grows with increasing battery capacities. We can observe that every instance can be entirely served by BEBs in the case of a battery capacity of 500 kWh. In some cases, this situation already occurs with a battery capacity of 300 kWh and in a single case with 120 kWh. However, none of the instances can be entirely served by BEBs with a battery capacity of 60 kWh. Regarding a battery capacity of 60 kWh, the proportions of feasible vehicle rotations fluctuates widely and ranges between 7.25% and 79.63%. With regard to charging stations, it can be concluded that the numbers of stop points equipped with charging technology decreases significantly when the battery capacities grow. Instance t1296\_s88 shows the biggest reduction in the number of charging stations needed from

48.86% to 6.81%. The operational costs of feasible vehicle rotations decrease slightly when the battery capacities grow, which can be attributed to fewer charging procedures required.

In summary, a sequential planning solving at first a standard VSP without incorporating the special features of BEBs and subsequently a location planning problem for charging stations is generally insufficient, leading to widely infeasible solutions. This approach is only suitable if the ranges of BEBs rise sharply in the future. The results obtained serve as lower bounds for the numbers of BEBs used and as an upper bound for the numbers of charging stations needed in the evaluation of the simultaneous solution approach.

**Table 2.** Results of solving a location planning problem for charging stations for the electrification of cost-minimal vehicle rotations computed without range restrictions incorporating partial charging procedure.


*5.3. Scheduling of Electric Buses Given Fixed Locations of Charging Stations*

We now discuss the results of solving an E-VSP with given locations of charging stations. The set of charging stations determined by the previous experiment within Section 5.2 serves as the input, since this set is already optimal if corresponding solutions are feasible for BEBs. Following Section 1, the objective of the E-VSP is to minimize the number of buses in use and the operational costs for deadheading while covering each service trip. In order to ensure comparability, partial chargings are performed. Because the E-VSP is NP-hard and exact solution methods are not suitable for solving large real-world instances in general, as in our experiments, we solve the E-VSP heuristically here.

To do so, we use our main solution method from Algorithm 1 in a reduced version. Within both Algorithms 2 and 3, which represent the main components of Algorithm 1, we disable modifications of the charging stations. Within Algorithm 2, we only allow the assignment of service trips to vehicles without charging or with detours to existing charging stations. The other two cases are omitted. Within Algorithm 3, we modify the Algorithms 5 and 9 by disabling Algorithm 8 within each procedure. This approach means that the set of charging stations cannot change in this experiment. While the following results are not necessarily optimal due to the heuristic solving, we show that they provide reasonable bounds for our analysis within the next section.

An overview of the results of this experiment is given by Table 3, providing the numbers of vehicles used as well as operational and total costs. The number of charging stations is taken from the previous experiment. In contrast to that, now each solution is feasible, which was to be expected because of the constraints imposed by the E-VSP. Consequently, the total costs are specified for each instance and each battery capacity, containing fixed costs for buses used and charging stations as well as operational costs. First of all, the results show that in most cases where feasible vehicle rotations were computed in the first experiment described in Section 5.2, the solving of an E-VSP provides similar results regarding the numbers of vehicles used and total costs. In some cases, the number of vehicles required is slightly higher than in the first experiment, which can be traced back to the heuristic solution approach. Furthermore, the operational costs are marginally increased. However, the solutions of this experiment converge towards the optimal solutions and thus provide a reasonable benchmark for subsequent analyses. Regarding the numbers of vehicles used, one can observe that the fewer the proportions of feasible vehicle rotations determined within the first experiment, the more vehicles are required when solving the E-VSP. This is understandable because the closely-timed service trips of the vehicle rotations when no range limitations are considered do not provide enough time for rechargings. This leads to an increasing demand for vehicles. For example, considering instance t1580\_s209, the optimal numbers of vehicles used is obtained for battery capacities of 500 kWh and 300 kWh. As the proportion of feasible vehicle rotations reduces rapidly for 120 kWh and 60 kWh within the first experiment (81.33% respectively 52%), the need for additional vehicles rises significantly (6 respectively 12 additional vehicles). Regarding the operational costs, we note that higher demands for vehicles generally leads to decreasing operational costs. This is because less deadhead trips and chargings have to be performed due to the shorter rotations.

In conclusion, solving an E-VSP with given locations of charging stations always leads to feasible vehicle rotations, which is in contrast to the first experiment. However, this solution approach generally entails increases in costs due to additional deadhead trips, likely leading to increasing demands for vehicles. The results obtained serve as upper bounds for the analysis of the simultaneous problem solving to be conducted in the following section.


**Table 3.** Results of solving an E-VSP given locations of charging stations by reduced Algorithm 1 incorporating partial charging procedures.

#### *5.4. Simultaneous Optimization of Vehicle Scheduling and Charging Infrastructure*

We now discuss the results of simultaneous optimization of scheduling of BEBs and location planning for charging stations, i.e., solving the E-VSP-LP, using our solution method given by Algorithm 1. We begin by presenting the results obtained by Algorithm 2 for generating initial solutions. Then, we discuss the results of Algorithm 3 for finding new solutions with lower total costs. In this experiment we consider complete as well as partial charging procedures in order to enable a comparison with the previous experiments. We conclude this chapter by a runtime analysis.

#### 5.4.1. Summary of Results for Generating Initial Solutions

Table 4 provides the results of using Algorithm 2 for generating initial solutions containing feasible sets of vehicle rotations and charging stations. The results contain the total and operational costs as well as the numbers of buses and charging stations used for each instance and each battery capacity. Additionally, the differences in the total costs are specified when enabling partial charging procedures, and the best solutions found are in bold.


**Table 4.** Results of Algorithm 2 for generating initial vehicle rotations for electric buses and locations for charging stations considering complete and partial charging procedures.

We first compare the results to the first experiment conducted and described in Section 5.2. We observe that in two of the 17 cases, when the first experiment lead to feasible vehicle rotations, the total costs obtained by the application of the savings algorithm were already lower by comparison to solving a location planning problem for charging stations. In the other cases, higher total costs are obtained. In general, the higher total costs arise from higher demands for vehicles needed within the savings algorithm. Regarding each instance, the numbers of vehicles used has increased, which is reasonable due to the heuristic solution procedure of the savings algorithm. The solving of instance *t*1296\_*s*88 leads to the highest increase of 23.4%. By contrast, the number of charging stations used decreases in every case. In some cases, such as instance *t*1060\_*s*209, the number of charging stations needed is enormously reduced (30 to two). However, since the costs for additional vehicles prevail over the cost savings arising from the lower number of charging stations used, the total costs increase. This holds true both for complete and partial charging procedures. Regarding these two charging procedures, the total costs obtained are lower in seven of the ten instances for all battery capacities when partial

charging procedures are enabled. On average, total cost savings of about 1.2% are achieved. Only in three cases are the total costs higher when considering partial chargings.

We now compare the initial solutions with the results obtained and described in Section 5.3. With regard to the total costs, our observations are twofold: In those cases in which the solving of a location planning problem led to infeasible vehicle rotations, the application of Algorithm 2 leads to lower total costs by comparison to the results obtained by solving an E-VSP. In the other cases where feasible solutions were obtained, the total costs are higher, arising from a higher demand for vehicles needed as indicated previously. Basically, the results computed by Algorithm 2 merely serve as the input for improvement methods and thus do not serve as the final results. For this reason, the clarified statements are not particularly significant. In the next section, we present the results of improvement using our solution approach based on VNS.

#### 5.4.2. Summary of Results for Variable Neighborhood Search for Improvement

In order to carry out a final comparison between sequential planning and simultaneous problem solving, we now present the results of our solution method given by Algorithm 3 for finding new solutions with lower total costs. We use the initial solutions presented in the previous section as the input data. Table 5 shows the results, containing numbers of vehicles and charging stations used, as well as operational and total costs for each instance and each battery capacity. Additionally, the differences in the total costs are specified when enabling partial charging procedures, and the best solutions found are in bold.

Again, we first compare the results to solving a location planning problem for charging stations at given vehicle rotations. In those cases, where feasible solutions were computed and shown in Section 5.2, the total costs obtained by applying Algorithm 3 are almost of the same quality. In some cases, the total costs are slightly higher, which is most likely due to the heuristic solving. However, in certain scenarios, even better solutions are achieved which can be explained by the utilization of the degrees of freedom. Simultaneous problem solving enables shorter and fewer deadhead trips to charging stations, leading to lower operational and fixed costs for vehicles. This effect would be intensified if exact solution methods were used. As the sequential planning approach leads mostly to infeasible solutions, the simultaneous problem solving is generally preferable.

We now discuss the results with regard to solving an E-VSP with given locations of charging stations as carried out and described in Section 5.3. The most significant observation is that the total costs obtained by the simultaneous problem solving are always below the results of solving an E-VSP with fixed charging stations. This holds true for each combination of instance and battery capacity. The primary reasons for this are that the VNS based approach leads either to the same or slightly higher numbers of vehicles. Similarly, considerably lower numbers of charging stations needed are achieved due to the simultaneous solution procedure, leading to significant cost savings. Additionally, the operational costs are reduced for the most part, which can be explained by the shorter deadhead trips to charging stations. As the cost savings exceed the increased costs for additional vehicles, the solutions generated entail significantly lower total costs. It is interesting to oberserve that the greatest costs savings are achieved for instances that contain peak times of cumulative service trips over the day. This can be explained by the fact that peak times of service trips over the day allow the vehicles to recharge their batteries during times with reduced offers. In conclusion, simultaneous problem solving enables significant cost savings and is always preferable to solving an E-VSP with given locations of charging stations.


**Table 5.** Results of Algorithm 3 containing vehicle schedules for electric buses and charging infrastructure after 100.000 iterations considering complete and partial charging procedures.

Lastly, we investigate the impact of enabling partial charging procedures within vehicle rotations. The results clearly specify that the incorporation of partial chargings is more realistic and opens up optimization potentials. The number of vehicles as well as charging stations used is lower in almost all cases. This leads to significant cost savings up to 4.68% compared to the best solution found for one of the two sequential approaches. On average, savings of 1.17% over all instances and battery capacities can be observed. The same total costs are achieved in only one case. Furthermore, the more vehicles are used, the higher the cost savings are. For this reason, the cost savings generally decrease when the battery capacities increase.

It is worth noting that the clarified statements would also hold true for exact solution methods for the E-VSP-LP. Exact solving would even strengthen the results because of the expected lower total costs. Figure 3 illustrates the key statements made within this chapter. The figure provides an overview of the total costs obtained by the different solution approaches presented for the instances t1060\_s209, t1135\_s101, and t3067\_s209 and for all battery capacities. The instances are chosen among all instances presented since they cover characteristic problem sizes and distributions of cumulative service trips over the day. Comparable behavior is to be expected for instances with similar characteristics not shown here. It is important to note that the total costs are only specified for feasible solutions.

**Figure 3.** Overview of total costs obtained by the different solution approaches presented for all battery capacities.

#### 5.4.3. Convergence Analysis

The experiments were performed on a common desktop computer (Intel(R) Core(TM) i7-6700 HQ @CPU 2.60GHz 2.59GHz, 16GB RAM). The solution method is implemented in Java. The computational analysis was carried out using Python 3.10.

Figure 4 provides an overview of the convergence behaviour for all problem instances solved by Algorithm 3. In order to facilitate comparison between the different instances, the total costs obtained are normalized. Each figure contains data for the first 20.000 runs. For none of the instances solved a total run time of 10 h was exceeded. The results basically prove reasonable convergence behaviours towards the minimum total costs for all instances. However, particular differences between the instances can be observed. The lower the number of service trips, the faster the total costs obtained by Algorithm 3 decrease. It is noteworthy that the number of stop points has no visible influence on the speed of convergence.

**Figure 4.** Overview of convergence behaviour for all problem instances solved by Algorithm 3.

#### **6. Conclusions**

We have introduced a novel solution method for simultaneous optimization of location planning of charging stations and vehicle scheduling for BEBs in public transport. To do so, we introduced the E-VSP-LP, which extends the standard E-VSP to incorporate location planning of charging stations. To solve the problem we developed a metaheuristic solution method based on VNS, as both problems are difficult to solve. To generate the necessary initial solutions we adapted the traditional savings algorithm. To evaluate the solution approach we performed a computational study based on real-world public transport data, with up to 3000 service trips and different battery capacities of the buses deployed. We also focused on a consideration of complete and partial battery charging procedures of the

batteries within vehicle rotations. In our study we compared the simultaneous solution approach to sequential planning to tackle the underlying problems.

Our experiments showed that simultaneous solving of location planning of charging stations and vehicle scheduling of BEBs is necessary as opposed to sequential planning. First, we demonstrated that sequential planning, first solving a standard VSP and afterwards a location planning problem for charging stations, generally leads to infeasible vehicle rotations for BEBs with regard to current battery technologies. Second, solving an E-VSP with given locations of charging stations entails significant increases in costs. Solving the E-VSP-LP, on the one hand, ensures the feasibility of the vehicle rotations. On the other hand, significantly lower total costs are achieved by comparison to solving an E-VSP, due to the higher degrees of freedom. This is particularly relevant for public transport companies that start operating electric bus fleets. With regard to complete and partial battery chargings, we found large cost savings in most cases when enabling partial chargings within the vehicle rotations.

Our paper can be extended by the following aspects. First, the proposed models do not deal with multiple depots. Incorporating this extension would most likely open up further potentials for cost savings, as already shown for the traditional VSP. Second, our solution method solves the E-VSP-LP heuristically. Exact solution approaches would be interesting for a better verification of the quality of heuristic solution methods. In addition, an interesting path for future research would be to develop additional algorithms for the generation of initial solutions as well as for improvement. Finally, more accurate models regarding the technical aspects of BEBs may be considered. It is conceivable to presume uncertain energy consumptions that may depend on weather conditions or the volume of traffic.

**Author Contributions:** Conceptualization, N.O. and N.K.; methodology, N.O. and N.K.; software, N.O. and N.K.; validation, N.O. and N.K.; formal analysis, N.O. and N.K.; investigation, N.O. and N.K.; resources, N.O. and N.K.; data curation, N.O. and N.K.; writing—original draft preparation, N.O. and N.K.; writing—review and editing, N.O. and N.K.; visualization, N.O. and N.K.; supervision, N.K.; project administration, N.K.; funding acquisition, N.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been financially supported by the German Research Foundation [grant KL 2152/5-1]. The publication of this article was funded by Freie Universität Berlin.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Digitalizing Maritime Containers Shipping Companies: Impacts on Their Processes**

**Pedro-Luis Sanchez-Gonzalez \*, David Díaz-Gutiérrez and Luis R. Núñez-Rivas**

Escuela Técnica Superior de Ingenieros Navales, Universidad Politécnica de Madrid (UPM), Av. Arco de la Victoria, 4, 28040 Madrid, Spain; david.diaz@upm.es (D.D.-G.); luisramon.nunez@upm.es (L.R.N.-R.) **\*** Correspondence: pl.sanchez@alumnos.upm.es

**Abstract:** Key analysts are emphasizing the importance of the digitalization especially of the supply chain. This work aims to improve maritime shipping companies by introducing digitalization in their operations. This objective is achieved analyzing the impact of maritime container shipping companies' digitalization. This analysis requires as input the Business Process Model (BPMo) and an inventory of digital applications to verify how the BPMo changes when deploying the applications, define the prerequisites necessary for this deployment, and identify the key performance indicators (KPIs) to track it. The impact of the deployment of the applications has been quantified by using four performance dimensions: Costs, Time, Quality, and Flexibility. The results show that the impacts are different per application, with changes in the processes, the addition of new ones, and the decommissioning of others. The impact of digitalization is high when trying to deploy all the applications at the same time. Companies can leverage this work, which requires reviewing the documented impacts in their processes and the applications' prerequisites as well as updating their existing balanced scorecard, incorporating the application's KPIs. A list of 10 applications has been identified as "quick wins"; then, applications can be the starting point for digitalizing a company.

**Keywords:** digitalization; BPM; business process model; artificial intelligence; big data; virtual reality; internet of things; cloud computing; digital security; additive engineering

#### **1. Introduction**

Maritime transportation defied the COVID-19 disruption, laying the foundations for a transformation in global supply chains [1]. Maritime shipping companies form the backbone of maritime transportation; therefore, they have been forced to changes to follow the global chain changes.

In this context, the key analysts are emphasizing the importance of the digitalization especially of the supply chain, with high investments in artificial intelligence, real-time transportation visibility, etc. [2]. Given its relevance and their intermodal global operations, maritime transportation industry digitalization is key for the supply chain's digitalization.

Digitalization in the maritime transportation industry is being studied these days following different streams: Munim et al. focused on big data and artificial intelligence [3]; Plaza-Hernández studied the integration of IoT technologies in the industry [4]; Kapidani et al. looked at the industry digitalization from a sustainability point of view [5]; Kapnissis et al. investigated blockchain adoption in the industry [6]; Tijan et al. reviewed the drivers, success factors, and barriers to digital transformation in the maritime transport sector [7]. These articles are just a few examples that illustrate the relevance of digitalization research in the maritime transportation industry.

The research from the team of the present paper published in 2019 in Sensors (ISSN 1424-8220) showed that when looking at the three different industrial sectors that compose the maritime transportation industry (ship design and shipbuilding; shipping; and ports), its digitalization is moving at different speeds in the different domains and industrial sectors defined in the aforementioned *Sensors* paper:

**Citation:** Sanchez-Gonzalez, P.-L.; Díaz-Gutiérrez, D.; Núñez-Rivas, L.R. Digitalizing Maritime Containers Shipping Companies: Impacts on Their Processes. *Appl. Sci.* **2022**, *12*, 2532. https://doi.org/10.3390/ app12052532

Academic Editor: Vicente Julian

Received: 4 February 2022 Accepted: 26 February 2022 Published: 28 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).


The size of the maritime transportation industry makes it necessary to focus on one of their industrial sectors; therefore, this work is limited to shipping.

Any change to the operations of maritime shipping companies requires understanding of how they operate. Business Process Management (BPM) is the science that monitors how work is performed in an organization in order to ensure consistent outcomes and to take advantage of opportunities for improvement [8]; this makes BPM an optimal technique for understanding maritime shipping companies' operations.

Few published works make use of BPM for analyzing the maritime transportation sector. Lyridis et al. [9] made use of BPM to optimize operations of a shipping company for one specific route. Elbert et al. [10] resorted to BPM for ports optimization, thereby analyzing the chains taking place at ports when ships arrive or depart and the interactions with ground organizations. Cimino et al. [11] also relied on BPM for analyzing the impact of Information and Communication Technology (ICT) for ports optimization. Finally, Nikitakos et al. [12] partially used BPM in part to evaluate ICTs in the Greek-owned shipping sector.

The research being presented in this article aims to improve maritime shipping companies by introducing digitalization in their operations while being aware that the implementation of a successful business process model does not automatically bring about the same benefits for all companies [13] but rather is a starting point for understanding the problems. Given the importance of maritime container shipping companies for the maritime transportation industry, this research focuses on these companies. Since there are different types of maritime container shipping companies, those used in this study are companies that have their own fleet of vessels used both nautically and commercially by the company.

The contributions of this work are as follows:


This work is divided into the following sections: Section 2 describes the methodology used in the study; Section 3 includes the results of the impact of maritime container shipping companies processes' digitalization as well as its analysis and discussion; and finally, Section 4, summarizes the conclusions.

#### **2. Approach and Methodology**

Since the hypothesis that needs to be proved is that the impacts of maritime containers shipping companies' digitalization is different per application and that these applications can be grouped or clustered according to their impact in the company's operations, the first step was performing an impact analysis. The impact analysis performed in this work required two inputs: the maritime containers shipping companies' Business Process Model (BPMo) and the digital applications used for digitalizing the BPMo.

The lack of published process models for the companies that are the object of this research has required the development of a BPMo. The developed BPMo departed from the Cross-Industry version of the Process Classification Framework© [14,15] from the American Productivity & Quality Centre (APQC) since there is no version for maritime shipping companies. Figure 1 has the "look and feel" from the APQC, which was used as a starting process model.


**Figure 1.** APQC cross-industry process classification framework.

A first version of the business processes for maritime container shipping companies has been generated by tailoring these cross-industry business processes, taking advantage of the following assets:


The content validation of this model was performed by using an inter-judge validation process. The experts that participated in this validation were:


The content validation of this business process model was performed by using an interjudge validation process. This method has been extensively used specially for validating survey questions. This work makes use of it, extending the concept of content validation beyond the one related with questions from a survey. The quantification of the agreement was calculated using the content validity ratio (*CVR*) developed by Lawshe [20]:

$$\mathcal{CN} = (ne - N/2) \text{(N/2)}$$

where *ne* = number of judges indicating the question as "essential" (in this research, *ne* = number of judges indicating the modification of the BPMo as "essential"); and *N* = total number of judges (in this work, *N* = 5).

Lawshe considered the values of *CVR* included in Table 1 as the ones necessary for item validation.

**Table 1.** Minimum values of *CVR*.


Therefore, the method required the agreement amongst judges on the validity and clarity of the model.

The next step was building the list of digital application; three sources have been used for building such a list:


The list of applications was confronted to the aforementioned BPMo for maritime container shipping companies in order to qualify the impact on each process, the requirements for the implementation of the app, and the Key Performance Indicators (KPIs) that will measure the impact of the implementation.

The digitalization of the processes implies their redesign. The tool for quantifying the impact of this redesign is the devil's quadrangle [21]. This framework evaluates the impact using the four performance dimensions for processes: costs, time, quality, and flexibility. In this research, the impact has been quantified using the following criteria:

	- Implementation costs, which accounts the costs for deploying the application in the company. It has these values:
		- - Low (equal to 2) for applications that require a low investment for their deployment.
		- - Medium (equal to 1) for applications that require a medium investment for their deployment.
		- - High (equal to 0) for applications that require a high investment for their deployment.

The aforementioned values are comparatively weighted (i.e., the values low, medium, and high are relative to the rest of the applications). The comparative analysis situated the applications in one of the three aforementioned tertiles (i.e., low, medium, and high).

	- -Low (equal to 2) for applications with an ROI in less than 2 months.
	- -Medium (equal to 1) for applications with an ROI in 2–12 months.
	- -High (equal to 0) for applications that need more than 12 months for their ROI.

The final value of the performance indicator is obtained by arithmetic media of the two sub-dimensions.

	- Implementation time, which accounts the time needed for deploying the application in the company. It has these values:
		- - Low (equal to 2) for applications that can be deployed in less than 6 months.
		- -(equal to 1) for applications that can be deployed in 6–18 months.
		- - (equal to 0) for applications that need more than 18 months for their deployment.
	- Execution time, which evaluates the savings in time for the processes' execution. It has these values:
		- - High (equal to 2) for applications with a high decrease on processes' execution time.
		- - Medium (equal to 1) for applications with a medium decrease on processes' execution time.
		- - Low (equal to 0) for applications with a small decrease on processes' execution time.

The aforementioned values are comparatively weighted (i.e., the values' categorization as low, medium, or high is relative to the rest of the applications). The final value of the performance indicator is obtained by arithmetic media of the two sub-dimensions.

	- High (equal to 2) for applications with a high increase on the processes' reliability.
	- Medium (equal to 1) for applications with a medium increase on the processes' reliability.
	- Low (equal to 0) for applications with a small increase on the processes' reliability.

The aforementioned values are comparatively weighted (i.e., the values' categorization as low, medium, or high is relative to the rest of the applications).

	- High (equal to 2) for applications with a high increase on the processes' flexibility.
	- Medium (equal to 1) for applications with a medium increase on the processes' flexibility.
	- Low (equal to 0) for applications a small increase on the processes' flexibility. The aforementioned values are comparatively weighted (i.e., the values' categorization as low, medium, is high is relative to the rest of the applications).

The "ideal" application is the one that maximizes the four performance indicators, and therefore, the impact on the company is considered positive. That application will achieve a total score of 8 (i.e., a score of 2 in each of the four performance dimensions for processes). A data sheet was developed for each of these applications, which has the information from Figure 2.

**Figure 2.** Application data sheet sample.

The impacts on the processes have been analyzed assuming that the application considered is the only one that has been deployed, i.e., that it has been deployed stand alone. The combined deployment of several applications will require a review on the impacts. This same consideration applies to the KPIs: in case of deployment of several applications, the KPIs should be reviewed and confirmed.

#### **3. Results and Discussion**

The validated BPMo for maritime container shipping companies can be found as additional material of this paper. The application of the methodology from Section 3 resulted in a total of 46 application data sheets that contain the results of the research. These results are the applications data sheets, they have the impacts in the BPMo, and they can be found in Appendix A of the present work.

Regarding the impacts on the processes, a total of 147 impacts have been found. The processes highly impacted by different applications are shown in Table 2.


**Table 2.** Processes with more than two impacts.

The processes with higher impacts are within the operations process categories domain. The reason is that these processes are the ones that produce the wealth of the company, so these are the ones subject to higher investments.

There are six new processes that need to be added to the business process model for different applications:


On the opposite side, there are three processes that will need to be decommissioned due to the introduction of different applications:

• 6.2.4 Technical support at shore; the introduction of applications "1.01 UV-controlling system" or "1.02 Autonomous vessels" makes it unnecessary.


The total number of KPIs used is 51. These KPIs measure the performance of the introduction of the 46 digital applications; this means that on average, there is more than one KPI needed for measuring the performance of the introduction of one application, which is reasonable. Actually, there are only two applications that require only one KPI for tracking their performance: "6.01 Cloud/edge platform" and "7.01 Enhanced cybersecurity" can be measured using KPIs "Percentage of reduction of operational cost" and "Percentage of improvement on cyberattacks prevented", respectively. These two applications tracked only with one KPI are the only ones that are service platforms for the entire company.

It has just been said that the majority of the applications are measured using more than one KPI; actually, the 46 applications required a total of 105 KPIs to measure their performance. Since the number of unique KPIs is 51, not 105, this means that many of them are used in more than one application. A total of 11 KPIs are used more than twice; they are used 53 times out of the mentioned 105 (Table 3), so 11 KPIs can measure more than 50% of what needs to be measured to quantify applications' performance, which is a good number since with this relatively small number of KPIs, a company can track most of their improvements coming from all the digital applications.


**Table 3.** KPIs used more than twice for measuring application performance.

Section 2 explained how the impact of the introduction of any of these applications in the BPMo has been quantified using the four performance dimensions for processes: Costs, Time, Quality, and Flexibility. Applications have been grouped into three tertiles in order to analyze the results of this quantified impact. The three tertiles are not always equal in size, since being strict on the balance between the three tertiles would have forced the separation in different tertiles of applications that have the same value on a performance dimension. This happens since there are performance dimensions such as costs or time that are made of two sub-dimensions (see Section 2); and the consolidated impact score is calculated from the four performance dimensions.

Table 4 contains the applications that are in the top tertile applying that evaluation.


**Table 4.** List of applications with higher consolidated impact score.

Not surprisingly, application "5.01 Container tracking" is leading the score given the following:


On the other side of the list in Table 5, the applications that are in the bottom tertile can be found.

**Table 5.** List of applications with lower consolidated impact score.


The applications from Table 5 do not necessarily fall into applications that should not be implemented or that should be discarded. What these 15 applications from Table 5 have in common is that they are the lowest when compared with the 46 applications; this should not prevent companies from the implementation of any of them, it is just that they need to know these have more costs or require more time for their deployment and for benefits realization. Actually, there are two of them that are service platforms for the rest: "6.01 Cloud/Edge platform" and "7.01 Enhanced cybersecurity".

The analysis can be taken to a level below the ones conducted so far by looking at the results obtained in each performance dimension. When looking at the list of top applications on time performance dimension (Table 6), there is one application that is top when looking at time performance but is not only not included in the top list for consolidated impact but is in the bottom side, so it is included in Table 5: it is "3.04 ISPS security levels". The reason is that this application does not increase substantially the flexibility or the quality of the affected processes compared with the rest of the 46 applications.


**Table 6.** Top applications on Time performance dimension.

One application is found in the opposite situation: being in the list of top performers when looking at the consolidated (Table 4); it is at the bottom side when looking at the time performance dimension (Table 7). This is the case of "2.14 Optimizing ship's operations via AI analysis of operational information". This application scores 6/8 in the consolidated impact score given it is outstanding when compared to others in the Flexibility and Quality provided to the affected processes, and its availability in the current market makes it almost optimal when looking at Costs.

**Table 7.** Bottom applications on Time performance dimension.


Moving to Costs, two applications are at the top for this performance dimension (Table 8) and at the bottom when looking at the consolidated score (Table 5). These are "3.04 ISPS security levels" and "5.02 Optimization of equipment usage". The first one was found in the same situation when looking at Time, and the reason is the same: it does not increase substantially the flexibility or the quality of the affected processes compared with the rest of the 46 applications. Regarding "5.02 Optimization of equipment usage", it does not increase flexibility or quality, and it is also low when looking at Time, since it is not available on the market yet.

As it happened when analyzing Time, one application is found in the bottom list from Costs (Table 9) and at the top when looking at the consolidated score (Table 4); this is "2.11 Optimizing maintenance process using digital twin and AI" given it is outstanding when compared to others in the Flexibility and Quality provided to the affected processes, but the cost of a digital twin makes it go down in the list when looking only at this performance dimension.

The next performance dimension to be looked into is Quality. In it, there are two applications that are in the top for this performance dimension (Table 10), whereas they are part of the list of bottom applications in the consolidated score (Table 5). These are "1.01 UV controlling system" and "1.02 Autonomous vessels", which have a very high impact on quality improvement for the affected processes but perform very low in the rest of the

variables (high costs, high time of ROI and implementation, and without a substantial impact in flexibility compared to the others).

**Table 8.** Top applications on Cost performance dimension.


**Table 9.** Bottom applications on Cost performance dimension.


**Table 10.** Top applications on Quality performance dimension.


Comparing Table 11 (bottom applications for Quality performance dimension) and Table 4 (top consolidated score), applications "6.02 Use of SaaS via cloud", "6.03 Use of eLearning via cloud" and "8.01 Spare parts using 3DP" are in both lists due to the same reason: they do not increase substantially the quality of the affected processes when compared to others, whereas they perform well on the rest of the performance dimensions.


**Table 11.** Bottom applications on Quality performance dimension.

Moving to the last performance dimension, Flexibility, comparing Table 12 (top performers in Flexibility) and Table 5 (bottom in consolidated score), there is no application in both lists.

**Table 12.** Top applications on Flexibility performance dimension.


However, doing the same exercise with bottom applications in Flexibility (Table 13) and top performers in consolidated score (Table 4), there are three applications in both lists: "1.04 Use of robots in complex/hazardous tasks", "3.02 Big data analysis for energy efficiency", and "3.03 Analysis of data on consumption and emissions for bunkering selection", all of them outperforming in the rest of the performance dimensions.

**Table 13.** Bottom applications on Flexibility performance dimension.


To finalize the analysis of results, we identified the 10 applications that can be named as "quick wins". These are applications that, given their optimal results on the Time performance dimension and good results on the Costs performance dimension, could be considered as the starting point for digitalizing a company. A company starting its digitalization with these could obtain a sense of what digitalization is and learn lessons of the implementation project, which will be value for going to the next step.

The list has been obtained by sorting the results of the score of the devil's quadrant first by those performing better on Time, then on Costs, and finally on consolidated global score. The list is in Table 14.

**Table 14.** Quick-win applications.


The majority of these are in Table 4 (List of applications with higher consolidated impact score); they are applications that are top performers in the consolidated impact score. The exceptions are "7.04 Electronic logbook" and "4.01 VR for training". These two do not score as high as others when looking at the consolidated score but can be good candidates for testing the benefits of digitalization in one company, given their ease of implementation.

Summing up the analysis of the results, the main outcomes are as follows:

	- Applications "5.01 Container tracking", "2.06 Analysis of engine parameters to anticipate issues", and "7.02 Cargo documents management" are at the top of the list of the consolidated impact score. These applications are market available which, together with the nature of the application, makes the Time and Costs performance dimensions better when compared to others. They are also in the top of the list in Quality.
	- Applications "2.16 AI applied to data management and clean" and "6.01 Cloud/Edge platform" are at the bottom of the list, though especially the last one is necessary for others to work (i.e., it is a prerequisite for implementing a number of other applications).

#### **4. Conclusions**

This work analyzes the impact of digitalization in a part of the maritime transport industry, the maritime containers shipping companies. This research has been conducted in order to help the digitalization of this industry, in particular in the aforementioned companies: digitalization in today's world is required for remaining competitive.

The analysis of the introduction of digital applications in the Business Process Model of maritime containers shipping companies shows that digitalization is feasible for these companies and can be completed at different paces. Each company should make a specific and detailed plan for digitalization, according to their needs and environment. They can leverage the work presented here on the applications and the KPIs that should measure the implementation of any of these applications.

Companies can also benefit from the identification of the applications named in this work as "quick wins"; these applications can be a sandbox that can be used to test the benefits of digitalization and learn how to best execute the deployment customized to the needs of the company. Application "5.01 Container tracking" is in the top of the list of these "quick wins" given its optimal behavior when looking at the four performance dimensions for processes (Time, Costs, Quality, and Flexibility).

The impact of digitalization is high when trying to deploy all the applications at the same time in a big bang approach. Such an approach is not advisable not only given the high investment it requires but also due to the risks that such a huge effort poses for a company. Companies should consider the impacts in their processes and the applications' prerequisites documented for each application in Section 3 of this work. They should also review their existing balanced scorecard incorporating the application's KPIs documented in the aforementioned section. The KPIs defined are 51, but with 11 of them, a company can track the majority of the impacts of an application deployment.

A relevant outcome of the analysis of the results of the impacts in processes is that the Operational process categories domain is the one with higher impacts. This is a consequence of the applications trying to impact the processes that generate the company's incomes. Looking at the rest of the process categories domains, there is one process that stands out

from the rest, "Analyze Competitors Routes". This process from the Strategy, Infrastructure, and Products process categories domain is impacted by four different applications given the importance that the market and the research is given to a company's strategy.

Digitalizing a company imposes changes in their processes and the definition of new processes as well as the decommissioning of others. In other words, digitalization will change the way a company operates. This is something that must be taken into account when defining the deployment plan of the applications, educating their personnel in the new way of doing things and the benefits that this will bring.

Digitalization has many impacts in the company's operations but a plan well defined, in which the impacts and prerequisites are detailed and where a number of KPIs is included to track the deployment's performance, is the key for success. This work covers these aspects in order to allow a successful digitalization.

**Author Contributions:** P.-L.S.-G. designed the methodology and applied it to obtain the results; D.D.-G. and L.R.N.-R. analyzed the results and provided feedback on the reporting; P.-L.S.-G. wrote the work. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is under consideration for funding by Fundación Marqués de Suanzes and by Soermar Chair from Universidad Politécnica de Madrid.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A Application Data Sheets**


**Figure A1.** *Cont*.


**Figure A1.** UV controlling system data sheet.



**Figure A3.** *Cont*.


#### **Figure A3.** Digital twin for AV controlling and maintenance data sheet.


**Figure A4.** Use of robots in complex/hazardous tasks data sheet.



#### **Figure A6.** Assessment of ship risks using fuzzy logic data sheet.


**Figure A7.** *Cont*.

**Figure A7.** Pricing market prediction data sheet.


**Figure A8.** Route optimization via AI analysis of client information data sheet.


**Figure A9.** *Cont*.


#### **Figure A9.** Client offering optimization via AI analysis of client information data sheet.

**Figure A10.** Analysis of engine parameters to anticipate issues data sheet.


**Figure A11.** *Cont*.


#### **Figure A11.** Route optimization via AI analysis of operational information data sheet.


#### **Figure A12.** Process optimization and reengineering using AI data sheet.


**Figure A13.** Freight rate optimization data sheet.


#### **Figure A14.** Fleet dimensioning optimization data sheet.

**Figure A15.** Optimizing maintenance process using digital twin and AI data sheet.


**Figure A16.** Conversational virtual assistance for helping seafarers in day-to-day activities data sheet.


**Figure A17.** *Cont*.



**Figure A18.** Optimizing ship's operations via AI analysis of operational information data sheet.


**Figure A19.** *Cont*.


**Figure A19.** AI applied to cybersecurity data sheet.



#### **Figure A21.** AI applied to competitors tracking and monitoring data sheet.


**Figure A22.** AI applied to business partners tracking and monitoring data sheet.


#### **Figure A23.** AI applied to providers tracking and monitoring data sheet.


#### **Figure A24.** AI applied to 3 parties route prediction data sheet.


**Figure A25.** *Cont*.


#### **Figure A25.** Using AI to enhance navigation safety data sheet.


#### **Figure A26.** Using AI to reduce emissions data sheet.


**Figure A27.** *Cont*.


#### **Figure A27.** Big data algorithm for collision avoidance data sheet.

#### **Figure A28.** Big data analysis for energy-efficiency data sheet.


**Figure A29.** *Cont*.


#### **Figure A29.** Analysis of data on consumption and emissions for bunkering selection data sheet.


#### **Figure A30.** ISPS security levels data sheet.


**Figure A31.** *Cont*.



#### **Figure A32.** Big data for ship speed controlling data sheet.


**Figure A33.** *Cont*.


**Figure A33.** VR for training data sheet.


**Figure A34.** VR as navigation aid data sheet.


**Figure A35.** *Cont*.


#### **Figure A35.** VR for maintenance data sheet.


#### **Figure A36.** Container tracking data sheet.


**Figure A37.** *Cont*.


#### **Figure A37.** Optimization of equipment data sheet.


#### **Figure A38.** Digital twin for training purposes data sheet.


**Figure A39.** *Cont*.


#### **Figure A39.** Cloud/Edge platform data sheet.


**Figure A40.** Use of SaaS via cloud data sheet.


**Figure A41.** *Cont*.



#### **Figure A42.** Enhanced cybersecurity data sheet.


**Figure A43.** *Cont*.



**Figure A44.** Blockchain-based Incoterms data sheet.


**Figure A45.** *Cont*.



**Figure A46.** Spare parts using 3DP data sheet.

#### **References**


## *Article* **A Helping Human Hand: Relevant Scenarios for the Remote Operation of Highly Automated Vehicles in Public Transport**

**Carmen Kettwich \*, Andreas Schrank, Hüseyin Avsar and Michael Oehl**

German Aerospace Center (DLR), Institute of Transportation Systems, Lilienthalplatz 7, 38108 Braunschweig, Germany; andreas.schrank@dlr.de (A.S.); hueseyin.avsar@dlr.de (H.A.); michael.oehl@dlr.de (M.O.)

**\*** Correspondence: carmen.kettwich@dlr.de

**Abstract:** Remote operation bears the potential to roll out highly automated vehicles (AVs, SAE Level 4) more safely and quickly. Moreover, legal regulations on highly automated driving, e.g., the current law on highly automated driving (SAE Level 4) in Germany, permit a remote supervisor to monitor and intervene in driving operations remotely in lieu of a safety operator on board AVs. In order to derive requirements for safe and effective remote driving and remote assistance of AVs and to create suitable human-centered design solutions for human-machine interfaces (HMIs) that serve this purpose, a set of 74 core scenarios that are likely to occur in public transport AVs under remote operation was compiled. The scenarios were collected in several projects on the remote operation of AVs across a variety of contexts including interviews with and observations of control center staff, video analyses from naturalistic road events, and interviews with safety operators of AVs. A hierarchical system that is based on interactions of central actors was used to structure the scenarios. The set explicates relevant cases in remote operation, which may help improve workplaces for remote operation both by combatting human factors issues such as distraction and fatigue, and by boosting usability, user experience, trust, and acceptance. As the catalogue of scenarios is not exhaustive, scenarios may be added as knowledge of the remote operation of AVs progresses. Further research is needed to validate and adapt the scenarios to specific conceptualizations of remote operations.

**Keywords:** human-machine interaction; scenarios; use cases; remote operation; highly automated vehicles; user-centered design; remote assistance; remote driving

#### **1. Introduction**

As mobility demands grow while calls for more sustainable and environment-friendly travel options are becoming more vocal, the transportation sector is facing dramatic changes. Not only has there been a wave of technological innovations in driving automation, electromobility, and mobile network bandwidth [1–5]. There is also a previously unseen boom in innovative means of public transport such as ride-sharing and other flexible on-demand mobility solutions that may be serious alternatives to individual mass mobility [6,7]

However, even though automation technologies have demonstrated sharp improvements, there are still plenty of cases where a vehicle's automation might be overwhelmed. Complex, multilayered sets of quickly evolving traffic situations particularly in urban mixed traffic environments present extreme challenges to highly automated vehicles (AVs). Some of these cannot be resolved by the automation alone as they exceed the vehicle's Operational Design Domain, or ODD, i.e., the defined conditions in which the AV can operate autonomously in a safe way [8]. Also, a human on-board operator is not mandatory to serve as a fallback solution for highly automated vehicles (Level of Automation Four according to SAE's terminology [8]). Instead, remote operation by a human operator may be a viable solution to enable automated driving without specifying every possible ODD, which is the requirement of fully automated driving according to SAE Level 5. This

**Citation:** Kettwich, C.; Schrank, A.; Avsar, H.; Oehl, M. A Helping Human Hand: Relevant Scenarios for the Remote Operation of Highly Automated Vehicles in Public Transport. *Appl. Sci.* **2022**, *12*, 4350. https://doi.org/10.3390/ app12094350

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti, Dimitrios S. Paraforos and Stefania Lanza

Received: 18 March 2022 Accepted: 11 April 2022 Published: 25 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

endeavor is likely not be fulfilled for a very long time, if ever. From the perspective of the substitution model, remote operation can be conceptualized as a substitute for the primary controller, in this case, the driving automation [9]. The remote operator could observe automated driving operations and intervene when the AV's driving automation capabilities are exceeded. This approach is becoming increasingly feasible as computers' processing speed and capacity have shown a sharp incline and high-bandwidth low-latency communication technologies with the option to prioritize certain kinds of data such as 5G have been widely rolled out [10].

In order to identify critical situations where remote operation could be used as an effective and efficient approach to support or resume highly automated driving, an extensive collection of scenarios with relevance to remote operation has been collected using a multimethod approach (see Section 2). This collection will help notice and address challenges in the practical use of remote operation solutions and support the human-centered design process of interfaces for remotely operating vehicles.

#### *1.1. Regulation, Standardization, and Conceptualization of Remote Operation of AVs*

In addition to technological leaps forward, legal environments have become more favorable toward using remote operations on public roads as well. For instance, the German Road Traffic Act ("Straßenverkehrsgesetz") has been modified last year so it now explicitly permits AV of Level 4 on German roads—as long as they are monitored and controlled, if necessary, by a human operator coined "Technical Supervisor" ("Technische Aufsicht") [11]. This supervisor can be either on board the vehicle or at another location. Thus, remote vehicle operations are now legally feasible on German roads. Also, in the UK, Sweden, Japan and a few US states, laws and regulations that require remote supervision of highly automated cars without an on-board driver have been passed [12]. Moreover, driverless operation of vehicles on public roads is now possible in major European countries like France and the UK [13] as well as at least 41 states of the US [14].

The standardization of remote operations has also seen tremendous steps forward. The latest update of the SAE's Taxonomy for Driving Automation Systems [8] includes two conceptualizations of remote vehicle operations: Remote Driving and Remote Assistance. In *Remote Driving* (RD), a human operator is executing "real-time performance of [ ... ] the DDT (i.e., dynamic driving task such as braking, steering, or accelerating)" ([8] p. 19). Thus, remote driving resembles the conventional way of driving a vehicle: by initiating direct low-level driving maneuvers, including lateral and longitudinal motion control, right when the situation requires them. The Remote Driver, who is the actor to execute Remote Driving, may overrule the vehicle automation's driving tasks.

In contrast, *Remote Assistance* (RA) is defined as an "event-driven provision, by a remotely located human, of information or advice to an ADS (i.e., automated driving system) equipped vehicle in driverless operation in order to facilitate trip continuation when the ADS encounters a situation it cannot manage" ([8] p. 18). Thus, the Remote Assistant supports the vehicle by providing high-level guidance on how to deal with certain situations that are not part of the automation's ODD. This advice is provided well before the challenging situation and must not be time-critical. In addition, the automation must be capable of processing the high-level information provided and translate it into direct driving maneuvers. Examples for guidance range from simple "giving clearance" cases in which the vehicle requests an assessment of the situation from the Remote Assistant on how to proceed in a certain situation, e.g., when it is uncertain whether an identified object is an actual obstacle, to more demanding "setting trajectories or waypoints" cases that require the Remote Assistant to determine a pathway or waypoints of a pathway that the AV follows to circumvent an obstacle.

#### *1.2. Real-World Tests of Remote Operation*

The remote operation of AVs is currently being tested in a variety of real-world laboratories. These labs usually are collaborations between research-focused and industrial partners which examine the feasibility of novel technologies in a real-world setting, aiming at maximizing ecological validity. Thanks to their incorporation into naturalistic settings within the intended context of use, real-world labs offer invaluable insights into the in situ application of devices or systems that have previously only been investigated in higher controlled oftentimes experimental, yet less realistic environments. Thus, situations and phenomena are more likely to occur that may not have been observed in a more controlled setting. Since they demonstrate the interplay of the technology with users and other actors in a less standardized environment, they yield scenarios with a larger external validity, i.e., transferring them to other (real-world) contexts may be facilitated.

The German Aerospace Center (DLR) is involved in real-world labs that use both RD and RA for remote operation. Figure 1 displays examples of vehicles that are remotely operated within these projects. Regarding RD, the modular electric AV concept "U-Shift" that caters to different urban mobility demands, including transportation of people and goods, encompasses Remote Drivers in its system architecture [15]. Pertaining to RA, DLR is engaged in urban mobility projects that provide last-mile shuttle services from major hubs of transportation, e.g., train stations, to the final destination. In the "Hamburg Electric Autonomous Transportation", or "HEAT", project, a self-driving minibus ran along a fixed route through the Harbor District of the city incorporated in the public transport provider "Hochbahn"'s network [16]. "The Real-World Lab Hamburg ("Reallabor Hamburg") provided on-demand service from a suburban railway hub to nearby neighborhoods that could be booked via a cellphone application [17]. The Berlin-based "KIS'M" project aims at demonstrating an AI-based system for connected mobility and at examining the interaction between human operators in the control center with passengers of remote-controlled AVs [18], utilizing an RA approach.

**Figure 1.** Remotely operated vehicles in investigated projects. (**a**) Modular vehicle concept "U-Shift". Reprinted with permission from Ref. [19]. 2022, DLR; (**b**) Shuttle "EasyMile" used in real-world transportation laboratory "Reallabor Hamburg".

#### *1.3. Rationale and Objectives*

Since the remote operation of AVs has not been widely rolled out so far, there is limited knowledge about concrete use cases and scenarios that are most relevant to it. However, being aware of events that may occur during remote operation is pivotal for several reasons: (1) It enables an ecologically valid determination of ODD thresholds for a vehicle's automation, (2) helps bridge those thresholds, and (3) feeds into the derivation of requirements for the task and workplace design regarding the remote operation (both RD and RA) by a Technical Supervisor.

Therefore, a method of approximation via adjacent roles and workplaces needs to be taken. This includes the study of today's already existing control centers for public transport as they execute tasks of monitoring and resolving disturbances that are comparable to those of remote operators. Further, gaining insights into workplaces of operators on board of already in real-world laboratories operating highly automated shuttles may be helpful as well. In this vein, control center staff has been interviewed about their expectations on remote-operators' tasks [20] and has been confronted with a first prototype for remote operation [21]; a study on on-board operators' tasks and human-machine interfaces (HMIs) is currently carried out [22].

In spite of the lack of research opportunities regarding actual remote operators, there is an urgent need for an initial compilation of use cases and scenarios in remotely operating AVs. This is an important element of the user-centered design process as user requirements are derived from them. As presented in Figure 2, initially in this process the authors of this paper followed, and observations and expert interviews in a context that is similar to the future context of use are conducted. From their results, both potential tasks and potential scenarios are derived. This paper focuses on deriving potential scenarios. Next, both tasks that the users will have to execute and scenarios they will be exposed to are used to compile user requirements. These, in turn, need to be addressed while designing the prototype of the remote-control workplace. Whether they were met or not is subsequently evaluated in user studies.

**Figure 2.** Empirical application of the user-centered design process by the authors. Observations and expert interviews serve as a basis both for deriving potential tasks and, which is the focus of this paper, on deriving potential scenarios. Both help derive user requirements that in turn inform the design of a workplace prototype, which will be validated by conducting user tests and evaluations. These results feed back to investigations of the context of use that has been initially investigated.

This procedure adapts the user-centered design process depicted in Figure 3 as specified by ISO (Section 7) [23]. First, the context of use needs to be understood and specified (box "Understanding and specifying the context of use"). There are different approaches to do so: One way is describing the context of use, of which tasks of the users are an essential characteristic. This approach is being pursued by the authors across several real-world laboratories for future mobility in Germany where they investigate tasks and human-machine interfaces of highly automated shuttle buses that are supervised by a human operator on board the vehicle [22]. Since on-board operators execute tasks similar to remote operators, this approach serves as an approximation to the tasks of remote operators that barely exist in urban road traffic to this date.

Another way is the specification of "as-is scenarios". This is the core objective of this paper. It contains an extensive list of so-called "Is Scenarios", as defined below. Thus, compiling scenarios and defining tasks are *parallel* steps that are both based on the empirical data, from sources such as interviews and observations. In a subsequent step, user requirements can be derived from both scenarios and tasks ("Specifying user requirements"). Eventually, these will be used to generate design solutions ("Producing design solutions"). This approach of basing the design on requirements that were factually articulated by potential users helps designing interactions in a user-friendly way that may increase safety, efficacy, ease of use, and prevent task overload, fatigue, and a lack of situational awareness that may increase the risk for accidents. The interactions, in turn, will facilitate the specification of HMIs and, eventually, help design workplaces for remote operation both in research and industrial application.

**Figure 3.** The user-centered design process is adapted from ISO [23]. Scenarios are a central element of understanding and specifying the context of use and determine user requirements that in turn help produce design solutions. Adapted from Ref. [22], 2019, ISO.

In addition to considerations of HMI design, compiling use cases and scenarios is also essential for creating a framework that can be used as a basis for interdisciplinary dialogue between engineers, computer scientists, mobility researchers, human factors specialists, and decision-makers on how to conceptualize and further develop remote operation. Furthermore, it may also be used in driving automation and transportation research (see Section 4).

This paper proposes an initial catalogue of use cases and scenarios in which remote operation, operationalized either as Remote Driving or Remote Assistance at SAE Level 4, supports vehicle automation. It is highlighted that this catalogue is a living conceptual document. As it contains statements from a limited number of sources, it does not claim to be exhaustive.

#### **2. Materials and Methods**

The following section will outline the process of user-centered design in which scenarios for remote control are a central element. Second, the process of collecting scenarios will be described before a system for systematically structuring the scenarios will be proposed.

#### *2.1. Process of Collecting Scenarios*

The scenarios that have been collected both from control centers and the operation of highly automated vehicles (see Figure 4).

**Figure 4.** Contexts from which scenarios were extracted and methods used in the process.

#### 2.1.1. Control Centers

To date, many real-world labs have tested automated shuttles with on-board operators in order to expand public transport services. AVs in real-world labs are usually operated with the assistance of a steward on board the shuttle who, among other things, monitors the traffic situation and the technical vehicle status and, if necessary, intervenes and initiates appropriate measures depending on the situation.

Remotely operated shuttles *without* onboard operators, however, will differ fundamentally regarding their interactions. Instead of interacting with the AV's on-board operators, the control center will interact directly with the AV while the tasks of the onboard operator will be shifted to the control center. However, only crude concepts and isolated prototypes for control centers to monitor, supervise, and, if necessary, remote-control AVs without on-board operators exist at the moment. Thus, in a first step, the roles and activities of today's control centers in public transport were analyzed by means of observation and interviewing. Participatory observation and expert interviews with control center staff in Hamburg and Braunschweig, Germany, helped examine the working equipment, tasks, roles, and collaborations in a control center for teleoperation in public transport in general. More importantly, these methods yielded scenarios with potential relevance for remote operation.

First, *observation* was used as a tool to collect essential data. It includes the description of behavioral and temporal patterns, the consequences for control center staff and their environment, as well as the spatial relationship of the control center employee with other people. Observations were characterized by the following attributes:


Second, based on these observations, several expert workshops yielded a set of categories which were subsequently used for two card-sorting studies. In a first study, interdisciplinary traffic researchers clustered the categories, assigned concrete task sets to them, and finalized them. In a second study, expert interviews were conducted using *the card-sorting approach* [20] to identify tasks and roles in future control centers.

#### 2.1.2. Highly Automated Vehicles

Structured in-depth *interviews* were carried out with three onboard operators of automated shuttles (SAE Level Four [13]) integrated into Hamburg's public transport system as part of the HEAT project [11]. These interviews focused on control center tasks, disruptions, work experience, current and future workplace design [25]. From the disruptions mentioned, scenarios were derived.

*Videoclips* from the EU CityMobil2 project [1,3] were *analyzed* focusing on the interaction of AVs with other road users to generate scenarios. The main objective of CityMobil2 was to implement different demonstrations of AVs in five European cities as a part of local public transport [12]. Before the analysis, literature research and workshops with four traffic experts led to a set of categories that were applied to analyze the videos of AVs on the road. The categories were chosen to represent the events of interest as exhaustively as possible. The categories were mutually exclusive, precisely defined and their wording was simplistic. They included interactions with vehicles, pedestrians, cyclists, and infrastructure. In accordance with this categorization scheme, naturalistic video clips from the AV demonstrations were evaluated. They showed highly automated shuttles from the cities of La Rochelle, France, and Trikala, Greece, and were recorded as part of the EU project CityMobil2. The videoclips were shot independently from the raters who categorized

them and therefore not influenced by them. In order to categorize them, the videos were analyzed regarding incidents, the main events were noted down and grouped regarding the interactions of the AV with other road users, including vulnerable road users (VRUs), and the infrastructure, e.g., traffic lights. These interaction categorizations served as a basis for generating scenarios.

#### *2.2. System of Structuring Scenarios*

In order to highlight similarities among scenarios, a hierarchical structure with three levels is proposed. It consists, from top to bottom, of use case clusters (UCCs), use cases (UCs), and scenarios.

The terminology for these terms is based on Ulbrich et al. [26] and Wilbrink et al. [27]. It is illustrated in Figure 5 and will be defined in the following paragraphs.

**Figure 5.** Relations between use case cluster (UCC), use case (UC), scenario, and scene. Adapted with permission from Ref. [27]. 2018, interACT.

A *scene* describes a snapshot of the environment. It includes a scenery (e.g., lane networks, stationary elements, and environmental conditions), dynamic elements (e.g., dynamic objects' states and attributes), self-representations of actors, and observers (e.g., actors' and observers' states and attributes, skills, and abilities) as well as the relationships between those entities. A *scenario* is defined as a temporal development of different scenes within a sequence of scenes. In order to characterize this temporal development, events and actions, as well as objectives, might be specified. Unlike a scene, a scenario describes a period of time. Scenarios start with an initial scene and can be visualized using interaction diagrams (cf. Figure 6). A *use case* is a functional description of a technical system and its behavior for a specific use. Use cases can comprise numerous different scenarios, but a scenario can only contain a certain number of scenes arranged in a certain order. A *use case cluster* comprises similar use cases.

**Figure 6.** Interaction diagram for the scenario with the cause "Vehicles parked in second row". The lane is blocked, e.g., due to vehicles parked in second row. This causes delays or disturbs the onward journey of the shuttle. The shuttle waits due to the blocked lane (1a). The shuttle informs the remote operator that no further travel is possible due to an obstacle (1b). The remote operator is in bidirectional contact with the passengers and switches to the shuttle's cameras to get an overall view (2). The remote operator finds out whether it is possible to bypass the obstacle. The remote operator sets a new trajectory, e.g., by setting waypoints, selecting a trajectory, or steers the AV manually around the obstacle, gives clearance, and permits the AV to continue (3). If this is the case, the shuttle can continue its journey. If a bypass cannot take place due to e.g., constructional conditions, then the remote operator contacts the police or another authority (4a). The passengers are proactively informed about the further procedure and possible delays (4b). The police or another authority drive to the shuttle and solve the blockade (5).

For research on human-machine interaction, the focus on singular static scenes does not suffice to describe processes of interaction. On the other side, the system-based level applied in use cases is too abstract to pay enough attention to these processes. Therefore, the focus of this paper will be on scenarios. Particularly, it will list and categorize various scenarios in a uniform structure to lay the groundwork for a scaffolding of scenarios pertaining to remotely operating AVs.

The scenarios presented in this paper were compiled based on interviews and observations in control centers of public transport (see previous section). They are inspired by Geis and Tesch's notion of *Is Scenarios* [28]. These are events or chains of events occurring in a naturalistic setting. They are characterized by a "narrative, textual description of actions that a certain user applies in order to attend to one or several tasks (*translated by the authors*)" [28] (p. 71). They describe components of the context of use in interplay with the perspective of the interviewed or observed person. According to Dzida and Freitag [29], Is Scenarios are the central source for identifying demands and deriving user requirements. Even though tasks may be included in an Is Scenario, they are not in the focus—unlike in *Use Scenarios*, which describe the implementation of tasks by the user. Rather, as Is Scenarios investigate the interrelations between tasks, the relevance of specific resources is elucidated. This, in turn, may facilitate the identification of previously concealed demands. Furthermore, Is Scenarios may help surface the intertwining of various actors.

As processes of interaction are vital to understand what is happening in urban mixed traffic settings including remote-controlled HAVs, they are used here to provide a scaffolding on the uppermost level, i.e., the level of use case clusters. On this top level, *actors* play a significant role. This paper defines "actor" in accordance with the United Modeling Language. Thus, an actor "specifies a role played by a user or any other system that interacts with the subject" [27] (p. 586). It emphasizes that actors are not limited to human users

such as remote operators: "Actors may represent roles played by human users, external hardware, or other subjects" [27] (p. 586).

Following this definition, the following actors are used in the compilation of scenarios:


All of these five actors may be interacting with any other actor in a given scenario within a specific context that influences them. Figure 7 includes the most relevant actors and their interrelations.

**Figure 7.** Most relevant actors interacting in the compiled scenarios on remote operation of highly automated vehicles. The remote operator, the highly automated vehicle, the passengers, and the infrastructure collaborate with each other to complete the driving task at SAE Level 4. Adapted with permission from Ref. [30]. 2022, DLR (CC BY-NC-ND 3.0).

On the second-to-top level, scenarios are grouped into use cases. Here, a use case is defined as a functional description of a technical system and its behavior for a specific use. Use cases can comprise numerous different scenarios, but a scenario can only contain a certain number of scenes arranged in a certain order [14].

On the third-to-top level, scenarios are listed. In order to facilitate comparability, every scenario is structured in a chain of cause, event, and consequence, as Figure 7 represents. The sequence consists of the following elements: a cause that the event is attributed to, the central event, and the consequences that arise from it.

A generic template is proposed that is used for every scenario:

Due to <Cause>, <Event> takes place. This results in <Consequence>.

The event and its consequence, in turn, lead to certain measures that are required to resolve the event. These required measures, however, are not part of the presented catalogue of scenarios. Adding them would be beyond the scope of this paper since its focus is on events in remote operation, their causes, and consequences. The required measures will be addressed in future publications.

It is important to note that the scenario does not take place in ignorance of the concrete context in which it happens. Rather, all its stages are embedded in this context (see Figure 8). Additionally, the actors that interact throughout the scenario represent another level of analysis that accompanies the chain of cause, event, and consequence, and might also be involved in the measures required for resolving the event.

**Figure 8.** The chain of cause, event, and consequence provides the scaffolding for each scenario. In future iterations of the scenario catalogue, required measures may be added.

#### **3. Results**

The following section outlines the structure of the scenario catalogue, provides the entire compilation of scenarios, and concludes with an exemplary scenario and its related interaction diagram.

#### *3.1. Structure and Catalogue of Scenarios*

Figure 9 presents an organigram of the structure of the compiled scenarios of scenarios, organized in use case clusters and use cases. The *central interactions* can be considered the main body of the scenario collection. They are structured in a way that enables an interaction of one actor with any of the remaining ones. In addition, use case clusters (UCCs) regarding the remote operator's state, contextual factors, and technical malfunctions related to *peripheral factors* that are not directly based on interactions.

**Figure 9.** Structure of use case clusters (UCCs, top row) and use cases (UCs, rows below top row). The core of the scenario collection is made up of interactions between different actors (central interactions). In addition, UCCs regarding the remote operator's state, contextual factors, and technical malfunctions related to peripheral factors that are not directly based on interactions. RO = Remote Operator, AV = Highly Automated Vehicle, P = Passengers, I = Infrastructure, OA = Other Actors.

Table 1 is a comprehensive list of scenarios in remote operation compiled. It is structured as follows: the overarching classification categories use case cluster and use case, the defining elements of the scenario consisting of cause, event, and consequence, a column for every of the five actor categories, and the mode of remote operation.











RA = Remote Assistance,

According

 to

standardized

 template.

 RD = Remote Driving, RO = Remote Operator, ROn = Remote Operation,

 VRU = Vulnerable

 Road User (e.g., pedestrian,

 cyclist).

Overall, 74 scenarios were compiled. These were subsumed under 15 use cases, which in turn were grouped into eight use case clusters, five of which are central interactions, and the remaining three are regarded as peripheral factors. Both use cases and use case clusters are stated in Figure 8. A sequential number *(N)* was given to each use case cluster (UCC), use case (UC), and Is Scenario (Sc), in the following fashion:

$$\text{}.\text{ }.\text{ }$$

From left to right, Table 1 includes the following columns: Use case cluster, Use case, Cause, Event, Consequence, and the descriptive Is Scenario. In the section "Actors", each actor involved is checked with an "X" if involved. If not involved, the respective column remains blank. The same system is used to indicate the Mode of Remote Operation. Here, an "X" indicated that the scenario is valid for Remote Driving, Remote Operation, or both.

#### *3.2. Example for Scenario "Vehicles Parked in Second Row"*

An example for a scenario is from the Use Case Cluster "Interaction AV with Other Actors", Use Case "Other Road Users". The Causes in this scenario are "Vehicles parked in second row". This leads to the Event "Driveway blocked" which in turn triggers the following Consequence: "Continuation of ride delayed; RO needs to assess situation and intervene". The following Actors are involved in this scenario: Remote Operator, Highly Automated Vehicle, and Other Actors. The scenario may occur both in Remote Driving and Remote Assistance. Finally, the full description of an Is Scenario reads as follows:

"Due to vehicles parked in the second row, the AV's lane is blocked. This leads to a delay in the further travel of the AV. The RO allows the AV to continue, sets a new trajectory, e.g., by setting waypoints or selecting a pathway, or steers the AV manually around the obstacle."

Figure 6 above shows an interaction diagram for this scenario. It displays the main actors in this scenario. Every scenario in Table 1 can be translated into an interaction diagram like this.

#### **4. Discussion**

This paper proposes an initial draft for a catalogue of scenarios that might help when creating design solutions for the remote operation of AVs. It is published preliminarily with a remaining need for evaluation, validation, and modification in future iterations. Even though the scenarios presented here originate from diverse sources, particularly real-world labs, idiosyncrasies of other contexts of use will need to be considered by revalidating and extending the catalogue.

In spite of these constraints, the catalogue fulfills several purposes. First, it serves as a starting point for *designing novel interfaces* for remote-controlling highly automated vehicles—and has, in fact, already done so. Based on these scenarios, the authors of this paper designed an HMI for a workplace for remote operation in an online study with experts employed by control centers in public transport [21]. Incorporating the results from the evaluation study, a workplace for remotely assisting AVs has been set up at DLR's Braunschweig premises.

Second, the catalogue is suitable to *test and validate HMIs* in teleoperations of means of public transport. For instance, the workplace described above will be tested and validated using the catalogue of scenarios presented here. Thanks to the workplace's integration in realistic road simulations and DLR's fleet of highly automated vehicles [31], a validation study with high ecological validity will be carried out. Using both quantitative, performance-based indicators and qualitative interview and questionnaire methodology, a group of experts from public transportation facilities and associations and other potential users of a remote operation workplace was exposed to a selection of the scenarios presented here [32]. Hence, a bidirectional relationship is established between the scenarios and the workplace: Not only will the study guide the process of improving the workplace for

the needs of the remote operator to execute their tasks safely, effectively, efficiently, while ensuring an optimal task load, keeping the operator in the loop and preventing fatigue and monotony. It will also help reassess and refine the compiled scenarios.

Third, *Operational Design Domains (ODDs) may be derived* from the scenarios. Thus, the catalogue will help specify the context and boundaries of safe teleoperation and create if-then contingencies between a certain contextual factor and its adequate driving mode. Hence, the remote operation can be incorporated in a wider framework of highly automated driving that encompasses different modes of operation, e.g., relying on input from the driving automation [33] and the infrastructure [34,35] in addition to remote operation. This multimodal, holistic approach is conceptualized within the framework of Managed Automated Driving [36].

Fourth, the catalogue will help to *identify the most safety-relevant scenarios* for teleoperation in public transport. This will be done, for example, by conducting a study that makes experts and operators in public transportation review the scenarios and have them rate (1) the probability of a scenario's occurrence and (2) its criticality for safe remote operation. A resulting priority classification will help understand the most pressing safety-relevant scenarios, among others, and provide a pathway to address them effectively. Subsequently, they will be used to derive user requirements to further *improve the existing HMI* of DLR's prototypical remote operation workstation. In addition, they will also help *create adaptive HMIs* solutions that consider the remote operator's current state and adapt the interface to accommodate it. This approach is pursued within the European Union's Horizon 2020 project "Hi-Drive", among others, and may be suitable to defragment the transition between various operational contexts [37].

Fifth, the identified user requirements may also facilitate creating a checklist for assessing the quality of a remote operator's workstation from a Human Factors perspective. Critically, *guidelines for selecting and training remote operators* could be interpolated from these requirements. This is highly relevant to rolling out highly automated vehicles since the role of having a Technical Supervisor to remotely monitor AVs and intervene, if necessary, i.e., a remote operator, was put forward as a requirement for SAE Level Four driving operations by several legislators on the German and European level [11,38]. Thus, remote operation is likely to be an inevitable prerequisite for highly automated driving.

Finally, the collection of scenarios may also be of importance for *mobility research in adjacent disciplines*. For instance, it could contribute to IT mobility services that require a holistic user-centered view on future mobility systems, embedded in a network of various means of transport, and feed data into a comprehensive mobility data space that is used to exchange mobility data. This is currently examined by the project "GAIA-X 4 ROMS" that aims at supporting and remote-operating automated and networked mobility services [39].

In spite of its numerous benefits, the catalogue of scenarios comes with certain limitations. Particularly, it must be noted that the catalogue does not claim to be exhaustive in several regards. First, not all the interactions between the actors proposed are addressed in the catalogue. Second, not every use case cluster or use case is considered at the same level of depth and detail which affects the balance of scenarios across use case clusters and use cases. The focus of this catalogue is on the projects and contexts that have been presented initially in this paper. Filling the gaps of the presented framework and coming up with more use case clusters, use cases, and scenarios is a quest for further research and inevitably depends on the context of use, the vehicles investigated, and their level of automation, as well as on the mode of remote operation and its concrete conceptualization. Also, categorizing the scenarios' consequences, e.g., by severity or impact, is left to future empirical investigations. Appropriate categorization systematics may be part of prospective iterations of this scenario catalogue, which is considered a living document that is permanently updated as research progresses. The same is true for deriving required measures for each scenario. Third, the categories used for analyzing the scenarios might not be mutually exclusive in certain cases. For instance, the listed consequences of some of the scenarios may already contain required measures even though deriving measures will be

the subsequent step in the user-centered design process. This is because for comprehending these scenarios, indicating the required measures is inherently necessary. Moreover, only if the scenario is understood correctly, adequate design solutions can be derived from it. Further research is needed to evaluate and modify the categories used in this framework, if necessary, as well as to disentangle remaining unclarities in the categories assigned.

At any rate, the remote operation of vehicles is likely to become a vital element of highly automated driving systems. Analyzing typical scenarios that may occur when remotely operating highly automated vehicles is a first but essential step towards enabling remote operation while designing with human needs at the center of attention. While the presented scenarios focus on the notion of controlling one vehicle at a time, feasible remoteoperation solutions may need to be able to supervise several vehicles simultaneously. Also, communication of the remote operator with other road users is a research area that has not been investigated yet to the knowledge of the authors. Relying on external HMI solutions for highly automated vehicles (e.g., [40,41]) may be a feasible approach.

All in all, the presented catalogue of scenarios is an important milestone for bringing highly automated vehicles onto the roads as it is a precursor for testing and validating them under realistic conditions. By being able to tackle the scenarios presented, the remote operation will significantly improve the operation of AVs and therefore bring us a small but vital step closer to fully automated mobility. And even in the case that fully automated driving (SAE Level 5) may be achieved one day, certain scenarios in fully automated public transport, such as passenger emergencies or vehicle malfunctions, may always require human support. Hence, the remote operation may be more than a preliminary technology to bridge the gap to a certain level of automated driving. It may be a long-lasting alternative to human operators on-site without compromising on the unique and sometimes irreplaceable abilities and skills of humans.

**Author Contributions:** Conceptualization, C.K., A.S. and M.O.; methodology, C.K., A.S., H.A. and M.O.; formal analysis; C.K., A.S. and H.A.; investigation, C.K. and A.S.; data curation, A.S., C.K. and H.A.; writing—original draft preparation, A.S. and C.K.; writing—review and editing, C.K., A.S. and M.O.; visualization, A.S. and C.K.; supervision, M.O.; project administration, C.K.; funding acquisition, M.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the German Federal Ministry for Digital and Transport via the project "RealLab Hamburg Digital Mobility" which goes back to the German National Platform for the Future of Mobility (NPM).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data can be made available on request to the authors.

**Acknowledgments:** We thank the public transport organizations "Braunschweiger Verkehrs-GmbH" in Braunschweig, Germany, "Hamburger Hochbahn AG" and "Verkehrsbetriebe Hamburg-Holstein GmbH" as well as "Hamburg University of Technology" in Hamburg, Germany for the kind collaboration.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **The Influence of Sentiments of Economic Agents on Pedestrians and Vehicle Crossings along the US–Mexico Border**

**René Cabral, Francisco García-Flores \* and Eduardo Saucedo**

EGADE Business School, Tecnológico de Monterrey, Rufino Tamayo and Eugenio Garza Lagüera, San Pedro Garza García 66269, Mexico; rcabral@tec.mx (R.C.); eduardo.saucedo@tec.mx (E.S.) **\*** Correspondence: f.garciaf@tec.mx

**Abstract:** This study aimed to investigate the impact of people's sentiments toward border crossings on personal vehicle and pedestrian crossings along the US–Mexico border. This study focused on regional factors and employed data derived from Google Trends as a proxy for people's sentiments. Monthly data from the first quarter of 2004 to February 2020 were used. Different regression models were used to address stationarity. After controlling for economic conditions and external events, the primary findings are as follows: first, pedestrian and personal vehicle crossings are sensitive to exchange rate fluctuations. Second, the economic cycle has a slightly higher impact on pedestrians than personal vehicle crossings. Third, an increase in the hostile environment toward immigration in the U.S. may negatively impact pedestrian crossings, especially in Texas. Moreover, a rolling regression was used to examine the impact of people's sentiments on crossings over time.

**Keywords:** border crossings; sentiments; personal vehicles; pedestrians; US–Mexico; Google Trends

**Citation:** Cabral, R.; García-Flores, F.; Saucedo, E. The Influence of Sentiments of Economic Agents on Pedestrians and Vehicle Crossings along the US–Mexico Border. *Appl. Sci.* **2022**, *12*, 2512. https:// doi.org/10.3390/app12052512

Academic Editors: Anselme Muzirafuti, Dimitrios S. Paraforos, Giovanni Randazzo and Stefania Lanza

Received: 26 October 2021 Accepted: 10 February 2022 Published: 28 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

The NAFTA agreement (now known as USMCA) has boosted economic integration between the United States (the U.S.) and Mexico over the last two and a half decades. From 1994 to 2019, trade in goods between the 2 countries increased 5-fold, from \$100.34 billion to \$614.54 billion Data from www.census.gov (accessed on 26 September 2020). Trade growth in Mexico was even more dramatic, with exports of goods increasing from \$49.49 billion in 1994 to \$357.97 billion in 2019. Labastida-Tovar [1] documented a significant growth in export levels in both countries' border cities. Indeed, more impoverished border cities, such as Port Arthur, California, and Reynosa, Mexico, have gained more from trade, with higher growth rates than San Diego or Monterrey. From 1996 to 2019, pedestrian crossings increased by 44.2%, from 34.10 million to 49.18 million, but the growth rate of personal vehicle crossings was 17.1% Data from Bureau of Transportation Statistics.

Labastida-Tovar [1] argued that because of NAFTA, economic integration and trade liberalization have intensified, benefiting both countries. According to Nicita [2], trade liberalization favored northern Mexican states more than southern states. In addition, Hanson [3] demonstrated that a high level of trade liberalization increases the demand for local products in foreign markets, boosting salaries due to the relocation of manufacturing facilities in the border regions.

Figure 1 depicts the total annual northbound border crossings of pedestrians and personal vehicles from 2004 to 2019. Data from 2020 is not considered due to border restrictions as part of the public health measures employed to tackle the COVID-19 pandemic. Pedestrian crossings totaled 48 million in 2004. From 2005 to 2007, an increasing trend was noted, followed by a considerable downturn, which was most likely influenced by the global financial crisis, with a low point in 2010 (39.9 million crossings). In 2019, there were 47.5 million pedestrian crossings, which is roughly the same level as that of 15 years earlier.

**Figure 1.** Annual pedestrian and personal vehicle crossings from 2004 to 2019 (millions). Note: Annual data for the 2004–2019 period. Data in millions. Pedestrian crossings are on the right vertical axis. Source: Own estimations using data from the Bureau of Transportation Statistics.

A substantial decline is registered for personal vehicle crossings after 2005, reaching a minimum in 2010 (from 91.5 million crossings in 2005 to 39.9 million in 2010). However, this was followed by a recovery of close to 77 million crossings, which is similar to the 2008 level. Figure 1 depicts periods where pedestrian crossings improved while personal vehicles decreased, suggesting that various factors may influence each type of crossing differently.

Fullerton and Walke [4] claimed that it is not cost effective for pedestrians to shop for specific retail goods categories or travel beyond the immediate border zone. Cross-border shopping requires the price differential of the local and foreign countries to exceed the transaction costs of purchasing across the border. Therefore, as Chandra et al. [5] pointed out, border shoppers live near the border. Baruca and Zolfagharian [6] demonstrated that hedonic shopping motivation may influence economic agents (e.g., consumers) to cross the border to seek fun and pleasure experienced in cross-border shopping trips.

The primary goal of this study was to investigate the impact that people's sentiments toward crossing the border have on personal vehicle and pedestrian crossings throughout the entire US–Mexico border. This study stems from behavioral economics studies that have examined the relationship between economic agents' sentiments and economic variables. In those studies, various proxies for assessing investor sentiment were developed, and they can be categorized based on the source of data employed [7]. In this analysis, the use of proxies with data extracted from the volume of internet searches is proposed. The hypothesis to be tested is that policies towards illegal immigration influence the sentiment of economic agents at the aggregate level and thus affect the northbound border crossing of pedestrians and personal vehicles along the US–Mexico border.

This study also builds on prior research examining the determinants of border crossings between Mexico and the U.S. This study makes three significant contributions to the literature. First, data on border crossings were retrieved from all the U.S. Ports of Entry (PoEs) along the US–Mexico border and grouped into three regions. Second, it used sentiment variables built from the volume of internet search queries as a proxy to assess decision-makers' sentiments. Lastly, a rolling regression analysis was performed to examine the relationship between the sentiment variables and border crossings over time.

The rest of this paper proceeds as follows. The literature review is examined in the next section. Then, in the third section, the variables and data are described. Section 4 presents the method used in the study, and the results are presented in Section 5. The discussion and trends for future research are proposed in Section 6, and the conclusion is presented in the last section.

#### **2. Literature Review**

The US–Mexico border region is home to approximately 14 million people concentrated in 14 border-city pairs. Quintana et al. [8] documented that these binational urban areas range from larger metropolitan areas, such as San Diego-Tijuana (2.9 million people), to minor city pairs, such as Nogales in Arizona and Nogales in Sonora (0.2 million). The border is shared by four U.S. states and six Mexican states. The border is 3200 km long and includes 39 Mexican municipalities, 25 U.S. counties, and 25 land ports. In 2019, close to 47.5 million pedestrians crossed northbound, and 73 million personal vehicle crossings were registered.

#### *2.1. Border-Crossing Literature: General Overview*

Heyman [9] defined a port of entry (PoE) as nodes in the world trade system where people and goods enter a nation. According to Quintana et al. [8], personal vehicle and pedestrian crossings have increased substantially since NAFTA was signed. Population growth, Mexican border industrialization, and commerce expansion are important factors in explaining the increase in crossings during this period. In addition, Lee and Wilson [10] pointed out that more than 70% of the bilateral trade between the U.S. and Mexico flows through the border's land ports. Tourism is another important driver. According to Lee and Wilson [10], approximately 85% of Mexican tourists arrive in the U.S. through land ports, impacting the economy of border cities in the U.S.

The border-crossing literature has analyzed the relationship between border crossings and the real exchange rate [5,11–14], GDP [13,15], gasoline prices [16], unemployment [4,17], personal income [18,19], trade agreements [17,20], and geopolitical and security issues, such as the 9/11 event [20–23], and the economic impact of foreign visitors in border cities [24,25].

#### *2.2. The US–Mexico Border*

Patrick and Renforth [26] found that the devaluation of the Mexican peso affects Texas retailers, showing their reliance on Mexican customers (economic agents). Moreover, the influence of currency rate fluctuations differs by city, distance from the border, retail sector, and domestic market size. Mexican consumption is determined by purchasing power after considering currency conversion, inflation, interest rates, and available disposable income.

Gerber [27] investigated the influence of Mexican tourists on retail sales in eight U.S. border counties by employing the simple compound growth model of retail sales. According to the research, currency fluctuations impact retail sales, with nondurable goods, fashion merchants, and general merchandize outlets being more vulnerable. Fullerton [14] examined the relationship between the exchange rate and the cross-border flows in three international bridges in El Paso. He used monthly data for same-day personal vehicle, passenger, and pedestrian trips; the exchange rate; and the consumer price index. The data revealed each bridge's border-crossing behavior. The findings demonstrate that crossing flows are not random, and peso devaluation has a varying effect on traffic on each bridge. The Mexican peso's depreciation negatively impacts the bridge in that more personal vehicles cross, although it has a positive effect on the bridge where pedestrians prevail.

Fullerton et al. [28] studied how toll tariffs and currency exchange rates influence border crossings using 1990–2006 monthly pedestrian, personal vehicle, and cargo crossing data. The authors found a negative relationship between the toll and flow of border crossings using ARIMA transfer functions. They also found a negative relationship in land ports where pedestrians and personal vehicles predominate and a positive relationship in ports where cargo crossings dominate.

Cabral et al. [13] analyzed the relationship between exchange rate and personal vehicle passenger crossings in the 9 most active PoEs using monthly data from 1997 to 2018. Using panel data fixed effects and augmented mean group models and controlling for the difference in economic growth rates between Mexico and the U.S., they found a negative

relationship between the exchange rate and border crossings and an increase in crossings when the Mexican economy improves faster than the U.S. economy.

#### *2.3. The U.S.–Canada Border*

Di Matteo and Di Matteo [19] used quarterly data about same-day vehicle crossings to examine the determinants of cross-border shopping for 7 Canadian provinces that share a border with the U.S. from 1979 to 1992. The authors employed the Gerber [27] demand model and estimated a log–log model using ordinary least squares regression techniques. They found that the main factor for border crossings in several Canadian provinces is the exchange rate. Moreover, the per capita income variable is the most important determinant; however, British Columbia's gas price is the most relevant variable that explains border crossings.

#### *2.4. Decision-Maker Sentiment*

Traditional economic theories leave no room for irrational behavior; they assume that decisions are made through rational decision-making processes [29], and economic agents consider all publicly available information when making decisions [30]. However, empirical evidence reveals that agents are constrained in the amount of information they can process [31], questioning the rationality assumptions in classical theory. Furthermore, behavioral economics has highlighted the relevance of economic agents' sentiments in financial markets [7], and their behavioral biases, which lead to overly optimistic/pessimistic beliefs and drive an irrational behavior [32].

The literature provides several definitions for the term "sentiment". In the financial literature, investor sentiment is commonly characterized as either the proclivity to take risks and speculate or the overall sense of optimism or pessimism toward risky assets [33]. For example, De Long et al. [34] related sentiment to noise trading. In contrast, Baker and Wurgler [35] attribute it to feelings of optimism or pessimism about risky assets.

Scholars have devised various proxies for measuring the sentiments of economic agents, especially investors. Based on the data source from which the proxy is extracted, Zhang et al. [7] grouped these proxies into three categories. Indirect proxies are based on stock market data and survey results. Baker and Wurgler [35] used six proxies for sentiment retrieved from the stock market, including the closed-end fund discount and NYSE share turnover. Based on noise-trader sentiment models, Lemmon and Portniaguina [36] employed the University of Michigan survey of consumer sentiment and the Conference Board survey of consumer confidence as proxies for investor sentiment.

Da et al. [37] suggested that earlier indirect sentiment measures do not capture all decision-makers' sentiments. They claimed that market-based indicators can capture more than investors' sentiments. Moreover, survey-based proxies are infrequent, and respondents cannot be incentivized to submit truthful answers. Because of this, they stated that the volume of internet search inquiries should be used to create direct sentiment indicators.

Emerging literature has applied internet search volume data as a source to construct proxies of economic and noneconomic variables [38]. For example, Ettredge et al. [39] analyzed the relationship between the search volume of employment-related terms with monthly U.S. unemployment data. Ginsberg et al. [40] used Google search volume to predict the incidence of influenza diseases. Choi and Varian [41] used search volume from Google to estimate macroeconomic variables, including automobile sales, initial claims for unemployment benefits, travel destination planning, and consumer confidence. Guzman [42] employed search query data to examine inflation expectations. Vosen and Schmidt [43] constructed a private consumption measure based on search queries. Finally, Da et al. [37] hold that sentiment at the market level can be measured using search volume data. Thus, using online searches relevant to households in the U.S., the authors developed a Financial and Economic Attitudes Revealed by Search (FEARS) index.

In this study, it is important to emphasize that the primary goal was to explore the relationship between people's sentiment to cross the border at the aggregate level and

its effect on border crossings. This study proposes that online search query volume data should be used to develop a direct measure, based on the claim of Ettredge et al. [39] that people's search behavior reveals information about their needs, desires, preferences, and concerns. As Guzman [42] proposed, search behavior can be interpreted as a measure of disclosed expectations. According to Da et al. [37], aggregating search volume data reveals market-level sentiment connected to specific topics. Moreover, Google search volume data is used as to gauge people's intentions and concerns over crossing the border.

#### **3. Data Analysis**

The Bureau of Transportation Statistics provides monthly statistics for inbound crossings between the U.S. and Mexico at the PoE level. The data are classified by port, state, and means of transportation. In this study, the dependent variables were the monthly number of pedestrians and the personal vehicle crossings from Mexico to the U.S.

All PoEs were grouped into three regions: California, Texas, and other (Arizona and New Mexico). Table 1 presents the descriptive statistics of pedestrians and personal vehicles aggregated by region. In 2019, California accounted for 42.6% of the pedestrian crossings (42.9% for personal vehicle); Texas, 41.8% (44.1%); and Arizona and New Mexico, 15.6% (13%).


**Table 1.** Descriptive statistics of pedestrians and personal vehicles crossing the border from Mexico to the U.S. (2004-2M2020).

Note: The data are up to February 2020. Source: Own estimations using data from the Bureau of Transportation Statistics.

Google is the largest and most popular search engine on the internet, and since 2004, it has provided the Google Trends services, in which the historical Search Volume Index (SVI) of search terms can be downloaded [37]. This tool provides the search volume of each term in hourly, daily, weekly, or monthly frequencies. In addition, the SVI data for each keyword within a particular geographical region is scaled from 0 to 100 by the period's maximum.

To test the effect that an "anti-immigrant environment" has on economic agents' sentiments (e.g., shopping-trip travelers) and, hence, on their attitudes and decisions toward northward pedestrian and personal vehicle crossings along the US–Mexico border, it is proposed that two variables should be constructed as proxies to capture the phenomenon from both sides of the border. According to Ettredge et al. [39] and Guzman [42], people's search behavior can be expected to reveal matters relevant to them. These variables are derived from the Google Trends database.

If hedonic shopping drives economic agents to cross the border to pursue fun and enjoyment [6], it is reasonable to expect that an "anti-immigrant environment" in the U.S. border cities will affect economic agents' attitudes and sentiments toward crossing the

border for shopping trips, thereby influencing border crossings. Therefore, it is proposed that a variable that captures the sentiments of these economic agents at the aggregate level should be developed using the online search volume data as a proxy for their concerns about and interests in the Mexican side of the border. Based on the online activity of economic agents in the U.S., a second variable was introduced to reflect the "anti-immigrant environment" in the U.S.

For the first variable, different monthly SVI data of border-crossing-related terms, including terms associated with border crossings and migration (e.g., "tiempo de cruce," "puente internacional," and "cruce fronterizo"), were individually retrieved using the Google Trends tool; the search was restricted to Mexico and the period analyzed. It was followed by a "snowball technique", which was implemented by including the related terms that Google Trends suggests in the analysis. After analyzing the stationarity and correlation of the time series of the SVI of each term with the border-crossings data, the more significant search terms were identified, which are "migrantes", "deportaciones", "muro fronterizo", "border patrol", and "patrulla fronteriza". Lastly, the SVI of these terms is grouped by adding each month's values to construct a single index named Border Economic Migrant Sentiment (BEMS). Because the BEMS variable comprises search terms related to security and "anti-immigration" issues, it is expected that it will capture people's concerns about crossing the border [39,42].

The SVI of terms related to border crossings is also explored, with the geographic scope limited to the U.S. A similar research strategy is employed for the other variables, with the "immigration" search term being the most relevant. It is assumed that when an anti-immigration sentiment occurs in the U.S., the SVI of the "immigration" search term will increase. Therefore, adding the "immigration" variable (Imm) may help determine whether a rise in the hostile environment in the U.S. toward immigration may influence northbound border crossings.

Table 2 displays statistics for the main variables. The monthly average crossings for pedestrian and personal vehicles are 3.6 and 6.2 million, respectively. Figure 2 depicts the northbound border crossings for pedestrians and personal vehicles grouped by region. The annual pedestrian crossings are depicted in Figure 2a. The other region experienced a rising trend from 2004 to 2007, followed by a significant decline, which was most likely caused by the global financial crisis, reaching a minimum in 2014. Subsequently, the border crossings experienced a sluggish recovery, with nearly 7 million crossings flattening in recent years. An annual decrease in pedestrian crossing is recorded from 2004 to 2009 in the Californian PoEs, with a temporary recovery in 2007. Texan ports reported similar behavior, with border crossings declining from 2004 to 2011 and with a temporary increase in 2007.


**Table 2.** Descriptive statistics of the main variables (2004-2M2020).

Note: The monthly data are up to February 2020. Source: Own estimations from sources described in the data section.

From 2009 to 2011, although pedestrian crossings in Californian ports increased, those in Texas decreased. However, from 2012 to 2015, there was a decrease in pedestrian crossings in California ports. The reverse is observed in Texas, where crossings in 2019 were lower than those registered 15 years earlier. This could indicate that regional factors may influence pedestrian border crossings.

Figure 2b exhibits the personal vehicle border crossings for each region. From 2004 to 2011, all 3 regions experienced a similar declining trend. However, a recovery was

registered from 2011 to 2017 in all regions, followed by a downward pattern in the last years of the sample.

**Figure 2.** Northbound PoE's annual pedestrian and personal vehicle crossings by regions, from 2004 to 2019 (million). Note: The annual data are from 2004 to 2019 and are in millions. In Figures (**a**) and (**b**), the crossings for other are on the right vertical axis. Source: Own estimations using data from the Bureau of Transportation Statistics.

Following the literature [13,14,26–28], the real exchange rate movements are expected to explain the evolution of border crossings. For example, a depreciation of the Mexican peso makes shopping and leisure activities more costly, reducing the number of pedestrian and personnel vehicles crossings. Likewise, two variables are introduced to capture economic factors. First, the Global Indicator of Economic Activity (IGAE) variable is used to track the Mexican economic cycle. Second, the real U.S. gas price is used to measure its influence in border-crossing fluctuations. The gas price difference between the U.S. and Mexico is a relevant variable for passenger vehicles crossings, especially across the Texas border. Moreover, the difference tends to be positive, influencing the crossing of Mexican citizens to the U.S. to look for cheaper gas, when such price differences are significant.

The data on the exchange rate between the Mexican peso and the U.S. dollar are retrieved from Banco de Mexico (Mexican Central Bank), which publishes a monthly real exchange rate index based on a weighted basket of numerous currencies. The gas price is retrieved from the U.S. Energy Information Administration. In this study, the monthly retail gasoline prices of all grades of formulations in the U.S. are used. The data are adjusted in real terms.

The monthly IGAE indicator and the real exchange rate from 2004 to February 2020 are displayed in Figure 3. The Mexican economy experienced an upper trend during this period, which was caused by trade liberalization and other economic reforms. However, the figure reveals that during the 2008 global financial crisis, the Mexican economy suffered a slowdown pattern, with the Mexican peso depreciating significantly. As a result, from 2016 onwards, the economy and the real exchange rate experienced a sideways trend, reducing the economic growth rate as compared with that of the previous years.

A dummy variable is included in the econometric model to capture the exogenous events that impacted the U.S. and may influence border crossings. The information about the U.S. economic recessions is retrieved from the National Bureau of Economic Research (NBER). Regarding the D\_US\_CRISIS variable, from December 2007 to June 2009 (including the 2008 global financial crisis) is assigned a value of 1.

**Figure 3.** IGAE and Real Exchange Rate from 2004 to 2020. Note: The monthly data are from 2004 to February 2020. IGAE stands for the Global Indicator of Economic Activity and is displayed on the right vertical axis. Source: Own estimations using data from INEGI and Banco de Mexico.

Table 3 presents the correlation matrix of the dependent and the main independent variables. The real exchange rate's negative sign regarding pedestrians and personal vehicles is as expected. Hence, there is some initial evidence that northbound border crossings decline when the Mexican peso depreciates. In the case of pedestrians and personal vehicles, the IGAE coefficient is negative, although the correlations are small. Furthermore, as projected, the real gas price is negative for personal vehicles. Lastly, the correlation between the two Google Trend variables is 0.408.


**Table 3.** Correlation matrix.

Source: Own estimations.

#### **4. The Empirical Model**

The empirical model proposed in this study, which uses a log–log specification, closely follows the study of Di Matteo and Di Matteo [19]:

$$\begin{array}{l} \Delta \ln(\mathcal{C}\_{l}) = \mathfrak{a}\_{l} + \beta\_{1} \Delta \ln(\operatorname{Exch} \, \textit{rate})\_{l} + \beta\_{2} \Delta \ln(\operatorname{GR})\_{l} + \beta\_{3} \Delta \ln(\operatorname{GAS})\_{l} \\ \quad + \sum\_{i=1}^{n} \theta\_{i} \Delta \ln(\operatorname{GT}\_{i})\_{t} + \gamma\_{1} (\boldsymbol{D}\_{\text{LUS\\_CRISIS}})\_{t} + e\_{i} \end{array} \tag{1}$$

where the dependent variable *Ct* denotes the *t* month total number of pedestrian or personal vehicle crossings. In Equation (1), *Exch rate* represents the real exchange rate variable, *GR* denotes the Mexican economy growth rate, *GAS* is the real gas price in the U.S., *D\_US\_CRISIS* is the dummy variable that captures economic turmoil, and *GTi* represents the variables constructed from Google Trends that capture decision-maker' sentiments (BEMS and Imm variables). The time series was computed in logarithms for both the dependent and independent variables, except the dummy variable.

Two sets of models were estimated: the first was for the entire sample of PoEs, and the second set was divided into three regions (California, Texas, and other). Endogeneity is

not expected to be a concern because exchange rates are defined in international financial markets. Therefore, border crossings from Mexico are not expected to directly affect the value of the Mexican peso. Furthermore, these border crossings can have little effect on national gas prices in the U.S. or the evolution of Mexico's national economy.

The log–log specification allowed us to identify the elasticities between the pedestrian and personal vehicle crossings and the independent variables. Due to the wealth effect resulting from the currency movements, it was expected that a depreciation of the local currency would negatively affect Mexicans' propensity to cross northbound. Thus, it was anticipated that the sign of the coefficient estimated would be *β*<sup>1</sup> < 0.

When the Mexican economy expands, its residents have more resources to cross northbound to take advantage of price differences in goods and services on both sides of the border [26]. Indeed, border crossings are explained by Mexican residents' shopping trips [26,44]. Mexicans have greater resources for cross-border shopping trips near the top of the economic cycle. Thus, the coefficient of this variable was predicted to be *β*<sup>2</sup> > 0.

For decades, the government set gas prices in Mexico, with price differentials expected along the border [4,16,19]. Thus, it can be assumed that the U.S. gas price may play a role in the decision to cross the northbound border. If the U.S. gas price increases, Mexicans are less motivated to cross the border. The sign for this coefficient was expected to be *β*<sup>3</sup> < 0.

Based on previous studies that used people's online search activity as a proxy for their sentiment [33,37,45,46], it can be argued that people's sentiment may influence bordercrossing fluctuations. An increase in the negative sentiment about Mexicans and immigration in the U.S. will result in fewer border crossings. If the variables based on Google Trends data captured the negative sentiments about border crossings, a negative relationship was anticipated (*θ<sup>i</sup>* < 0).

#### **5. Results**

We begin by revisiting the series' stationarity before estimating the empirical model presented in Section 4 of this study. The primary estimations are then reported by bordercrossing type and region, followed by robustness checks.

#### *5.1. Testing for Stationarity*

Table 4 reports the stationarity tests of the main dependent and independent variables in logarithms for both levels and the first differences. The Augmented Dicky Fuller (ADF) unit root tests were conducted, and the results are recorded in the first column of Table 4. The ADF tests the null hypothesis that a time series has a unit root against the alternative hypothesis that it is I (0). The level variables are presented in the first panel, revealing that the data are not stationary. The Phillips–Perron (P.P.) unit root test was also performed, and the results are reported in the second column of Table 4. For this unit root test, the null and alternative hypotheses were the same as those of the ADF test. The results indicate that the data in levels are not stationary. Therefore, the data were computed in its first difference to minimize the nonstationary problems. The data in the first differences are stationary and reported in the second column of Table 4. Stationarity tests for the individual time-series components of the border-crossing sentiment index were performed, which are stationary at the first difference (the results are not presented in Table 4 but are available on request).

#### *5.2. Main Estimates*

The results of the regressions of Equation (1) for personal vehicle and pedestrian crossings are presented in Table 5. The Newey–West method was used to compute the robust standard errors to account for autocorrelation problems. The results support the negative relationship between border crossings and real exchange rates. The coefficients range from −0.19 to −0.48 (at least with a significance level of 10%) for pedestrian and personal vehicle crossings. Therefore, the depreciation of the Mexican currency negatively influences the propensity to cross northbound. An increase of 1% in the exchange rate leads

to approximately a −0.20% decrease in personal vehicle crossings, and that of pedestrians is close to −0.40%.

**Table 4.** Unit root tests.


Note: *p*-values are reported in parentheses. ADF refers to the Augmented Dicky Fuller unit root test, and PP refers to the Phillips–Perron unit root test. Source: Own estimations.

**Table 5.** Personal vehicles and pedestrians OLS estimations (period: 2004–2020M02).


Note: The standard errors reported are robust to heteroscedasticity. \*, \*\*, and \*\*\* denote significance levels of 10%, 5%, and 1%, respectively.

The coefficient of the IGAE variable is positive (1% significance level), as expected. Thus, Mexicans tend to cross northbound more frequently when they have more resources. With an increase of 1% in the IGAE indicator, increases closes to 1% in personal vehicle crossings and 1.4% in pedestrian crossings are expected. Thus, the elasticities for IGAE are somewhat higher for pedestrian crossings than personal vehicles. These differences are statistically significant (1% significance level). To conduct this analysis, the Stata's suestbased test of equality of coefficients was utilized, by comparing the d(lnIGAE) coefficients of model 1 to model 4, model 2 to model 5, and model 3 to model 6 (Table 5). Moreover, there is no evidence that the price of gasoline in the U.S. is a motive for Mexican residents to cross the border.

In Table 5, the BEMS index variable is added to models 1 and 4, the immigration variable is added to models 2 and 5, and both variables are included in models 3 and 6. The coefficient of both variables is negative, as predicted. An increase of 1% in the BEMS index leads to a 0.02% decrease in border crossings. Moreover, a 1% increase in the Imm leads to an approximately 0.07% decrease in northbound border crossings. By comparing the d(ln Imm) coefficients calculated using models 2 and 5 in Table 5, which is significant at the 10% level, it is found that Imm is slightly larger for pedestrian crossings than for personal vehicles. These differences are statistically significant (5% significance level), using the Stata's suest-based test of equality of coefficients. After examining models 3 and 6, we find that the coefficients of Imm are larger than the coefficients of the BEMS index for both pedestrian and personal vehicle crossings. These differences are statistically significant (5% significance level), using the Stata's suest-based test of equality of coefficients.

Table 6 (personal vehicle crossings) and Table 7 (pedestrian crossings) report the results for the 3 regions in which the PoEs are grouped. The BEMS index variable is included in models 1, 4, and 7 in Tables 6 and 7; the Imm variable in models 2, 5, and 8; and both variables in models 3, 6, and 9. By analyzing both tables, some interesting insights are obtained. First, in Texas, when the Imm is added, the relationship between personal vehicle crossings and the real exchange rate is not statistically significant, and for pedestrian models, the significance reduces. Moreover, for personal vehicle models in the other two regions, the significance reduces. However, the statistical significance level of the real exchange rate does not change when Imm is included in California and other region's pedestrian crossings models.

If Imm captures some of the possible hostile immigration-related environment, an anti-immigrant environment weakens the economic motivation for crossing, especially in Texas PoEs. However, additional research may be conducted to examine this argument, as previous findings suggest a possible relationship between the real exchange rate, the anti-immigrant environment, and border crossings flow. However, such a relationship should be viewed with caution, as we provide no strong evidence to support such findings.

If tourism and shopping are two of the main reasons for Mexicans to cross the border [4,26], an increase in an anti-immigrant environment may diminish the economy of border cities.


**Table 6.** OLS regressions results of personal vehicle crossings clustered by regions (2004–2020M02).

Note: The standard errors reported are robust to heteroscedasticity. \*, \*\*, and \*\*\* denote 10%, 5%, and 1% significance levels, respectively.


**Table 7.** Pedestrian crossings OLS regression results clustered by regions (period: 2004-2020M02).

Note: Standard errors reported are robust to heteroscedasticity. The symbols \*, \*\*, and \*\*\* refer to 10%, 5%, and 1% significance levels, respectively.

However, in terms of elasticities, a 1% increase in Imm in Texas results in a −0.127% decline in pedestrian crossings, which is slightly greater than the −0.090% elasticity of personal vehicles. These differences are statistically significant (5% significance level), using the Stata's suest-based test of equality of coefficients. This indicates that pedestrian crossings are more sensitive to changes in Imm.

Second, in all regions, the Mexican economic cycle represented by IGAE is statistically significant at the 1% level. As expected, when Mexicans have more resources, they cross the border more frequently. However, gas price is not statistically significant, suggesting that most people who cross the border may be motivated by shopping, tourism, or leisure activities rather than purchasing gasoline.

Lastly, except for the pedestrian models in California (Table 7), the BEMS index is statistically significant in all models. This might imply that economic motivators may be more important in influencing pedestrians' border-crossing decisions in California than in the other regions. However, future research may further analyze this line of reasoning. By comparing models 4 in Tables 6 and 7, it is found in Texas that the BEMS index elasticities are slightly higher for pedestrian crossings (−0.03%) than personal vehicles (−0.02%). When both sentiment variables are added, the Imm coefficient has a higher value than the BEMS index coefficient in California and Texas for both types of crossings. Models 3 and 6 in Tables 6 and 7 were examined. These differences are statistically significant (10% significance level in California, 5% significance level in Texas), using the Stata's suest-based test of equality of coefficients.

These results might suggest that people's sentiments play a role in their decision to cross the border. Imm is significant in all models and regions, especially in California and Texas. If this proxy captures some of the variations in the anti-immigrant sentiment, the findings might imply that the hostile anti-immigration atmosphere may have a negative impact on border crossings and the economies of U.S. border cities.

#### *5.3. Robustness Checks*

Rolling regression analysis was conducted as a robustness check. This statistical method seeks to analyze the relationship between the dependent and independent variables. However, unlike other methods, a specific window size is defined and moved progressively along the sample period. Therefore, a window size of 24 observations was selected in this study generating 174 subsamples, which were used to perform the rolling regression with a step of 1 (Windows of 12 and 48 lengths were tested, yielding similar results that can be provided upon request). Henceforth, the window was two years long and moved progressively from one month to the next. The results can be delivered upon request. This analysis intended to evaluate the behavior of the coefficients of the BEMS and Imm variables using the following specifications:

<sup>Δ</sup> ln(*PVt*) = *<sup>α</sup><sup>i</sup>* <sup>+</sup> *<sup>β</sup>*1<sup>Δ</sup> ln(*Exch rate*)*<sup>t</sup>* <sup>+</sup> *<sup>β</sup>*2<sup>Δ</sup> ln(*GR*)*<sup>t</sup>* <sup>+</sup> *<sup>β</sup>*3<sup>Δ</sup> ln(*GAS*)*<sup>t</sup>* <sup>+</sup> ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *θi*Δ ln(*GTi*)*<sup>t</sup>* + *ei* (2)

$$
\Delta(PD\_l) = \mathfrak{a}\_l + \beta\_1 \Delta \ln(\text{Exch } rate)\_l + \beta\_2 \Delta \ln(GR)\_l + \sum\_{i=1}^{n} \theta\_i \Delta \ln(GT\_i)\_l + \varepsilon\_l \tag{3}
$$

where the dependent variables *PVt* and *PDt* denote the *t* month total number of personal vehicle and pedestrian crossings. In Equations (2) and (3), *Exch rate* represents the real exchange rate and *GR* denotes the Mexican economy growth rate; in Equation (2) *GAS* is the U.S. real gas price, and *GTi* denotes the BEMS and Imm variables.

Figure 4a depicts the coefficients for the first difference of the BEMS logarithm for personal vehicle crossings. Throughout the period, the BEMS had a negative relationship with personal vehicles. However, short-run episodes with positive coefficients arose, such as the summer of 2012 and the 2013 and 2014 winter seasons. Pedestrian crossings followed a similar pattern (Figure 4c). Although Figure 4a depicts 2 main downward trends—one from 2006 to 2008 and the other from 2014 to 2019—with the latter coinciding with the Donald Trump Administration, a similar pattern was found for pedestrian crossings. The mean of the d(ln BEMS) coefficients for personal vehicle (pedestrian) crossings was −0.031 (−0.041), with a standard deviation of 0.020 (0.027), suggesting that pedestrian crossings are slightly more sensitive to fluctuations in the BEMS index than personal vehicles.

163

Figure 4b reveals that the coefficients of the first difference of the Imm logarithm for personal vehicle crossings are negative, except from January 2006 to May 2007 and 2 subperiods in 2010. Two rising trends were followed by major decreases in 2006 and 2010. In addition, a low point is observed in 2014, coinciding with the 2014 immigration crisis. Due to violence and persecution, an increasing number of Central American families seek asylum in the U.S. [47]. Similar behavior is observed in Figure 4d for pedestrian crossings. The coefficients are more stable in recent years, as seen in Figure 4b for personal vehicle crossings, whereas pedestrian crossings (Figure 4d) experienced a downward trend. In the case of personal vehicle (pedestrian) crossings, the mean of the coefficients of d(ln Imm) was −0.085 (−0.125), with a standard deviation of 0.059 (0.094). These findings might imply that pedestrian crossings are more responsive to Imm.

The 4 panels in Figure 4 depict a rise in the elasticity between the BCI and Imm and border crossings during the 2008 and 2012 economic slumps, as demonstrated by the downtrend of the IGAE indicator in Figure 3. This suggests that sentiments may have a larger influence on the choice to cross the border during economic turmoil.

#### **6. Discussion**

The topics examined in this study may be relevant to those overseeing the economic development of the US–Mexico border regions. This manuscript reveals that the U.S. authorities can have an impact on Mexicans' intention to cross the border based on their perception of the new developments of U.S. immigration policies. It is observed that controlling for economic factors, such as the real exchange rate and economic activity, border crossings are also affected by the sentiments of people toward new immigration policies. Even when those policies have, in principle, no impact on legal immigration, how people perceive policy changes affects their willingness to cross the northbound border. The findings are relevant for communities in the U.S. that receive a significant number of visitors from Mexico.

This study also offers some potential lines of further research. COVID-19 has changed the dynamics of border crossings. Nowadays, Mexican pedestrians and personal vehicles have new reasons to visit the U.S. border regions. This is because the U.S. government is ahead of the Mexican government in the COVID-19 vaccination. Moreover, the situation has not precluded Mexican citizens from obtaining a vaccine in the U.S. before taking it in their own country. The U.S.–Mexican border was closed in March 2020 but opened in November 2021. This offers new motives for Mexicans for crossing, which are to look for booster shots, to be immunized with the vaccine of their choice, or to obtain vaccines for the underage population, whom the Mexican government has been reluctant to include in its vaccination policies. These new motives can also complement the previous motives for crossings discussed in this study. Indeed, the impact of closing and opening the border can also offer an interesting setting for testing hypotheses about the nature of border crossings along the US–Mexico border.

#### **7. Conclusions**

This study tested the hypothesis that the anti-immigrant environment in the U.S. may influence the sentiment of economic agents and thus affect the crossings along the US– Mexico border. Unlike previous studies, variables constructed from online search volume data were used. In addition, a rolling regression analysis was performed to examine the relationship between the sentiment variables and border crossings over time. Finally, all the PoEs along the border were included and grouped into three geographic regions.

The elasticities obtained suggest a negative relationship between the real exchange rate and both types of border crossings, which is consistent with previous literature that indicates that the depreciation of the Mexican peso has an adverse effect on border crossings. The influence of the Mexican economic cycle was slightly more significant for pedestrians (1.39%) than personal vehicle crossings (1.09%). Therefore, the border is crossed more frequently when the Mexican economy grows.

When Imm was included, the relationship between the real exchange rate and personal vehicle crossings was not statistically significant in Texas; this might imply that an anti-immigrant environment reduces the economic incentive for crossing. In contrast, for the pedestrian crossings in California, the inclusion of Imm did not reduce the statistical significance.

Following the studies of Ettredge et al. [39] and Guzman [42], it can be argued that the BEMS index variable is expected to capture people's sentiment to cross the border at the aggregate level. The BEMS was statistically significant in all models for both crossings, except pedestrians in California, which might indicate that the other factors analyzed have a greater influence in this region than the sentiment captured by the BEMS.

As suggested by Baruca and Zolfagharian [6], if a hedonic shopping motive drives customers to cross the border to pursue fun and enjoyment, the preceding findings may have some practical implications. Border crossings in Texas are more sensitive to changes in the anti-immigration environment and people's sentiment, as captured by the BEMS index; hence, it is vital to implement public policies that enhance a friendlier environment for those interested in crossing the border because of the economic impact of Mexicans' shopping trips on the U.S. border-city economies [25].

The rolling regression results demonstrate that the relationship between the sentiment proxy variables and border crossings is negative, except for short-run subperiods, which are positive, although with small coefficients. Thus, pedestrian crossings are more sensitive to changes in the sentiment proxy variables than personal vehicles. In addition, the elasticities of the sentiment proxy variables and border crossings increase during economic turmoil.

**Author Contributions:** Conceptualization and methodology, R.C., F.G.-F. and E.S.; software, F.G.-F.; validation and formal analysis, R.C., F.G.-F. and E.S.; data curation, F.G.-F.; writing—original draft preparation, F.G.-F.; writing—review and editing, R.C., F.G.-F. and E.S.; supervision, R.C. and E.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

#### **References**


## *Article* **Isolating the Role of the Transport System in Individual Accessibility Differences: A Space-Time Transport Performance Measure**

**Alberto Dianin 1,2,\*, Michael Gidam and Georg Hauger 1,\***


**Abstract:** Accessibility differences across individuals are a core topic in the transport equity debate. Space-Time Accessibility measures (*STAs*) have often been used to show such differences, given their sensitiveness to individual spatial and temporal constraints. However, given their complexity, *STAs* cannot properly isolate the specific role of the transport system in individual accessibility differences, since it is mixed with several other spatial, individual and temporal factors. To isolate the role of the transport system, this study introduces a Space-Time Transport Performance measure (*STTP*) that (a) grounds on the individual daily schedule of fixed activities, (b) calculates the generalised transport costs each individual has to bear to perform such schedule, and (c) weights it against the Euclidean distance between the activities of such a schedule. *STTP* is tested together with *STA* for a small sample of individuals living and performing their daily activities within the 22nd district of Vienna. This test provides two main findings: first, individual differences registered by *STTP* tend to be smaller than those highlighted by *STA*, according to the former's more narrowed and transportspecific approach. Second, individuals with the highest *STA* do not necessarily register the highest *STTP* (and vice versa). Indeed, some may experience limited transport performances when running their mandatory daily schedule, while registering a high degree of access to discretionary activities according to their constraints and opportunities at disposal (and vice versa). Considering these results, *STTP* may be seen as a complementary indicator to be used together with *STA* to analyse both general and transport-specific individual accessibility differences. Its role is particularly important for transport policy makers, who should understand which accessibility differences are directly linked to the performances of the transport system and could be remediated through transport policies.

**Keywords:** transport equity; distributional analysis; accessibility; space-time model; transport policy

#### **1. Introduction**

As highlighted by van Wee and Mouter [1], "in the transport policy literature, there is consensus that 'sound' policies have to meet three criteria: they should be effective, efficient and fair" [2]. Effectiveness and efficiency have received significant attention in the last decades [1], while the same does not apply to fairness except for contributions addressing social exclusion (e.g., ref. [3]). In recent years, the attention on transport fairness has increased thanks to the growing importance of inequality reduction at the international level, e.g., among the Sustainable Development Goals of the United Nations [4]. The lack of studies in this field is linked to the normative nature of fairness, making it difficult to measure and apply it to a cost–benefit analysis (one of the most diffused policy assessment tools [1,5]). Due to this normative issue, most studies in transport fairness focus on distributional analyses, i.e., how transport effects (such as air pollution variations or safety variations) are distributed over people [1,6]. One of the most addressed effects is the variation of *individual accessibility differences* [5]. Indeed, accessibility is one of the critical

**Citation:** Dianin, A.; Gidam, M.; Hauger, G. Isolating the Role of the Transport System in Individual Accessibility Differences: A Space-Time Transport Performance Measure. *Appl. Sci.* **2022**, *12*, 3309. https://doi.org/10.3390/ app12073309

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti, Dimitrios S. Paraforos and Stefania Lanza

Received: 18 February 2022 Accepted: 21 March 2022 Published: 24 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

pillars for a sustainable mobility paradigm [7]. Its improvement is one of the core concerns for transportation ministries throughout the world [8].

However, most accessibility measures developed in the literature are not suitable to point out such individual differences since they focus on the physical separation among places and overlook the accessibility differences that could exist among people living in the same area (e.g., because of their different modal restrictions or daily schedules [9]). To fill this gap, so-called *person-based* accessibility measures have been developed [10,11]. Among them, the *Space-Time Accessibility* measure (*STA*) is one of the most diffused given its sensitiveness to individual space-time constraints [12–14]. Although *STA* is very suitable for investigating individual accessibility differences in general, its high complexity does not allow a clear understanding of the specific role the transport system plays in such differences, differences which are fed by a multitude of spatial, transport, individual and temporal factors. This is a relevant limit of *STA* especially from the perspective of transport policy makers, who should introduce transport policies aimed at reducing such differences. Indeed, accessibility differences are often unavoidable and the transport system cannot remediate them [1]. For instance, inevitably, people living close to a large facility (e.g., a hospital) have better access to it than people living far away. However, this is unavoidable due to land-use constraints (hospitals cannot be built in any municipality), and transport policies cannot eliminate distance. Given this issue, it is necessary to complement *STA* with transport performance measures (With the term "transport performance" we mean a set of transport indicators that describe the efficiency of the transport system, such as commercial speed, network capacity, service period, average waiting or transfer time, or monetary cost of travel) able to isolate the specific role of the transport system in individual accessibility differences, which could be avoided or remediated through transport policy interventions. For this purpose, this paper introduces a so-called *Space-Time Transport Performance* measure (hereinafter *STTP*).

The rest of the article is organised as follows. Section 2 reviews main place- and person-based accessibility measures used in literature and points out the elements that prevent *STA* from isolating the role of the transport system in individual accessibility differences. On this basis, Section 3 introduces *STTP* by describing its key features and its computation process. *STTP* is then tested in Section 4 together with *STA* to determine the complementary results that are achievable with the proposed measure. Moreover, it discusses the limits of *STTP*. Section 5 concludes the contribution by highlighting potential contexts of application.

#### **2. Place- and Person-Based Accessibility Measures**

In general terms, place-based measures calculate accessibility for a location by assuming that all people in that location register the same accessibility. Conversely, person-based calculations analyse accessibility for individuals living in the same area to understand how accessibility varies across them [15]. Sections 2.1 and 2.2 summarise the main placeand person-based measures developed and used in literature, while Table 1 displays their definition, mathematical formulation and main conceptual and operational pros and cons. Afterwards, Section 2.3 discusses the factors that prevent *STA* from isolating the role of the transport system in individual accessibility differences.

#### *2.1. Place-Based Measures*

Place-based measures calculate how easy it is for people departing from a place to reach opportunities located in another [9,16]. Following this definition, three main types of place-based measures have been developed: *cumulative-opportunity*, *gravity-based* and *adapted gravity-based* measures ([9,17]; Table 1). All of these are a function of two core elements: (a) an attraction factor given by the amount, spatial distribution and quality of opportunities to access; and (b) an impedance factor given by the effort needed to reach these opportunities [18]. These are combined in different ways. The *cumulativeopportunity measure* (e.g., refs. [19–21]) counts the number of opportunities reachable from

an origin location within a predetermined threshold usually defined in terms of travel time or cost. It is a very straightforward approach, but it shows some limits since counted opportunities have the same importance regardless of the effort needed to reach them. The *gravity-based measure* (e.g., refs. [22–24]) addresses these limits since it calculates the accessibility of an origin as a function of the number and importance of opportunities at the destination, weighed against the travel effort needed to reach them. However, it neglects the competition between the demand for and supply of opportunities. The *adapted gravity-based measure* (e.g., refs. [25–27]) incorporates them through a double constrained spatial interaction model with two mutually-dependent balancing factors [17].

These three measures have two main limitations that person-based measures aim to overcome [13,28]. First, they assume that all individuals who depart from the same origin location experience the same level of accessibility regardless of their different spatial, temporal and modal constraints. Second, they calculate accessibility for a single reference location (typically home place) overlooking the fact that people generally perform a sequence of daily activities that are differently located in space and time, and this sequence affects their accessibility.

#### *2.2. Person-Based Measures*

Three main types of person-based measures address the gaps of the place-based measures: the *utility-based*, *individual integral* and *space-time* measures ([11,17]; Table 1). The *utility-based measure* (e.g., refs. [9,29,30]) grounds on economic theories and calculates accessibility as the maximal economic utility individuals can get from the access to spatially distributed opportunities based on their perception of the utility of the options at their disposal. The *individual integral measure* (e.g., refs. [31–33]) is a gravity-based measure adjusted to be person-specific. The adjustment is performed either by disaggregating data and analysis by, e.g., trip purposes, transport modes, age or income groups; or by using a non-zonal method as the point-based approach, which allows a focus on specific point locations and measurement of point-to-point travel costs at the individual level. Although person-based, this last measure focuses on a single reference location, and it overlooks the spatio-temporal constraints affecting people on a daily basis [34]. The *space-time measure* addresses these limitations in the most comprehensive manner (e.g., refs. [14,35,36]).

For this reason, this is considered an effective approach to measure accessibility at the individual level and to discuss individual accessibility differences [10]. It derives from the time geography framework elaborated by Hägerstrand [37] and focuses on the set of discretionary opportunities (i.e., non-mandatory daily activities) that individuals could reach on a daily basis given the spatio-temporal constraints posed by their fixed daily activity chain (i.e., mandatory daily activities) [38]. This set is called a Feasibility Opportunity Set (*FOS*), and is obtained in three steps. First, the daily sequence of fixed activities constrained in space and time for a person is schematised. This sequence generates a so-called Space-Time Path (*STPA*). Based on the *STPA*, the Potential Path Areas (*PPAs*) are calculated for each couple of the following fixed activities in the *STPA*. Each *PPA* includes all the locations that an individual could visit between two subsequent fixed activities, given the mandatory departure time from the former, the mandatory arrival time at the latter, the time needed to travel between them, and the time required to visit such locations. By extending the calculation of the *PPA* to all couples of sequential fixed activities, the Daily Potential Path Area is obtained (*DPPA*). All the opportunities that belong to the *DPPA* constitute the *FOS* and define the space-time accessibility measure (*STA*).


\* Where: **Cumulative opportunity:** *i* is an origin location; *j* is a destination location; *Oj* are the opportunities available at destination; *TCij* is the cost of travelling from *i* to *j*; *f***(***TCij***)** is the travel cost function, which may assume different forms such as linear, Gaussian, logistic or negative exponential; *T* is the travel cost threshold set in the analysis. **Gravity-based:** *i*, *j*, *Oj*, *TCij* and *f***(***TCij***)** are defined above. **Adapted gravity-based:** *i*, *j*, *Oj*, *TCij* and *f***(***TCij***)** are defined above; *ai* is the balancing factor for demand in location *i*; *bj* is the balancing factor for supply in location *j*; *Di* is the demand for opportunities in *i*. **Utility-based:** *u* is a user for whom accessibility is calculated; *λ* is the travel cost coefficient; *z* is one of the choices that *u* can make; *Cu* is the set of choices *z* that *u* can make; *Vzu* is the systematic utility of the choice *z* for *u*. **Individual integral:** *u*, *i*, *j*, *Oj* and *f***(***TCij***)** are defined above; *TCij<sup>u</sup>***,***<sup>k</sup>* is the travel cost for *u* from *i* to *j* by transport mode *k*. **Space-time:** *u* is defined above; *w***1***−<sup>n</sup>* are the locations of discretionary opportunities; *Ow* are the discretionary opportunities *Ow* available in *w*1−*n*; *DPPA* is the Daily Potential Path Area; *t* is the time needed to participate in a discretionary opportunity *Ow*; *ta* is the ending time of a fixed activity *a*; *ta***+1** is the starting time of the following fixed activity *a*+1; *da***,***<sup>w</sup>* is the physical distance between *a* and *w*; *dw***,***a***+1** is the physical distance between *w* and *a*+1; *v* is the average speed on the transport network.

#### *2.3. Limits of STA in Describing the Role of the Transport System*

Thanks to its individual sensitiveness and capacity to comprise all the four accessibility components (land-use, transport, individual and temporal; [17]), *STA* is often adopted in the analysis of accessibility differences (e.g., refs. [10,12,39]). Nevertheless, *STA* presents some limits when it comes to isolating the role of the transport system in individual accessibility differences. In particular:

• **Limited relevance of the transport system performances in the** *FOS***:** *STA* is represented by the *FOS*, which mostly depends on the amount and spatial distribution of the discretionary opportunities and the spatio-temporal constraints of the *STPA* [11]. Therefore, *STA* focuses highly on spatial and temporal accessibility components and less on transport performances [17]. This is a gap when the aim is to isolate the specific role of a transport system in individual accessibility differences. For instance, let us

assume two people who both have one fixed activity during their day, departing and headed to the same locations simultaneously, and using the same transport system with the same performances. The former works full-time while the latter part-time. According to the *STA* concept, the different time constraints of the two individuals would lead to an accessibility difference since the part-time worker has more occasions to engage in discretionary activities than the full-time worker. However, the performance of the transport system does not play a role in such accessibility differences.

• **Unsuitability of the** *FOS* **to represent accessibility differences:** As stressed by Pritchard et al. [40,41], the choice of the accessibility measure may significantly influence the outcomes of the analysis. Therefore, it is crucial to deploy a measure that is as suitable as possible to discuss accessibility distribution. Specifically, the estimate should represent an optimisation factor for the observed individuals, i.e., a good they generally aim to increase [42]. This is the case, e.g., with income, which is one of the critical indicators for distributional analyses in socio-economic sciences [43]. *STA* cannot be easily labelled as an optimisation factor since it is not straightforward to state that individuals aim to maximise the number of discretionary opportunities they could reach on a daily basis. For instance, a person could have a small *FOS* because (s)he has a tight schedule of fixed activities and no room to engage in discretionary ones. Nevertheless, (s)he could be not much interested in further activities. At the same time, the transport system could be efficient in allowing them to reach all the fixed activities with a reasonable effort [5].

Based on these limits, we introduce the so-called Space-Time Transport Performance measure *(STTP*) in order to complement *STA* by isolating the role of the transport system in individual accessibility differences.

#### **3. Space-Time Transport Performance Measure (***STTP***)**

#### *3.1. Key Features of STTP*

*STTP* aims to measure the performances of the transport system based on the spatiotemporal and individual constraints characterising the daily life of each individual. To meet this purpose, *STTP* grounds on three key features, described below in detail.


belonging to an individual's daily schedule (see Section 3.2 for further details). This choice makes *STTP* a measure of transport performance rather than a measure of the daily transport effort of individuals (which would be influenced also by the distance daily covered).

• **Estimation of** *IDTC* **based on temporal and individual constraints:** To incorporate the temporal constraints, *STTP* calculates *IDTC* by considering the actual location and timing of the fixed activities daily performed by an individual. Also, the individual constraints are incorporated in the *IDTC* computation in two ways. First, the actual modal choices of individuals for each daily travel are considered according to individual constraints such as the ability to drive or car ownership. Second, the non-monetary cost part of *IDTC* (i.e., travel-time costs) are estimated at the individual level based on income (as described in detail in Section 3.2).

These three features shape *STTP* as a transport performance indicator that stems from spatio-temporal and individual constraints. On the one hand, this allows *STTP* to isolate the role of the transport system in individual accessibility. On the other hand, it suggests how *STA* and *STTP* should be deployed together to get both a comprehensive picture of space-time accessibility and more narrowed information on the transport component. Figure 1 summarises the relation between *STA* and *STTP*, while Section 3.2 describes how *STTP* is calculated.

**Figure 1.** Relation between *STA* (**left**) and *STTP* (**right**) and their key features.

#### *3.2. Calculation of STTP*

*STTP* (Formula (1)) is calculated by following three steps: (A) the setup of the *STPA*; (B) the calculation of the *IDTC* figures; (C) the calculation of the *IDD* figures.

(A) The setup of the *STPA* is made in the same way as for *STA*. The daily sequence of fixed activities (*a*1−*n*) constrained in space and time for the observed individual is schematised. This includes the location where each *a* takes place (address); its category (home-stay, work, education or other); the duration of each *a* given by its mandatory starting and ending time; the transport mode(s) usually used to travel between each couple of subsequent activities (*a*); and the degree of fixity for each *an* according to the flexibility of its location and timing. This is made through a 1–5 Likert scale, where 1 indicates maximum flexibility of the location and/or timing of *a*, while 5 shows a minimum one. The data needed to reconstruct the *STPA* are collected from observed individuals using travel diaries. Interviewed individuals are asked to fill them out by considering a typical weekday of their daily life. Table 2 shows an exemplificative (and fictional) *STPA.*


**Table 2.** Exemplificative *STPA* to be set up for the calculation of *STTP*.

**\* Notes:** the transport mode used to reach the activity.

(B) Once the *STPA* is set, *IDTC* is calculated (Formula (2)). *IDTC* is the sum of the Individual Transport Costs incurred by an individual for each daily travel performed by the transport mode *k* between each couple of subsequent fixed activities *a*,*a+*1 (*ITC<sup>k</sup> <sup>a</sup>*,*a+*1). Each *ITCk <sup>a</sup>*,*a+*<sup>1</sup> value is calculated through a generalised cost function, including a series of monetary and non-monetary (but monetizable) costs [42]. This consists of the monetary cost of travel (*Cm*), the cost of the in-vehicle travel time (*Civtt*), and the cost of out-of-vehicle travel time (*Covtt*). *Cm* encompasses the costs for the usage of infrastructures (e.g., tolls and parking fares), the operating costs of vehicles (e.g., fuel, usage-related depreciation and insurance), and the costs for access to services (e.g., public transport; from now on PT). *Civtt* includes the time spent within private, shared or pooled vehicles. *Covtt* covers the cost of the time to access the first transport system (first mile), the waiting time for transport services, the transfer time among transport services, and the time to reach the final destination (last mile [42]). *Civtt* and *Covtt* are monetised based on unitary Values of Travel Time (*VTT*). As demonstrated in the literature, *VTT* may vary by income, country, travel purpose, mode of transport and distance. It depends on the approach with which it is estimated (e.g., stated vs. revealed preference surveys [44]). The wage rate method is used for *STTP*: different wage rate shares are assumed depending on the country of investigation, travel purpose, and transport mode. Moreover, the actual wage rates of observed individuals are used to make the estimation individual.

(C) Once *IDTC* is calculated, this has to be weighed against *IDD* (Formula (3)). This is the sum of the distances measured between each couple of subsequent fixed activities *a*,*a+*1 (*Da*,*a+*1). Each *Da*,*a+*<sup>1</sup> value is measured as Euclidean and not travelled distance along with the transport network. Indeed, the travelled distance may be influenced by the design of the transport system and not only by the land-use system. For instance, this is the case with fast transport systems such as motorways and high-speed railways, which tend to generate much more detours than slower systems (a phenomenon called "spatial inversion" by Bunge [45]). This detour is an aspect that transport planners can potentially address e.g., by modifying the shape of the PT lines and distribution of stops. Therefore, it is a factor to be included in the accessibility computation. Conversely, the Euclidean distance solely depends on the land-use system (i.e., the amount and location of opportunities in space) therefore it is used to weight *IDTC* and point out the role of the transport system in accessibility differences. Figure 2 summarises the calculation process for *STA* (in red) and *STTP* (in blue) based on *STPA* (in black), which is the common element between them.

$$STTP = \frac{IDTC}{IDD} \tag{1}$$

$$IDTC = \sum\_{a\_1}^{a\_n} IT\big{{}^{k}\_{a,a+1}}\text{ with }\quad IT\big{{}^{k}\_{a,a+1}} = \big{{}^{\mathsf{C}}m\_{a,a+1}^{k}} + \text{Civt}t\_{a,a+1}^{k} + \text{Covt}\_{a,a+1}^{k}\tag{2}$$

$$IDD = \sum\_{a\_1}^{a\_n} D\_{a, a+1} \tag{3}$$

where: *a*1−*<sup>n</sup>* are the fixed activities performed by an individual on a daily basis, *k* is the mode(s) of transport used by an individual between each couple of subsequent *a*s*, IDTC*

is the Individual Daily Travel Cost incurred by an individual on a daily basis*, IDD* is the Individual Daily Distance between the *a*s performed by an individual on a daily basis, *ITCk <sup>a</sup>*,*a+*<sup>1</sup> is the individual transport cost by mode *k* between each couple of subsequent *a*s, *Cm<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> is the monetary cost of transport by mode *k* between each couple of subsequent *a*s, *Civtt<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> is the cost of in-vehicle travel time by mode *k* between each couple of subsequent *a*s, *Covtt<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> is the cost of out-of-vehicle travel time by mode *k* between each couple of subsequent *a*s, and *Da*,*a+*<sup>1</sup> is the Euclidean distance between each couple of subsequent *a*s.

**Figure 2.** Process for the computation of *STPA* (black), *STA* (red) and *STTP* (blue).

#### **4. Joint Test of** *STA* **and** *STTP* **in the City of Vienna**

The test aims to show how *STTP* may lead to complementary results for *STA*, providing insights on the role of the transport system in individual accessibility differences. Similar to other studies focused on the methodological integration of the space-time approach (e.g., refs. [36,46]), we run the test for a small sample of five individuals, for whom *STA* and *STTP* are calculated. We perform such a small test for two reasons: first, because the purpose is to provide a methodological test and not to get statistically relevant results about a specific phenomenon (e.g., gender or income-related accessibility differences). Second, focusing on a few individuals allows a more detailed reflection on results, e.g., pointing out the main differences for each individual (Section 4.3). This would not be feasible with a test involving many individuals, which would be more suitable for statistical comparison. Nevertheless, focusing on such a small test also has some limits, which are discussed in Section 4.4.

#### *4.1. Study Area and STPAs*

The test is run in the City of Vienna, Austria. The analysed individuals (A–E) live and perform their fixed activities within the 22nd district (Donaustadt). This is the second northernmost district of Vienna, with the greatest surface (ca 102 km2), the second-highest population (ca 198,800 inhabitants), and the second-lowest population density out of the 23 city districts (1943 inhabitants/km2). The district is served by 26 bus lines, two light rail lines and two subway lines. Moreover, it is served by a road network that is denser in the core part of the district characterised by a higher urban density and much lower at the fringes, where green areas are predominant. Given the heterogeneous availability of transport means, the area represents a suitable case study to explore the individual accessibility differences related to the transport system. Figure 3 displays the road and PT network of the study area, the location of the available opportunities, the location of the home places, and the fixed activities of the individuals.

**Figure 3.** Study area and key locations of the *STPAs* of the individuals A–E.

After defining the study area, the first step for calculating both *STA* and *STTP* is the setup of the *STPAs* (summarised in Table 3 for individuals A–E). Each *STPA* describes the categories of fixed activities daily performed, their location (not listed in Table 3), their starting and ending time, their fixity degree (expressed with a 1–5 Likert scale), and the transport mode(s) used to reach them. Individual A is a full-time worker who takes their child to school before reaching the work place, stays at the work place until late afternoon, and then comes back home in the evening. (S)He always travels by car. Individual B is a part-time worker who works in the morning and picks up their child after school on the way home at lunchtime. In the afternoon, (s)he has to stay at home for some household duties in the timespans 13:00–15:00 and after 17:30, while (s)he has free time to engage in discretionary activities between 15:00 and 17:30. (S)He always travels by car for all the mandatory travels. Individual C is a part-time worker too. In the early morning and at lunchtime (s)he takes their child to school and back home by walking. In between, (s)he has some household duties to perform at home (from 08:20 to 09:30 and from 12:00 to 14:00) and a free time to engage in discretionary activities in between (from 09:30 to 12:00). From the afternoon until the evening, (s)he works part-time. (S)He travels by PT to and from the workplace. Individual D is a pensioner who visits the hospital in the morning on a

daily basis and travels to and from the hospital by PT. In the afternoon, (s)he has free time to engage in discretionary activities (between 15:00 and 18:00), while (s)he has to stay at home for the other hours of the afternoon. Finally, Individual E is a teenager who goes to school. In the morning, a parent takes them to school by car. In the early afternoon, (s)he has to come back home by PT and stays at home for their mandatory activities till 18:00. Afterwards, (s)he has free time in the late afternoon to engage in discretionary activities, before having to be again at home at 20:00.


**Table 3.** The *STPAs* of individuals A–E.

**Notes: Activity category**: We define four activity categories: Home-stay, Work, Education, and Other. For individuals B–E, two consecutive home-stays occur in their *STPA*s. This is because they have two consecutive mandatory activities to perform at home and a free-time span in between to potentially engage in discretionary activities. **Activity timing**: This indicates the starting and ending time of each activity. The first and last activities always start at 00:00 and end at 24:00. **Activity fixity degree**: 1–5 Likert scale, with 1 indicating maxium flexibility and 5 maximum fixity. **Transport mode**: This indicates the transport mode(s) used to reach the related activity. We define four transport modes: Car driver, Car passenger, PT, and Walking. Since the first activity is always the early-morning home-stay, there is no transport mode assigned to reach it.

#### *4.2. STA and STTP Calculation*

To calculate *STA*, (A) the travel-time performances of the transport mode(s) used by the individuals A–E have to be estimated, and (B) the discretionary opportunities in the study area have to be mapped. These steps are implemented in ArcGIS by calculating different route analyses and service-area analyses through the Network Analyst extension. The estimation of these two components is described below in detail.

(A) **Travel-time performances (***tta***,***a+***1):** *tta*,*a+*<sup>1</sup> by car, PT and walking is estimated via GIS by using the GTFS-Dataset of the Wiener Linien and the Austrian Graphenintegrations-Plattform GIP [47,48]. Road network performances include speed limits, one-way streets, turn prohibitions and actual traffic conditions. According to time schedules, PT performances include travel time between stops and waiting time at the stops. The transfer time between lines or modes is not yet available for the city of Vienna. Therefore, we assume an average value of one minute for buses and light rail and three minutes for the subway, plus the related waiting time. Finally, *tta*,*a+*<sup>1</sup> by walking is estimated based on the existing network of sidewalks and an assumed walking speed of 5 km/h.

(B) **Discretionary opportunities (***Ow***):** The set of *Ow* available in the study area is georeferenced using OpenStreetMap as a core data source. These comprise all the study areas' amenities apart from workplaces, schools, and other educational facilities. Therefore, they mainly include groceries, shopping facilities, healthcare facilities, leisure facilities and other services such as post offices and banks. We consider all *Ow* to have the same importance for all individuals for the *STA* computation. Additionally, we assume that all *Ow* need at least a 10-min stay to be considered in the *PPAs*.

To calculate *STTP*, it is necessary to estimate (A) the unitary value of travel time for both *Civtt<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> and *Covttk <sup>a</sup>*,*a+*1; (B) the unitary monetary cost for *Cmk <sup>a</sup>*,*a+*1; and (C) the Euclidean distances among fixed activities, i.e., *Da*,*a+*1. Even these components are implemented in ArcGIS through the Network Analyst extension by calculating different route analyses and described below in detail.

(A) **Unitary value of travel time (***VTT***):** *VTT* is estimated with the wage rate method [44]. According to this approach, the value of travel time outside working hours (called Off-The-Clock Travel Time) for the driver is empirically found to be approximately 60% of the wage rate, excluding benefits. This percentage tends to decreases to 45% for passengers (of cars and PT) and increases to 100% when considering any kinds of out-of-vehicle travel time (i.e., walk-access, waiting, and transfer times). These differences depend on the perceptions of disutility of travel time for different modes of transport. Generally, travel time by PT or as car passenger has a higher utility since it is possible to make a profitable use of that time (e.g., to read, work or relax). When focusing on travel time during working hours (called On-The-Clock Travel Time), a percentage of 100% is considered for any kind of in-vehicle and out-of-vehicle travel time. Since we do not have individual income data at our disposal, we rely on the average hourly wage rates registered in Austria in 2018 for four different categories of people relevant for our case study, i.e., full-time workers (€16.22/h), part-time workers (€13.78/h), pupils (€9.88/h) and pensioners (€8.89/h) [49]. Accordingly, *VTT* is calculated for each individual and transport mode as summarised in Table 4. Combining these values with the travel-time performances, we obtain the *Civttk a*,*a+*1 and *Covttk <sup>a</sup>*,*a+*<sup>1</sup> figures for each individual.


**Table 4.** *VTT* values applied to the different modes of transport and categories of individuals.

(B) **Unitary monetary cost (***UMC***):** *UMC* is estimated for private cars and PT in two different ways. For private vehicles, we rely on the average kilometric Vehicle Operating Cost (VOC) for passenger cars in Austria. This includes the average cost of fuel and oil, maintenance and repair, tyres, and kilometric-dependent depreciation. According to the EU report by Infras [50] and the yearly values provided by ACEA for all EU countries, [51], a VOC of €0.42/km is assumed for Austria. This is multiplied by the distance travelled to obtain *Cm<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> figures for each individual travelling by car. As

for PT, the transport operator of the city of Vienna offers different yearly subscriptions covering the whole urban transport system [52]. Given the age and mobility habits of individuals, three subscriptions are considered: the annual ticket for adults (€365/year), for seniors +65 (€235/year), and for students till 24 years old (€79/year). According to these fares, a *UMC* of €1/day, €0.64/day and €0.21/day is taken as *Cm<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> for individual C, D and E, respectively.

(C) **Euclidean distances (***Da***,***a+***1):** The Euclidean distances are first measured for each couple of subsequent fixed activities and then summed up to obtain the total daily Euclidean distance (*IDD*). Each *Da*,*a+*<sup>1</sup> value is obtained via GIS and then merged for each individual.

Table 5 presents the values of the components discussed above for the individuals A–E. Based on these components, Figure 4 displays the *STA* and *STTP* results. Figure 4 (left side) focuses on *STA* by showing the extension of the *DPPA* and the related *FOS* for each individual. For individuals B–E (who have a wide free-time span available for discretionary activities either in the morning or afternoon), results are divided into two clusters. The first includes the *DPPA* and *FOS* resulting from the time available between fixed activities occurring in different locations (*DPPAfa* and *FOSfa*). The second includes the additional *DPPA* and *FOS* deriving from the free-time span available between consecutive mandatory home stays (*PPAhs* and *FOShs*). Figure 4 (right side) shows the results of *STTP*. Each individual shows the *ITC<sup>k</sup> <sup>a</sup>*,*a+*<sup>1</sup> and *Da*,*a+*<sup>1</sup> segments on the map, plus the overall *IDTC* and *IDD* figures.

**Figure 4. Left**, results of the *STA* computation; **right**, results of the *STTP* computation.


**Table 5.** Components for the calculation of *STA* and *STTP* for individuals A–E.

**Notes:** \* The time span between the ending time of a fixed activity and the starting time of the following one. \*\* Subsequent fixed activities occurring at the same location (i.e., home place) with a free-time span in between. Travels that could be potentially performed between them are considered in the *STA* computation, but not in the *STTP* one. \*\*\* For the values regarding PT, the unit of measure is €/day and not €/km. † *PPA* and *Ow* are equal to 0 because there is less than 10 min at disposal to engage in discretionary activities.

#### *4.3. Discussion of Results*

The results of *STA* and *STTP* are summarised in Table 6. Since they are expressed in different measurement units, they are also converted in percentages (*rSTA* and *rSTTP*). The highest *STA* and *STTP* figures across the five individuals are 100.00%, while the other values are rescaled accordingly. The coefficient of variation (CV) is measured for both *STA* and *STTP* to point out their differences in terms of distribution across individuals. To better understand these differences in distribution, we take into account a set of time, space and transport variables, which have a relevant influence on *STA* and *STTP* (reported in Table 6). These include: the amount of constrained time in a day, i.e., the number of hours/day spent in fixed activities (*CT*); the density of discretionary opportunities available within the *DPPA* of each individual (*OD*); the average speed of the travels linking the subsequent fixed activities of each individual (*AS*); the average kilometric monetary cost (considering the travelled distance) incurred by individuals to travel among their fixed activities (*AKC*); the average travel-time cost to perform daily travels by all modes (*ATC*); the detour effect experienced by each individual expressed as the ratio between the travelled distance and the Euclidean distance (*DE*); and finally the preferred mode(s) of transport used by the individuals for their fixed activities (*PMT*). The first two variables are particularly relevant to explain *STA* results, while the others have a higher influence on *STTP*. The following two

paragraphs first discuss the overall accessibility differences registered by *STA* and *STTP*, and then explore the main reasons for these differences for each individual.


**Table 6.** Results of *STA* and *STTP* for individuals A–E.

**Overall accessibility differences:** Accessibility differences measured by *STA* are sensibly higher than those registered by *STTP* (CV equal to 153% and 19%, respectively). This is consistent with the different approaches of the two measures. The primary purpose of *STTP* is to isolate the individual accessibility differences that are directly related to the performance of the transport system by excluding other variables not affected by changes in the transport system. Accordingly, *STTP* varies depending on the travel-time and monetary costs incurred to travel between fixed activities (*ATC* and *AKC*). Moreover, it depends on the level of the detour (*DE*), since travel costs are weighed according to the Euclidean (and not travelled) distance among fixed activities. Accordingly, the individuals with the lowest *rSTTP* (B and C) register the highest average time cost (*ATC* = €0.18/min for individual C), monetary cost (*AKC* = €0.42/km for Individual B), detour effect (*DE* = 182% for individual B), and one of the lowest average speeds between fixed activities (*AS =* km11/h for individual C). In contrast, *STA* considers a wider range of factors, including the availability and distribution of discretionary opportunities across space and the individual amount of constrained and free time on a daily basis. These two factors play a crucial role in determining the higher differences registered by *STA*. Indeed, the three individuals scoring the lowest *STA* (E, A and C) also have the highest amount of daily constrained time (*CT* equal to h23/day and h20/day for individuals A and E); and the lowest density of discretionary opportunities within their *DPPAs* (*OD* equal to 14.9/km<sup>2</sup> and 31.3/km<sup>2</sup> for individuals E and C).

**Accessibility differences at the individual level:**


corresponds to about 50% of their daily travel time. This has great impacts on their *AS* and *ATC*.


#### *4.4. Added Value of STTP for STA and Its Limits*

According to the developed methodology and test, the main added value of *STTP* for *STA* consists in its capacity to focus explicitly (and exclusively) on the performance of the transport system. Although this allows *STTP* to point out the accessibility differences that are strictly linked to the transport system, it also makes individual differences less evident (as demonstrated by the test results). Indeed, other aspects such as the daily amount of free time, the distance daily travelled, and the availability of discretionary opportunities are excluded from the computation. This confirms that *STTP* should be seen as a complementary (and not alternative) measure to be used together with *STA* to deal with the broad topic of accessibility equality. On the one hand, *STA* measures individual accessibility from a broad perspective by taking into account spatial, transport, individual and temporal factors and by holding the potential dimension of accessibility. On the other hand, *STTP* narrows the focus down, by isolating the performances of the transport system in allowing individuals to carry out their schedule of daily fixed activities.

Beside this key added value of *STTP* for *STA*, it is important to mention also some limits of *STTP* that need to be addressed in future applications. First, *STTP* results highly depend on the estimation of *VTT*, which has to be as accurate as possible and performed at the individual level. However, to estimate *VTT* at the individual level with the wage rate method, personal income data has to be collected. This is not always feasible since people tend to be not willing to answer income-related questions [53]. An alternative may be to estimate *VTT* through stated and revealed preference methods [44]. However, this approach is very data demanding and time-consuming and it requires the availability of individuals to answer a lengthy questionnaire that should include (a) preliminary socioeconomic and demographic questions; (b) the travel diary for *STPA* computation; and (c) stated-preference questions to estimate *VTT*. Second, regarding the test, there are some

computational limits to be refined. First, the opening/closure time of the discretionary opportunities should be integrated to properly select those to be counted in *STA*. Second, discretionary opportunities could be weighed according to the importance assigned by individuals to different categories of opportunities such as groceries, shopping facilities, or post/bank offices. However, the inclusion of these elements in the computation highly depends on the data at disposal in the study area and the availability of individuals to answer longer questionnaires. Third, the walking speed assigned to individuals (km5/h in our test) could be differentiated in specific cases, e.g., for elderly people with physical hindrances that make them walk slower. These limits need to be addressed to highlight precisely accessibility difference issues at individual level (e.g., refs. [54–56]).

#### **5. Conclusions**

Considering that many accessibility differences are unavoidable and irremediable by transport policies [1], and that *STA* measures tend to incorporate a broad variety of factors, this study has introduced a transport performance measure isolating the role of the transport system in individual accessibility differences. As suggested by the test, the results obtained by *STTP* may be sensibly different from those of *STA*. On the one hand, individual differences tend to be smaller, according to the more narrowed and transport-specific approach of *STTP*. On the other hand, the individuals registering the highest *STA* do not necessarily correspond to those with the highest *STTP* and vice versa. Indeed, an individual might register some transport difficulties in running their daily schedule but at the same time have enough time to engage in several discretionary activities (like Individual D in Section 4). Even the opposite may apply: a person could easily reach their usual daily destinations but have little free time and few surrounding amenities so as to register a low space-time accessibility (like Individual E in Section 4).

These results highlight the complementary value of *STTP* for *STA*. Indeed, researchers and policymakers might gain relevant benefits from the combined analysis of these two measures since they may evaluate both the overall and transport-specific impacts of various transport policies on individual accessibility differences (e.g., refs. [57–59]). This would be particularly relevant to assess transport policies that are expected to trigger controversial impacts on individual accessibility differences. For instance, in the case of high-speed railways, autonomous vehicle applications and transport sharing services, which are often found to increase accessibility in general, but also the accessibility differences across (groups of) people (e.g., refs. [57,59,60]). At the same time, *STTP* might be used to discuss the individual accessibility impacts of growing mobility trends, such as the usage of individual slow mobility solutions (such as e-scooters), which are progressively replacing walking travels, especially for the first- and last-mile.

To prove the suitability of the combination of *STA* and *STTP*, it is necessary to refine the estimation of user transport costs on the one hand and extend the application of this methodology to a broader case study on the other. The former challenge is mainly related to reliable estimation of the value of travel time (*VTT*). This study relied on the wage rate method, given its popularity and ease of application. However, more articulated estimation approaches could be used (such as stated and revealed preferences) to differentiate better *VTT* values depending on, e.g., the travel purposes, length of the travel and transport modes [61,62]. The latter challenge implies collecting *STPA* information from a substantial sample of individuals and applying sound statistical distribution analyses to the results. The first aspect is problematic for many studies using space-time measures, and it is one of the main practical reasons for the low usage of this kind of measure compared with, e.g., gravity-based measurements [15]. To overcome this limit, recent studies have deployed multi-stage stratified random sampling approaches, reducing the survey sampling rates while maintaining high accuracy [12]. The usage of statistical distribution analyses is problematic for a conceptual reason, since they imply the focus on one or more equity typologies [63]. For instance, calculating the coefficient of variation (CV) over the whole sample is closely linked to an egalitarian point of view, since the CV measures how spread

the distribution of a good over a population is. Conversely, calculating the percentile ratio (PR) by comparing the accessibility values scored at the median (50%) with those scored at e.g., 10% is linked to the vertical-equity point of view, since it focuses on the individuals with the lowest accessibility values. To guarantee a scientifically solid distributional analysis, future applications of *STTP* and *STA* to a broader case study would require a methodological effort in selecting a broad set of statistical distribution analyses to apply to the results.

Despite these challenges, *STTP* provides an added value for *STA*, and it may be deployed to complement the analysis of individual accessibility and of the distributional implications of transport policies, which represent an increasingly relevant priority of policymakers and transport planners.

**Author Contributions:** Conceptualization, A.D., G.H.; methodology, A.D., G.H.; software, M.G.; validation, A.D., M.G.; formal analysis, A.D., M.G.; investigation, A.D.; resources, A.D.; data curation, M.G.; writing—original Draft Preparation, A.D.; writing—review and editing, A.D., M.G., G.H.; visualization, A.D., M.G.; supervision, G.H.; project administration, G.H.; funding acquisition, A.D., M.G., G.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in whole or in part by the Austrian Science Fund (FWF) [I 5224-G Internationale Projekte]. For Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission. The research leading to these results has received funding from the Department of Innovation, Research and University of the Autonomous Province of Bozen/Bolzano, within the framework of the FWF Joint Projects between Austria and South Tyrol, under Project number I 5224-G. The APC was funded by the Austrian Science Fund (FWF) [I 5224-G Internationale Projekte].

**Data Availability Statement:** Data used and developed in this study will be uploaded to an institutional repository in case of approval. In that case, details for data access will be provided in this section.

**Acknowledgments:** Open Access Funding by the Austrian Science Fund (FWF). The authors warmly thank Elisa Ravazzoli (Eurac Research, Institute for Regional Development) for her support to the conceptualisation and revision of the article and her contribution within the framework of the RAAV research project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Analytical Model for Enhancing the Adoptability of Continuous Descent Approach at Airports**

**Emad A. Alharbi 1,\*, Layek L. Abdel-Malek 2, R. John Milne <sup>3</sup> and Arwa M. Wali <sup>4</sup>**


**Featured Application: We present an analytical model, using a queueing theory framework, that identifies periods of time for air traffic controllers when they can permit the vast majority of approaching aircrafts to land using Continuous Descent Approach, thereby reducing noise, fuel consumption, and pollution, while enhancing air transportation sustainability.**

**Abstract:** Continuous Descent Approach (CDA) is the flight technique for aircraft to continuously descend from cruise altitude with an idle thrust setting and without level-offs, contrary to the staircase-like Step-down Descent Approach (SDA). Important for air transportation sustainability, using CDA reduces noise, fuel consumption, and pollution. Nevertheless, CDA has been limited to low traffic levels at airports, often at night, because it requires more separation distance between aircraft arrivals and, thus, could decrease throughput. Insufficient attention has been given to helping air traffic controllers decide when CDA may be used. In this paper, we calculate the probability that an aircraft arriving during a particular brief period of time (e.g., 15 min) will need to revert to SDA when the controller tentatively plans to permit CDA for all aircrafts arriving during that time period. If this probability is low enough, the controller may plan to permit CDA during that time period. We utilize an analytical approach and queueing theory framework that considers factors such traffic and weather conditions to estimate the probability. We also provide the number of aircrafts that can be accommodated within the airport's stacking space using CDA. This number provides insight into whether a particular aircraft may use CDA.

**Keywords:** green transport; continuous descent approach; optimized profile descent; climate change; terminal maneuvering area; environmental impact; applied queueing theory; air traffic management; air transportation sustainability

#### **1. Introduction**

The air transportation and aviation industry face several challenges due to projected increases in demand for air travel and freight accompanying limited airspace congestion and airport capacity. The International Air Transport Association expects 7.2 billion passengers to travel in 2035, almost doubling the 3.8 billion air travelers in 2016, with the U.S. as the second-fastest-growing market, after China, with 484 million additional passengers per year forecasted for a total of 1.1 billion passengers [1]. With increased pressure on the infrastructure of terminals, runways, airspace around airports, and air traffic control operations, the industry is struggling to cope with this demand, yet it has to limit the harm that aircrafts cause to the environment through carbon emissions and noise levels.

With regard to aircraft emissions, the U.S. Environmental Protection Agency finalized its determination that greenhouse gas emissions from certain types of aircraft engines,

**Citation:** Alharbi, E.A.; Abdel-Malek, L.L.; Milne, R.J.; Wali, A.M. Analytical Model for Enhancing the Adoptability of Continuous Descent Approach at Airports. *Appl. Sci.* **2022**, *12*, 1506. https://doi.org/10.3390/ app12031506

Academic Editor: Giovanni Randazzo

Received: 31 December 2021 Accepted: 26 January 2022 Published: 30 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

primarily engines used on large commercial jets, contribute to the pollution that causes climate change and endangers Americans' health and the environment [2]. Other countries are taking strict measures to limit emissions from aviation operations at airports by setting penalties for emissions levels above a specified limit. Under the European Union Emission and Trading System, all airlines operating in Europe, European and non-European alike, are required to monitor, report, and verify their emissions and to surrender allowances against those emissions that exceed certain levels from their flights per year [3]. Aircraft noise, on the other hand, is the biggest concern for airport officials at 29 of the 50 busiest U.S. airports [4]. Airports' support personnel who work in proximity to aircrafts idling on the ground or taking off and landing may suffer hearing loss. Residents of communities surrounding airports suffer sleep disorders and interference with speech, both of which may lead to reduced productivity in learning and work. Furthermore, recent studies have linked noise to non-auditory health effects, such as hypertension, heart disease, and stroke [5]. These issues represent critical challenges to air transportation and aviation industry sustainability, development, and prosperity.

The Continuous Descent Approach (CDA), also known as Optimized Profile Descent (OPD), is an advanced flight technique for aircrafts to descend continuously from cruising altitude to the Final Approach Fix (FAF) or touchdown without level-offs and with an idle, or near idle, thrust setting. Descending using the CDA procedure, an aircraft can stay as high as possible for a longer time than with a conventional descent, thereby expanding the vertical distance between the aircraft's sources of noise and the ground and thus, significantly reducing the noise levels for populated areas near airports. Furthermore, by descending with an idle, or near idle, engine setting, fuel burn is decreased, resulting in the reduction of fuel consumption and harmful emissions to the environment. A study that conducted flight trials of CDA at Kentucky's Louisville International Airport using an aircraft fleet of the United Parcel Service (UPS), an express package delivery company, quantified the benefits of CDA in terms of fuel savings by 400 lb. to 500 lb. per flight and noise level by 3.9 A-weighted decibels (dBA) [6]. Another study conducted at San Francisco International Airport estimated a reduction of CO2 emissions of between 700 lb. and 10,000 lb. per flight with CDA flights [7]. To cut down on aircraft emissions, airplane manufacturer Airbus recently has been working on unique ways used by birds and emerging concepts like tandem flying that could reduce fuel burn by up to 10% [8]. When compared with the widely used Step-down Descent Approach (SDA), in which the arrival aircraft descends in a step-like fashion, CDA reduces flight time by around two minutes [9]. FedEx, another express transport and delivery company with one of the largest civil aircraft fleets in the world, has been using CDA at their World Hub, Memphis International Airport. Their use of CDA at Memphis reduced flight time by 2.5 min for each flight, and this translated into cost savings of \$105 million based on their field study from 2006 to 2009 [10].

These operational, economical, and environmental benefits from CDA procedures have made it a cornerstone in some aviation modernization programs at the national (e.g., FAA's Next Generation Air Transportation System, "NextGen"), continental (e.g., EU's Single European Sky Air Traffic Management Research, "SESAR"), and international (e.g., United Nations' International Civil Aviation Organization, "ICAO", Continuous Descent Operations, "CDO", initiative) levels. Although considered as an effective Noise Abatement Procedure, CDA is not widely implemented, especially during high density operations [11,12]. Due to safety considerations [13–15], CDA procedures may require more separation between aircraft arrivals, which may affect the airport arrival rate and runway throughput [16,17]. The larger separation spacing for a CDA aircraft is mainly due to two reasons: the difficulty for air traffic controllers to predict the future position of an aircraft with significantly variable speed [6] and the inability of the pilot to quickly decelerate during descent [18]. Although CDA has been proven to be feasible and without increasing the required spacing between aircraft under light traffic conditions, such as night-time operations [6], aircrafts flying CDA are most likely to be spaced further apart under heavy traffic conditions.

Thus, CDA implementation has been limited to low to moderate traffic levels. During these low traffic conditions, CDA has been used at more airports. To increase the use of CDA, several studies in the literature have used various approaches, such as simulation [19], mathematical modeling [15,20], and flight trials [6,14], to quantify CDA's benefits and/or suggest solutions to the problem of increasing CDA's usage at airports through the analysis of sequencing and merging [21], merging and spacing [18], scheduling and conflict detection and resolution [16], time and aircraft energy management during descent [22], fuel and flight-path management [23,24], and ground-to-air air traffic network vulnerability [25]. Other literature has applied quantitative methods to improve aircraft operation [26,27]. However, insufficient attention has been given to developing a quantitative measure to enable air traffic controllers to make informed decisions on safely accepting more CDA operations.

The contribution of this work is the development of a model that addresses this gap in CDA research and that helps air traffic controllers determine brief periods of time (e.g., 15-min periods) in which the vast majority of arriving flights may land using CDA. These time periods are based on time of aircraft arrival into the TMA. An aircraft entering the TMA during one of those time periods may begin its continuous descent upon entering the TMA while completing its continuous descent during a later time period, in which then-arriving aircrafts are no longer using CDA. In fact, although the time to descend using CDA depends on several factors (e.g., aircraft weight), it may be longer than 15 min. Data from previous authors [28,29] implies the time to descend using CDA may be 20–30 min.

Special attention is dedicated to factors that have a significant impact on CDA implementation, such as airspace structure around airports, airport arrival rate, and distance requirements for longitudinal separation between approaching aircrafts. Analyzing airspace structure around an airport offers a systematic way of developing an analytical model that adequately captures the elements associated with descent and approach procedures.

In particular, we calculate the probability that an aircraft arriving during a particular brief period of time will need to revert to SDA under the initial modeling assumption that the controller will permit CDA for all aircraft arrivals during that period. If this calculated probability is low enough, the controller may be comfortable in planning to permit CDA for all arriving aircrafts during that time period, and otherwise, they will not permit any of them to use CDA. Our model utilizes an analytical approach and queueing theory framework that considers factors such traffic and weather conditions to estimate the probability. The non-queueing portion of our modelling provides the number of aircrafts that can be accommodated within the airport's stacking space when CDA is used for all aircrafts arriving during the time period. This number provides insight to the controller on whether to permit a particular aircraft to use CDA during the period when all (or nearly all) arriving aircrafts will be permitted to descend with CDA (due to the low probability an aircraft will need to use SDA instead). Through the use of this modeling, it is our hope that CDA will be used more often and, thus, reduce noise, fuel consumption, harmful emissions and, thus, provide greener and more sustainable air transportation operations. This paper should draw attention to the opportunity to systematically increase the use of CDA for aviation green operations and air transportation sustainability.

The paper is organized as follows. Section 2 describes the airspace around airports with respect to the terminal maneuvering area and describes descent and approach operations. In particular, two approach operations (CDA and SDA) are discussed and compared. In Section 3, we present the adoptability of CDA and the factors that impact aircraft landing time; the Base of Aircraft Data (BADA) Aircraft Performance Calculation (APC) is estimated and validated against actual landing times of flights operated at Nashville International Airport (BNA). Section 4 presents a background on the queuing theory and the main assumptions and fundamental components used to develop our model, while Section 5 presents the concept of the probability of an aircraft being denied CDA entry as a key output of the queuing model. In Section 6, we illustrate the calculation of the model probabilities

using standard industry data (e.g., separation distance requirements) and actual flight data from the BNA airport. Finally, Section 7 presents our main findings and conclusions.

#### **2. Preliminaries and Process Description**

We begin this section by describing the airspace around airports, then generally describe aircraft descent and approach operations at airports, and we conclude with a comparison between the two most-commonly used descent approaches: CDA and SDA.

#### *2.1. Structure of Airspace around Airports*

Terminal Maneuvering Area (TMA). TMA refers to the designated area of airspace managed by air traffic control services around major airports that have high volumes of traffic. Normally, TMA airspace is designed in a cylindrical configuration, including all altitudes centered around the geographical coordinates of the airport. Geographical positions that define the boundaries of the TMA, known as entry fixes, are considered as entry points to the TMA (although each fix includes all altitudes and thus may be conceptualized as a line), and the arriving aircraft enters the TMA airspace via entry or arrival fixes. When crossing the TMA boundary over one of these entry fixes, the responsibility for the separating aircraft will be handed over usually from a controller at the air traffic control center responsible for separating the en route aircraft (e.g., Air Route Traffic Control Center "ARTCC") to a controller at the air traffic control center responsible for separating the aircraft approaching an airport (e.g., Terminal Radar Approach Control "TRACON"). A simplified structure of a TMA is illustrated in Figure 1.

**Figure 1.** Typical structure of a TMA (top view).

As an arriving aircraft nears an entry fix, the air traffic controller may clear the pilot for the approach or, depending on traffic congestion and the separation and sequencing method used, may place the aircraft in a holding pattern. While aircraft separation aims to have the controller apply and maintain the separation distance requirements between

aircrafts for safety purposes, aircraft sequencing aims to have the controller organize a stream of aircrafts to provide an orderly sequence of continuous traffic flow towards the final approach path. In practice, there are a number of aircraft-sequencing methods for approach traffic management, but generally, all of them fall under two broad categories: procedural control (published procedures with altitude change and speed instructions) and radar vectoring (controller-generated instructions in terms of headings, altitude, and speeds to optimize traffic flow in order to maximize the number of aircrafts with the least average delay). Today, radar vectoring is one of the main methods to achieve efficient sequencing for aircrafts flying towards the final approach path.

Once an aircraft has been cleared by the controller to approach the airport or to leave a holding pattern, the aircraft approaches the merging fix. However, as the aircraft approaches the merging fix, it flies in the stacking space, the space that the controllers use from the available terminal airspace to stack arriving aircrafts, that is, orderly align aircraft arrivals for approach. In the stacking space, the controller manages air traffic and enhances airspace capacity by stacking arriving aircrafts using techniques such as minimal speed adjustments and path-stretching. This efficient management of air traffic flow enables the controller to bring together aircrafts that have crossed entry fixes from different directions to be stacked and merged at the merging fix. For instance, an aircraft may enter the TMA through one of about 12 entry fixes and then proceed to one of about four merger fixes as they get closer to the airport. The merging fix provides a transition for arriving aircrafts from the stacking space to approach, as it connects traffic from different directions into one merger fix to create one stream of aircraft arrivals to follow a standard published arrival procedure. This way, arrivals from several directions can be accommodated, and traffic flow is efficiently managed within a congested airspace. To safely merge arriving aircrafts, the controller synchronizes aircrafts based on joining window time on the air route leading to the merging fix considering sufficient spacing for other aircrafts to fit into the air traffic stream and while maintaining, at least, the minimum required separation between aircrafts.

#### *2.2. Description of the Aircraft Descent and Approach Process at Airports*

In this subsection, we first describe the aircraft descent operations. Then, we present SDA and finally introduce CDA and compare it with SDA.

#### 2.2.1. Descent and Approach Operations

Aircraft descent could be initiated to attain an optimal profile from the cruise altitude all the way down to landing to minimize fuel burn, emissions, and noise exposure. However, due to Air Traffic Control (ATC) restrictions and aircraft performance limitations, an optimal descent profile may not be attained all the time. For an aircraft operating at typical cruise altitudes, descent will normally initiate at 100 to 130 nautical miles (nmi) from the destination airport. This distance varies primarily due to ATC service restrictions, aircrafts' equipment and performance capabilities, and weather conditions. The controller may issue crossing restrictions during the descent, as part of a Standard Terminal Arrival Route (STAR) or as a requirement for traffic sequencing. These crossing restrictions are generally issued to the cockpit crew in terms of altitude over a fix, and they may include a speed restriction as well [30].

A stabilized descent requires minimum control adjustments by the pilot in maintaining the planned descent path; more specifically, excessive corrections or control inputs indicate that the descent was improperly planned. Thus, planning the descent from cruise altitude is important because descending early results in more of the flight at a low altitude with increased fuel consumption and noise impacts, and starting the descent late results in problems with controlling both airspeed and descent rates later in the approach phase.

Prior to flight, pilots need to compute the fuel, time, and distance required to descend from the cruising altitude to the approach gate (an imaginary point used by the controller to provide headings (i.e., vectoring) for aircraft arrivals to the final approach course), with the objective of determining the most economical distance from the airport to begin descent. This distance is referred to as the Top of Descent (TOD) point. The computations for the TOD point could be done manually prior to flight or automatically during flight using the Flight Management System (FMS). Conversely, in flight prior to the descent, pilots plan the descent from cruise by reviewing and verifying landing weather to include winds in their consideration, since tempestuous weather at the landing airport can cause slower descents. Furthermore, pilots need to know the cruise altitude and approach gate altitude (otherwise known as the initial approach fix (IAF) altitude), descent rate, and ground speed during descent.

Based on aircraft performance, approach constraints, aircraft weight, and weather data (such as winds, temperature, and icing conditions), the vertical component of the flight plan, which referred to as the Vertical Navigation (VNAV), is computed. Usually, the VNAV approach is computed from the TOD point down to the waypoint at which the descent ends, which is generally the runway or the Missed Approach Point. There are only two types of VNAV paths that the FMS uses: the performance path or geometric path. The performance VNAV path is computed using an idle or near-idle thrust from the TOD point to the first constrained waypoint, which is constrained by speed and/or altitude and represents a typical CDA. The geometric VNAV path is computed from point-to-point between two constrained waypoints or when a vertical angle is assigned, which may represent a typical SDA, as it is shallower than the performance VNAV path and is typically using a non-idle thrust. Detailed descriptions of SDA and CDA are presented in the following subsections.

#### 2.2.2. Step-Down Descent Approach (SDA)

In air navigation, if the aircraft flies under Instrument Flight Rules (IFR), which represents a set of rules governing the navigation of aircraft using instruments, then the instrument approach procedures (IAP) must be followed. The IAP consists of four approach segments along the aircraft flight path, namely the initial, intermediate, and final approach and, as a backup plan to use if needed, a segment for a missed approach. Typically, the initial approach segment starts at the en route (i.e., cruise) altitude from an IAF and ends when the aircraft joins the intermediate approach segment, where the later ends at the final approach fix (FAF).

SDA is the conventional arrival procedure that pilots and air traffic controllers have been accustomed to for many years. In SDA, an aircraft begins its initial descent at the TOD point and continues descending gradually in a series of steps along the descent path. This step-down descent occurs because the aircraft descends over a stair-like path from the current altitude to a new altitude, due to the controller instructions and/or airspace constraints. During the SDA, the aircraft gradually levels-off by transitioning from the initial to the intermediate to the final approach segments through predefined fixes that indicate the start and end of each approach segment. To fly from the fix that marks the end of the previous approach segment to the fix that marks the subsequent one, the aircraft must increase speed by employing thrust to maintain altitude [31]. Figure 2 illustrates the SDA profile and the approach segments of the IAP.

SDA also requires communication between the pilot and controller to inform and authorize air movement, which means more workload on both the aircrew and controller during a critical phase of flight that requires situational awareness and additional concentration. Once an aircraft has reached the fix or waypoint that marks the end of the previous approach segment and marks the subsequent one at the new altitude assigned by the controller, the pilot needs to utilize engine thrust to maintain altitude and prepare for further instructions from the controller with respect to approach. Air traffic may be expedited during periods of high demand at airports when using SDA through radar vectoring; however, the utilization of engine power increases fuel burn, which, in turn, increases emissions and noise levels at lower altitudes [31].

**Figure 2.** The vertical profile of SDA based on the IAP and approach segments.

#### 2.2.3. Comparison between CDA and SDA

In this subsection, we provide a comparison between CDA and SDA from an operational perspective.

Considering aircraft approach speed, if a pair of aircrafts are approaching an airport for landing heading for the same runway, both aircraft approach speeds may not be the same when CDA is used, even with the same aircraft type. This is due to the fact that during descent, pilots make efforts to stabilize their approaches by controlling and balancing several parameters such as rate of descent, approach speed, thrust, and the aircraft's attitude. With CDA, landing is conducted with idle thrust as the aircraft approach speed decreases just before touchdown [32]. With SDA, the pilot utilizes thrust and adjusts speed more frequently along the descent path, and the aircraft approach speed increases just before touchdown. This comparison is illustrated in Figure 3 below. Table 1 highlights some of the differences between CDA and SDA operations.

**Figure 3.** The vertical profile of CDA compared with SDA.


**Table 1.** Summary of the Differences between CDA and SDA.

#### **3. CDA Adoptability and Aircraft Descent Times**

This section introduces the factors that influence our model and then estimates and validates the time aircrafts take to land under CDA and SDA operations. The estimation and validation are essential for developing our model. In our model, we assume that during a given brief period of time, say 15 min or 30 min, for example, the air traffic controller will enable all arriving aircrafts to use CDA or not permit any aircrafts to use CDA. This assumption simplifies the controller's duties and avoids the complexity associated with a significant number of aircrafts within the TMA using CDA and a significant number using SDA. That being said, it remains possible for the controller to plan on permitting all aircrafts arriving in the brief period of time to use CDA, if suggested by the model, and yet on a case-by-case basis, decide whether to deny CDA for a particular aircraft. The case-by-case basis analysis is beyond the scope of our queueing model, but it is considered by the maximum number of aircrafts that may reside in the stacking space at any given time, as we determine much further below.

#### *3.1. Factors Impacting CDA Adoptability and Aircraft Descent Time*

Before we present the details of our model [33], we briefly discuss the concept of acceptance, and rejection, in the context of landing operations at an airport and, particularly, with CDA operations. In general, our queueing model assumes that at a given airport, the air traffic controller will either accept all CDA requests from aircraft arrivals to approach and land using CDA over a specified brief time period, say 15 min, or reject them all. Before the queueing model details are calculated, we first calculate the CDA Adoptability Factor (CDA\_AF). This factor is a function of *λss*, the average arrival rate of the aircraft that requests CDA at an airport (which our model assumes is all the aircrafts arriving at

the TMA's stacking space due to the advantages of continuous descent), and *AAR*, the Airport Arrival Rate, which is defined as the dynamic parameter that specifies the number of arriving aircrafts that an airport can accept during any consecutive 15-min period of time [34]. As shown from the equation below, CDA\_AF represents the ratio of *λss* to *AAR*:

$$\text{CDA\\_AF} = \frac{\lambda\_{\text{ss}}}{AAR} \tag{1}$$

If the value of CDA\_AF during the brief time period is high (e.g., over 100%), then there is no need to continue the analysis. It is obvious in that case that CDA will not be permitted during that time period. Conversely, if the value of CDA\_AF is low (e.g., under 10%), then it is obvious that CDA will be permitted during the time period. The queueing calculations are performed only for those periods of time when the value of CDA\_AF does not make it obvious whether or not CDA should be permitted during the time period.

#### Factors Impacting CDA Adoptability and Aircraft Descent Times

There are a number of factors that could impact the nature of aircraft arrivals at airports. Such factors could be operational, meteorological, planning, technological, or related to airspace structure and procedures design. We discussed some of these factors in Section 2 and discuss others briefly in the following subsections. Technology factors (e.g., the level of Air Traffic Management automation at an airport) are beyond the scope of this work. Other factors, however, such as traffic at neighboring airports and wind speed and direction, can be managed by reducing aircraft stacking space and increasing the minimum separation distance between aircrafts.

The Airport Arrival Rate (AAR) states the hourly capacity of airplane arrivals at an airport, and thus, it is critical to our model.

The aircraft fleet mix, or more generally fleet mix, refers to the ratio of various aircraft types that, based on wake turbulence categories, make up the total arrival traffic that operates at an airport. Fleet mix is essential in airport planning to determine the likely average landing speed and separation requirements on final approach, which are important factors that affect the AAR and, in turn, our model. Generally speaking, and from the perspective of runway capacity, which is defined as the expected number of landings that can be performed per hour on a runway, a relatively homogenous fleet mix, consisting of two dominant aircraft classes, is more favorable than a heterogeneous fleet mix.

The aircrafts' separation requirements determine the maximum number of aircrafts that can navigate each part of the airspace or can use a runway system per unit of time. The separation requirements for an aircraft landing on the same runway specify the minimum separation in longitudinal distance, or time, that must always be maintained between two aircrafts operating consecutively on the runway. These requirements are also specified for every possible pair of classes and every possible sequence of movements [35]. Table 2 exhibits the ICAO's minimum wake turbulence separation standards [36], and apparently, the larger the separation required, the lower the AAR. Furthermore, the more heterogeneous the fleet mix at an airport, the more influence there will be on AAR and our model. These separation distances are based upon SDA being used. If CDA is used, then the minimum separation distances are the same when the leading and trailing aircrafts are from the same weight turbulence category but longer than with SDA when the aircraft weight classes differ. Furthermore, the air traffic controller will be inclined to use longer minimum separation distances with CDA because of the greater challenge of controlling aircrafts using CDA.


**Table 2.** ICAO Minimum Wake Turbulence Separation Standards.

Among the usually considered weather conditions at airports, such as cloud ceiling and visibility, wind speed and direction are the most influential conditions on ATM operations in general and on approach operations, in particular. The two components of the wind, headwinds and tailwinds, have a significant impact on AAR. In fact, wind speed and direction dictate the availability and orientation of runways at any given time. Adverse wind conditions can reduce AAR due to the increased complexity of merging arrival traffic streams and separating aircrafts as they descend and change heading under intense or varying winds. Specifically, winds aloft may result in a phenomenon called compression, in which the separation between pairs of arriving aircrafts decreases rapidly as they descend to the final approach [37]. The results from applying our model indicate it captures the effect of wind speed.

In general, airport and airspace constraints refer to limitations that hinder airport capacity by creating difficulties for arrival aircrafts, largely due to airspace consideration. Often, such constraints are contingent on the original airspace design, which gradually became less efficient due to increasing demand and fluctuating traffic patterns, or airspace redesign, which necessitates consideration of nearby restricted airspace. On the other hand, a restricted airspace, which is an area of airspace typically used by military operations, could be close to an airport and would impose a specific airspace design that affects the pattern of the arriving aircraft. Other airspace constraints include the topographical nature and terrain (e.g., an airport close to a mountainous terrain). Airspace constraints, collectively as a single factor, are beyond the scope of this work. While estimating the effect of this factor is beyond the scope of this work, our model's behavior reflects its ability to capture such effect.

Growth in air traffic at airports within close geographical proximity likely will create congestion, especially if these airports are in a large, busy metropolitan area. The impact of air traffic at neighboring airports comes from systems of airports commonly referred to as *metroplexes*. Operationally, air traffic that flows into and out of airports within a metroplex airport system needs to be coordinated between airports in such systems to maintain efficient air traffic and individual airports' throughput, while contributing little (if any) impact to the AAR of an airport over another in such a system [38,39] and, therefore, limiting or even preventing the use of our model. The FAA is having ongoing efforts to accommodate CDAs within metroplexes, with plans to deploy during a later phase of NextGen.

#### *3.2. Estimation of Aircraft Descent Time*

In this section, we estimate the time an aircraft takes to descend, starting from the TOD point at cruise altitude down to the runway, under CDA and SDA operations, using version 3.11 of Base of Aircraft Data's (BADA) Aircraft Performance Model (APM) [40]. Estimating aircraft descent time under the two distinct approach operations is a fundamental step towards developing our model.

BADA is an APM developed and maintained by the European Organization for the Safety of Air Navigation, commonly known as EUROCONTROL, through active cooperation with aircraft manufacturers and operating airlines. To estimate aircraft landing time at airports using BADA, we used BADA's web-based Calculation Tool, the Aircraft Performance Calculation (APC), to calculate aircraft performance for the descent phase of flight. Essentially, BADA's application software provides access to an online implementation of BADA APM, which consists of a database of aircraft operational performance files and formulas derived from the Total-Energy Model that EUROCONTROL relied on to model aircraft performance in categories such as aircraft, aerodynamics (e.g., drag), and engine thrust [41], as follows:

#### 3.2.1. Aircraft Velocity and Lift Model

For a straight-and-level flight at cruise altitude, the aircraft speed (velocity) is given by

$$\mathbf{V\_{TAS}} = \mathbf{a\_o} M\_{cruise} \sqrt{\frac{T}{T\_o}} \tag{2}$$

where VTAS is aircraft's true airspeed (TAS) in nautical miles per hour (knots), ao is the speed of sound at sea level in knots, *Mcruise* is the aircraft's Mach number at cruise altitude, and *T* and *To* are the temperatures at cruise altitude and at sea level, respectively. The lift coefficient, *CL*, can be calculated using the classical formula for the lift force, *L*:

$$L = \mathbb{C}\_L \frac{1}{2} \rho V^2 \text{S} \tag{3}$$

where *ρ* is the density of air in kilograms per meter cubic, *V* is the aircraft speed in meters per second, and *S* is the aircraft's wing area in square meters.

In cruise flight, the lift force, *L*, in Newtons, may be assumed to be equal to the aircraft's weight in kilograms, *m*. Combining this relationship with Equation (3) and rearranging terms results in

$$\mathcal{C}\_L = \frac{2mg}{\rho V^2 S} \tag{4}$$

where *g* is the acceleration due to the earth's gravity. Assuming a no-wind scenario and that the flight path's angle in degrees is γ, then the relationship between ground speed and true airspeed is given by

$$\mathbf{V\_{ground}} = \mathbf{V\_{TAS}} \cdot \cos \chi \tag{5}$$

#### 3.2.2. Drag Model

Drag is the aerodynamic force acting on an aircraft body in terms of air resistance to aircraft motion through air. Similarly, to the lift force, the aerodynamic drag, *D*, is the product of the dynamic pressure and drag coefficient, as follows:

$$D = \mathbb{C}\_D \frac{1}{2} \rho V^2 \mathbb{S} \tag{6}$$

The drag coefficient is given by the sum of zero-lift, *CDo*, and induced drag, *CDi*, coefficients, where the latter is a quadratic function of the lift coefficient, as follows:

$$\mathbf{C}\_{D} = \mathbf{C}\_{Do} + \mathbf{C}\_{Di}\mathbf{C}\_{L}^{2} \tag{7}$$

Typically, *CDo* and *CDi* are functions of the aerodynamic configuration of the aircraft flight phase. Generally, drag coefficients are functions of the aircraft's Mach number and the Reynolds number (*Re* = *ρVL*/*μ*, where *μ* is the absolute viscosity coefficient of air). For each aerodynamic configuration, BADA models these coefficients as constants to provide computations for altitude and speed profile thresholds at pre-determined flight phases (i.e., takeoff, initial climb, clean, approach, and landing).

3.2.3. Thrust Model

BADA uses a general formula to calculate the maximum climb thrust, *Thr*max,climb, at a standard atmosphere for three different types of engines: jet, turboprop, and piston engines. For jet engines, the general equation is given as

$$Thr\_{\text{max,climb}} = \mathcal{C}\_{\text{Tc},1} \times \left(1 - \frac{H\_p}{\mathcal{C}\_{\text{Tc},2}} + \mathcal{C}\_{\text{Tc},3} \times H\_p^2 \right) \tag{8}$$

Since BADA uses this maximum climb thrust for both take-off and climb phases, the descent thrust is then calculated from the maximum climb thrust using adjustment coefficients for cruise, approach, and landing configurations [41], respectively, as follows:

$$\text{Thr}\_{\text{des,low}} = \text{C}\_{\text{Tdes,low}} \times \text{Thr}\_{\text{max,climb}} \tag{9}$$

$$\text{Thr}\_{\text{des,app}} = \text{C}\_{\text{Tdes,app}} \times \text{Thr}\_{\text{max,climb}} \tag{10}$$

$$Thr\_{\rm des,ld} = \mathcal{C}\_{Tdes,ld} \times Thr\_{\rm max,climb} \tag{11}$$

where *CTc*,1, *CTc,*2, *CTc,*3, *CTdes,*low, *CTdes,*app, and *CTdes,*ld are aircraft-specific coefficients, and *Hp* is the geo-potential pressure altitude, in feet. The rate, in feet per minute, at which an aircraft's altitude changes with respect to time when descending and approaching the runway for landing is the Rate of Descent (ROD). ROD is given by

$$ROD = \frac{dh}{dt} = \frac{(Thr\_{des} - D)V\_{TAS}}{mg} - \frac{V}{g}\frac{dV}{dt} \tag{12}$$

where *dV/dt* is the aircraft's vertical speed, in feet of descent per minute. Given that the typical target of a flight path is about 3 degrees, the flight path angle, *γ*, in degrees for a 3-degrees flight over the descent path is

$$\gamma = \sin^{-1}(\frac{ROD}{V\_{app}}) \tag{13}$$

where *Vapp* is the aircraft approach speed, in knots. The distance, in nautical miles, that the aircraft covers over the descent path is given as follows:

$$\text{Distance} = \frac{(\Delta h \div 100)}{\gamma} \tag{14}$$

where Δ*h* is the difference between the altitude that the aircraft is currently flying at and the altitude that the aircraft will descend to, in feet. Finally, the time, in minutes, that the aircraft takes to descend and land can be estimated by dividing the difference in altitude, in feet, by the rate of descent, in feet per minute, as follows:

$$\text{Aircraft Landing Time} = \frac{\Delta h}{ROD} \tag{15}$$

#### *3.3. Evaluation of Aircraft Estimated Descent Time*

To evaluate the calculations outputs of BADA APM, Figure 4 shows a comparison between the estimated landing times computed by BADA APC and actual landing times for an aircraft with CDA operated at Nashville International Airport (BNA) on 17 June 2015. There is a slight variation observed across the compared values between the estimated landing times and actual landing times. For example, with a CRJ9 aircraft that has estimated and actual landing times of 38 min and 27 min, respectively, there is an error of almost 29%, while for a CRJ7 aircraft with estimated and actual landing times of 20 min and 19 min, respectively, there is an error of 5.3%. On average, BADA APC have estimated the landing time for an aircraft with CDA operations to be 20 min. When compared with the actual

average landing time for an aircraft with CDA at BNA airport, which is 21 min, an error of 4.7% was generated from this estimation.

**Figure 4.** Evaluation of aircraft landing times with CDA at BNA airport.

Similarly, Figure 5 shows a comparison between the estimated landing times computed by BADA APM, using BADA APC, and actual landing times for an aircraft with SDA operated at BNA. It shows that for a B737 aircraft with estimated and actual landing times of 32 min and 36 min, respectively, BADA APC produced an error of about 11%, with an error of about 8% for an E135 aircraft with estimated and actual landing times of 39 min and 36 min, respectively. However, there are SDA instances where BADA APC was able to match the estimated landing time with the actual landing time, such as with the MD88 aircraft, or provide close to a match, such as with the FA50 aircraft. On average, BADA APC have estimated the landing time for an aircraft with SDA operations to be 21.7 min. When compared with the actual average landing time for an aircraft with SDA at BNA airport, which is 24 min, an error of 9.6% results from this estimation.

**Figure 5.** Evaluation of aircraft landing times with SDA at BNA.

#### **4. Model Development**

#### *4.1. Background on Queueing Theory*

In this subsection, we briefly discuss the fundamentals of queueing theory. Queues, or waiting lines, are common in people's daily lives. Queueing theory is the field of study within operations research (OR) that concerns the study of queueing models to represent the different types of queueing systems (systems that involve some sort of queue) that appear in real-world applications. Thus, these queueing models are helpful for determining how to operate a queuing system [42].

The basic process of most queueing models is that customers requiring service are generated over time by an input source (also known as a calling population). The arrival pattern by which customers are generated from the input source is statistically defined to accommodate the randomness of the customer arrival pattern. A common assumption is that customers arrive according to a Poisson process, that is, customers arrive at random but at a fixed mean rate, or equivalently, the time between consecutive customer arrivals, that is, interarrival time, follows an exponential distribution. These customers enter the queueing system and form a queue to wait for the required service.

Queues could be infinite or finite according to the maximum permissible number of customers that they can contain. At certain times, a customer is selected from the queue for service according to some defined rule referred to as the queue discipline (usually first-come-first-serve, shortly known as FCFS, or some priority-based rule). The service is then provided to the selected customer by a service mechanism that may consist of one or more service facilities, each of which contains one or more servers. The time elapsed from the commencement to the completion of service for a customer is referred to as the service time. Collectively, characteristics of queueing systems include arrival patterns of customers, service patterns of server(s), the number of servers, system capacity, queuing discipline, and the number of service stages, if more than one service stage exists [43].

A convenient notation for summarizing the basic characteristics of the queueing systems was developed by D. G. Kendall and is known in the literature as the Kendall notation. It follows the notation of (a/b/c), where a = customer arrivals distribution, b = service time distribution, and c = number of servers [44]. For instance, the queuing model (M/D/5) uses Markovian (or Poisson) arrivals (or equivalently, exponential interarrival time distribution), deterministic (constant) service time, and five parallel servers.

Generally, there are three basic measures of performance for queuing systems: the waiting time that a typical customer endures, the number of customers that may accumulate in the queue or system, and the idle time of the servers. Since most queuing systems follow random processes (i.e., stochastic processes), these measures are represented as random variables, and thus, their probability distributions need to be defined. Depending on if the main objective of modeling a queuing system is whether to determine some measure of effectiveness for a given process or to design the optimal system based on some defined criterion, the measures of performance could include the expected number of customers in the system, expected number of customers in the queue, expected waiting time in the system, expected waiting time in the queue, and expected number of busy servers.

Beyond the previously mentioned measures of performance, there is a measure of performance of particular interest that indicates the percentage of time the service facility within the queuing system is being utilized. This measure of performance represents the traffic intensity or utilization factor, which is the expected arrival rate of the customers to the queuing system, divided by the expected service rate, assuming one server in the service facility. If more than one server is available, then the number of servers must be multiplied by the expected service time. The utilization factor is an important performance measure of the queuing system.

#### *4.2. Adopting Queueing Theory to Our Model*

In this section, we introduce the fundamental parameters and essential conceptual elements for developing our model. In our model, aircrafts arriving at the TMA are viewed as customers of a queuing system. The aircraft within the stacking space are modeled as the customers waiting for the service and being served. Aircrafts leaving the queueing system are viewed as customers completing service at a single server.

4.2.1. Assumptions and Parameters of the Model


Our model assumes that the number of aircraft arrivals over the period of time considered follows the Poisson probability distribution. This distribution has high variability and, thus, is likely to lead to the model results being conservative. Moreover, the fleet mix is assumed to be homogeneous, that is, dominated by two aircraft wake turbulence classes. The following parameters represent the fundamental components of our model: the space available to stack aircraft arrivals, the minimum allowable horizontal separation distance between a pair of consecutive same-weight-class aircraft arrivals, and the number of aircrafts that can be stacked for the approach. Figure 6 illustrates these components in our model. In this regard, while stacking space can be viewed as three dimensional, we model it as one dimensional, reflecting the longitudinal separation of stacked aircrafts as if the aircrafts within the space are all positioned in a single line.

**Figure 6.** Parameters of our model for aircrafts approaching airports with CDA.

4.2.2. Capacity of the Stacking Space for Aircraft Arrivals

To maximize airport capacity, especially during periods of high demand, the controller longitudinally aligns and separates approaching aircrafts (i.e., positions arriving aircrafts in the queue) for landing on the same runway in a predetermined airspace, according to a predefined requirement for minimum separation between aircrafts that typically operates under IFR [36].

Optimal spacing refers to the efficient implementation of separation requirements by the controller, such that spacing delivers seamless and efficient air traffic control services while maintaining safety. As the controller often emphasizes sequencing (ordering of aircrafts approaching based on their sizes), this should not be the case with CDA operations. During CDA operations, the optimal spacing between aircrafts is more important than optimal sequencing [45]. Thus, we principally assume that the separation distance (mapped as the horizontal distance in Figure 6) between two, same-weight-class, consecutively

arriving aircrafts conducting CDA is *greater* than when these two consecutive arriving aircrafts are conducting SDA, thus

$$d\_{\rm CDA} > d\_{\rm SDA} \tag{16}$$

where:

*dCDA* = minimum separation distance between aircrafts conducting CDA; and

*dSDA* = minimum separation distance between aircrafts conducting SDA.

Assuming that the space available to stack aircraft arrivals (i.e., the maximum number of aircrafts in the queuing system) at an airport is *Sp*, and the minimum allowable horizontal separation distance between a pair of same-weight-class aircraft arrivals is *d*, then the number of aircrafts stacked for approach, *k*, must fit safely within the allowable stack space, as follows:

$$k \le \frac{\mathcal{S}\_p}{d} \tag{17}$$

The largest integer value of *k* that satisfies Equation (17) is a key output of our modeling. When CDA is used, that largest integer value of *k* is the maximum number of aircrafts that can fit within the stacking space under the CDA assumption. The air traffic controller may be able to compare this largest integer with the number of CDA aircrafts presently in the tracking space to ascertain whether there is available stacking space to permit the next arriving aircraft to use CDA or not.

Assuming that the aircraft approach speed, measured in knots, on average, is *Vapp* and that the distance the aircraft covers during descent from the TOD point to touchdown, measured in nautical miles, is *ddes*, then the *average* descent time, *tdes*, could be estimated as

$$t\_{des} = \frac{d\_{des}}{V\_{app}}\tag{18}$$

We assume that an airport's nominal capacity, *AAR*, is sufficiently large to handle all aircrafts that can fit safely within the stacking space. Therefore, when implementing CDA, this assumption is represented as follows:

$$AAR \times t\_{des} \ge k \tag{19}$$

Essentially, stacking space is a contained airspace with predefined boundaries based on traffic and/or obstacles limitations with the purpose to stack aircraft arrivals up to a certain capacity. As the separation distance between aircrafts increases, the stacking space capacity, in terms of the number of aircrafts that could be stacked, *k*, will decrease. Moreover, as the airport arrival rate increases, typically during periods of high demand when many airport staff are working and the airport operates at near capacity, stacking space capacity may decrease as well. This is due to the high level of traffic causing stress and cognitive pressures on the air traffic controllers who, thus, may decide to increase the minimum separation distance between aircrafts as a safety buffer to reduce stress and the possibility of a safety error.

Furthermore, we assume that almost all aircraft arrivals at the airport are expected to successfully land on a runway, regardless of their descent profile type. To attain this operationally, the runway, as a critical element in ATM and airports operations, is assumed to have an arrival capacity that is at least as large as the *AAR*. The maximum runway arrival hourly capacity is calculated by dividing the average aircraft ground speed, *GS*, in knots, crossing the runway threshold by the longitudinal separation distance, *d*, in nautical miles, required between successive arrivals, as follows:

$$RwyCap = \frac{GS}{d} \tag{20}$$

Observe above that the stacking space capacity, *k*, was determined above based on the minimum allowable separation distances between a pair of aircrafts of a similar weight class. Thus, the value of *k* represents a bound on the capacity of aircrafts that may safely fit within the stacking space. Consequently, an upper bound of *k* should be determined based on the worst-case sequencing of aircrafts within the stacking space. That worst case can be determined by assuming a sequence in which the lightest aircraft scheduled to land that day is followed by the heaviest aircraft scheduled that day followed by the lightest aircraft scheduled and so forth. Following these calculations, the air traffic controller may be provided with the lower and upper bounds of the stacking space capacity under the assumption that all aircrafts arriving during that time period use CDA. When a particular aircraft arrives during the period, the controller may compare the number of aircrafts presently in the system versus the lower and upper bounds. If the number in the system is less than the lower bound, then that is a favorable indicator that the arriving aircraft may be admitted using CDA. If the number is above the upper bound, then that indicates that CDA should be denied. If the number of aircrafts within the stack is as much as the lower bound or greater and yet below the upper bound, then the controller will need to consider other factors in reaching a decision on whether CDA is safe for that arriving aircraft. Even if the number in the system is below the lower bound, those other factors may need to be considered by the controller to ensure there is sufficient separation distance between the arriving aircraft and the aircraft it follows most closely within the stacking space. Providing the controller with lower and upper bounds on stacking space is an optional tactic. Doing so depends on whether it would be too much additional information for the, often busy, controller to absorb.

#### **5. The Applied Queueing Model**

In this section, we present our model and its key output, the Probability of CDA Blocking, and by blocking, we mean that an aircraft would be denied conducting CDA by the controller. Therefore, the probability of CDA blocking is the probability that (assuming all aircrafts arriving during the brief period of time considered (e.g., 15 min) are assumed to use CDA) an aircraft would need to revert to SDA even though the initial plan was for all aircrafts arriving in that time period to use CDA. However, we first discuss how the concept of traffic intensity, which is borrowed from the queuing theory, applies in the context of our model.

#### *5.1. Traffic Intensity*

Queuing theory presents a key parameter known as the traffic intensity, also referred to as the utilization factor, which is denoted by the Greek letter *ρss* ("rho"), which is defined here as the average hourly demand rate of the stacking space divided by the average hourly capacity (or *service*) of the stacking space. If the average demand rate (the rate at which aircrafts arrive at the stacking space, i.e., the aircraft arrival rate) is denoted by *λss* and the average service rate is denoted by *μss*, then the *utilization of airspace factor*, *ρss*, for the stacking space within the TMA is as follows:

$$
\rho\_{\rm ss} = \frac{\lambda\_{\rm ss}}{\mu\_{\rm ss}} \tag{21}
$$

where the demand rate is expressed in terms of the number of aircrafts that arrive per hour at the stacking space, and service rate is expressed in terms of the number aircrafts per hour that may enter the stacking space. The value of the service rate, *μss*, is conceptually equivalent to the airport arrival rate (AAR). However, because *λss* is expressed in our examples per 15-min time period, the value of the service rate, *μss*, represents the number of aircrafts that can be processed (served) per 15-min time period, and thus, it is one fourth of the value of AAR. The symbol *μss* is typically used in publications on queueing theory, while AAR is commonly used in air traffic management. Observe that the service rate, *μss*, will be lower when CDA is assumed than when SDA is assumed due to the longer separation distances required between aircrafts using CDA.

#### *5.2. Probability of Aircraft Blocking*

In a queueing system of finite capacity, the probability of "blocking" is the probability that an arriving customer arriving at the queue finds it full and thus exits this system. In our context where aircrafts are the customers, the interpretation of the probability of blocking depends on whether CDA or SDA is assumed for all aircrafts during the brief time period being analyzed. If CDA is assumed for the time period, then a full queue suggests that CDA may not be used, and thus, SDA will be used instead. If SDA is assumed for the time period, then a full queue suggests that the aircraft will enter a holding pattern until the queue has available space to accommodate it. The probability of blocking is the percentage of time an aircraft's request to embark on CDA (or SDA) is denied principally due to safety and because the stacking space within the TMA is busy and congested. This probability is denoted by *Pk* and could be specified for an airport and its TMA to define a threshold beyond which CDA is unsafe to implement, or in the case of SDA, it is the probability that an aircraft will need to enter a holding pattern. Since the approach operations would be limited to the two profiles, namely CDA and SDA, then *Pk* should be determined for these two approach profiles. Theoretically, *Pk* is expressed based on the *M/M/1/k* queuing model, in which the arrival process is Poisson with rate λ<sup>s</sup> and the service process is Poisson with rate μs, a single server (that is, the stacking space), and finite system capacity at *k* aircraft, as follows:

$$P\_k = \frac{1 - \rho\_{\rm ss}}{1 - \rho\_{\rm ss} k + 1} \rho\_{\rm ss}{}^k \tag{22}$$

Because the Markov (Poisson) process distribution has high variability and is assumed for both arrivals and service, the resulting probability is likely to be higher than the value experienced in practice, and thus, the Markov assumption may be viewed as conservative.

#### **6. Numerical Results to Illustrate the Model**

This section illustrates the application of our model through the use of industry standard data (e.g., minimum separation distance rules) and a stream of aircrafts arriving for landing at a mid-sized international airport during an afternoon level of demand. In particular, we used actual flight data from flights operated at Nashville International Airport (BNA) from 1200 to 1759 local time, on 17 June 2015.

Using BNA flight data and standard industry data, we calculated the probability of blocking for each type of descent profile, namely, CDA and SDA. These probabilities are denoted as *PkCDA* and *PkSDA*. *PkCDA* is the probability of blocking if all the aircrafts in the stacking space will conduct CDA, and *PkSDA* is defined as the probability of blocking if all the aircrafts in the stacking space will conduct SDA. As previously mentioned, the probability of blocking is the percentage of time an aircraft request to embark on CDA is denied for safety considerations and due to the stacking space being congested and busy and similarly the faction of time an aircraft landing with SDA would need to enter a holding pattern.

The relevant parameters in building the model are described as follows. The rate of aircrafts arriving per time period, *λss*, is determined based upon the actual number of arrivals at BNA within each of the 24 15-min time periods analyzed. The stacking space, *Sp*, was estimated by visually examining the data and assuming its value as a constant value throughout the six hours. The wind speed, *Ws*, is the average value at BNA during each time period. The aircraft approach speed, *Vapp*, is determined based on averages, across all arriving aircrafts, of their initial approach velocity and final approach velocity. The fleet mix of heavy-, medium-, and light-weight aircrafts arriving in each time period and their sequence of arrivals is determined from what actually occurred in each time period.

The minimum separation distances under SDA and CDA assumptions, *dSDA* and *dCDA*, respectively, were chosen based on ICAO's wake turbulence application [36] and matrix

model to calculate the separation distance between a mix of aircrafts proposed in [35] in each time period. As mentioned previously, these distances vary based on the wake turbulence categories of the leading and trailing aircrafts and, thus, depend on the fleet mix in each time period. Furthermore, the values of *dSDA* and *dCDA* are identical when the leading and trailing aircrafts have the same weight class and differ otherwise.

Figure 7 shows the values of the aircraft arrival rate, *λss*, over time for BNA.

**Figure 7.** The number of aircrafts arriving in each time period at BNA airport.

Consistent with Equation (17), we calculated the number of aircrafts that can fit in the stacking space if CDA is used, *kCDA*, and the number of aircraft that can fit in the stacking space if SDA is used, *kSDA*, as follows:

$$k\_{\rm CDA} = S\_p / \mathbf{d}\_{\rm CDA} \tag{23}$$

$$k\_{\rm SDA} = \mathbf{S}\_p / \mathbf{d}\_{\rm SDA} \tag{24}$$

Those values of *kCDA* and *kSDA* for BNA are shown in Figure 8. This figure may be provided as systematic output for the air traffic controllers, thus enabling them to know the bounds on the maximum number of aircrafts that may be permitted within the stacking space for each descent method. Because the minimum separation distances for CDA are longer than with SDA, the value of *kCDA* is always lower than *kSDA*, while the difference between the two varies due to the varying fleet mix and arrival sequences in each period.

Using Equation (18) to determine *Tdes*, we determined the service rate, μ*CDA* and μ*SDA*, under the CDA and SDA conditions in the following manner, using a calculation similar to Equation (19), described earlier:

$$
\mu\_{\rm CDA} = k\_{\rm CDA} / \text{Tdes}\_{\rm CDA} \tag{25}
$$

$$
\mu\_{\rm SDA} = k\_{\rm SDA} / \text{Tdes}\_{\rm SDA} \tag{26}
$$

Because CDA requires greater minimum separation distances between aircrafts than SDA, this reduces the rate at which CDA arrivals can be processed through the stacking space and, thus, results in μ*CDA* being lower than μ*SDA*.

**Figure 8.** Capacity of the TMA's stacking space *k*.

Using the above parameters and Equation (22), we calculated the values of the probabilities of blocking, *PkCDA* and *PkSDA*, for CDA and SDA, respectively, and show the results in Figure 9.

**Figure 9.** Probability of blocking of the two descent methods.

Observe in Figure 9 that the probability of CDA blocking (i.e., CDA requests being denied), *PkCDA*, is consistently less than the probability of an SDA request requiring the aircraft to enter the holding pattern, *PkSDA*. By looking at graphs such as Figure 9, the air traffic controller can see those periods of time when the vast majority of aircrafts will be able to use CDA, i.e., when the probability of CDA blocking, *PkCDA*, is sufficiently low. For example, except for the late afternoon periods between 15:00 to 15:15 and from 15:30 to 16:45, there is less than one percent probability of CDA blocking. Aside from those late afternoon exceptions, the other time periods would be favorable for permitting CDA arrivals for all (or nearly all) arriving aircrafts. Even during the late afternoon time

periods, the probability of CDA blocking is low enough that the air traffic controller may be able to permit many of the aircrafts to land using CDA. Different air traffic controllers will have different probabilities at which they will be comfortable with using CDA. Their thresholds in such decision-making may change over time as they gain experience using the model. Overall, the information should be helpful to the air traffic controllers by focusing their attention on those opportunities to increase the use of CDA and thus lead to more sustainable air transportation operations.

After deciding on the periods of time when CDA may be used for all (or nearly all) arriving aircrafts, knowing the bounds on the number of aircrafts that may fit safely in the stacking space, *kCDA* and *kSDA*, as shown in Figure 8, may help the air traffic controller to decide on an aircraft-by-aircraft basis whether to admit a particular aircraft's descent using CDA. For instance, if an aircraft arrives during a period of time when the controller plans on permitting CDA and when the actual number of aircrafts within the stacking space upon a particular aircraft's arrival at the TMA is less than *kCDA*, then the controller knows it is likely fine to accept that particular aircraft's request to use CDA, and otherwise, it is not.

#### **7. Conclusions**

Based on the analysis of the parameters that govern CDA implementation during high traffic levels, such as the terminal maneuvering area (TMA) and the size of stacking space to arrange aircraft arrivals, an analytical model has been developed that aims at addressing the accommodation of more CDA operations than is presently done. Our analysis shows that the parameters that have significant impacts on CDA usage include airport arrival rate, capacity of the stacking space, and the minimum separation distance between aircraft arrivals. Although CDA usage is also affected by other parameters, such as wind speed, types of arriving aircrafts, and traffic levels at contiguous airports, we were able to capture the underlying relationship between these parameters and CDA usage, potentially helping air traffic controllers to decide on whether to adopt more CDA more often and using more quantitative information. In particular, we calculated the probability that an aircraft arriving within a brief period of time (say 15 or 30 min) would be denied CDA as a function of airport conditions. This may enable controllers to identify those periods of time when CDA should be anticipated, in other words, when the controller should plan on permitting CDA. This may lead to an increased use of CDA and, thus, result in lower noise, fuel consumption, and pollution. Furthermore, we established bounds for the maximum number of CDA descending aircrafts would fit within the stacking space, providing insight to the controllers on whether to permit a particular aircraft to use CDA or not. Finally, we illustrated our model using actual data from flights operated at Nashville International Airport (BNA).

Future research opportunities include better support for air traffic controllers in analytically determining which arriving aircraft may be able to use CDA for a portion of their descent and when they should switch from SDA to CDA and vice versa. Another research possibility would be to analytically identify opportunities to put arriving aircrafts into a holding pattern for a short while if so doing would be result in it being able to use CDA after holding. Calculations could be performed to identify whether the additional fuel burned from holding (or adjustments to en route speed) would be more than compensated by the saved fuel from using CDA. In addition to the work presented in this paper, these research opportunities and others should be explored to improve aviation green operations and ensure air transportation sustainability.

**Author Contributions:** Conceptualization, L.L.A.-M., E.A.A. and R.J.M.; methodology, L.L.A.-M., E.A.A. and R.J.M.; software, E.A.A. and A.M.W.; validation, L.L.A.-M., R.J.M. and E.A.A.; formal analysis, E.A.A., R.J.M. and A.M.W.; investigation, E.A.A., R.J.M. and A.M.W.; resources, E.A.A.; data curation E.A.A. and A.M.W.; writing—original draft preparation, E.A.A. and R.J.M.; writing—review and editing, R.J.M.; visualization, E.A.A. and A.M.W.; supervision, L.L.A.-M.; project administration, E.A.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Mode Choice Effects on Bike Sharing Systems**

**Matthias Kowald \*, Margarita Gutjar, Kai Röth, Christian Schiller and Till Dannewald**

Department of Architecture and Civil Engineering, RheinMain University of Applied Sciences, 65197 Wiesbaden, Germany; margarita.gutjar@hs-rm.de (M.G.); kai.roeth@hs-rm.de (K.R.); christian.schiller@hs-rm.de (C.S.); till.dannewald@hs-rm.de (T.D.)

**\*** Correspondence: matthias.kowald@hs-rm.de

**Abstract:** Bike-sharing systems (BSS) are offered in many cities and urban municipalities and urban areas without such systems are thinking about their introduction. In addition, many studies on BSS are available; however, neither mode nor route choice parameters are available for station-based BSS, which are required for the implementation of BSS in local and regional transport demand models. As a result, this makes it impossible to simulate demand model-based effects of these systems on other transport modes and e.g., calculate scenario-guided modal shifts. The paper presents results obtained from a survey study, which aims to estimate BSS-related choice parameters. The study combined computer-assisted telephone interviews (CATI) for a collection of revealed preferences (RP) on the use of BSS with a follow-up paper-and-pencil survey on stated preferences (SP) of 220 BSS users and non-users from the Rhine-Neckar area in mid-west Germany. Considering the three transport modes BSS, public transport (PT), and private motorized transport (PMT), results from this choice experiment and, according to behavioural parameters, allow integration of BSS in transport demand models and a simulation of modal shifts. Survey design, mode-choice experiment, and choice models are presented in this paper.

**Keywords:** bike-sharing system (BSS); mode choice; stated choice experiment; multinomial logit model; transport demand model

#### **1. Introduction**

The first bike-sharing systems (BSS) were introduced around five decades ago. Over the past two to three decades, the number of BSS has increased, and such systems are nowadays available in many cities around the globe. It is, however, surprising, that parameters for neither BSS-related mode nor route choices are currently available. This results in a lack of knowledge of behavioural patterns. Amongst other purposes, such parameters are needed for the implementation of BSS as a transport mode in transport demand models and to calculate, e.g., modal shifts.

To estimate such parameters, a survey study was conducted in the field in Germany and collected information on mode- and route choices from around 220 participants in an existing station-based BSS. This BSS was introduced in 2012, and is located in the Rhine-Neckar area in mid-west Germany, including 20 municipalities in total; 4 of them are major cities (Mannheim, Ludwigshafen, Heidelberg, Kaiserslautern), 11 are mid-size municipalities and 5 are smaller towns. The survey study combined computer-assisted telephone interviews (CATI) for a collection of revealed preferences (RP) on the use of BSS with a follow-up paper-and-pencil survey on stated preferences (SP) for BSS users and non-users. The choice experiment considers the three transport modes: BSS, PT, and PMT.

This effort resulted in a rich data set, which allows an analysis of behavioural patterns in terms of BSS-related mode and route choices and a quantification of the needed parameters; with that, the study closes a knowledge gap and allows the implementation of station-based BSS in local or regional transport demand models and according to simulations with these tools.

**Citation:** Kowald, M.; Gutjar, M.; Röth, K.; Schiller, C.; Dannewald, T. Mode Choice Effects on Bike Sharing Systems. *Appl. Sci.* **2022**, *12*, 4391. https://doi.org/10.3390/app12094391

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti, Dimitrios S. Paraforos and Stefania Lanza

Received: 16 March 2022 Accepted: 25 April 2022 Published: 27 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The paper focuses on mode choices exclusively. Section 2 presents results from literature analysis and aims to identify attributes of relevance for choices pro or against BSS-use. Section 3 introduces the study area and an existing BSS, that was used to recruit survey participants and collect information on BSS use. Furthermore, the recruitment strategy, survey protocol, and experimental design for the choice experiment are presented. Next, descriptive statistics on respondents' socio-demographics and their choices are provided in Section 4 together with the model formulation. Section 5 presents results on model development and the final multinomial logit models on choices between the alternatives BSS, public transport (PT), and private motorized transport (PMT). The models are estimated for mandatory and leisure trips and model results are accompanied by interpretations. Finally, conclusions and an outlook on future research are drawn in Section 6.

#### **2. Literature Review**

BSS were introduced in many cities around the globe in the last decade. In parallel, the number of studies on BSS has increased as well (for comparative meta-studies see e.g., Refs. [1–5]). Most of these studies indicate that BSS-use substitutes sustainable modes such as walking and PT; however, some report that BSS-use also reduces trips by car and other PMT means. The effect on PMT is closely related to the synergetic effects of multiand intermodal combinations of BSS with PT and promoting cycling in general [5]. In order to incorporate BSS into transport demand models and thus estimate more comprehensively the potential of reducing car trips, it is necessary to understand the mode and route choice behaviour of (potential) BSS users; so far, only a few studies focused on mode and route choice behaviour of BSS-users see Refs. [6–9], while much more studies exist for cycling with private bikes. Filling this gap is the aim of the survey, which is presented in the following sections.

Several research studies investigated the influence of different attributes on cycling. Buehler and Dill [10] reviewed the effects of cycling infrastructure; among other things, they found that cyclists tend to prefer separated bike lanes, lower speed limits and volumes of motorized traffic, trees along the route, and routes with fewer intersections; further, they tend to avoid routes with on-street parking and many variations in altitude.

Other studies investigated the effects of the built environment to enhance walking and cycling levels [11,12]. Their main findings highlight the importance of short distances to destinations, mixed land use with high densities of population and facilities for groceries, retail, service, and recreation), charges for car parking, and a network of convenient cycling infrastructure for increased shares of bike-use. What cyclists perceive as convenient highly depends on the study location, but amongst other factors, they prefer segregation or protection from motor traffic, little or only small detours, avoidance of intersections with motor traffic, and bike parking facilities.

Studies have also investigated how the built environment affects the use of bikesharing stations [13,14]. It was found that high densities and proximity to cycling-friendly routes and to PT stations play a crucial role; furthermore, a study from Lisbon described an algorithm-based approach for the identification of optimal locations for stations and fleet dimensions for BSS by taking into account user demand, renting-costs, and a mixed fleet of regular and electric bikes [15].

In addition, there are studies, which explicitly investigated the mode choice behaviour of cyclists on both private and shared bikes: Hamre and Buehler [16] studied the mode choice behaviour of around 4.600 commuters in Washington; their result is those bike parking facilities, non-free car parking, and showers at work increase the utility of commuting by bike. In another study, cycling was compared with bus riding and driving with respect to travel time reliability [17]; it was found that for habitually repeated trips, travellers rate reliability higher than travel time. Campbell et al. [6] investigated the factors influencing the choices for bike-sharing (conventional bike and e-bike) in Beijing [6]; they found that trip distance, rain, temperature, and poor air quality negatively impact the choice for non-electric BSS on the one side. On the other side, they reported that users'

socio-demographic characteristics only play a minor role. A study from Switzerland [18,19] compared several mode-specific effects on choices between PMT, PT, cycling, and walking under the inclusion of individuals' and area-specific characteristics, whereby for walking and cycling only the effect of travel time was considered as a mode-specific attribute; they found that car availability, younger age, low fuel, and parking costs, and low parking search times increase the utility of PMT, while low access and egress times, low ticket costs, and a low utilized capacity increase the utility of PT. An investigation on the Dutch mobility panel with almost 2000 participants focused on characteristics that affect choices between PMT, PT, cycling, and walking [20]. Among others, these characteristics include individual and household characteristics (such as socio-demographics and mobility ownership), weather, trip characteristics (such as distance and travel time), effects from the built environment, and work characteristics. Results indicate that, among other things, higher education, transit subscription, cycling to high school, weekdays, and certain trip purposes increase the utility of cycling, while owning a company car and travelling in larger groups decrease its utility. A recent study compared shared modes in Zurich, namely station-based BSS, e-bike sharing systems (e-BSS), and e-scooters [7]; they found that station density and morning hours increased the utility of BSS, while variation in topographical altitude and night hours decreased its utility. Ilahi et al., (2021) undertook an extensive survey with more than 5000 participants from the Greater Jakarta area in which they also incorporated less-established modes such as urban air mobility, including currently developed electricbased aircraft or autonomous vertical take-off and landing vehicles, on-demand transport, and bus rapid transit [21]; they found that motorcycles have the highest baseline utility, while bikes have the second-lowest.

In addition, numerous studies have focused on the route choices of cyclists. Although this paper is primarily focused on mode choices, it can be expected that route choice effects in general also affect mode choices. A recent study with 662 participants from Greece and Germany investigated the effects of cycling infrastructure, speed limit, surface, on-street parking, trees, and travel time on route choices [22]; they found that in both countries protected bike lanes are preferred over other forms of infrastructure. Furthermore, asphalt is preferred over cobblestone, while the utility of a speed limit depends on the country. Other attributes that were incorporated in older studies are the number of car lanes [23], stop signs and crowding of cyclists [24], number and type of crossings [25], the width of bike lanes and traffic volumes [26], as well as sharing space with pedestrians and the availability of secure parking and showers at the destination [27].

With reference to the presented studies, it can be assumed that relevant attributes for cycling with private bikes are relevant for BSS, too. Furthermore, additional attributes, such as renting costs, access and egress times, to renting stations are relevant in the case of BSS. Evaluating which effects are of relevance for BSS and quantifying their influence is the aim of our survey study. The employed survey design and choice experiment are introduced in the following section. The study aims to provide an overview of the most relevant effects to allow the implementation of BSS in multi-modal transport demand models, and, with that, to provide a basis for forecasts on urban transport, which consider BSS as an alternative transport mode.

#### **3. Survey Design and Choice Experiment**

In 2020, the transport association Rhine-Neckar (VRN) decided to evaluate the performance of its BSS named VRNnextbike and especially focus on users and their behavioural patterns such as e.g., origins and destinations of rental bike trips, trip purposes, average renting times, and distances. Figure 1 provides an overview of the supply area of VRNnextbike, which lies in Mid-West Germany; it includes 20 municipalities in total; 4 of these municipalities are major cities (Mannheim, Ludwigshafen, Heidelberg, Kaiserslautern), 16 are minor cities with 11 mid-size municipalities and 5 smaller towns.

**Figure 1.** The supply area of VRNnextbike.

Figure 2 provides an overview of the absolute development of bike rents between 2015 and 2020; it documents a positive development trend and repeating temporal patterns between annual seasons. Furthermore, it shows the effect of the first Corona-Lockdown in the second quarter of 2020 and the quick recovery of the system in the third quarter.

**Figure 2.** Quarterly development of the absolute number of rents.

Evaluating the BSS VRNnextbike and analyzing user behaviour in terms of, e.g., sociodemographics made it necessary to design a survey study. This provided the opportunity to include items on the effects of mode and route choices in terms of a BSS and to quantify mode and route choice parameters. Collecting data on BSS-related mode and route choices; however, made it necessary to extend the survey framework and include a stated choice experiment. In addition, the target population had to be extended. While the VRN-survey wanted to focus on BSS users exclusively, the choice experiment made it necessary to include non-users as well, to understand differences in the perception between these two groups and allow a later simulation of, e.g., modal shifts. For this reason, a hybrid recruitment strategy and a specific survey protocol for BSS users on the one side and non-users on the other side were employed and a stated preference experiment was developed and included in the survey.

The following descriptions and data analyses focus on the mode choice experiment exclusively. The route choice experiment and recording results will be presented in the future. Readers who are interested in the results are encouraged to contact the research team.

#### *3.1. Recruitment Strategy*

Table 1 presents an overview of the frame population and recruitment strategy of the survey study, which was split and specifically designed for BSS users and non-users. BSS users, on the one side, were recruited electronically either before or after renting a bike from VRNnextbike. During the electronic check-in or -out of the rented bike in the VRNnextbike smartphone app, they have presented an invitation to participate in a computer-assisted telephone interview (CATI) on their travel behaviour, bike-rent habits, and the BSS-trip they performed either before or after the recruitment. To participate, they were asked to mention their preferred daytime for a 30-min telephone interview within the next week and, in addition, to report their age, gender, and city of residence in an electronic recruitment questionnaire. Questions on socio-demographics aimed to keep control over the distribution of socio-demographic characteristics in the CATI sample. Participation in the electronic recruitment questionnaire took about two minutes on a smartphone. Accompanying the questions, information on data protection and the aim of the research project was provided on linked websites and an incentive of EUR 20 for participation in the CATI was mentioned. After participants completed the CATI, they were recruited for the subsequent paper-and-pencil questionnaire, which was sent via postal mail to the respondents. Enhancing convenient participation was the aim of this study, ensured by sending an addressed and postpaid mail-back envelope together with the questionnaire, including additional information on the study and data protection.

#### **Table 1.** Frame population and recruitment strategy.


BSS non-users, on the other side, were recruited from a sample of randomly generated phone numbers for the supply area of VRNnextbike. These phone numbers might in some cases have resulted in interviews with BSS-users; however, this approach was chosen as only 2.9% of all inhabitants in the supply area are BSS-users and sampling a user of VRNnextbike was for this reason rather unlikely; furthermore, a control questions whether respondents are BSS users was added to allow the identification in the analysis. Recruitment of this subsample was done via telephone by asking for gender, age, and city of residence to keep control over the distribution of socio-demographic characteristics. In addition, the telephone recruitment aimed to provide information on the study, and data protection issues and to discuss potential respondents´ questions. In addition, an incentive of EUR 20 was offered for a filled-out survey instrument. Like in the sample for BSS users, the printed paper-and-pencil questionnaire was sent via postal mail to the respondents, which included an addressed and postpaid mail-back envelope.

#### *3.2. Revealed and Stated Preferences: Survey Protocol*

Choices on, e.g., transport modes and routes can be observed either in real-life situations or in hypothetical choice situations. Real-life observations result in information on revealed preferences (RP) whereas observations of hypothetical choices result in data

on stated preferences (SP). RP data have the advantage of representing peoples' actual behaviour; they, however, also include disadvantages such as little variation in attributes, which could be used to explain choices (e.g., travel times by bus on a specific route), and issues of multicollinearity (e.g., travel times, distances, and costs are often highly correlated for a specific transport mean). In addition, future effects, demands, and supplies cannot be addressed with RP data. SP and according to hypothetical choices overcome these issues. Here, an experimental design is employed, that focuses on selected and potentially future attributes. It asks respondents to exclusively consider these selected effects when making their choices. Attribute variation is controlled by an experimental design, which allows overcoming the challenge of multicollinearity. The disadvantages are, however, the hypothetical character of the choice tasks and the reduction of complexity (for a more detailed discussion on RP- and SP choices see, e.g., Refs. [19,28,29]).

In transport planning, RP and SP data are often combined to overcome the mentioned limitations of the SP approach. One way to increase the reliability of SP choices is to employ an individual´s RP decisions as the basis for her or his SP-choice situations [30] (for a discussion of combined RP–SP studies see Ref. [19]). This means, on the one hand, an increased complexity for the fieldwork, as choice situations are individually tailored for each participant of the SP survey. On the other hand, the RP–SP combination increases the quality of a survey as the choice situations are based on former choices of a respondent, transport familiarity and thus allow an easy imagination of the choice situation under observation. The more realistic and familiar a choice situation is, the less effort it takes to be contextualized resulting in more reliable respondents' answers (for choice situations see Ref. [31]; for a more general discussion on response burden see Ref. [32])

In the CATI for BSS-users, information on mobility-tool ownership, trip characteristics of the BSS-use at the time of recruitment, attitudes on BSS, and socio-demographics were collected. During the interview, the chosen route of the rental bike trip was traced electronically by employing an online routing tool [33] to gather additional information for the trip, such as travel time and road surface. The collected information and RP data were employed as a basis for the mode choice experiment. Based on these BSS attributes, trip characteristics for the alternative modes PT and PMT were collected using an online routing provider [34] and an electronic PT-schedule service [35]. The choice experiment itself was designed as a follow-up survey and presented to those CATI participants, who agreed in filling out the paper-and-pencil questionnaire.

As RP data were not available for BSS non-users, aggregated RP characteristics on BSS usage were employed as the basis for the SP experiment. The aggregated figures for travel time were calculated on the automatically recorded data from the BSS VRNnextbike by discriminating between short, middle, and long trips for both major and minor cities (on average trips of the length of 0.8, 1.5, and 3.2 km for major cities, and 0.7, 1.4, and 4.4 km for minor cities), while averages for access and egress times were obtained from the BSS user-survey. In this case, data on mode alternatives were based on aggregated figures from the national travel survey in Germany [36], taking into account differences between major and minor cities. To design an individual questionnaire for this subsample, firstly, each respondent was assigned to a town size group based on his or her postal address. Secondly, every participant was sequentially assigned to a short, middle, or long trip distance and a trip purpose, either a leisure or mandatory activity at the trip destination. The resulting RP-values for each BSS non-user were employed as the basis for SP-experiment and its variations of attributes.

The recruitment procedure of both subsamples is visualized in Figure 3.

**Figure 3.** Schema of survey protocol.

*3.3. Experimental Design and Choice Situations*

Attributes of the mode choice experiment were taken from former studies on mode choices and studies on selected transport modes such as bikes, PMT, and PT (see Literature Review in Section 2). Furthermore, they were discussed with external academic partners and practitioners from transport planning offices.

As described above, RP information (for BSS users) or aggregated empirical figures (for BSS-non-users) were employed and empirical values for the alternative modes were collected. Next, an individually tailored SP questionnaire was created by varying the mode-specific characteristics in accordance with a predefined experimental design. An overview of the attributes and variation of attribute levels is provided in Table 2.

**Table 2.** Stated preference experiment: transport modes, attributes and variation.


To reduce the number of possible combinations of the presented variation levels, an efficient design [37] was generated with the software Ngene [38]; it resulted in 60 combinations to design the choice tasks, which were split into six blocks (the experimental design is available upon request). Each participant was assigned to one block and the empirical values were varied accordingly. Finally, each participant was asked to complete ten choice

tasks in the paper-and-pencil questionnaire. Participation in the study was restricted to adults (18 years and older) owning a driver's license to make the alternative PMT realistic.

The individually tailored paper-and-pencil questionnaires were created, printed and sent within one week after recruitment to avoid fatigue effects. To increase reliability in terms of PT utilized capacity, an illustration accompanied the questionnaire (see Figure 4; for more details on reliability see Ref. [19]).

**Figure 4.** Illustration of capacity utilization (illustrations taken from Weis et al. [19]).

Both the survey protocol and instrument followed the suggestions by Dillmann [39]. After four weeks of non-response, reminders were sent with a new copy of the questionnaire. An exemplary choice situation is shown in Figure 5.


**Figure 5.** Example of a choice task.

#### **4. Descriptive Statistics and Modelling Approach**

The survey was in the field from September 2021 to February 2022. After data cleaning, information from 220 respondents, who filled out and returned the questionnaire, was collected. On average, respondents answered 9.93 (median = 10) mode choice tasks, which resulted in a total of 2184 observations for the analysis. 27 respondents (12.3%) showed non-trading behaviour, meaning they chose an identical transport mode in all presented choice tasks.

#### *4.1. Descriptive Statistics*

Respondents from the subsample recruited via randomly generated phone numbers (see subsample 2 in Table 3), who indicated to use a BSS, are considered BSS-users in the following analytical procedure (*n* = 11). Information on the distribution of selected socio-demographic characteristics and the frequency of chosen transport modes in the choice experiment for both subsamples and the whole sample are presented in Table 3.


**Table 3.** Relative frequency distribution of selected sample characteristics.

Around 58% of all participants were BSS users and 42% are non-users. Users were more often males (64.1%) than non-users (41.3%). For the whole sample, the gender proportion was more balanced with 54.5% males and 45.5% females. BSS users belonged remarkably more often to younger age groups than non-users. Again, the proportion between young adults (18–30 years; 47.3%) and middle-aged persons (31–65 years; 41.8%) was more balanced for the whole sample. There are only a few observations of people in the retired age group (66–94 years): 0% for BSS-users, 26% for non-users and around 11% for the whole sample. Furthermore, around 89% of the user-sample lives in cities, 9% in larger towns and around 2% in small municipalities. This fits well with the automatically tracked renting numbers of VRNnextbike [40]. The non-user sample includes more respondents from larger towns (25%) and small municipalities (around 11%). In summary, there is socio-demographic variation in the data, and it can be assumed that evaluations from people with different socio-demographic characteristics are considered in the analysis.

Concerning mode choice situations, respondents from the user sample most often chose the BSS (around 60%), followed by PT (around 32%) and PMT (9%). Non-users also preferred the BSS (64%), however, followed by PMT (37%) and PT (22%).

Covariates in the behavioural experiment are based on RP-data for subsample 1, respondents recruited during the CATI, and on aggregated empirical figures for subsample 2, respondents recruited from the random phone number sample. Table 4 presents the empirical distribution of these covariates as included in the experiment.


**Table 4.** Empirical distribution of covariates in a choice experiment.

In terms of the BSS, access and egress time has a minimum of 1 min, a median of 6 min, a mean of around 6 min and a maximum of 49 min. This distribution is comparable to access and egress for PT. BSS trips are short with a median travel time of 10 min and 15 min for the 3rd quartile. A few trips, however, are long with a maximum of 108 min. Travel costs are low, with EUR 1.80 for the 3rd quartile.

The overall travel time (including parking search time) of PMT is substantially higher than the travel time for BSS and PT. The minimum travel time is 6 min, 1st quartile at 12 min, median at 15 and 3rd quartile at 19 min. The maximum travel time, however, is 46 min and substantially lower than the maximum travel times for BSS and PT. Fuel costs for the trip show a range between EUR 0.10 for a very short trip and EUR 6.20. Parking costs lay between EUR 0.60 and EUR 4.80 with a median of EUR 3.20.

Access and egress times and travel times for PT are overall comparable to BSS. In terms of travel costs, PT shows moderate values between BSS and PMT.

#### *4.2. Model Formulation*

Discrete choice data, where respondents choose between a limited number of alternatives, are commonly analyzed by applying random utility maximization theory. The theory assumes rational behaviour in which respondents choose the alternative with the highest utility [29,41–43]. Namely, an individual *n* faced with *J* alternatives in *T* choice tasks associates an indirect utility *Unjt* for an alternative *j* in a choice task *t* and chooses the alternative with the highest utility. The utility of an alternative *j* is therefore decomposed as

$$
\mathcal{U}\_{\rm njt} = V\_{\rm njt} + \varepsilon\_{\rm njt} = \mathfrak{x}'\_{\rm njt}\mathcal{B} + \varepsilon\_{\rm njt} \tag{1}
$$

where *Unjt* is not observed, but *Vnjt* is the deterministic utility of alternative *j*, and *εnjt* is a random component not included in *Vnjt*. The deterministic utility *Vnjt* can be specified by the term *x njt*, where *x* is a vector of explanatory variables (e.g., attribute levels), and *β* is the corresponding coefficients to be estimated.

For each alternative, a utility function (*Vnjt*) is specified, whereby the alternativespecific attributes, characteristics of the respondent or the choice situation are included as explanatory variables. When specifying the utility function, it is important to understand that only the differences in utility matter, while the scale of utility is arbitrary [29] (p. 19). Therefore, to capture the differences in the utility of the alternatives, *J*-1 alternative-specific constants (ASC) are specified, whereby the estimated ASCs are interpreted relative to the omitted alternative, which is normalized to zero [29,43]. For the categorical attributes, street type, surface type, and utilized capacity, the *L* levels of each attribute were transformed into *L* − 1 dummy variables. This means, the utility for one level per attribute is normalized to zero and serves as a reference category, while the parameter estimates for the *L* − 1 dummy variables capture the utility differences to this reference category [28,29,43].

#### **5. Results**

The 2184 observations (choice tasks) were analyzed by estimating multinomial logit models (MNL) [44] in R [45,46], whereby BSS was chosen as reference alternative when specifying the equations for estimation. Firstly, an initial MNL was estimated by including exclusively effects of attributes from the choice experiment (see Table 2; for a documentation of this work see Ref. [47]). With reference to previous studies on mode choice [18–21] effects of socio-demographics (age, gender, education, student status, car availability, PT season ticket availability), home municipality, and season (winter vs. autumn) were expected. Consequently, as recommended in methodological literature [29,44] the initial model was sequentially built up by including these effects as alternative-specific attributes in maximum *J*-1 alternatives (one alternative as reference category), testing the hypotheses, and comparing the models (restricted vs. unrestricted) to omit parameters without significant effects and/or substantial improvement in the model fit. Further, it was assumed that the effects of travel time and travel costs depend on household income and the distance of the trip, and this is why corresponding continuous interactions were specified [18,19]; however, these interactions neither had a significant effect nor made a substantial improvement of the model and thus are not presented. All analytical steps along with estimated models can be made available on request. Table 5 provides an overview on model fit between the initial model, which exclusively included effects from attributes of the choice experiment, and the extended, final model, which is presented below. The likelihood ratio-test indicates a significantly better fit for the final model, which is supported by the increase in adjusted Rho-square, and by the decrease in AIC and BIC (for evaluation of model fit indices and model comparison, please review methodological literature, Refs. [29,44]).


**Table 5.** Model fit comparison between the initial and the final model.

Employing the estimated parameters in transport demand models at a later stage of the project requires a distinction between mandatory and leisure trips. Mandatory trips are those with destinations for purposes such as education, work, business, or home. Leisure trips are those with destinations such as shopping, private activities and tasks, or any leisure activities. Usually, people have more degrees of freedom in destination choice for leisure than for mandatory activities. Therefore, the trip purpose was included as alternativespecific attribute in the final model on the total sample. In addition, segregated models were estimated on a subsample for mandatory and a subsample for leisure trips. The results of all three models, the overall (total) model on all observations, and the segregated model on mandatory and leisure trips, are presented in Table 6.

All parameters for the final model show the expected sign and reasonable differences in parameter values. For all modes (BS, PMT, and PT), the estimated parameters for travel time and time for access and egress show a negative effect. Hereby the negative effect is stronger for access and egress, which was expected as ride-times in or on a vehicle are often considered less negatively than waiting times or access and egress-times [19]. Travel costs demonstrate a negatively associated utility for all modes. In addition, parking costs and fuel costs for PMT are negative, too.

For BS, the data do not support any significant difference in utility for the street type; however, with reference category arterial road, the cycleway has a higher positive estimated utility (β = 0.195, t-value = 1.339) and the side street has a negative utility (β = −0.013, t-value = 0.089). This negative utility of side streets can be explained with a detour-association in comparison to the probably more direct and thus shorter route on an arterial road. Relatively to macadam surface, cobblestones do not show differences in utility (β = 0.004, t-value = 0.028), while asphalt is a more preferred surface type; however, the effect is also not significant (β = 0.239, t-value = 1.634).

For PMT and PT, the estimated ASCs show the differences in utility of a given alternative from the reference BS when everything else is equal [44]. The utility of PMT is higher than for BS (β = 0.669, t-value = 1.059), whereby the direction of the effect changes when comparing mandatory trips (β = −1.677, t-value = 1.389) to leisure trips (β = 0.980, t-value = 1.233). This can be explained with the high share of commuters and students mainly using the system for trips to work and education. For these people, BS has a higher utility as PMT. This interpretation is also supported by the negative sign for mandatory trips in the overall (total) model (β = −0.608, t-value = −3.739). In addition, PT has a higher positive utility than BS, too (β = 0.756, t-value = 1.552), whereby there is no change in sign between mandatory and leisure trips. Influences from the spatial typology are limited to PMT, where the utility for PMT decreases with an increasing size of the home municipality. For PT, a mode-specific effect results from capacity utilization. An increasing utilization results in a decreasing utility for PT.


**Table 6.** Results of MNL models for all observations (total) and trip specific models.

In terms of socio-demographics, the effect of age and age-squared shows a u-shaped distribution of utility for both, PMT and PT (see Figure 6). Choosing PMT has a negative utility from 18 to 64 years, whereby the smallest value is reached between 32 and 33 years. From this age on the utility of PMT increases again. A somehow similar picture is observed for PT, where PT has a negative utility in comparison to BS between 18 and 55 years. The lowest utility is calculated for an age of 28 years. From this age on the utility of PT increases in comparison to BS.

For women, PMT has a higher utility than BS (β= 0.745, t-value= 4.976). This effect is different for PT, where the utility for women is negative (β= −0.109, t-value= 0.840). This pattern fits the results of other studies, which show that women appreciate the privacy of cars (for a general discussion on car use and gender see, e.g., Ref. [48]) and perhaps BSS in comparison to PT.

Furthermore, in comparison to BS, having a car always available increases the use of PMT (β = 0.627, t-value = 3.056), while owning a PT season ticket increases the utility of PT (β = 0.433, t-value = 3.241) and decreases the utility of PMT (β = −0.494, t-value = −2.897). In winter, both, PMT (β = 0.360, t-value = 2.230) and PT (β = 0.643, t-value = 4.322) are more preferred than BS.

**Figure 6.** Example of a choice task.

#### **6. Conclusions**

The survey resulted in behavioral parameters, which show the expected signs and allow a straightforward interpretation. It has to be kept in mind that the survey was in the field between September 2020 and February 2021. In this rather cold season of the year, BSS-using figures are low, and it can be assumed that the share of experienced users is overrepresented, whilst occasional users are underrepresented in comparison to the warmer season. This, however, does not necessarily lead to bias in the data.

In addition, our study has a regional character. Topographically seen, the supply area of VRNnextbike is rather flat with some hills. Mountains and large altitudinal differences are rare if present at all. Even though altitude was not considered in the choice experiment, there is a correlation with travel time and with travel time related costs; this has to be considered when statistical results are employed in other regions.

In general, results can be used to implement BSS in transport demand models. The main empirical findings are:


Analyses, however, are not finished yet. Future work will be on a calculation of willingness to pay values (WTP) as well as values for travel time savings (VTTS). These values will allow a comparison to similar studies for PT and PMT and will show to what extent the above presented results are similar and reasonable. In addition, parameters for route choices have to be estimated. Once this is done, BSS-parameters will be implemented in an existing regional transport demand model and three scenarios will be simulated:


4. Estimation of route-choice parameters, implementation of BSS in a transport demand model and calculation of modal shifts along the above-mentioned scenarios will be documented later and published elsewhere. The present work, however, represents one necessary next step for a better understanding of a good established transport mode in cities and urban areas. In terms of data collection and analysis, it would be good to combine survey data on cycling in general with sensor-based data on, e.g., cycling safety [49].

**Author Contributions:** Conceptualization and methodology: M.K., T.D., C.S.; writing: M.K.; data acquisition: K.R., M.G., M.K.; data analysis: M.G.; literature review: K.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the state of Hessian and the House of Logistics (HOLM) in the program "Innovations in logistics and mobility" of the Hessian Ministry for Economic Affairs, Energy, Transport and Housing, grant number HA-No. 1013/21-15. Open Access funding provided by the Open Access Publication Fund of RheinMain University of Applied Sciences.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The survey data can be obtained from the corresponding author. Interested researchers are asked to send a data request via e-mail.

**Acknowledgments:** This project (HA-No. 1013/21-15) is founded by the state of Hessian and the House of Logistics (HOLM) in the program "Innovations in logistics and mobility" of the Hessian Ministry for Economic Affairs, Energy, Transport and Housing.

**Conflicts of Interest:** The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Multi-Factor Rear-End Collision Avoidance in Connected Autonomous Vehicles**

**Sheeba Razzaq 1, Amil Roohani Dar 1, Munam Ali Shah 1, Hasan Ali Khattak 2,\*, Ejaz Ahmed 3, Ahmed M. El-Sherbeeny 4, Seongkwan Mark Lee 5, Khaled Alkhaledi <sup>6</sup> and Hafiz Tayyab Rauf <sup>7</sup>**


**Abstract:** According to World Health Organization (WHO), the leading cause of fatalities and injuries is rear-ending collision in vehicles. The critical challenge of the technologically rich transportation system is to reduce the chances of accidents between vehicles. For this purpose, it is especially important to analyze the factors that are the cause of accidents. Based on these factors' results, this paper presents a driver assistance system for collision avoidance. There are many factors involved in collisions in the existing literature from which we identified some factors which can affect the accident occurrence probability. However, with advancements in the technologies of autonomous vehicles, these factors can be controlled using an onboard driver assistance system. We used MATLAB's Fuzzy Inference System Tool to analyze the categories of accident contributing factors. Fuzzy results are validated using the VOMAS agent in the NetLogo simulation model. The proposed system can inform the vehicle's automated system when chances of an accident are higher so that the vehicle may take control from the driver. The proposed research is extremely helpful in handling various kinds of factors involved in accidents. The results of the experiments demonstrated that multi-factor-enabled vehicles could better avoid collision as compared to other vehicles.

**Keywords:** collision avoidance; fuzzy logic; on board driver assistance; semi-autonomous; multifactor; VANET

#### **1. Introduction**

According to the WHO [1], around 20 to 50 million people suffer from severe injuries in road traffic crashes, with many experiencing disabilities because of their injury. Road traffic injuries were the leading cause of death for children and adults between the ages of 5 and 29 years [1]. To reduce fatalities and injuries from road traffic crashes, the World Health Organization (WHO) acts as a team with partners responsible for technical support to countries. The leading cause of fatalities and injuries is the rear-end collision, which make up 70% of all vehicle collisions [2]. Another report, according to the authors in [3], is that 1.078 million injuries in the USA are only due to rear-end collisions. So, an efficient collision

**Citation:** Razzaq, S.; Dar, A.R.; Shah, M.A.; Khattak, H.A.; Ahmed, E.; El-Sherbeeny, A.M.; Lee, S.M.; Alkhaledi, K.; Rauf, H.T. Multi-Factor Rear-End Collision Avoidance in Connected Autonomous Vehicles. *Appl. Sci.* **2022**, *12*, 1049. https:// doi.org/10.3390/app12031049

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti and Dimitrios S. Paraforos

Received: 18 December 2021 Accepted: 13 January 2022 Published: 20 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

avoidance system in vehicles is needed to reduce the death rate [1]. In the existing literature, many researchers have proposed solutions for collision avoidance from the rear-end.

The authors in [4] proposed a rear-end collision avoidance controller based on proportional–integral–derivative. Another research was proposed by authors in [5] for vehicle rear-end collision avoidance using the linear quadratic optimal control technique. The problem with these solutions is that these are highly dependent on mathematical models [6]. The problem of mathematical-model-based solutions can be overcome using fuzzy logic [7].

Now we figure out important factors from literature which can be used in fuzzy logic to solve the problem of rear-end collision avoidance. The authors in [8] used the physical, environmental, and mental factors to reduce the chances of accidents (COA). The authors in [9] analyzed and discussed the road and weather condition factors in accident occurrence. The authors in [10] also used the environmental factors, such as road and weather conditions in rear-end crash avoidance. Driver characteristics can be added in reducing accident chances and increasing the flexibility of the algorithm. The authors in [11] proposed an algorithm in which they pass the characteristics of driver in the proposed algorithm and showed a significant improvement. Driver's characteristics are also important in decision making because the warning thresholds can be improved by adding driver experience [12], age, and time of accident, along with the factors that are used in [8]. Different factors can change the results of accident occurrence. The authors in [12] discuss different single factors and multi-factors involved in road accidents. The time of the accident can also play an important role in accident avoidance in all these factors discussed in [12].

The combinations of these factors, as discussed in [8,11,12], can improve decision making while driving. We will use fuzzy logic to check how these factors can increase the chances of accidents. First, we examine whether these fuzzy rules can be verified and validated or not. There will be serious problems, e.g., false warning when there is no need for that, if the fuzzy rules are not properly validated. There is an existing model in this regard that provides the Virtual Overlay for Multi-Agent System (VOMAS), which can test any kind of system for accurate results. VOMAS can be applied using NetLogo tool for the validation purpose of different simulations. The authors in [13] used the VOMAS in their proposed system.

The proposed system requires the output of the Fuzzy system to take actions for accident prevention. The actions can be simulated with the help of NetLogo Tool. This is possible if we fed the input of multi-levels of factors, such as environmental and physical conditions [8], driver [11], and weekday and time [12], into the simulation model.

The contributions of the proposed system are as follows:


#### **2. Related Work**

This literature review consists of collision-avoidance-based research work between vehicles using different factors involved. We tried to find out these factors for our research purpose and how these factors help in accident occurrence or collision avoidance. We also tried to figure out tools and techniques used by researchers for this work. In recent times, many researchers have conducted research on collision avoidance between vehicles. Due to an increase in the number of vehicles, it has become a challenge to reduce the deaths in

accidents by vehicles collision [14]. Collision-avoidance-based warning systems [15] have a significant impact on road traffic safety. In the existing literature, many algorithms with the name collision avoidance, collision warning, collision assessment, collision prediction, or collision risk assessment have received massive research in reducing collisions in vehicular ad hoc network (VANET). Figure 1 depicts the VANET.

**Figure 1.** Ad-hoc Network using Vehicles (VANET) [16].

The authors in [11] proposed a safety collision avoidance algorithm. This algorithm takes the characteristics of environment and driver and assign weights to these factors. These characteristics (health, mental index, age, visual acuity, driver age, etc.) are the inputs to the algorithm, and an output is generated in the form of a warning. MATLAB and VISSUM were used for the implementation. The authors said that more experimental data are required for the improvement and optimization of the algorithm. Another collision warning system was proposed by the authors in [17]. In this collision warning model, the authors discussed the impact of weather factors on human-related factors. Their intent was to consider the low visibility factor and proposed a Visibility-based Collision Warning System. MATLAB was used for the implementation and the results. According to authors in [18], a collision warning system is necessary for avoiding collision as depicted in Figure 1. Their proposed model consisted of three steps. They used PreScan commercial software for simulation tests. The proposed system results were better than the time of the collision-based system [19].

The authors in [20] proposed a new methodology for finding crash risk. They studied the driver factors and time factors involved in accidents from police reports recorded during 2002–2012 in Great Britain. They found that drivers of different ages and the time of the accident has huge potential impact in accidents. Their study helps in finding new factors involved in accidents. The authors in [21] proposed an ANN-based self-learning control framework which can improve the strength of vehicle during collision avoidance with the increase in the experience of driver. Their study added new factor called driver experience. They performed the experiments using CarSim software. The authors in [22] assess the impact of V2V communication for road safety applications. Instead of DSRC devices, the authors used laptops as a test bed with the necessary equipment. Their main interest was in broadcasting messages between V2V for collision avoidance without DSRC. Tests were performed using a Linux-Based Laptop and a Scapy add-on.

Frontal obstacle detection is a challenging task in collision avoidance. The authors in [23] proposed Mamdani and Sugeno fuzzy logic methods to overcome this challenge. They said that Mamdani and Sugeno can obtain the same efficiency. Experiments were performed using MATLAB. The authors in [24] have proposed an obstacle avoiding system based on a fuzzy logic controller. It allows the vehicle to move independently while avoiding collision with an obstacle. The controller was implemented in real time with an underwater vehicle. Based on tests, the authors proved that fuzzy logic can be useful for collision avoidance. The authors in [9] proposed a methodology for avoiding rear-end collisions. In this study, the focus was on visibility and road alignment factors.

The researchers in [25] describe different factors involved in accident occurrence. According to their study, the most influential factors are environmental and human factors. In addition, high-speed driving, cell phone use, and use of substances also increases the risk of accidents. The authors suggested some strategies to reduce the chance of accidents. Faisal et al. [26] proposed a novel approach for collision avoidance between autonomous vehicles following social norms and emotions. The authors used the fuzzy logic to compute the results of factors involved. A simulation was created for the proposed model using NetLogo. Xiang et al. [27] proposed a forward collision avoidance algorithm where fuzzy logic rules were used for initiating critical brake control.

The researchers in [28] have proposed a rear-end collision avoidance scheme between vehicles. They have considered factors such as the road, vehicle type, driver, and external environment. For implementation purposes, Fuzzy Logic, VISSIM, and MATLAB were used. The authors in [29] deal with two key aspects of road transport: efficiency and safety. The proposed system detects the obstacles and generates the warnings and then sends them to the driver. In case the driver fails to perform an action, the control shifts to cruise control system. The authors in [30] proposed software-based collision avoidance systems using Dedicated Short-Range Communication (DSRC). They performed the timing analysis of events based on the DSRC detection range, communication latency, and road condition. Zhao et al. [31] also proposed a collision warning system based on DSRC.

The authors in [32] proposed a collision warning algorithm in which they analyze different factors (human, road condition, time, and position), which can affect the performance of collision. The authors in [33] proposed a collision avoidance system where traffic lights communicate with nearby smartphones. Then these smartphones share warning with other smartphones. Though they performed collision avoidance, but no factors were used in their proposed system. The authors in [34] described a rear-end collision system using a model called the Bayesian Network. The model depends on ego factors of drivers and the braking intention of the front vehicle. They will implement and test their work in the future. The authors in [35] proposed a framework for space-based collision avoidance. V2V communication and a machine learning approach were used to accurately detect the collision and avoid its occurrence.

According to the authors in [36], features of human drivers have been used to control the rear-end collision using fuzzy logic, and according to the authors in [37], fuzzy logic can resolve the rear-end collision avoidance mathematical issues. In our previous work [8], we have proposed a V2V rear-end collision avoidance algorithm with the help of fuzzy rules bearing in mind the environmental, physical, and mental factors. These factors contribute to road collisions. A Multi-Factor-Based Road Accident Prevention System (MFBRAPS) was proposed to avoid collisions. MATLAB and Net Logo were used for the implementation. In this paper, we are fetching new important factors which can help the collision avoidance algorithms in more effective way based on existing research. We put these new factors in MFBRAPS for a better collision warning system in V2V.

#### **3. Proposed Methodology**

There are 6 levels (level 0 to level 5) of autonomy defined by the Society of Automotive and Engineers (SAE) [38]. In level 0, all tasks are performed by driver. Level 1 assists drivers with an advanced driver assistance system (ADAS) [39]. In Level 2, the driver is still present and responsible for driving and monitoring the environment with the assistance of more than one ADAS. Level 2 is also known as partial driving automation [40]. Levels 3, 4, and 5 are under the system software responsibility in which an autonomous system monitors the environment continuously [41]. The driver is still required in level 3 and level 4. Level 5 is called fully autonomous. Due to legislative factors and technological limitations [42–44], the human driver is still mandatory in AVs. The proposed architecture is applicable to the semi-autonomous vehicles [41], in which the driver or vehicle can shift control. Figure 2 describes the proposed methodology and shows how control will be

shifted between human and driver after calculating the chances of accidents. Figure 3 describes the factors involved at a fuzzy level.

The system will apply fuzzy membership function rules for the fuzzification of input values with every possible combination. Time, environmental, physical, weekday, and driver factors are the inputs to the Mamdani Fuzzy Inference System of MATLAB. The combination of these factors' values will find the chances of a collision occurring. If the chance of collision is high, then the vehicle will take control from the driver and apply the brakes automatically with the help of an agent-based system SIM Connector. When the situation is in normal position, the control will be handed back to the driver. If chances of accident are low, then the vehicle's control will remain with the human driver.

**Figure 2.** Architectural Diagram of Proposed Methodology.

**Figure 3.** Five Level Factors of Collision in V2V.

#### *3.1. Five Factors Description*

We have classified accident reasons into five categories, i.e., environmental, physical, driver characteristics, time, and weekday factors. Each category has further sub-member

functions. These are described in past studies but not combined in one algorithm. For the selected factors, the fuzzy values range from low to high according to the authors in [8,37] and as shown in Figure 3. Five intensity levels from very low, meaning the value is zero, to very high, meaning the value is one. This nature of the variation in values from low to high proves the importance of the fuzzy system. Table 1 defines the numeric range division for factors described in Figure 3.

**Table 1.** Quantitative values of intensity levels for All Factors.


#### 3.1.1. Environmental Factors

Weather conditions [45], road conditions [46], light intensity [47], and traffic volume [48] are the contributing factors of accident. Each of these factors will be checked individually. Weather can be rainy, snowy, foggy, or a dust storm, in which the intensity of the chance of accident will vary. Good road conditions along with rain will produce different results as compared with bad road conditions with rain. Light intensity can be very good, average, or bad. Traffic volume can be high, average, or there can be few vehicles on the road. All these factors and sub-factors are the inputs to Environmental Factors. The result will show how much environmental factors will contribute to accidents.

#### 3.1.2. Physical Factors

This factor includes the speed of the vehicle [49], distracting activities [50] of the driver, and the current vehicle condition [51]. Higher speeds of the vehicle has a higher risk. Distinctive activities can be the use of mobile phones during driving. The vehicle can be in very good condition, average condition, or in very bad condition. All these factors will contribute to accident risk.

#### 3.1.3. Driver Factor

Focus on driving [12] due to alcohol or drugs, behavioral situation [52], fear [53], and behavior in an emergency [54] are the contributing factors that may cause accidents. These are all the inputs to the mental factors which affect driving. According to authors in [11,12], driving experience matters a lot in accident occurrence. The age and gender of the driver [12] also contribute to chances of accident. These are the inputs for the measurements of accident chances. The result will be the output of the driver factor.

#### 3.1.4. Time Factor

The authors in [12] described the effects of day and nighttime on the chance of accident. From 18:00 to 20:00 h, there is a high accident rate. According to their study, accident rate during the daytime is high.

#### 3.1.5. Weekdays Factor

According to authors in [12], weekdays, especially Friday and Saturday, have high chances of accidents because drivers behave differently on different days.

Our proposed system will check which category contributes the most to accident occurrence, and by combining the result of all the categories, the system will calculate the chance of an accident. Every category and its sub-functions will compute the fuzzy values from the vehicle's sensors and other pre-defined fuzzy values. These values will be processed in MATLAB, and if the chance of an accident is high, then the vehicle will apply an immediate brake and send messages to the relevant persons and organizations. The proposed simulation will show how brakes will be applied when chances of accidents are high. If chances of accidents are not high, then control remains with the human driver.

#### **4. Proposed Algorithm**

In semi-autonomous vehicles [55–57], the driving control can be shifted between vehicles and human. The proposed algorithm elaborates how the control of the vehicle will be shifted between the human driver and the vehicle's automatic driving system. Sensor values are the basis for this implementation. Variables store these values, and functions perform operation on these variables and generate the chance of accident value. According to this calculated value, the required function call takes place as shown in Algorithm 1.

**Algorithm 1:** Control Structure for transferring vehicle control to and from vehilce


The proposed algorithm variables and functions are described in Table 2 below. a brief description is provided of the variables and functions in the table. Table-based description helps in easily understanding the purpose of the variables and functions. The proposed algorithm presented in this section shows the collision avoidance mechanism in a simulation environment. The function which obtains sensor values in the proposed algorithm is actually related to different factor values. After calculating the integrated result of every factor, it is passed into a result variable. The selected vehicle in the proposed system takes the necessary action on the basis of the result variable's value. In Figure 10, in the upper right corner, we set the different factors' variables whose values passed into the obtain sensor values function used in the proposed algorithm. Figure 11 shows the selected vehicle with blue color.



#### **5. Experiments**

We divided this section in two parts: First, based on multi-factors, finding the chances of an accident using the Fuzzy Logic Tool Box and, second, Net-Logo-based Simulation experiments to show the effects of multi-factor-enabled vehicle on accidents results.

The Fuzzy Logic Toolbox provides MATLAB functions and a Simulink block for analyzing, designing, and simulating systems based on fuzzy logic, as described in Figure 4. The authors in [58–62] have also used the Fuzzy Logic in the modeling of their proposed work.

We have used the Fuzzy Inference System (FIS) as presented in [63] to apply membership functions on pre-defined input values ranging from 0 to 1. A chance of accident value approaching 0 is considered a low chance of an accident and 1 as a high chance.

Our proposed system is based on five categories of factors which are the causes of accidents and fatalities. We performed the experiments in MATLAB using FIS, and Figures 5–8 are the samples of experiments with input–output relationships.

In Figure 5, multi-factors are given as inputs, and the chances of accidents are calculated. Every factor is evaluated using membership functions of the Mamdani Fuzzy inference system. Whenever the integrated factors values are greater than 0.75, then the system will take control from the driver to avoid collision between vehicles. We are describing the switching of control in the second part of experiments in Net-Logo simulations based on the values of different factors values.

**Figure 4.** Fuzzy Inference System to apply membership functions.

**Figure 5.** Input & Output Relationship of Integrated Factors.


**Figure 6.** Rules Used in Fuzzy Logic.

**Figure 7.** Testing the COA on Rule Viewer with Input as Day Time (DT), Night Time (NT), Driver's Experience (DE), Driver's Age (DA), Weekday (WD), Environmental Factors (EF), Physical Factors (PF).


**Figure 8.** Testing the COA on Rule Viewer.

#### *5.1. Fuzzy-Logic Based Experiments*

Next, the Figures in this section show how inputs as rules are fed into the model and the output results between 0 and 1. Each input/factor consists of five member functions from very low to very high, as described in Table 1. The rules for computing the COA are defined in the FIS Rule editor as shown in Figure 6.

The rules viewer in Figure 7 shows how inputs are contributing to finding the COA. The nighttime input factor is selected as none in Figure 6 and has no effect on the COA. The authors in [12] described the importance of the time factor in collisions. Consequently, when we select the day time input as none, then this factor input will not effect the COA. We performed the experiments by scaling input variables, and the results are verified as per the rules defined in the FIS rule editor. Figures 7 and 8 show the experiment results performed in FIS.

The experiment in Figure 9 shows a very low chance of accident (COA = 0.05), which means control is in driver's hand and there is no need of control switching. However, the experiment in Figure 9 shows a very high chance of accident (COA = 0.854), and there is need of control switching from the driver to the autonomous mode. This variation in the COA from very low to very high validates the proposed fuzzy logic for multi-factor inputs into the proposed system. Further experiments and their results are provided in the result section in Table 3.

**Figure 9.** Sample Experiments Results from VLow to VHigh for COA.


**Table 3.** Different Experiments Results Computed Using Rule Viewer.


#### *5.2. Simulation-Based Experiments*

The approach used to model the complex systems in engineering and technologies, etc., is known as agent-based modeling [64–66], and for this purpose, we used a Net-Logo simulation model. Net-Logo provides an observer which can monitor and validate the simulation scenario. The results achieved using the simulation validate the proposed algorithm. Figure 10 shows the simulation interface for the experiments' setup. The selected blue car is enabled with multi-factors to avoid the collision, and the experiments' results validate that the selected car collisions are much less as compared with other cars in the simulation environment.

This can be seen in the plot generated by the Net-Logo simulator which shows the number of collisions. An alert message box shows a beep when the vehicle is in danger or is in normal condition according to the reading of the factors involved to take necessary actions. The control box in the simulation shows whether the control of the vehicle is in the driver's hand or in the autonomous mode. As the simulation runs, the values of the plot show the collision, alert box status, and the vehicle control changes according to multi-factor values, as seen in Figures 10 and 11. In Figure 10, the control is in the driver's hands, but as the blue car finds very high chances of accident, the control in Figure 11 is then shifted to the autonomous mode. In the simulation, we performed eight experiments with different values of multi-factors ranging from very low to very high. The experiments and their results are presented in Table 4. To understand the effect of multi-factor-enabled vehicles, we discuss experiments 1 and 6 here. In experiment 1, when the daytime factor is very high and the weekday factor is also very high, the vehicle without factors' collision count is 65 and the vehicle with factors' count is 4. In experiment 6, when the environmental factor is high, the physical factor is very high, and daytime is also very high, then the vehicle without factors' collision count is 314, which is very high, and the vehicle with factors enabled's collision count is 23. There is a significant result difference between both vehicles due to the proposed system implementation in the simulation environment. The results of the experiments are given in Figure 12 in the Results section. Figures 10 and 11 show the experimental setup with results plot as number of collisions count.

**Figure 10.** Main Interface of Simulation with Control in Driver Hand.

**Figure 11.** Main Simulation Interface and Control Shifted to AV.

**Figure 12.** Simulation Results with and without Multi Factors.

**Table 4.** Experiments Results Using Net-Logo Simulation with Inputs (EF = Environmental Factor, PF = Physical Factor, DF = Daytime Factor, TF = Time Factor, WF = Weekday Factor).



#### **6. Results**

We divided the results section in two parts. In the first part, we explain the chances of accident calculation using Table 3 and combo chart. In the second part, we explained the results generated after simulation-based experiments in Net-Logo.

The fuzzy experimental results in Figure 7 show very low chances of accidents as compared to experiment in Figure 8, which shows high chances. Table 3 consists of the input factors' contribution to chances of accident, and Figure 9 shows the combo chart, which shows how the COA varies according to input values ranging from 0 to 1. The experiments 5 and 12 with very low and very high COA results in the Table 3 generated from fuzzy logic. The results ranging between very low and very high are validated by the proposed fuzzy logic. For graphical representation of the results in Table 3, we used the combo chart in this section.

Figure 12 shows number of collision with factors and without factors. Red line in the plot shows vehicles without multi factors and blue line in plot shows vehicle with multi factors enabled. The difference between number of collisions can be seen clearly in plots. We also listed the experiments in Table 4 with the results of collisions. The experiments show very good results in collision avoidance when the proposed algorithm is used in the simulation environment setup. The speed plot in the simulation setup shows the speed of both vehicles, and this plot shows clearly that the blue car with multi-factors enabled speed is under control due to speed control function as we used in our proposed algorithm and described in Table 2. The results of the simulation in Table 4 shows a significant difference in collisions between vehicles with and without multi-factors enabled.

#### **7. Discussion**

The primary focus of this research work is to highlight the importance of different factor combinations involved in the vehicle's collision and to avoid this collision with the help of driver assistance software. There are many factors that exist according to the existing research, but it is not possible to cover all of them here. We used some of them and achieve a satisfactory result. We used the Net-Logo tool for simulation and designed a multi-factor-based simulation environment. In the simulation section, when different factor values change from high to very high, the system shows how the control is shifted from the human driver to the vehicle's autonomous mode. In addition, the alert box shows danger beeps for the driver's assistance. Our proposed system computes the quantitative values and calculates the chances of accident. The driver assistance system then takes the necessary action and avoids the collision. After calculating the chances of accident, the driver assistance system with the proposed algorithm activates the different functions to control the speed and apply the brake. These functions are described in Table 2 with function name and function meaning. When required, the driver assistance system generates the alert. Basically, these alerts are warnings for the driver to take necessary action and control the vehicle. If the driver ignores the alert, then the control shift function takes place. When an alert is generated by the system, which is the danger beep, it means that the system is now ready to apply the brake and reduce the speed to avoid the collision if the driver does not take action to handle the emergency. Control shifts from the human to driver assistance systems after alert messages from "normal to danger beep". In Figures 10 and 11, we can see the alert message status changes from normal to danger beep. This switching helps the proposed system to apply the brake and reduce the speed in time because it prepares the system to handle the situation. The status of the alert message changing from normal to danger beep means that the proposed logic is working, monitoring the environment, and calculating the chances of collision and is now ready to take action if the driver does not take the necessary action.

Experiments are performed with considering different values of factors involved in vehicle collision. We can change the values of factors from very low to very high in the simulation model. According to proposed model, the control is shifted from the human to the autonomous mode of the vehicle when chances of accident are high to avoid collision. The algorithm used in the proposed model implementation shows the shifting of control between human and autonomous mode. In the case of very high results computed by the system, an alert is also generated to inform the driver about the current situation. The results show the importance and correctness of the proposed model for collision avoidance. The generated graphs from simulation-based results show that a vehicle with multi-factors enabled shows significant improvement in collision avoidance and also prove the worthiness of the proposed model for a driver assistance system.

#### *Limitations*

There are some limitations in this study which can be addressed in future research. One limitation of the proposed system is that we performed the experiments in a simulationbased environment and not in a real-time environment. The second limitation is if the driver is not responding to alerts and the software is also not in working condition. Then, the proposed system will not be effective at avoiding the collision. The third limitation is related to hardware failure. If any of the hardware fails in the vehicle, it can affect the performance of the proposed system. In addition, when two drivers are driving very closely and the leading driver applies their brakes suddenly, then this scenario is also very difficult to handle to avoid collision.

#### **8. Conclusions**

Our contribution in this work is that we have taken a step towards the betterment of humans using modern techniques. If we timely calculate these factors' risks, we may be able to save one's life. Existing studies proved that different factors individually and combined can play their role in the collision between vehicles. However, no one countered them using simulation-based results to avoid collisions. We first demonstrated and calculated the chances of accident using fuzzy logic and showed the multi-factors' importance. The proposed system first calculates the chances of accident and then avoids the collision by shifting the control from the human driver to vehicle's automated system while generating an alert from the human driver. The simulation results designed in the Net-Logo tool demonstrated that the vehicle which is enabled with multi-factors can avoid the collision as compared to the other vehicles without multi-factors. The automakers in the near future can use this research for the improvement of collision avoidance because, in existing studies, many authors are working on finding factors which are the causes of accidents/collisions.

#### *8.1. Future Work*

The current work is performed on a semi-autonomous vehicle. In the future, fully autonomous vehicles can be accommodated with the fuzzification of accident-causing factors. Message passing in times of emergency can also be processed according to privacy protection rules. Emotional factors may also improve the results to avoid collision. In addition, this model can be enhanced to work on the T-junctions. Another important research direction is that time to collision avoidance can be incorporated in the future to enhance the proposed model.

#### *8.2. Recommendations*

Collision avoidance systems are the key for connected autonomous vehicles and are very helpful in reducing road traffic injuries and fatalities. In this regard, the proposed system is providing a solid foundation to handle different factors involved in the collision of vehicles. This system can be extended to many other issues which are the causes of collision. These are planning and deciding factors, e.g., illegal maneuvers, following too closely, stopping suddenly, or accelerating very rapidly from stop. Factors which are unavoidable by the driver include brakes failing, suspension failing, steering failing, wheels failing, and transmission failing. Incapacitance issues include heart attack or physical impairment of the ability to act. These highlighted factors in the recommendations can be overcome with the proposed system, or new techniques can be applied in the future.

**Author Contributions:** Conceptualization, M.A.S., S.R. and A.R.D.; methodology, S.R.; software, H.T.R.; validation, S.R., A.R.D. and H.A.K.; formal analysis, H.A.K. and E.A.; investigation, A.M.E.-S.; resources, S.M.L.; data curation, H.A.K.; writing—original draft preparation, S.R., H.A.K. and A.R.D.; writing—review and editing, A.R.D. and H.A.K.; visualization, K.A.; supervision, M.A.S. and H.A.K.; project administration, H.A.K.; funding acquisition, K.A., A.M.E.-S. and S.M.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** Researchers Supporting Project number (RSP-2021/133), King Saud University, Riyadh, Saudi Arabia.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** The authors extend their appreciation to King Saud University for funding this work through Researchers supporting project number (RSP-2021/133), King Saud University, Riyadh, Saudi Arabia.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **ROAD Statistics-Based Noise Detection for DME Mitigation in LDACS**

**Miziya Keshkar 1, Raja Muthalagu 2,\* and Abdul Rajak <sup>1</sup>**


**Abstract:** Interference mitigation in L-band Digital Aeronautic Communication Systems (LDACS) from legacy users is extremely important as any error in data retrieval of aeronautic communication can adversely affect flight safety. This paper proposes an LDACS receiver prototype which uses rank-ordered absolute differences (ROAD) statistics to detect the distance measuring equipment (DME) interference. The detected DME interference is reduced in the next stage by pulse blanking. The performance of the proposed ROAD pulse blanking method (ROAD PB) is compared with the existing interference mitigation methods which use the amplitude of the received signal for the detection of DME interference. In depth analysis of the obtained results affirms that the proposed ROAD value-based interference detection excels amplitude-based detection. For an SNR value of 0 dB, the proposed method of detection could achieve a 3% increase in terms of accuracy with a reduction of 4% in false alarms. With the advantage of ROAD statistics detection, the proposed ROAD PB could achieve an SNR saving of 2.7, 1.1, 0.7, 0.25 and 0.2 dBs at BER 10−<sup>1</sup> in comparison with pulse blanking, Genie-aided estimation enhanced pulse peak attenuator (GAEPPA), GAE enhanced pulse peak limiter (GAEPPL), optimum Bayesien estimator enhanced pulse peak attenuator (OBEPPA) and OBE enhanced pulse peak limiter (OBEPPL). The comparative results show that the proposed ROAD pulse blanking outperformed the other techniques for the optimum threshold value of the operation.

**Keywords:** OFDM; LDACS; aeronautical communication; impulse noise; pulse blanking; ROAD statistics

#### **1. Introduction**

Air traffic growth is happening at a very rapid rate. As per Eurocontrol's latest study report about European aviation in 2040, the air traffic growth will be limited by the available capacity at the airports. This can lead to a rapid increase in congestion at the airport, which in turn can cause extra pressure on the network and more delays [1]. To accommodate this huge increase in air traffic, an efficient air traffic management system (ATM) supported by a secure and spectrum-efficient Communications, Navigation and Surveillance (CNS) framework is needed.

The existing air traffic management system is supported by voice and data communications systems. The main voice communication media for air to ground communication is still analog. The existing analog VHF double sideband amplitude modulation (DSB-AM) will remain in service for many more years as it ensures safe and reliable communication with the use of low-cost communication equipment. However, this technology becomes a hindrance in deploying new ATM applications, such as flight centric operation with point-to-point communications [2].

Similar to voice communication, data communication to the cockpit is also ensured by ground-based equipment operating within HF or VHF radio bands. The communication is through narrowband radio channels, which limits the data throughput to some kilobits

**Citation:** Keshkar, M.; Muthalagu, R.; Rajak, A. ROAD Statistics-Based Noise Detection for DME Mitigation in LDACS. *Appl. Sci.* **2022**, *12*, 3774. https://doi.org/10.3390/app12083774

Academic Editors: Giovanni Randazzo, Dimitrios S. Paraforos, Stefania Lanza and Anselme Muzirafuti

Received: 17 January 2022 Accepted: 1 April 2022 Published: 8 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

per second. These data links are insufficient to provide broadband services now or in the future with the existing VHF and HF spectrum [2]. Hence, the International Civil Aviation Organization (ICAO) has recommended Future Communications Infrastructure (FCI) to modernize the existing communication links with new spectrum-efficient and reliable infrastructure, which can support new ATM applications and broadband services. This led to the development of the L-band (960–1164 MHz) Digital Aeronautical Communication Systems (LDACS) [3].

In 2009, the specification of LDACS was proposed. ICAO suggested two possible standards: LDACS1 derived from the IEEE 802.16 wireless system [3] and LDACS2 derived from the global system for mobile communication (GSM) [4]. LDACS1 uses advanced network protocols of current commercial standards. It is a broadband multicarrier system based on orthogonal frequency division multiplexing (OFDM). LDACS2 uses protocols that offer high QoS communications. It is a narrow band single carrier system based on Gaussian minimum shift keying modulation (GMSK). It is expected to accommodate the huge increase in air traffic with the deployment of any of these subsystems of LDACS.

As shown in Figure 1, L-band is already providing services to legacy users such as distance measuring equipment (DME), military tactical air navigation (TACAN) system and joint tactical information distribution system (JTIDS), which are used for navigation aids. Apart from these, universal access transceiver (UAT) at 978 MHz and secondary surveillance radar (SSR) and airborne collision avoidance system at 1030 and 1090 MHz are also allotted with fixed channels [5–8]. However, studies about spectrum occupancy revealed that large portions of the L-band spectrum are used less frequently or underutilized [3,4]. Hence, the LDACS system is deployed in the L-band either as an inlay system between the legacy users or as an overlay system in the unoccupied spectrum [9]. The overlay method is selected for LDACS2 (960–975 MHz). Though the method is less complex, spectrum scarcity is a noticeable challenge [10–12]. The inlay approach, chosen in LDACS1, is expected to overcome the challenge of spectrum scarcity by utilizing the 1 MHz spectral gap between legacy user DME, thereby increasing the spectrum utilization.

The comparative studies between LDACS1 and LDACS2 affirmed that LDACS1 is the preferred choice over LDACS2. LDACS1 is highly capable of supporting high-speed delay-sensitive multimedia services and is also compatible with the cellular communication standards. LDACS1 is further referred to as LDACS [13,14]. Hence, the work presented in this paper uses LDACS to refer to LDACS1 hereafter.

LDACS involves two way communication: a forward link (FL) from the ground station (GS) to air station (AS) and a reverse link (RL) from AS to GS. It provides frequency division duplexing (FDD) of 63 MHz spacing between FL (962–1213 MHz) and RL (1025–1150 MHz) with the opportunistic access of paired spectrum. The deployment of LDACS in L-band gives rise to interferences to license users and vice versa. The possible interference scenario for LDACS is portrayed in Figure 2. In words, it can be affirmed as follows: (a) LDACS FL is impaired by DME GS (FL), not by DME airborne station (AS), as (RL) is not active in this part of spectrum, (b) LDACS RL is impaired by both the DME GS (FL) and DME AS (RL), (c) DME FL is impaired by interference from both the LDACS GS (FL) and LDACS AS (RL) and (d) DME RL is impaired with interference from LDACS AS (RL), not from LDACS GS (FL), as it is not active in this part of the spectrum.

**Figure 1.** L-band Spectrum Occupancy [9].

**Figure 2.** Interference Between DME and LDACS [15].

Identifying the possible interference scenario, the role of interference mitigation is recognized as critically important as any malfunctioning of the licensed system can affect flight safety. The rest of the paper is oriented as follows: Section 2 lists the literature survey of existing DME interference mitigation techniques in LDACS with their advantages and drawbacks. Section 3 expounds on the system and noise model used for this work. Section 4 elaborates the theory and functioning of the proposed method in spotting and reducing DME interference. The section also includes the detailing of nonlinear methods used for the comparison of the proposed method in DME mitigation. Section 5 elucidates the results obtained for the proposed method in terms of reducing the bit error rate in received data with other DME interference reduction techniques. The following are the list of abbreviations used in this paper.

#### **2. Related Work**

Several interference mitigation techniques for LDACS are available in the literature; among these, most of the proposed schemes focus on mitigating DME interference. In 2011, two methods capable of detecting and mitigating DME interference were put forward. The monotonous structure and spectral shape of the DME pulse are exploited in proposing these two methods. The methods achieve the merit of simpleness at the cost of losing a portion of data resulting from pulse blanking [16]. In 2012, Hailiang Wang et al. put forward a concoction of pulse blanking and notch filter to reduce the interference. Though the method gives leeway in the time and frequency domain of operation, it is more safe and effective for the B2 band (1900 MHz) signal [17]. Further, Yun Bai et al. proposed a DME mitigation scheme that advantages complementary code keying (CCK). Though the CCK encoding has the merit of better gain, it requires low phase distortion and a wideband channel. The latency of the system is high due to the large acquisition time. Moreover, the modulation employed here is not a power-efficient modulation [18].

In 2014, Q. Li et al. proposed an iterative receiver design [19]. The design employs iterative decoding between the demodulator and decoder based on the Turbo principle. Another type of selective pulse blanking method to curtail DME interference is put forward in [20]. The method bestowed a designed fast filter bank for this purpose. In 2016, Li Douzheetal et al. propounded a method based on deformed pulse pair spotting and its subtraction from the actual signal [21]. Later, Khodr A. Saaifan and Werner Henkel insinuated lattice signal sets to resist DME interference. Precoding based on lattice signal set at the transmitter changes the shape of the DME signal spectrum. A simple clipping technique is then applied for DME mitigation [22].

The major flaw of pulse blanking is recognized as intercarrier interference. In [9], decision-directed noise estimation is put forward to eliminate the intercarrier interference.

Reduced throughput of transmission and wastage of power are the drawbacks of this scheme. LDACS-OFDM based on discrete wavelet transform (DWT) is discussed in [23]. The scheme makes use of the real nature of DME. The signal affected with DME is selectively transmitted through the quadrature channel. With the effective utilization of direct sequence spread spectrum to combine the in phase and quadrature phase, the method proclaims the elimination of DME. The computational complexity and resource requirement are the flaws present in this scheme. In [24], an energy-based DME detector has been proposed. The detector works with adaptive threshold value in order to obtain the best trade-off between DME signal detection and false alarm. In 2021, a deep clipping-based DME noise reduction technique is propounded in [25]. It is a linear clipping method that uses two threshold levels for recognition and reduction in DME. Detailed study of the existing interference mitigation techniques in LDACS exposes active research is going on in this area.

The Genie-aided estimator (GAE) taps the statistical description of the side information to generate the design parameters to accomplish lower bounds on the bit error at the receiver [26]. However, details of the side information are required to accomplish this lower bound performance. The correlation of the impulsive noise or the frequency in impulsive noise arrival time is some other side information [27]. When a Gaussian source is influenced by uncorrelated impulse noise, it is possible to attain optimum system performance with the use of a Bayesian signal estimator. In 2013, P. Banelli proposed an optimal Bayesian estimator (OBE) mainly for real-valued Gaussian mixture noise [28]. Later, the method was further upgraded for complex signals in 2015 [29]. It is possible to propose different types of pulse peak attenuators and pulse peak limiters for DME mitigation with the estimation knowledge obtained from GAE and OBE [30]. In this paper, we have used GAE and OBE enhanced pulse peak attenuators and limiters to compare the performance of the proposed ROAD pulse blanking.

ROAD statistics-based impulse detector was proposed in 2005 to detect the impulse pixels in an image. The idea can be extended to remove any mix of Gaussian and impulse noise [31]. In [32], the ROAD value of the received signal is used as one of the inputs to train the deep neural network for the detection of signal instances corrupted with impulse noise. The most affected or least acceptable data are present on those subcarriers whose powers are much different from neighboring subcarriers at each time epoch. Hence, it is possible to use ROAD statistics to identify the subcarriers affected with impulse noise. To the best of our knowledge, no work has been reported that employs ROAD statistics for the detection of DME interference to date.

In this paper, ROAD pulse blanking is proposed which uses ROAD statistics for the detection of DME interference and pulse blanking for noise mitigation. The performance of the proposed method is compared with absolute value-based DME interference detection methods such as pulse blanking, GAE enhanced pulse peak attenuators/limiters and OBE enhanced pulse peak attenuators/limiters.

All these methods include two basic operations:


The advantage of the proposed method compared to other methods is that it could identify affected subcarriers more accurately and hence could eliminate noise more effectively. The performance of pulse blanking is observed to be improved when ROAD statistics-based noise detection has been employed. The improvement in performance is such that it outperformed GAE enhanced pulse peak attenuators/limiters and could stand with OBE enhanced pulse peak attenuator/limiter.

#### **3. System and Noise Model**

#### *LDACS System Model*

The system model includes the LDACS transmitter, the channel which imparts additive white Gaussian (AWGN) noise and DME interference and the LDACS receiver. Figure 3

elaborates the detailed block diagram of the LDACS ground station transmitter. Data source creates random data of 91 bytes and passes to Reed–Solomon (RS) coder resulting in an extra 10 bytes of redundant data for error correction and detection. A 6-bit zero padding is performed on the output of the RS coder before passing to the convolutional coder. The coded output of the convolutional coder is made to pass through permutation interleaver for reducing burst errors. The output of permutation interleaver is arranged into a standard LDACS data format (F) after symbol mapping, modulating with Quadrature Phase Shift Keying (QPSK) and frame composing. All the variables shown in Figures 3 and 4 are generated for standard LDACS data frame format (F). The same variables with suffix '*t*' signify the same signal for an instant '*t*' or the *t th* OFDM symbol. In other words, *Ft* = [*Ft*[0], *Ft*[1]...*Ft*[*<sup>N</sup>* − <sup>1</sup>]]*<sup>T</sup>* stands for the *<sup>t</sup> th* symbol of LDACS forward link frame (F) with N orthogonal subcarriers. *Ft* carries the random data *Ft*[*m*]*m*=0,1...*N*−<sup>1</sup> with zero mean and variance *σ*<sup>2</sup> *<sup>F</sup>*. The OFDM symbol *St* = [*St*(0), *St*(1)...*St*(*<sup>N</sup>* − <sup>1</sup>)]*<sup>T</sup>* is generated in the time domain by calculating the 64-point IFFT of the data *Ft*. Further, *NCP* number of cyclic prefix bits are added to the total *N* subcarriers resulting in transmitted vector *X <sup>t</sup>* <sup>=</sup> *X <sup>t</sup>*(0), *X <sup>t</sup>*(1)..., *X t N* + *Ncp* − 1 .

**Figure 3.** LDACS transmitter block diagram.

The transmitted signal *X* changes to *r* when it passes through the channel in the presence of DME. For each instant *t*, the transmitted vector *X <sup>t</sup>* is affected by a noise component *it* = [*it*(0), *it*(1), ..., *it*(*<sup>N</sup>* <sup>+</sup> *Ncp*−1)]*<sup>T</sup>* which is a mixture of the additive white Gaussian noise *At* = [*At*(0), *At*(1), ..., *At*(*<sup>N</sup>* <sup>+</sup> *Ncp*−1)]*<sup>T</sup>* and the impulse noise (DME) *pt* = [*pt*(0), *pt*(1), ..., *pt*(*<sup>N</sup>* <sup>+</sup> *Ncp*−1)]*T*. Thus, the received signal for an instant '*t*' is *r <sup>t</sup>* <sup>=</sup> *r t*(0),*r t*(1),*r t N* + *Ncp* − 1 *<sup>T</sup>* and can be denoted as in (1).

$$r\_t' = X\_t' + i\_t \tag{1}$$

where

$$
\dot{a}\_t = A\_t + p\_t \tag{2}
$$

As discussed in Section 1, the prime contributor of interference to LDACS is DME. These signals are a pair of Gaussian-shaped pulses, separated by a duration of Δ*t*. The transmission rate (30 pulse pairs per second or 50 ppps ), as well as the duration Δ*t* (12 or 36 μs) of DME signals, varies with the mode of operation of the distance measuring equipment. A pair of DME pulses in the baseband can be expressed as in (3) [33].

$$P\_d(t) = e^{\frac{-\zeta t^2}{2}} - e^{\frac{-\zeta(t-\Lambda t)^2}{2}} \tag{3}$$

where *<sup>ζ</sup>* = 4.5 × <sup>10</sup><sup>11</sup> <sup>s</sup><sup>−</sup>2.

It has a width of 3.5 μs at half of the maximum amplitude. The frequency domain representation of DME signal is as in (4). The spectrum is modulated with a cosine as the pulses are always happening pairwise [34].

$$I\_{pd}(f) = \sqrt{\frac{8\pi}{\xi}} e^{\frac{2\pi^2 t^2}{\xi}} \, e^{-j\pi f \Delta t} \cos(\pi f \Delta t) \tag{4}$$

The base band DME pulse pairs are modulated to the relative carrier frequency of the channel to 0.5 MHz left and to 0.5 MHz right of the LDACS system bandwidth. The DME interfering signal that affects the LDACS system is expressed in (5). *Ipd*(*t*) is the total interference signal for a time interval '*t*' caused by *N* number of DME stations that are operating on the 0.5 MHz offset to the center frequency of the LDACS system [35].

$$I\_{pd}(t) = \Sigma\_{i=0}^{N\_{pd-1}} \Sigma\_{l=0}^{N\_l - 1} \sqrt{P\_{i,l}} P\_d(t - t\_{i,l}) e^{j2\pi f\_{c,l}t + j\chi\_{i,l}} \tag{5}$$

where *Npd* is the total number of interfering DME stations, *Ni* is the total number of pulse pair in the particular time interval for the ith interfering DME station, *P*(*i*,*l*),*χ*(*i*,*l*) are power and phase of the pulse pair, respectively, *f*(*c*,*i*)—the relative carrier frequency of the *i*th interfering DME station and *t*(*i*,*l*) is the starting time of the *l*th pulse pairs of the ith DME station. The methods used to reduce impulse noise work well to reduce DME noise also.

**Figure 4.** Proposed LDACS receiver block diagram.

The detailed block diagram of the proposed LDACS FL AS receiver is sketched in Figure 4. The first block in receiver removes the cyclic prefix bits associated with the received signal *r* = [*r t*(0),*r t*(1), ...,*r <sup>t</sup>*(*<sup>N</sup>* <sup>+</sup> *NCP*−1)]*<sup>T</sup>* resulting in signal *rt* = [*rt*(0),*rt*(1), ... *rt*(*<sup>N</sup>* − <sup>1</sup>)]*T*. The nonlinear device ROAD PB detects the DME interference in LDACS signal *r* (with the clever use of ROAD statistics) and performs pulse blanking to reduce the bit error rate in received data. In general, the resulting vector can be defined as *x* = *f*(*r*) or *xt* = *f*(*rt*), where *f*(.) is the nonlinear function with enough intelligence to sense DME interference. The nonlinear devices discussed in this paper process the signal *rt* in dissimilar ways to sense the DME interference. Moreover, the nonlinear estimators used for the performance comparison of the proposed method utilize one more vector *π<sup>t</sup>* to estimate the received signal data. Hence, the definition of function *f*(.) varies with different nonlinear devices.

The nonlinear device is operated on the signal *r* before the DFT processing to block the dispersion of sparse time domain impulses *pt*[*n*] over all the OFDM carriers in the frequency domain.

#### **4. Nonlinear Estimators**

As discussed in the system model, the nonlinear device is designed to detect and eliminate DME interference from the LDACS AS receiver. In this paper, the proposed nonlinear device ROAD PB uses ROAD statistics for the detection of DME interference and pulse blanking for the mitigation of interference. The performance of the method is compared with the conventional pulse blanking, which uses the amplitude of the received signal for the detection of DME interference. In addition, nonlinear estimators such as GAE enhanced pulse peak processors and OBE enhanced pulse peak processors are also used for the mitigation of DME interference to compare the performance of ROAD PB.

The functioning of the nonlinear devices ROAD PB, pulse blanking, GAEPPA, GAEPPL, OBEPPA and OBEPPL in the detection and elimination of DME noise are discussed below.

#### *4.1. Proposed ROAD PB*

When the LDACS signal is affected by DME interference, the amplitude of the received data exhibits a large variation in amplitude from neighboring data. The variation in amplitude due to bit change or additive white Gaussian noise is less due to DME interference. Therefore, the most affected data are those whose amplitude is much different from neighbors. The proposed ROAD PB uses ROAD statistics to quantify the variation in amplitude of particular LDACS (OFDM) data from the neighboring data for each time epoch. Further, the most basic threshold-based detection approach is utilized to identify the data signal affected by DME interference. The goal of fixing a suitable threshold is to recognize the OFDM data that are significant outliers.

The calculation of ROAD value [32] for a one-dimensional LDACS OFDM symbol involves the following steps:

1. The received OFDM symbol for each time epoch is considered as a one-dimensional vector. For a one-dimensional vector of size (1 × 2f + 1), the absolute difference between a center sample and a receiving sample for each time epoch *fd*(*k*) are calculated as in (6)

$$f\_d(k) = |r\_k - [r\_{k-f'} \dots r\_{k+f}]|\tag{6}$$

2. The difference vector ( *fd*(*k*)) is sorted in increasing order.

$$Q(k) = \operatorname{sort}(f\_d(k)).\tag{7}$$

3. The ROAD value is calculated as the sum of first *f* values of *Q*(*k*)

$$ROAD = \Sigma\_{k=1}^f Q(k). \tag{8}$$

The simple way to recognize the effectiveness of ROAD statistics is to incorporate the method into any existing DME mitigation method. Hence, the proposed ROAD PB incorporates ROAD statistics into pulse blanking. The mathematical depiction of ROAD pulse blanking is as in (9). Here, *Rp* is the lower threshold value used to discriminate LDACS signals affected with DME interference.

$$\mathfrak{x}\_{|R\_P|}(r) = \begin{cases} |r|e^{j\arg(r)} & \text{if } ROAD(r) \le R\_p \\ 0 & \text{Otherwise} \end{cases} \tag{9}$$

The reduction in bit error rate at the receiver is compared with normal pulse blanking in Section 5.

The extra computational complexity put forward by ROAD PB compared to conventional pulse blanking is the sum of the computational complexity put forward by the steps involved to calculate ROAD value, as in (6)–(8). As these three steps have no complex multiplication in calculating the ROAD value, it is evident that the extra complex multiplication contributed by ROAD PB is zero. Hence, there is no change in number of complex multiplication compared to LDACS OFDM receiver (without any mitigation) or to pulse blanking. As we have used fast Fourier transform in LDACS OFDM receiver, the total number of complex multiplications involved is (*N*/2)·*log*2(*N*), where *N* is the number of subcarriers in OFDM signal [36].

The steps in (6) introduce the extra complex additions *N* · (*N* − 1) or real number addition 2*N* · (2*N* − 1). Hence, the total number of complex additions of LDACS OFDM receiver with ROAD PB is *N* · *log*2*N* + *N* · (*N* − 1). The number of real-time additions put forward by sorting depends on the type of sorting that is used. For instance, if selective sorting is used, it introduces *<sup>N</sup>*2·(*N*−1) <sup>2</sup> number of real-time additions. Finally, the number of real value additions introduced by step (8) is *<sup>N</sup>*·(*N*−1) <sup>2</sup> . The proof is included in Appendix A.

#### *4.2. Conventional Pulse Blanking*

The method pulse blanking makes the signal zero if the absolute value of the received signal is above a particular threshold value *αp*. The mathematical depiction of pulse blanking is as in (10),

$$\mathfrak{x}\_{|P|}(r) = \begin{cases} |r|e^{j\arg(r)} & \text{if } |r| \le \mathfrak{a}\_p \\ 0 & \text{Otherwise} \end{cases} \tag{10}$$

#### *4.3. Pulse Peak Processors*

As discussed in the system model, data estimation needs one more vector *π<sup>t</sup>* along with *rt* in performing the nonlinear function *f*(*πt*, *yt*). The detailing of the nonlinear operations performed by pulse peak attenuators and limiters are depicted in block diagrams Figures 5 and 6, respectively. The received data (after CP removal) *r* are passed through a 2-GMM estimation block to extract the vector *πt*. The parameters contained in vector *π<sup>t</sup>* are further used to compute scaling factor *μ<sup>t</sup>* for each time epoch. From Figure 5, it is clear that pulse peak attenuator uses this parameter *μ* for processing the signal *rt*.

From Figure 6, it is to be noted that pulse peak limiters have one extra block compared to pulse peak attenuators. It is denoted as a decision device that holds the algorithm to change or update the scaling factor. The decision device determines if the scaling factor is needed to modify or not. When the scaling is performed with a modified scaling factor, pulse peak attenuators become pulse peak limiters. In this paper, we have included two types of pulse peak attenuators and four types of pulse peak limiters to compare the performance of proposed ROAD statistics.

It is inevitable to discuss K-GMM modeling to understand the vector *π<sup>t</sup>* in more detail. This section outlines how GAE and OBE enhanced pulse peak processors exploit 2-GMM estimation (2-GMM) to scale the signals affected with DME.

Any random variable can be expressed as the combination of K-number mutually exclusive Gaussian variables with K-GMM modeling [37,38]. Hence, this model can be effectively applied to any ImpN distribution (Class A, S-*α*-S noises, etc.), either estimated [39–41] and approximated by a K-GMM [42] or modeled with the actual equation [43]. The K-GMM model is mathematically expressed with pdf,

$$f\_W(i) = \Sigma\_{k=0}^{K-1} P\_k \cdot G(i, \sigma\_k^2) \tag{11}$$

where {*Pk*}*k*=0,1,......*K*−<sup>1</sup> with <sup>Σ</sup>*K*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> *pk* = 0 are the probability of occurrence of each Gaussian component *<sup>k</sup>*. For the value *<sup>k</sup>* = 0, the component *<sup>i</sup>*<sup>0</sup> ≈ *<sup>G</sup>*(*i*0, *<sup>σ</sup>*<sup>2</sup> <sup>0</sup> ) represents the thermal noise with variance *σ*<sup>2</sup> <sup>0</sup> and with the probability of occurrence *p*<sup>0</sup> . For values *k* = 1 to *K* − 1, the statistical combinations of components characterize the impulse noise with the probability *pI* = <sup>1</sup> − *<sup>p</sup>*<sup>0</sup> and noise power *<sup>σ</sup>*<sup>2</sup> *<sup>I</sup>* . The ratio of thermal noise to impulse noise is expressed as <sup>Γ</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> 0 *σ*2 *I* . For the value *K* = 2, this model will reduce to 2-GMM with a thermal noise component *<sup>i</sup>*<sup>0</sup> ≈ *<sup>G</sup>*(*i*0, *<sup>σ</sup>*<sup>2</sup> <sup>0</sup> ) and an impulse noise component *<sup>i</sup>*<sup>1</sup> ≈ *<sup>G</sup>*(*i*1, *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> ) .

The 2-GMM model is simple and assumes the presence of a strong impulsive noise as the recognition of only two mutually exclusive events, with probability *p*<sup>0</sup> and *p*1. Hence, 2-GMM is exploited to employ GAE and OBE enhanced pulse peak processors as DME interference mitigators in LDACS receivers. The received signal *r*, when passed through 2-GMM estimation, results in parameters thermal noise component and impulse noise component with a probability of *p*<sup>0</sup> and *p*<sup>1</sup> for each time epoch. Thus, the vector *π* holds parameters *σ*<sup>2</sup> <sup>0</sup> , *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> , *p*<sup>0</sup> and *p*<sup>1</sup> obtained from 2-GMM estimation. These parameters are used to calculate the instant scaling parameter *μ<sup>t</sup>* to apply instant nonlinearity to the affected subcarriers in the time domain. The parameter *μ* varies with different types of pulse peak processors.

**Figure 5.** General block diagram for pulse peak attenuator.

**Figure 6.** General block diagram for pulse peak limiter.

GAE Enhanced Pulse Peak Processors

The pulse peak processors included in this section are GAE enhanced PPA and GAE enhanced PPLs (Type I and Type II). These pulse peak processors have better performance than GAE by utilizing other side information, such as impulsive noise arrival time or relationship of the impulsive noise [30].

The received signal *r* can be expressed as the sum of transmitted signal *X* and noise *i*|*<sup>k</sup>* , where *X* and *i*|*<sup>k</sup>* are two zero-mean independent Gaussian random variables with variances *σ*2 *<sup>X</sup>* and *<sup>σ</sup>*<sup>2</sup> *<sup>i</sup>* . With any impulse noise, when modeled as properly weighted mutually exclusive Gaussian events, the GAE claims to know which is the *k*th Gaussian component of the (pdf), which actually affects the transmitted signal at each time epoch. Once the signal affected with DME interference is identified, all three GAE enhanced pulse peak processors use the same knowledge of GAE for pulse peak processing. The amplitude of the received signal is used for the detection of DME as in pulse blanking.

The GAE enhanced PPA attenuates the nonlinear input *r* when the amplitude of the received signal exceeds the threshold value. The operation of GAE enhanced PPA can be expressed as follows:

$$\mathfrak{A}\_{|kPA|}(r) = \begin{cases} |r| & \text{if } |r| \le a\_p \\ \rho\_k \cdot |r| & \text{otherwise.} \end{cases} \tag{12}$$

where *<sup>ρ</sup><sup>k</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *x σ*2 *<sup>x</sup>*+*σ*<sup>2</sup> *k*

$$And\_{\prime}\sigma\_k^2 = (1 + \frac{k}{A\Gamma})\sigma\_0^2 = \frac{k/A + \Gamma}{1 + \Gamma}\sigma\_i^2 = \frac{k}{A\Gamma}\sigma\_I^2 + \sigma\_0^2\tag{13}$$

As the device is a pulse peak attenuator, the scaling factor *μ* is *ρ*. The scaling factor changes at each instant *t* as it is a function of *rt* and *πt*.

GAE enhanced PPA has better performance than the pulse blanking method as it attenuates the DME affected signal rather than losing the data by blanking. The device is well suited to process complex data signals as in LDACS with the modified equation as follows:

$$\mathfrak{X}\_{|kPA|}^{\*}(r) = \begin{cases} |r|e^{jarg(y)} & \text{if } |r| \le \alpha\_p \\ \rho\_k \cdot |r|e^{jarg(r)} & \text{otherwise.} \end{cases} \tag{14}$$

GAE enhanced PPL is a modified or improved form of GAE enhanced PPA. It employs a decision device to update the scaling factor to attenuate the received signal continuously until the amplitude of the received signal reaches the threshold value (*αp*). The repeated attenuation will not affect other subcarriers which are not affected with DME noise. This nonlinear device has better performance at high SNR values compared to the GAE enhanced PPA. The GAE enhanced PPL processes the input signal *r* and delivers output *x*ˆ*kPL*(*r*) as stated in (15).

$$\hat{\mathfrak{X}}\_{|kPL|}(r) = \begin{cases} r & \text{if } |r| \le a\_p \\ \rho\_{mod} \cdot r & \text{otherwise.} \end{cases} \tag{15}$$

where *<sup>ρ</sup>mod* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *x σ*2 *<sup>x</sup>*+*N*·*σ*<sup>2</sup> *k* .

Here, the value of *N* varies directly with the difference in power of received signal and threshold peak detection value at each instant. The maximum value of *N* occurs when the resulting signal holds no values greater than *αp*. With this knowledge, the value of *Nmax* is as in (16). The algorithm to support this derivation is from [30]

$$N\_{\max} = \frac{\sigma\_x^2 (|r|\_{\max} - \alpha\_p)}{\alpha\_p \sigma\_k^2} \tag{16}$$

Similar to GAE enhanced PPA, GAE enhanced pulse peak limiters also reduce the drawback of the pulse blanking method with less complexity. The same operation can be performed in another way as in (17)

$$\mathfrak{X}\_{|kPLs|}(r) = \begin{cases} r & \text{if } |r| \le \alpha\_p \\ M \cdot \rho\_k \cdot r & \text{otherwise.} \end{cases} \tag{17}$$

where the maximum value of value of *M* is derived as in (18) [30]

$$M\_{\text{max}} = \frac{\alpha\_p}{\rho\_k \cdot |r|\_{\text{max}}} \tag{18}$$

From Equations (15) and (17), the updated scaling factors (*μ*) for GAEPPL Type I and Type II are identified as *ρmod* and *M* · *ρk*. Both the methods are applicable to perform scaling of complex valued data with a slight change in Equations (15) and (17) resulting in (19) and (20), respectively.

$$\mathfrak{X}\_{|kPL|}^\*(r) = \begin{cases} |r|e^{jar\mathfrak{g}(r)} & \text{if } |r| \le a\_p\\ \rho\_{mod} \cdot |r|e^{jar\mathfrak{g}(r)} & \text{otherwise.} \end{cases} \tag{19}$$

where *<sup>ρ</sup>mod* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *x σ*2 *<sup>x</sup>*+*N*·*σ*<sup>2</sup> *k*

$$\hat{\mathfrak{X}}\_{|kPLs|}^{\*}(r) = \begin{cases} |r|e^{jar\mathfrak{g}\cdot(r)} & \text{if } |r| \le a\_p\\ M \cdot \rho\_k \cdot |r|e^{jar\mathfrak{g}\cdot(r)} & \text{otherwise.} \end{cases} \tag{20}$$

In both cases, the definition for *ρmod* and *M* remains the same as that used in Equations (15) and (17).

#### *4.4. OBE Enhanced Pulse Peak Processors*

.

Bayesian estimators are useful in any Gaussian source affected by any Gaussianmixture noise [28]. The time domain OFDM signal *x* can be approximated by Gaussian pdf, *fX*(*x*) = *G x*; *σ*<sup>2</sup> *x* <sup>=</sup> *<sup>x</sup>*2/2*σ*<sup>2</sup> √ *<sup>X</sup>* <sup>2</sup> . The complex valued received signal *rt*[*n*] at the receiver side has real and imaginary parts *rt*,*R*[*n*] and *rt*,*I*[*n*], respectively. Consider that *r* represents distinctly either the real or the imaginary part of *rt*[*n*]. When the received signal of interest

is modeled or approximated as a Gaussian pdf or K-component pdf, the minimum mean square error Bayesian estimators can be effectively utilized along with the knowledge of the signal *xt*. By exploiting the statistical dependency between *X* and *i*, it is possible to write *<sup>f</sup>*(*r*|*<sup>X</sup>*)(*y*) = *fi*(*<sup>r</sup>* <sup>−</sup> *<sup>x</sup>*) and *<sup>G</sup>*(*r*; *<sup>σ</sup>*<sup>2</sup> *rk*) = *<sup>G</sup>*(*r*; *<sup>σ</sup>*<sup>2</sup> *<sup>x</sup>* ) ∗ *<sup>G</sup>*(*r*; *<sup>σ</sup>*<sup>2</sup> *<sup>k</sup>* ) . Here, ∗ stands for convolution operation. Thus, the received noise power *σ*<sup>2</sup> *rk* is the sum of the signal power *<sup>σ</sup>*<sup>2</sup> *<sup>x</sup>* and *k*th Gaussian component noise power *σ*<sup>2</sup> *k*

$$
\sigma\_{rk}^2 = \sigma\_r^2 + \sigma\_k^2 \tag{21}
$$

The OBE enhanced pulse peak processors perform Bayesian estimation only when the received signal is identified as DME affected signal. The pulse peak processors included in this section are OBE enhanced pulse peak attenuator and OBE enhanced pulse peak limiters (Type I and Type II).

The device attenuates the nonlinear input *r* when the amplitude of the received signal is above *α<sup>p</sup>* . The mathematical expression of OBE enhanced pulse peak attenuator is as in (22).

$$\pounds\_{|kOPA|}(r) = \begin{cases} |r| & \text{if } |r| \le \mathfrak{a}\_p \\ \beta\_o(r) \cdot r & \text{otherwise.} \end{cases} \tag{22}$$

where

$$\beta\_o(r) \cdot r = \frac{\Sigma\_{k=0}^{K-1} \rho\_k p\_k G(r; \sigma\_{rk}^2)}{\Sigma\_{k=0}^{K-1} p\_k G(r; \sigma\_{rk}^2)} \tag{23}$$

As the device is a pulse peak attenuator, the scaling factor *μ* is *βo*(*r*). It is possible to use this OBE enhanced PPA for processing complex valued signal as well. The mathematical statement for this operation is as given in (24).

$$\mathfrak{k}\_{|k\text{OPA}|}(r) = \begin{cases} |r| & \text{if } |r|e^{j\arg(r)} \le a\_p\\ \beta\_o(r) \cdot |r| \cdot e^{j\arg(r)} & \text{otherwise.} \end{cases} \tag{24}$$

OBE enhanced PPL is an altered or upgraded form of OBE enhanced PPL, where it reduces the amplitude of the received signal unceasingly until the amplitude of the DME affected subcarrier reaches the threshold value. As the repeated attenuation is performed only for the subcarriers which exceed the threshold value, it will not disturb the subcarriers which are not affected with DME interference. The OBE enhanced PPL process the input signal *r* and deliver output *xKOPL*(*r*) as stated in (25).

$$\pounds\_{|kOPL|}(r) = \begin{cases} |r| & \text{if } |r| \le \alpha\_p \\ \beta\_{mod}(r) \cdot |r| & \text{otherwise.} \end{cases} \tag{25}$$

where

$$\beta\_{mod}(r) = \frac{\mathfrak{a}\_p}{|r|} \tag{26}$$

Here, value of *βmod*(*r*) is the modified scaling factor *μ*. The modification can be performed in two ways so that *βmod*(*r*). |*r*| becomes equal to *αp*.

In one method, *P* multiples of *βO*(*r*) is considered as *βmod*(*r*). In this situation, the maximum value of *P* for limiting the output (*Pmax*) can be expressed as in (27) [30].

$$P\_{\text{max}} = \frac{\mathfrak{a}\_p}{\beta\_{mod}(r)|r|\_{\text{max}}} \tag{27}$$

In the second method, the value of the noise power component *σ*<sup>2</sup> *<sup>k</sup>* is boosted *R* times so that the output of the nonlinear device is limited to the threshold value *αp*. In this case, the modified scaling factor *βmod*(*r*) can be expressed as in (28) [30].

$$\beta\_{mod}(r) = \frac{\Sigma\_{k=0}^{K-1} \rho\_{mod} p\_k G(r; \sigma\_{rkmod}^2)}{\Sigma\_{k=0}^{K-1} p\_k G(r; \sigma\_{rkmod}^2)} \tag{28}$$

where *<sup>ρ</sup>mod* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> *x σ*2 *<sup>x</sup>*+*R*·*σ*<sup>2</sup> *k* and *σ*<sup>2</sup> *rkmod* = *<sup>σ</sup>*<sup>2</sup> *<sup>x</sup>* + *<sup>R</sup>* · *<sup>σ</sup>*<sup>2</sup> *<sup>k</sup>* . Both the methods are adaptable to complex valued OFDM data signals as in LDACS. This can be stated mathematically as in (29)

$$\mathfrak{X}\_{|kOPL|}^{\*}(r) = \begin{cases} |r|e^{jarg(r)} & \text{if } |r| \le a\_p\\ \beta\_{mod}(r) \cdot |r|e^{jarg(r)} & \text{otherwise.} \end{cases} \tag{29}$$

#### **5. Results and Discussions**

This section exposes the advantages of ROAD statistics-based sensing over amplitudebased sensing in LDACS FL communication. The results obtained from the detailed analysis of threshold ROAD value-based sensing for different threshold values under different SNR conditions are distinctly presented. The section also discusses the performance of the proposed ROAD statistics-based nonlinear device (ROAD PB) in reducing DME interference when employed in OFDM-based LDACS communication. The discussion is based on the results obtained from the MATLAB simulation of the LDACS forward link communication prototype. The performance of the proposed method is compared with the conventional pulse blanking method which uses the amplitude of the received signal for the detection of DME interference. The nonlinear devices such as GAEPPA, GAEPPL, OBEPPA and OBEPPL are also included to compare the performance of ROAD PB. The mathematical model of the LDACS FL GS transmitter (Figure 3) and LDACS FL AS receiver (Figure 3) are developed as per the standards of the LDACS system for all the inner building blocks.

At the transmitter side, random data of 91 bytes are generated by the data source and given as the input of RS coder (91,101) for external coding. Once external encoding is performed by the RS encoder, 6-bit zero padding is performed before passing through internal encoding by the convolutional encoder (171,133). The encoded bits from the output of convolutional coder, with native coding rate half, are further interleaved (using permutation interleaver) and then mapped to symbols (using symbol mapper). The mapped symbols form complex values when they pass through the QPSK modulation block. The frame composer block forms the LDACS FL data/CC frame with proper insertion of pilot values (158), null values (728) and complex data values (2442) over a total of 3328 subcarriers. Further, the time domain composite waveform of this OFDM frame is generated by passing the frame through the IFFT block of length 64. The effect of the introduction of IFFT (windowing) is canceled by adding 16 cyclic prefix bits. Table 1 holds the OFDM system parameters used in this simulation study.

To analyze the performance degradation of the LDACS FL AS receiver due to DME interference, the AWGN channel is considered. The BER variation of the received signal when passed through the AWGN channel without the influence of DME interference is obtained as shown in Figure 7. For the study of interference on LDACS, DME signals are generated by (3) for a duration Δ*t* of 12 μs as shown in Figure 8. The baseband DME pulse pairs are modulated to the relative carrier frequency of the channel to 0.5 MHz left and to 0.5 MHz right of the LDACS1 system bandwidth. A reduction in performance of the LDACS FL AS receiver can be observed when DME interference is allowed to affect the transmitted data. Figure 7 also shows how the existing simple noise reduction method (pulse blanking) improved the performance of the receiver. The threshold value used for the pulse blanking method is 0.3. Careful analysis of Figure 7 reveals the fact that the pulse blanking technique showed a significant improvement in the performance of the receiver at high SNR powers and a slight decrease at low SNR values. The reason for the reduction in

performance of pulse blanking (at low SNR values) is the false detection caused due to the amplitude-based sensing and the resulting extra loss of data. It is possible to reduce this number of false detections by increasing the threshold value of detection. However, high threshold value can lead to an increase in missed detection and more interference power at the output of the receiver. The false detection caused due to the amplitude-based sensing for an SNR value of 15 dB is visible (sample values between 2500 and 3000) in Figure 9.

**Figure 7.** Performance of conventional pulse blanking technique vs. without DME interference and with DME interference.

**Figure 8.** Standard DME pulse pair.

The amplitude of the DME interference signal, amplitude of the received signal along with the threshold value for sensing and the calculated ROAD values of the received signal are plotted in Figure 9. The signals are plotted together to figure out how both amplitude and ROAD value-based sensing accomplishes the detection of DME interference pulses. It can be observed from Figures 9 and 10 that the ROAD values of the received signal is

a magnified (though not exactly) version of absolute difference of each sample from the neighboring sample. Figure 11 depicts how well the ROAD values of the received data could identify the exact location and shape of the DME pulses than amplitude sensing. The performance of both amplitude-based sensing and ROAD statistics-based sensing for a low SNR value (0 dB) is shown in Figure 10. Comparison of Figures 9 and 10 shows that the amplitude-based sensing has less performance for low SNR value 0 dB due to the increased number of false detections. ROAD value-based sensing not only showed improved performance than the other but also preserved it (though not fully) irrespective of the SNR values as the rank-ordered difference value is considered for sensing. It has been observed that the performance of both amplitude-based sensing or ROAD value-based sensing may vary with both the SNR levels and threshold values. Hence, a detailed study of amplitude-based sensing and ROAD value-based sensing have been performed to get more insight of the process.

**Figure 9.** DME interference (**Top**), amplitude of the received signal without interference mitigation (**Middle**) and ROAD value of the received signal (**Bottom**) for an SNR of 15 dB.

**Figure 10.** Amplitude of the received signal without interference mitigation and ROAD value of the received signal for an SNR of 0 dB.

**Figure 11.** DME interference (**Top**), amplitude of the received signal without interference mitigation (**Middle**) and ROAD value of the received signal (**Bottom**) for an SNR of 15 dB.

An inclusive analysis of amplitude-based sensing and ROAD value-based sensing at low and high SNR levels has been portrayed in Figures 12 and 13, respectively. The characterization of different probability measures ( accuracy, false detection and missed detection) are also analyzed in Figures 14–16. The findings from the comparative study of amplitude-based sensing and ROAD value-based sensing are as follows:


The above mentioned (2, 3, 5 and 9) statements, which are realized from the obtained results, affirm that ROAD value-based sensing is admirable in comparison to amplitude value-based sensing in detecting DME interference.

Furthermore, the characterization of the ROAD statistic-based detection for different threshold ROAD values and SNRs has been performed. The results are as displayed in Figures 17–19. It has been observed that the probability of false detection decreases with an increase in threshold values (Figure 17) and the probability of missed detection increases with an increase in threshold value, Figure 18. Hence, there is a trade-off between false detection and missed detection for different values of threshold. Hence, to identify the optimum threshold, we have analyzed the variation in probability of correct detection (accuracy) for different threshold values. From Figure 19, it is noted that the accuracy in ROAD value-based sensing increases from threshold ROAD value 5 to 8. The reason for this nature is the considerable reduction in false detection occurring in ROAD value-based sensing. Further, the performance starts diminishing from 8 to 12. This can be due to the increase in missed detection that occurs for high threshold value.

**Figure 12.** DME interference (**Top**), threshold amplitude-based sensing (**Middle**) and threshold ROAD value-based sensing (**Bottom**) for an SNR of 0 dB.

**Figure 13.** DME interference (**Top**), threshold amplitude-based sensing (**Middle**) and threshold ROAD value-based sensing (**Bottom**) for an SNR of 15 dB.

**Figure 14.** Probability of detection vs. SNR.

**Figure 15.** Probability of false detection vs. SNR.

For better clarity of the results, the variation in probability of false detection and missed detection has been plotted for a constant SNR value. The rate of decrease in false detection with the increase in threshold value is clearly visible in Figure 20. The variation in accuracy and missed detection are separately plotted to verify the optimum threshold value of detection and the reason behind it. From Figure 21, it is evident that optimum threshold value of detection is occurring for a value of 8. The reduction in detection after threshold value 8 is due to the increase in missed detection as in Figure 22.

Once the significance of ROAD value-based sensing and its optimum threshold ROAD value are identified (based on experimental results), the method is incorporated with an existing pulse blanking method to propose a new DME mitigation scheme.

**Figure 16.** Probability of missed detection vs. SNR.

**Figure 17.** Probability of false detection in ROAD statistics-based detection for different threshold value vs. SNR.

The block diagram of the proposed LDACS receiver is shown in Figure 4. Initially, the cyclic prefix bits are removed from the received data. The resulting data are then converted into frequency domain using fast Fourier transform. Further, the frame decomposer separates the pilot symbols and complex data from the corresponding subcarriers. The segregated complex data values further undergo QPSK demodulation and symbol demapping to obtain the bitstreams. The bitstreams are de-interleaved and decoded using de-interleaver and vitterbi decoder, respectively. From the output of the vitterbi decoder, redundant bits are removed and decoded using the RS decoder to obtain the original data.

Figure 23 shows the variation in BER with different transmit SNR values. It has been observed that the proposed ROAD PB exhibits a much improved performance for a threshold value of 8. Figure 24 shows the variation in performance of ROAD PB for different threshold values ranging from 7 to 11. The performance of ROAD PB initially improved with a rise in threshold values and then started diminishing. The reason for this nature

is very obvious; ROAD detectors with low threshold values perceive the small variations from the neighboring carrier as DME interference. The actual data can cause a variation in ROAD values which can lead to false detection of DME interference. Moreover, once DME interference is detected in a subcarrier with a low threshold value, blanking the subcarrier causes the loss of more data. As OFDM systems have a self-removal noise mechanism due to the principle of orthogonality, the focus of detection is for large variation. Hence, there is an optimum high threshold value for which leaving the data is better than maintaining or estimating. From the results shown in Figure 24, the optimum threshold value is noted as 8. The accuracy of interference detection starts diminishing for a threshold value greater than an optimum threshold value. In this situation, the ROAD interference detector will only sense a very large variation from the neighboring carrier as DME interference leading to missed detection.

With the introduction of pulse blanking, a possibility of change in optimum threshold exists, if one considers the trade-off between interference power and signal distortion. Excess interference power may exist in mitigated data if the threshold value is high. On the other hand, reducing the threshold value can cause more distortion and loss of data due to blanking. In our work, the optimum threshold value of ROAD value-based sensing is recognized as the optimum value of operation to obtain data with minimum BER (Figures 21 and 24). As ROAD value-based noise detection is more accurate than the conventional amplitude-based method, it introduces less distortion in the mitigated data and prevents the extra loss of falsely detected data.

**Figure 18.** Probability of missed detection in ROAD statistics-based detection for different threshold value vs. SNR.

The performance of the proposed ROAD PB is further compared with GAE enhanced pulse peak processors and the results are as shown in Figure 25. The proposed ROAD PB outperformed the GAE enhanced pulse peak attenuators and limiters (Type I and II). Similarly, Figure 26 depicts the performance comparison of the proposed ROAD PB with OBE enhanced pulse peak processors. It has been observed that ROAD PB could outperform OBE PPA. Moreover, ROAD PB has similar or slightly better performance than OBE PPLs for low SNR values. For SNR values 8 dB and above, OBE enhanced pulse peak limiters performed better than ROAD PB. Figure 27 compares the performance of ROAD PB with pulse blanking, GAE PPL (Type 2) and OBE PPL (Type 2). It has been observed that GAE PPL and OBE PPL have an improved performance compared to pulse blanking as data estimation has been performed instead of blanking the noise affected signal. The threshold value used for all types of pulse peak processors is 0.3 [30]. When ROAD statistics is incorporated with pulse blanking, the performance could be improved better than GAE PPL and comparable performance with OBE PPL.

**Figure 19.** Accuracy in ROAD statistics-based detection vs. SNR.

**Figure 20.** Probability of detection vs. threshold for SNR = 5 dB.

**Figure 21.** Accuracy in ROAD statistics-based detection (SNR = 5 dB) vs. threshold.

**Figure 22.** Probability of missed detection in ROAD statistics-based sensing (SNR = 5 dB) vs. threshold.

**Figure 23.** Performance of ROAD PB vs. pulse blanking.

**Figure 24.** Variation of ROAD PB with different threshold value.

**Figure 25.** Performance of ROADPB vs. GAE enhanced pulse peak processors.

**Figure 26.** Performance of ROADPB vs. OBE enhanced pulse peak processors.

**Figure 27.** Performance of ROADPB vs. pulse blanking, GAE and OBE enhanced pulse processors.

**Table 1.** OFDM Parameters for LDACS1 [9].


#### **6. Conclusions**

In this paper, a new DME mitigation scheme named ROAD PB is proposed to mitigate the DME interference using pulse blanking. The ROAD PB detects DME interference with a method named ROAD statistics-based detection. The method detects the interference from the ROAD values of the received signal. The performance of the new detection method is compared with the conventional amplitude-based sensing method. The results guided to the following conclusions:


From the results obtained with the comparative study of the proposed method (ROADPB) with amplitude-based detection methods such as pulse blanking and pulse peak processors, we observed the following:


In comparison to pulse blanking, ROAD PB could achieve the SNR saving of 2.7 dB at a BER of 10−<sup>1</sup> by introducing some amount of complexity in the receiver. Moreover, at a BER of 10−1, ROAD PB could accomplish SNR savings of 2.7, 1.1, 0.7, 0.25 and 0.2 dBs compared to GAEPPA, GAEPPL, OBEPPA and OBEPPL, respectively. The proposed ROAD PB is significant due to its improved performance at low SNR regions in comparison to pulse blanking. Moreover, ROAD value-based detection can be used to sense impulse noise in any type of OFDM-based communication systems where threshold-based detection can be used. In the future, ROAD value-based detection can be incorporated with any other threshold-based DME mitigation scheme such as GAE enhanced methods. ROAD PB-based LDACS receiver can be extended for the en-route channel. The performance of this method in LDACS RL can also be analyzed. Though this method is investigated on the LDACS background, the method is compatible in cutting down impulse noise in any OFDM-based communication systems.

**Author Contributions:** Conceptualization, M.K. and R.M.; methodology, M.K. and R.M.; software, M.K.; validation, A.R.and R.M.; formal analysis, M.K.; writing—original draft preparation, writing—review and editing, M.K.; supervision, R.M. and A.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Exclude this statement.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**




#### **Appendix A**

Following, we provide the derivation of extra computational complexity introduced by ROAD PB, Section 4.1.

#### *Appendix A.1*

The extra computational complexity put forward by ROAD PB compared to conventional pulse blanking is the sum of the computational complexity put forward by the steps to calculate ROAD value. The following steps are used to calculate ROAD value of a center sample in one OFDM symbol:

1. The received OFDM symbol for each time epoch is considered as a one-dimensional vector. For a one-dimensional vector of size (1 × N), the absolute difference between a center sample and a receiving sample for each time epoch *fd*(*k*) are calculated as in (A1)

$$f\_d(k) = \left| r\_k - \left[ r\_{k - \lceil \operatorname{arcN} - 1/2 \prime \cdots \prime} r\_{k + \lceil \operatorname{arc} (N - 1)/2 \rceil} \right] \right| \tag{A1}$$

No complex multiplications are introduced in this step.

Here, the number of complex additions put forward is *N* − 1 for center sample or for one subcarrier. It is the same as that of (2 · *N* − 1) real multiplications.

Thus, for *N* number of subcarriers, the number of complex additions involved in the calculation of absolute difference from a center sample *C<sup>d</sup> <sup>a</sup>* is as follows,

$$\mathbf{C}\_{a}^{d} = \mathbf{N} \cdot (\mathbf{N} - \mathbf{1}).\tag{A2}$$

It is the same as 2*N* · (2 · *N* − 1) real additions and can be expressed as in (A3)

$$R\_a^d = 2N \cdot (2N - 1). \tag{A3}$$

2. The difference vector (*fd*(*k*)) is sorted in increasing order.

$$Q(k) = \operatorname{sort}(f\_d(k)).\tag{A4}$$

No complex multiplications or additions are introduced in this step.

For a one-dimensional vector of size N, the number of real-time additions put forward by sorting depends on the type of sorting that is used. For instance, if selective sorting is used, it introduces *<sup>N</sup>*·(*N*−1) <sup>2</sup> number of real-time additions for a single center sample. Thus, for *N* number of samples, the number of real-time additions involved in selective sorting is as in (A5)

$$R\_a^s = \frac{N^2 \cdot (N-1)}{2}.\tag{A5}$$

3. The ROAD value is calculated as the sum of first (*N* − 1)/2 values of *Q*(*k*)

$$ROAD = \Sigma\_{k=1}^{(N-1)/2} Q(k). \tag{A6}$$

No complex multiplications or additions are introduced in this step. Finally, the number of real-time additions put forward by adding the first half values of the sorted output is (*N*−1) <sup>2</sup> .

For *N* number of samples, the number of real-time additions involved is as in (A7)

$$R\_a^a = \frac{N \cdot (N-1)}{2}.\tag{A7}$$

#### **References**


## *Article* **IMU: A Content Replacement Policy for CCN, Based on Immature Content Selection**

**Salman Rashid \*, Shukor Abd Razak \* and Fuad A. Ghaleb \***


abdulgaleel@utm.my (F.A.G.)

**Abstract:** In-network caching is the essential part of Content-Centric Networking (CCN). The main aim of a CCN caching module is data distribution within the network. Each CCN node can cache content according to its placement policy. Therefore, it is fully equipped to meet the requirements of future networks demands. The placement strategy decides to cache the content at the optimized location and minimize content redundancy within the network. When cache capacity is full, the content eviction policy decides which content should stay in the cache and which content should be evicted. Hence, network performance and cache hit ratio almost equally depend on the content placement and replacement policies. Content eviction policies have diverse requirements due to limited cache capacity, higher request rates, and the rapid change of cache states. Many replacement policies follow the concept of low or high popularity and data freshness for content eviction. However, when content loses its popularity after becoming very popular in a certain period, it remains in the cache space. Moreover, content is evicted from the cache space before it becomes popular. To handle the above-mentioned issue, we introduced the concept of maturity/immaturity of the content. The proposed policy, named Immature Used (IMU), finds the content maturity index by using the content arrival time and its frequency within a specific time frame. Also, it determines the maturity level through a maturity classifier. In the case of a full cache, the least immature content is evicted from the cache space. We performed extensive simulations in the simulator (Icarus) to evaluate the performance (cache hit ratio, path stretch, latency, and link load) of the proposed policy with different well-known cache replacement policies in CCN. The obtained results, with varying popularity and cache sizes, indicate that our proposed policy can achieve up to 14.31% more cache hits, 5.91% reduced latency, 3.82% improved path stretch, and 9.53% decreased link load, compared to the recently proposed technique. Moreover, the proposed policy performed significantly better compared to other baseline approaches.

**Keywords:** content replacement; content placement; content-centric networking; cache networks; immaturity; stretch reduction

### **1. Introduction**

Due to advancements in technology, things are becoming more integrated and intelligent, leading to a rapid increase in Internet usability. Internet usage patterns demonstrate that new era applications are becoming more sensitive in bandwidth and latency. IP video traffic is expected to dominate overall IP traffic by 82% by 2022 [1], up from 74% in 2017 [2]. Internet users are not interested in the location of the storage server. Their primary interest is having Internet connectivity that assures fast and reliable retrieval of desired information. Content-centric networking (CCN) has proven to be a promising solution to meet the needs of future networks [3]. CCN naturally supports in-network caching and it attempts to respond to the requested data when a user request contains the name or identity of the desired data. CCN assigns each piece of data a unique identity and addresses data objects at the network level, in contrast to the Internet's host-centric architecture. CCN

**Citation:** Rashid, S.; Razak, S.A.; Ghaleb, F.A. IMU: A Content Replacement Policy for CCN, Based on Immature Content Selection. *Appl. Sci.* **2022**, *12*, 344. https://doi.org/ 10.3390/app12010344

Academic Editor: Yangquan Chen

Received: 18 November 2021 Accepted: 27 December 2021 Published: 30 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

naturally supports in-network caching and many-to-many communication [4]. When the user request contains the name or identification of the desired data object, the network attempts to respond to the request with the data object. The name can also belong to a location or a host machine. This mechanism makes CCN more general than the host-to-host communication model [5].

In-network caching provides a solution for traditional Internet architecture that works in the application layer [3]. The content of the CCN cache changes rapidly due to enormous data demands. Furthermore, CCN is a solution that works on the network layer level [6]. It allows the CCN node to cache a temporary copy of the requested content. CCN can minimize network traffic and maximize content availability by providing the desired content closer to the consumer [7]; it is difficult to decide the cache location of the content to satisfy consumer requests and improve network performance [8]. In addition, it is also important to determine which content should be removed from the cache space to accommodate new content in the cache. Improper content selection causes the degradation of network performance [9]. In-network caching faces several challenges, including limited cache storage, caching replacement and placement, caching traffic, and complex network topology [10,11].

The performance of the CCN depends on the content placement and replacement policies. The content placement policy decides the appropriate cache location of each content [1]. Hence, the node selection for content caching should be optimized to satisfy consumer requests, with minimum overhead. Due to the limited cache capacity in the node, any cached content in the cache needs to be removed to accommodate new content [12]. Content replacement policy is responsible for choosing the right content against defined criteria [13]. The network performance and cache hit ratio decreases if popular content is removed from the cache or unpopular content remains in cache for a long time [14–17].

Although cached contents at all nodes, along with the routing path, increase the network performance and cache hit ratio, it is not a practical approach due to the finite cache space. That is, if the cache space is full and new content arrives, one of the cached content needs to be removed from the cache space. However, most existing replacement policies follow the concept of the Least Frequently Used (LFU) or Least Recently Used (LRU) policy to replace the content which is not effective for CNN [18]. The newly arrived content may become popular over time due to high demand. When popular content loses its popularity, it stays in the cache due to previous popularity. Therefore, the network performance may decrease due to the overstay of previous popular content that is currently unpopular or the eviction of currently popular content. To solve this issue and improve network performance, we introduced a new concept of content maturity and immaturity to deal with the aforementioned issues. The content that loses its popularity over a specific time frame and stays in the cache for a long time is called immature content. In contrast, the content will be considered mature if it has high popularity and is also recently requested in the network within a specific time frame. Every new content is neither popular nor mature. Content should stay in the cache for some time to know its maturity level. Hence, such content is not evicted from the cache, which is yet to become popular. In addition, this concept removes content from the cache that loses its popularity after being highly popular for some time.

A content replacement policy is proposed in this work called IMU (Immaturity used). This policy removes content from the cache that is immature within a limited time frame. Therefore, most of the contents in the cache are recently used and highly popular, leading to a better cache hit ratio and network performance. The key contributions are summarized below:

• A new concept of content maturity/immaturity has been introduced to design and develop an effective content eviction policy. The proposed content eviction policy evicts the content from cache through the immature content selection to improve the cache hit ratio, latency, path stretch, and link load.


The rest of this paper is organized as follows: We discuss the related work in Section 2. The proposed policy is described in Section 3, which highlights its contribution. Section 4 describes the simulation environment and parameters as well as the result analysis and discussion. Finally, the conclusion and future work are in Section 5.

#### **2. Related Work**

Content eviction policy works when the cache space is filled with content. The eviction policy provides a mechanism to replace existing contents with requested contents in the cache. The eviction policy must keep popular contents in the cache with the least processing complexity. In general, an eviction policy should have two properties. First, the eviction policy should not remove popular content from the cache. Second, it also keeps the most frequently used contents in the cache by applying some sort of priority. Several eviction policies were proposed in the past [9,19–25]. Some of the most popular eviction policies include First in First out (FIFO), Random Replacement (RR), Least Recently Used (LRU), Least Frequently Used (LFU), Window-LFU (W-LFU), Least Frequent Recently Used (LFRU), Popularity Prediction Caching (PPC), Network-oriented Information-centric Centrality for Efficiency (NICE), NC-SDN, and Least Fresh First (LFF). A brief description of each cache eviction policy is summarized below.

As the name suggests, FIFO replaces the content from the cache based on a first-come, first-serve basis. The content item that comes first in the cache is evicted first when there is a need for replacement [20]. It does not deal with the importance or priority of the content being replaced by the new content. RR policy randomly selects existing content from the cache to replace it with new content [21]. However, it has no particular criteria for content selection from the cache. LRU is a typical policy that has extensive usage in cache eviction [22]. LRU keeps track of the usage of each content in cache. When the replacement request is received, LRU checks the requested content in the cache. If this requested content is not already in the cache, it evicts the least recently used content to accommodate the requested content. Therefore, LRU is simple to implement and has less computational delay.. But on the other side, LRU does not consider the content frequency (dynamic changes of popularity over time), which plays a significant role in network performance and the cache hit ratio.

LFU keeps track of the frequency of each content in the cache [23]. LFU serves to store the most popular content in the cache statically. It keeps a counter of how many times the content is requested. Whenever a request is received for content, the counter value is incremented by one. When the cache space is full and there is a requirement to replace content, the content with the least counter value is selected to evict. LFU keeps popular content in the cache, but it requires a very high processing time that leads to performance degradation in CCN. Further, when content that has been popular for some time loses its popularity, it stays in the cache, causing severe performance losses. W-LFU is an eviction policy that uses a limited number of access requests over a time window [24]. This technique tries to solve the LFU problem by keeping the history of the requested contents. This record of history is referred to as a window. The size of this window is directly proportional to the total number of contents and the cache size in the network. This policy demonstrates considerable improvements, but it fails to evict suitable content in the case of bursty requests. Moreover, this policy only observes a small portion of the cache, making it impractical for full cache capacity.

LFRU is the combination of LRU and LFU [25]. According to the LFRU eviction policy, a cache divides into unprivileged and privileged partitions. The privileged partition is known as a protected partition. The popular content is pushed into the privileged partition. If the privileged partition is fully occupied and there is no more space available to store content, the LFRU ensures that the content is evicted from the unprivileged partition and that content is transferred from the privileged to the unprivileged partition. Filtering out the locally popular contents and placing the popular contents in the privileged partition are the key features of the LFRU eviction policy. This policy demonstrates considerable improvements, but it fails to evict suitable content in the case of bursty requests. Moreover, this technique also requires a large processing time to manage partitions.

PPC is a chunk-level in-network caching eviction policy [26]. It is capable of predicting the popularity of the video chunks. PPC stores content based on the popularity that it predicts. On the other side, the contents that have the least popular prediction are evicted out. This eviction scheme is also termed the Assist-predict model. It is based on the request information of the neighboring chunks. It also predicts future popularity by using past experience with the popularity of the content. If the popularity of the new content is less than the former popularity, the newly incoming chunk does not cache. Otherwise, it evicts the future content based on popular prediction. This model-based prediction technique works well but fails to predict properly against frequently changing requests. Moreover, this policy leads to high network load due to control signaling overhead and high computational workload. The NC-SDN eviction model was introduced as a cache eviction algorithm that relies on SDN (software-defined networking) [16]. NC-SDN model uses three arguments. First, it calculates data popularity; second, it comes to know the location of cache management switches; third, it facilitates cooperation among different nodes in the network. When the cache is fully occupied, it checks the popularity of each content and replaces the least popular content with new content. Although the replacement technique is straightforward, the control traffic and exchange of information between the switches are very high, leading to performance losses.

LFF is a content replacement policy that predicts the time of the next event [27]. Based on the prediction, it controls the residual life of retrieved content. When the cache capacity is full, this policy measures the time for which the content is considered invalid. In addition, this policy checks whether the source has been updated after retrieving the content to check the validity of each content. This policy ignores the high replacement rate in the central node and does excessive computing, making it impractical for large CCN. NICE has been introduced as a new metric for cache management in ICN [28]. This policy uses a method that computes the centrality. Centralization is used in the replacement phase to manage cache contents. This method is based on the number of caches instead of the number of contents. Content is replaced when the NICE value is high, as the contents move from one cache to the other due to the centrality of the content. However, it causes high network load and computational complexity.

Most of the replacement strategies [27–34] on CCN focus on content frequency, popularity, and time freshness. These policies ignore the concept of content immaturity in content eviction; it is neither popular nor mature when new content is cached in the cache. We need some time to evaluate whether this content has become popular or not. If that content is removed from the cache, the consumer has to retrieve it from the publisher, which affects network performance. Therefore, content may become popular for a certain period, and then its popularity starts decreasing [29]; if that content is not removed from the cache, network performance and the cache hit ratio also degrade. When the cache space is low and the popularity of the content changes frequently, it becomes challenging for the content eviction policy to decide which content should be evicted from the cache space. A content eviction policy should be able to provide equal opportunities for each content to become mature. Therefore, we introduced a concept of maturity and immaturity of the content, and our proposed cache replacement policy uses this concept to accommodate the request of new content. The proposed policy evicts immature content to solve content popularity issues.

#### **3. Proposed Content Replacement Policy**

Content replacement policy is an integral part of CCN cache management. The nodes in CCN need to free up space over time, due to limited cache space, so that new contents are cached. It is a crucial decision to evict content from the cache, which, in turn, increases or decreases network performance and the cache hit ratio. Numerous content replacement policies decide to evict content from the cache using various criteria, such as time in the cache, frequency, popularity, and node centralization. These policies do not use content immaturity for eviction. The proposed policy selects immature content from the cache that stays for a long time in the cache and has a lower frequency in a particular time window. Thus, the proposed policy avoids unnecessary content occupation in the cache space. Due to immature content eviction, network nodes contain more requested content within the cache space. Therefore, more customer interests are satisfied within the network.

The proposed technique determines the mature/immature contents. Algorithm 1 elaborates the procedure to label a content, s*<sup>i</sup>* is mature or immature. The proposed policy keeps track of each content's arrival time and frequency at each node. The current time and the frequency of the node <sup>s</sup>*<sup>i</sup>* is denoted by <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* and <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* respectively. The proposed technique calculates the content period, T¸ -,s*i* , with the help of content frequency, <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* , and content arrival time, <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* . Therefore, it determines the duration of the content s*<sup>i</sup>* in the cache space. Then, the proposed policy calculates the maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* by dividing the frequency of the content <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* and content period T¸ -,s*i* . The maturity classifier ᗡᏄ is calculated using the median of maturity indexes ᗡᏄ <sup>←</sup> *Median* <sup>ᗡ</sup>*c*Ї,s*<sup>n</sup>* . Content s*<sup>i</sup>* whose maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* exceeds the value of ᗡᏄ is classified as mature content; otherwise, it is immature content. The median is used for finding the relevant mean value of the maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* , because it is not affected by lower or extreme high set values. Thus, this provides a fair value to the maturity classifier ᗡᏄ.


Algorithm 2 describes the next part of our proposed policy. When a node *v* receives an interest packet for content s*i*, and the time window has not expired, then the proposed policy finds the requested content s*<sup>i</sup>* in the local cache. In the case of a cache hit, the proposed policy increments the frequency of content s*<sup>i</sup>* by one and associates a new arrival time <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* . Moreover, node *v* discards the interest packet from PIT and replies through the

data packet to the requested consumer. Otherwise, a cache miss means that the requested content <sup>s</sup>*<sup>i</sup>* is being cached for the first time in CS. Thus, its frequency <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* is one and it is associated with the current timestamp <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* . When the cache is full, it selects content s*<sup>k</sup>* with a minimum value of the maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* and evicts it from the cache space. Then, the proposed technique checks the time window *WT* ; if *WT* is expired, then the frequency of all content <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>n</sup>* is set to one, and the previously associated timestamp <sup>ᎄ</sup>*c*r,s ´ *<sup>n</sup>* remains the same.


```
Input: Request for a content si at node v
Output: Content selection for replacement of newly arrived content
1. if WT is not expired
      check local cache
           if cache hit
                  ፵c,s ˇ i ← ፵c,s ˇ i + 1
                  ᎄcr,s ´ i ← current time
            else if cahe_size == full
                   sk ← select the content with min ᗡcЇ,si
                   evict sk
                    place si in cache
                    ፵c,s ˇ i ← 1
                    ᎄcr,s ´ i ← current time
            else
                   place si in cache
                    ፵c,s ˇ i ← 1
                    ᎄcr,s ´ i ← current time
2. else
            for each si
                    ፵c,s ˇ i ← 1
             Update WT
             go to step 1
```
For simplicity, we assume that all the CCN based routers (node) have the same cache sizes, cached content, and discrete instants of time for interests to arrive. CS is the local cache size, and the window size is denoted by *WT* . There are some events related to content s*i*, including received interest packet, received data packet, reply data packet, forward interest packet, cached content, eviction from the cache, and look-up content in local CS. The received interest packet (RIP), received data packet (RDP), reply data packet (REDP), forward interest packet (FIP), cached content (CC), eviction from the cache (EC), and look-up content (LU) are denoted by Rˇ *sip* , R´ *sdp* , Ꮂ௦*sedp* , <sup>Ꮮ</sup>*sip* , <sup>Ꮣ</sup>*s*, Е`*<sup>s</sup>* and ´ L*s*, respectively. These notations are helpful to understand the whole process of the proposed policy. For example, initially, we assumed the value of cache space (CS) = 6, *WT* = 4 s, t = [1, 2, 3, 4, . . . , 13], S ∈ {A, B, C, D, E, F, G, H, I}, as presented in Figure 1.

**Figure 1.** Node cache space, window size, and caching events.

The consumer's requested content (RC) is in the same sequence, window size, and cache space illustrated in Figure 1. The colors indicate three caching processes: cached, hit, and evicted content from the cache. With the help of these colors, we can easily understand the new entry in our tables and variations in the values.

We assume that the cache of a node is empty. The detailed caching process at t=1 to t=4 is expressed in Table 1, and Table 2 also maps the IMU process with values. Moreover, we see the effect of values after the window *WT* is not expired. The router receives an interest packet for content A Rˇ *Aip* but does not find that content in its CS after look-up ´ L*A*, then routers the same interest packet forwards Ꮮ*Aip* to the next router. The next router has A content and responds through the data packet Ꮂ௦ *Aedp* , from which it receives the packet of interest. Furthermore, the router received A data packet R´ *Adp* is then cached <sup>Ꮣ</sup>*<sup>A</sup>* in CS, along with values <sup>ᎄ</sup>*c*r,s ´ *<sup>A</sup>* = 1, <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>A</sup>* = 1, T¸ -,s*<sup>A</sup>* = 4, and <sup>ᗡ</sup>*c*Ї,s*<sup>A</sup>* = 0.25. This process will be the same for content B and C. When a hit occurs at t = 4, then the values of <sup>ᎄ</sup>*c*r,s ´ *<sup>A</sup>* = 4, <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>A</sup>* = 2, T¸ -,s*<sup>A</sup>* = 1, and <sup>ᗡ</sup>*c*Ї,s*<sup>A</sup>* = 2. Furthermore, window *WT* has expired at the same t = 4, but the cache space is not yet full. Then, the router receives an interest packet for content A Rˇ *Aip* . The hit occurred at t = 4 and changed the values of values <sup>ᎄ</sup>*c*r,s ´ *<sup>A</sup>* = 4, <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>A</sup>* = 2, T¸ -,s*<sup>A</sup>* = 1, and <sup>ᗡ</sup>*c*Ї,s*<sup>A</sup>* = 2.00.

**Table 1.** Caching process at t = 1 to t = 4.


**Table 2.** IMU process at t = 1 to t = 4.


Table 3 describes the detailed caching process at t = 5 to t = 8, and Table 4 also maps the IMU process with values. Table 4 demonstrates that the new window *WT* starts at t = 5, and that all the values of <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* become 1 and retain all the values of <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* . Content D has cached <sup>Ꮣ</sup>*<sup>D</sup>* at t = 5 and displays the values of <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>D</sup>* = 1, <sup>ᎄ</sup>*c*r,s ´ *<sup>D</sup>* = 5, T¸ -,s*<sup>D</sup>* = 4, and <sup>ᗡ</sup>*c*Ї,s*<sup>D</sup>* = 0.25 in Table 4. Furthermore, at this stage, cache space CS = 4. After the hit occurs at t = 6, the values of <sup>ᎄ</sup>*c*r,s ´ *<sup>D</sup>* = 6, <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>D</sup>* = 2, T¸ -,s*<sup>D</sup>* = 3, and <sup>ᗡ</sup>*c*Ї,s*<sup>D</sup>* = 0.67 have changed. New contents are cached at t = 7 and t = 8, <sup>Ꮣ</sup>*<sup>E</sup>* and <sup>Ꮣ</sup>*F*, respectively.


**Table 3.** Caching process at t = 5 to t = 8.

**Table 4.** IMU process at t = 5 to t = 8.


Content E and F have cached <sup>Ꮣ</sup>*<sup>E</sup>* and <sup>Ꮣ</sup>*F*, respectively, at t = 6 and t = 7, and the new values are presented in Table 4. We see that the caching process is displayed step by step in Table 3, and the numbers are associated with each process to illustrate the sequence of this process.

Table 5 reflects caching events from t = 9 to t = 12. At t = 9, the router receives an interest packet of G Rˇ *Gip* . After the look-up ´ L*<sup>G</sup>* content is not found in CS, the interest packet is forwarded <sup>Ꮮ</sup>*Gip* to the next router. This time, CS is full when it receives the R´ *Gdp* data packet. Now, we find the lowest <sup>ᗡ</sup>*c*Ї,s*<sup>C</sup>* = 0.10 value and remove that content Е`*<sup>C</sup>* from CS. Therefore, it caches the new content <sup>Ꮣ</sup>*<sup>G</sup>* with the associated values of <sup>ᎄ</sup>*c*r,s ´ *<sup>G</sup>* = 9, <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>G</sup>* = 1, T¸ -,s*<sup>G</sup>* = 4, and <sup>ᗡ</sup>*c*Ї,s*<sup>G</sup>* = 0.25 in the CS. We can observe in Table <sup>6</sup> how the IMU works when the memory is full and new content arrives simultaneously.

We repeat the same process at t = 10 for content H. The hit occurs at t = 11, and t = 12 updates the values of <sup>ᎄ</sup>*c*r,s ´ *<sup>E</sup>* , <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>E</sup>* , T¸ -,s*<sup>E</sup>* , and <sup>ᗡ</sup>*c*Ї,s*<sup>E</sup>* , as illustrated in Table 6. The hit occurred at t = 11 and t = 12 for requested contents E and G, respectively. Table 5 illustrates that the minimum caching process and forwarding operations have been minimized when the hit occurs.

**Time t RC s***i* **RIP Rˇ** *sip* **LU ´ L***s* **FIP** Ꮮ*sip* **REDP** Ꮂ௦*sedp* **RDP R´** *sdp* **CC** Ꮣ*s* **EC** Е`*s* 9 G Rˇ *Gip* (**1**) ´ <sup>L</sup>*<sup>G</sup>* (**2**) Ꮮ*Gip* (**3**) Ꮂ௦ *Gedp* (**4**) R´ *Gdp* (**5**) <sup>Ꮣ</sup>*<sup>G</sup>* (**7**) Е`*<sup>C</sup>* (**6**) 10 H Rˇ *Hip* (**1**) ´ <sup>L</sup>*<sup>H</sup>* (**2**) <sup>Ꮮ</sup>*Hip* (**3**) Ꮂ௦*Hedp* (**4**) R´ *Hdp* (**5**) <sup>Ꮣ</sup>*<sup>H</sup>* (**7**) Е` *<sup>A</sup>* (**6**) 11 E Rˇ *Eip* (**1**) ´ L*<sup>E</sup>* (**2**) Ꮂ௦ *Eedp* (**3**) 12 G Rˇ *Gip* (**1**) ´ L*<sup>G</sup>* (**2**) Ꮂ௦ *Gedp* (**3**)

**Table 5.** Caching process at t = 9 to t = 12.

**Table 6.** IMU process at t = 9 to t = 12.


Table 7 describes the detailed process of caching at t = 13. The cache space CS is full, and the time window *WT* has expired; Table 8 demonstrates that when the new time window starts, all the values of <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* become one (1) and retain the values of <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* . The exact process that was performed at t = 9 and t = 10 is repeated at t = 13. The IMU used the <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* and <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* for calculating the maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* of the content s*i*. This value indicates the maturity of the content with the specific time window *WT* .

The tables demonstrate that the lower value of a content maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* represents a longer stay in the cache space, with a lower frequency (popularity) over a particular time frame *WT* . Therefore, this content is evicted from the cache when the cache space is full. It takes some time to define the maturity/immaturity of new cached content. Therefore, the content should not be evicted without checking the level of a content maturity index; the tables indicate that the maturity index value of new cached content is greater than others. Content that has become popular over time, but loses its popularity, has a higher frequency than other content. Therefore, this kind of content stays in CS for a long time and wastes cache space. However, the window *WT* is used to equalize the frequency of all contents after a specific time, and immature content is selected from the maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* to evict content from the CS. The proposed policy has significantly improved the cache hit ratio, bandwidth usage, latency, and path stretch.


**Table 7.** Caching process at t = 13.

**Table 8.** IMU process at t = 13.


#### **4. Performance Evaluation**

We performed a simulation in the GEANT network topology using the Icarus [13] simulator, to evaluate the performance of our policy. The GEANT topology consists of 40 nodes and 60 edges. The cache capacity of each node in the network is the same and ranges between 4% to 20% of the total content population. We used warm-up requests to settle caches before running the actual experiment, to minimize experimental errors. The cache warm-up requests are 40,000 and measured requests are also 40,000. We also used measured requests for performance evaluation. Zipf's law is used to distribute the popularity of the content and popularity distribution of the exponent alpha (α) ∈ [0.6, 0.8, 1.0] used in our simulation. For fair comparison with state-of-the-art replacement policies, the popularity of requested contents follows a Zipf distribution with a parameter ranging from 0.6 to 1.0, as presented in [10]. The lower and higher values indicate a low and high correlation between content requests [30]. The parameters of our simulation setup are mentioned in Table 9.

**Table 9.** Simulation Parameters.


The obtained results have been compared with state-of-the-art content replacement policies, including LRU, LFU, FIFO, and LFRU. To check the effectiveness of our approach, we compared popular cache placement policies, including Leave Copy Everywhere (LCE) [27], Cache Less for More (CL4M) [31], ProbCache [32], Leave Copy Down (LCD) [33], and opt-Cache [10], with our proposed replacement policy (IMU). These placement policies indicate the more redundant data to less redundant data in the network, respectively [10]. These placement policies indicate the more redundant data to less redundant data in the network. These results prove the effectiveness of our proposed technique with different cache sizes and populations, using various performance metrics such as cache hit ratio, latency, link load, and path stretch. These performance metrics are compared one by one, as explained below.

#### *4.1. Cache Hit Ratio*

The cache hit ratio is an essential metric for evaluating the performance of CCN cache. It identifies the response to network cache storage, in which content is cached locally within a specific time frame. Two terms are important in the cache hit ratio. The first is the cache hit (requested content is found from the cache), and the second is the cache miss (unlike cache hit). When content is available in the cache, the content request does not forward to the publisher. Therefore, a higher hit ratio indicates good cache performance and represents low bandwidth utilization, reduction in latency, and low server load. The cache hit ratio is defined as follows:

$$\text{Hit Ratio} = \frac{\text{Cachehits}}{\text{Cache}\_{\text{hits}} + \text{Servver}\_{\text{hits}}} \tag{1}$$

Our proposed strategy, IMU, compared to existing well-known replacement strategies in terms of the cache hit ratio. We have extracted the results from low to high popularity and different cache sizes. We first comment that content eviction policies behave the same under different caching strategies. Regardless of the content eviction policy, we observe in Figure 2 that the opt\_cache performs best and the LCE performs the worst in terms of the cache hit ratio. Moreover, different eviction policies affect the performance of the cache hit ratio.

Figure 2 illustrates that the IMU's performance is better than the existing replacement strategies; this is because the IMU not only considers the time <sup>ᎄ</sup>*c*r,s ´ *<sup>i</sup>* but also the frequency <sup>፵</sup>c,s <sup>ˇ</sup> *<sup>i</sup>* of the requested content within the specific period *WT* . When the *WT* is expired, then all the <sup>፵</sup>c,ˇ *si* initialize to their starting frequency (፵c,s <sup>ˇ</sup> *<sup>i</sup>* = 1). Moreover, it helps to evict content from the cache space whose popularity increases for a while and decreases shortly. When the cache is full, it is evicted from the cache after selecting the least value of the maturity index <sup>ᗡ</sup>*c*Ї,s*<sup>i</sup>* . The advantage of immature content eviction from the cache is that most of the content is mature, which leads to a higher cache hit ratio.

**Figure 2.** *Cont*.

We observed that FIFO underperformed because contents are removed from the cache in the same order in which they were cached, regardless of how many times they were previously accessed. Besides, increasing cache space and similar content requests improve FIFO's performance because the content stays in the cache for a longer period, which increases the chances of increasing the cache hit ratio. LFU performs better than LRU when the cache size is large and the content is repeatedly requested because LFU considers the frequency of the requested content, while LRU does not. Moreover, LFU caches popular content and evicts unpopular content from the cache. Besides, contents are often evicted from the cache when the cache size is small. However, LFU displays low performance in small cache sizes. LFRU has better performance due to the coupling of LRU and LFU; however, when the content request rate is minimum from the maximum normalized request rate, the content is evicted from the unprivileged partition. Therefore, the new content is cached in the unprivileged partition. Besides, if the content request rate is higher than the maximum normalized request rate, it chooses the least recent content from the privileged partition and pushes that content into the unprivileged partition. Hence, new content is cached in the privileged partition and hit counter associated with each partition. However, content that loses popularity stays in the unprivileged partition for a long time due to its

high frequency. IMU outperformed FIFO, LRU, LFU, and LFRU in terms of the cache hit ratio by 48.33%, 30.07%, 26.34%, and 14.31%, respectively.

The percentage (%) of IMU performance in different popularities, and low to high cache sizes with different content placement strategies, is presented in Table 10. We observed that IMU is outperformed with low popularity because, if such content is popular for some time but its popularity decreases with time and its frequency is high, then IMU evicts this content from the cache space. When the cache space is low and the popularity of the content changes frequently, it becomes very difficult for the content eviction policy to decide which content should be removed from the cache space. Hence, the IMU policy evicts immature content from the cache space and gives each content an equal opportunity to define its maturity/immaturity level. Such content is not removed from the cache space that is gaining popularity.

#### *4.2. Path Stretch (Hop Count)*

Path stretch indicates the distance traveled to the content provider by the consumer's interest. The value of the path stretch is low when the consumer's interest packet is found from the routing path. Therefore, the better content replacement policy identifies content that users are interested in and that is mature. Such content should not be evicted from the cache. If such content is evicted from the cache, the publisher's load and bandwidth utilization will be high. Therefore, a better content replacement strategy should be to minimize the hops between the consumer and the publisher. Path stretch is defined as follows:

$$Path\,Strectch = \frac{\sum\_{i=1}^{n} Hop-Traveled}{\sum\_{i=1}^{n} THop-Hop} \tag{2}$$

where ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Hop* − *Traveled* is the number of hops between the consumer and publisher nodes covered by consumer interest. The value ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *THop* − *Hop* denotes the total number of hops between the consumer and the provider. *n* represent the total number of generated interests for specific content.


**Table 10.** IMU cache hit ratio percentage improvement.

Figure 3 illustrates that the IMU's performance in terms of path stretch is better than other existing replacement policies. The placement strategy chooses the location of the cache, which may reduce the number of hops. IMU removes content that has been in the cache for a long time but has not matured. Therefore, when immature content is removed from the cache and new content is cached so that the consumer's requested content is available nearby, the request is not forward to the publisher. However, cached content on nearby routers is mostly popular or close to being popular.

**Figure 3.** Path stretch with different cache sizes and α, using different placement policies. (**a**) Path stretch with LCE. (**b**) Path stretch with CL4M. (**c**) Path stretch with ProbCache. (**d**) Path stretch with LCD. (**e**) Path stretch with opt-Cache.

FIFO, LRU, and LFU represent the high path stretch due to content selection for eviction based on a single factor. FIFO content is evicted in the order in which it was cached. However, no matter how many times the content has been accessed, the timeline of popular and unpopular contents in FIFO will be the same, which increases the path stretch value. Figure 4 indicates that LRU is better than LFU when the cache size is smaller; however, as the cache size increases, the performance of LFU improves because LFU considers the popularity of content. Therefore, as cache size increases, popular content stays longer in the cache. LRU ignores the popularity of the content and the least recently used content evicts from the cached. However, content that is not popular, but, over time, their request keeps coming, are present in the cache space, making the path stretch higher. LFRU divides the cache space into two parts: LRU used privilege partition and LFU used unprivileged partition. With the higher request rate, the least recently used content has been evicted from the privilege partition and that content pushes it to the unprivileged partition. When unpopular content is pushed into the unprivileged partition, the content stays in the cache space for a long time. Further, these techniques are not focused on the maturity of the content. IMU outperformed FIFO, LRU, LFU, and LFRU in terms of path stretch by 11.33%, 6.16%, 5.77%, and 3.82%, respectively.

Table 11 illustrates the improvement of IMU in terms of path stretch using different content placement strategies with content eviction policies. We have observed IMU perform better in low to high cache space. In addition, IMU is better in high popularity. When the cache space is full, IMU selects immature content and evicts it from the cache. Therefore, popular content and content that may be popular remain in the cache. However, the consumer's request for specific content is fulfilled from the nearest node.

**Figure 4.** *Cont*.

**Figure 4.** Latency with different cache sizes and α, using different placement policies. (**a**) Latency with LCE. (**b**) Latency with CL4M. (**c**) Latency with ProbCache. (**d**) Latency with LCD. (**e**) Latency with opt-Cache.



#### *4.3. Latency*

Latency indicates the delay in the delivery of requests and content from consumers. It is a vital metric for evaluating the performance of the CCN cache, and it is defined as follows:

$$\text{Latency} = \text{Required Travel Delay} + \text{Content Travel Delay} \tag{3}$$

The IMU provides low latency because it evicts the most suitable content from the cache, based on immaturity. If the cache is full, the IMU jointly considers the frequency and time and selects the content for potential eviction from the cache. Hence, more popular and mature content will be in the cache, and content that may be popular. However, most consumer requests are satisfied along the routing path, which reduces latency. Figure 5 illustrates that IMU's performance is better than other content replacement policies, regarding latency with different cache sizes and popularity.

Figure 5 illustrates that FIFO represents a high latency because the duration of popular and unpopular content is the same. However, latency increases when popular content is evicted from the cache. LRU ignores the popularity of the content. Therefore, requests for less popular content come before eviction, and the content will remain in the cache, which causes high latency. LFU considers the frequency of the content, and contents that increase in frequency over a short period of time but are no longer popular; such contents use cache space due to their high frequency.

Therefore, fresh contents are reduced in the cache, which increases the latency. LFRU performs better than the previous two discussed replacement techniques because LRU and LFU are used together. When the request rate is high, then the required processing should be high, because the least recently used content is evicted from the privileged partition and pushed to the unprivileged partition and associated with the access history of content. In addition, low-frequency content is evicted from unprivileged partition. However, content with a high access history that is no longer popular will spend more time in the cache space, reducing the freshness of the content. IMU outperformed FIFO, LRU, LFU, and LFRU in terms of latency by 12.32%, 9.97%, 9.08%, and 5.91%, respectively.

When the alpha equals 0.8 with cache size is 0.04, IMU is 64.44 ms, which is 9.45% lower than LFRU (71.16 ms), 13.90% lower than LFU (74.84 ms), 13.34% lower than LRU (74.35 ms), and 15.60% lower than FIFO (76.34 ms). When the alpha equals 0.8, the cache size is 0.12, IMU is 61.03 ms, which is 5.40% lower than LFRU (64.51 ms), 9.65% lower than LFU (67.55 ms), 12.05% lower than LRU (69.39 ms), and 14.50% lower than FIFO (71.38 ms). When the alpha equals 0.8 with cache size is 0.2, IMU is 60.05 ms, which is 3.71% lower than LFRU (62.37 ms), 5.58% lower than LFU (63.60 ms), 8.93% lower than LRU (65.94 ms), and 11.59% lower than FIFO (67.92 ms). Therefore, as the cache size increases, the latency is reduced because more content in the network can be cached.

The latency improvement performance from IMU is illustrated in Table 12, using different content placement strategies with low to high popularities and cache sizes. We have observed that, as the popularity of content increases, so performs IMU. The IMU evicts content from the cache that has been in the cache for a long time and has few requests. Furthermore, content that has been in high demand for some time but declined over time has also been evicted from the cache. Therefore, the cache contains mostly mature content. However, when a consumer requests specific content, the consumer's request does not reach the publisher because the consumer is satisfied along the routing path.

**Figure 5.** Link load with different cache sizes and α, using different placement policies. (**a**) Link load with LCE. (**b**) Link load with CL4M. (**c**) Link load with ProbCache. (**d**) Link load with LCD. (**e**) Link load with opt-Cache.


**Table 12.** IMU latency percentage improvement.

#### *4.4. Link Load*

Link load indicates the total number of bytes (consumer's request size and content size) traversed for retrieving the interesting content at the specific time limit. It measures bandwidth usage in the network and is defined as follows:

$$\text{Link Load} = \frac{(\text{request}\_{\text{size}} \times \text{request}\_{\text{link\\_count}}) + (\text{content}\_{\text{size}} \times \text{content}\_{\text{link\\_count}})}{\text{Duration}} \tag{4}$$

$$\text{Duration} = \text{Content Retrieval Time} - \text{Content Requirement Time} \tag{5}$$

where, *requestsize* denotes the request's size in bytes, *requestlink*\_*count* designates the number of the links traversed that reach the source, *contentsize* is the content size to retrieve, *contentlink*\_*count* is the number of links where the content reaches the request's originator.

Figure 5 illustrates that IMU performs better than other existing strategies in terms of link load. IMU does not replace such content from the cache, which has a frequency over a certain period of time. Therefore, consumer request is mostly satisfied with the routing path or close to the consumer. Therefore, most of the content in the cache is of interest to the user. In addition, content that increases in frequency for some time but does not become popular later also removes such content from the cache. However, IMU maintains the freshness of the content as well as the mature content in the cache.

FIFO does not compete with the popularity of content because this technique only considers the order in which the content is cached and evicts the content from the cache in that order. Therefore, popular content is evicted from the cache. However, most consumer requests are satisfied with the publisher. Figure 5 demonstrates that LRU is better than LFU when the cache size is smaller. LFU performance improves as the cache size increases, as LFU takes into account the popularity of the content. Therefore, popular content stays in the cache for a long time, and the consumer's request is found in the cache space and not forwarded to the publisher. In addition, content that increases in frequency stays in the cache space, even if it is not popular. However, this is a misuse of cache space and leads to a higher link load. LRU ignores the popularity of the content as well as the maturity of the content. Therefore, popular content requested in the past is likely to be used in the future, but recently requested content may be replaced with less popularity; thus, it does not adapt to changing workloads. When the request rate is high in LFRU, the least recently used content is evicted from the privileged partition and pushed towards the unprivileged partition, with complete access history. However, this content is no longer popular but has a high access history; this content spends more time in cache space, which causes high

link load. IMU outperformed FIFO, LRU, LFU, and LFRU in terms of link load by 18.04%, 13.61%, 12.49%, and 9.53%, respectively.

When the alpha equals 0.8, with a cache size of 0.04, IMU is 55.48 bytes/ms, which is 16.41% lower than LFRU (66.37 ms), 19.60% lower than LFU (69.01 bytes/ms), 17.85% lower than LRU (67.53 bytes/ms), and 20.57% lower than FIFO (69.85 bytes/ms). When the alpha equals 0.8, with a cache size is 0.12, IMU is 48.99 bytes/ms, which is 15.25% lower than LFRU (57.81 bytes/ms), 18.57% lower than LFU (60.16 bytes/ms), 19.21% lower than LRU (60.64 bytes/ms), and 22.19% lower than FIFO (62.96 bytes/ms). When the alpha equals 0.8, with cache size is 0.2, IMU is 44.93 bytes/ms, which is 10.40% lower than LFRU (50.15 bytes/ms), 10.68% lower than LFU (50.30 bytes/ms), 11.51% lower than LRU (50.78 bytes/ms), and 15.37% lower than FIFO (53.09 bytes/ms). As the cache size increases, we observed that the link load decreases, as the proposed scheme removes immature content from the cache. Therefore, IMU maintains the data freshness with popularity within the network. However, none of the previous eviction policies have adopted the concept of immaturity for content selection.

Table 13 describes the IMU's improvement in percentage (%) of the link load, which used different content placement strategies along with content eviction policies. We observed that IMU outperformed the other content eviction policies against low to high popularity and cache space. It performed better in a fully redundant and low redundancy environments. IMU contains the most popular and mature content in the cache and makes better use of cache space. Moreover, the consumer is mostly satisfied along the routing path when requesting content. Therefore, the link load value is low because the request is not sent to the publisher.


**Table 13.** IMU link load percentage improvement.

#### **5. Conclusions and Future Work**

In-network caching is one of the essential features in the CCN architecture network, allowing content items to be cached in the router nodes for some time, to meet subsequent consumer requests. Due to the limited cache capacity in the node, any cached content in the cache needs to be evicted to accommodate new content. Content replacement policy is responsible for choosing the right content against defined criteria. Existing cache replacement policies use the concept of popularity or time for content eviction. However, when content loses its popularity after becoming very popular in a certain period, it remains in the cache space. Moreover, content is evicted from the cache space before it becomes popular. Therefore, the proposed policy handles cached items that lose their popularity over a specific time frame and remain in the cache for a long time. We introduced the new concept of content maturity and immaturity for content eviction in CCN. The

proposed content replacement policy (IMU) uses the concept of maturity/immaturity of the content. This policy finds the content maturity index by using the content arrival time and its frequency. Also, it determines the maturity level through a maturity classifier. We have performed extensive simulations to evaluate the proposed content replacement policy, using the Icarus simulator under different cache sizes and content popularity. The simulation results indicate that the proposed policy outperformed recent and baseline content replacement policies (FIFO, LRU, LFU, and LFRU). The results demonstrate that the proposed policy is better in terms of the cache hit ratio, latency, path stretch, and link load. In the future, this work can be extended to use the content replacement policy (IMU) with different constraints in different use cases. Another potential future work is in-depth investigation of content diversity for the nodes with very high content popularity.

**Author Contributions:** Conceptualization, S.R., S.A.R. and F.A.G.; methodology, S.R.; software, S.R.; validation, S.R., S.A.R. and F.A.G.; formal analysis, S.R. and F.A.G.; investigation, S.R., S.A.R. and F.A.G.; writing—original draft preparation, S.R.; writing—review and editing, S.R., S.A.R. and F.A.G.; supervision, S.A.R. and F.A.G.; funding acquisition, S.A.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Universiti Teknologi Malaysia (UTM) under UTM Research University Grant Scheme: (VOT Q.J130000.3613.03M41), Universiti Teknologi Malaysia, Johor Bahru, Malaysia.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Attitudes toward Applying Facial Recognition Technology for Red-Light Running by E-Bikers: A Case Study in Fuzhou, China**

**Yanqun Yang 1, Danni Yin 1, Said M. Easa <sup>2</sup> and Jiang Liu 3,\***


**Abstract:** The application of facial recognition technology (FRT) can effectively reduce the red-light running behavior of e-bikers. However, the privacy issues involved in FRT have also attracted widespread attention from society. This research aims to explore the public and traffic police's attitudes toward FRT to optimize the use and implementation of FRT. A structured questionnaire survey of 270 people and 94 traffic police in Fuzhou, China, was used. In the research, we use several methods to analyze the investigation data, including Mann–Whitney U test, Kruskal–Wallis test, and multiple correspondence analysis. The survey results indicate that the application of FRT has a significant effect on reducing red-light running behavior. The public's educational level and driving license status are the most influential factors related to their attitudes to FRT (*p* < 0.001). Public members with these attributes show more supportive attitudes to FRT and more concerns about privacy invasion. There are significant differences between the public and traffic police in attitudes toward FRT (*p* < 0.001). Compared with the public, traffic police officers showed more supportive attitudes to FRT. This research contributes to promoting the application of FRT legitimately and alleviating people's concerns about the technology.

**Keywords:** facial recognition technology; e-biker; red-light running behavior; privacy invasion

#### **1. Introduction**

The e-bike is a vital means of transportation in many Chinese cities [1], given its convenience and fast characteristics. As of 2021, the number of e-bikes in China has reached nearly 300 million. The rapidly increasing number of e-bikes has resulted in increased accidents. In 2019, there were approximately 8639 deaths and 44,677 injuries caused by e-bike accidents, which is close to 70% of non-motorized vehicle casualties [2]. In China, e-bikes are categorized as non-motorized vehicles, and riders must drive on non-motorized lanes and comply with the same regulations as bicycles [3]. However, red-light running, illegal use of motor vehicle lanes, and over-speed cycling are the main reasons for accidents involving e-bikes [4]. These violations are often caused by low traffic safety awareness [5], among which running the red light is the leading cause of e-bike accidents [6,7]. Previous research points out that e-bikers run a red light more frequently than traditional bicycle riders [8], and e-bikes are faster than bicycles before collisions, with a higher risk ratio at intersections [9].

To reduce the red-light running behavior of e-bikers, many cities in China, such as Shenzhen, Shanghai, Jinan, and Fuzhou, have launched the Red-light Record System to regulate traffic violations. The system can capture and recognize the red-light running behavior of pedestrians and e-bikers and expose the screen's on-site violation images. The application of this system has achieved satisfying results. Since the Red-light Record System trial in Jiangbei, Chongqing, the violation rate of pedestrians and e-bikes has dropped from

**Citation:** Yang, Y.; Yin, D.; Easa, S.M.; Liu, J. Attitudes toward Applying Facial Recognition Technology for Red-Light Running by E-Bikers: A Case Study in Fuzhou, China. *Appl. Sci.* **2022**, *12*, 211. https://doi.org/ 10.3390/app12010211

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti and Dimitrios S. Paraforos

Received: 14 December 2021 Accepted: 22 December 2021 Published: 26 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

40% to less than 3%. With facial recognition technology (FRT), traffic police need not face the violators, and the difficulty of enforcement is reduced with the evidence provided from FRT. However, there is no specific law related to applying FRT in the traffic area. Thus, different cities have different standards for FRT. China has not yet established a unified standard for the application of FRT in transportation. The application of FRT has aroused public concerns about privacy invasion. Controversial opinions exist regard the extent to which violators' information is exposed and the suspicion around releasing personal privacy. Furthermore, whether it is a punishment beyond the law.

Thus, to understand the application effects of FRT, we investigate the attitudes of two significant stakeholders (the public and traffic police) on applying FRT in Fuzhou, China. The study aims to determine: (1) The public's opinion on the privacy violation of exposing personal information of red-light running behavior, (2) how personal characteristics of the public affect their attitudes toward FRT, and (3) the attitudes of traffic police toward FRT. Based on the above analysis, we propose several practical suggestions to improve the efficiency and rationality of FRT.

The methodology of the study is shown in Figure 1. The methodology consists of a literature review, experimental design, questionnaire design, data collection, and statistical analysis. In the analysis, all statistical calculations and plots were performed using SPSS 22.0.

**Figure 1.** Study methodology.

#### **2. Literature Review**

#### *2.1. Red-Light Running Behavior of E-Bikers*

Many studies have been conducted to determine the factors that affect the red-light running behavior of e-bikers, mainly from external and internal perspectives. In terms of external factors, the higher acceleration rate and weight of an e-bike enables bikers to reach a higher speed than bicycle riding. Thus, e-bikers are more likely to run a red light [10,11]. Traffic conditions and situation factors have been verified to impact red-light running behavior [12]. They are also prone to accidents when the speed of an e-bike is underestimated by other road users [13]. As to internal factors, the attitudes of e-bikers are in close relationships with red-light running behavior. Red-light running intention and willingness could be predicted by the attitudes and past behaviors of e-bikers [14]. Self-discipline to follow traffic regulations, herd tendency, and past behaviors of e-bikers are crucial factors that affect the likelihood of accidents [15]. Of course, higher safety awareness

and more concern about their traffic risk could reduce dangerous riding behaviors [16]. An acceptable waiting time for e-bikers at signalized intersections is shorter than that of bicycle riders, which may also be one reason for the higher probability of red-light running behavior [17]. Some scholars have found that gender and age may affect red-light running behavior. In terms of gender, males are more likely to run against a red light than females [18]. Although the effect of age on red-light running behavior is still not clear, young and middle-aged people are more likely to run a red light [19–21]. Whether holding a driving license or not could also affect the red-light-running violation rate [22]. To sum up, the complexity of these influential factors poses a significant challenge to red-light running behavior.

#### *2.2. Preventive Measures*

To prevent red-light running behavior, different intervention measures have been taken, such as educational programs, enforcement activities, and social marketing [23]. These interventions could use the positive influence of e-biker groups to promote lawobeying behavior [21]. Educating and training e-bikers is fundamental to reducing red-light running behavior [24]. E-bikers were recommended to participate in training programs to provide relevant skills [25]. Education and training programs for e-bikers with different characteristics reduce their unsafe behavior [26]. Besides, a comprehensive e-bike treatment needs enforcement [27]. Some scholars have recommended launching an e-bike license system with point-based penalties by factoring in China's unique regional and political characteristics [28]. Police enforcement of traffic regulations could effectively curb the red-light running behavior of e-bikers [24].

Technical equipment is widely used as an essential supplementary measure to monitor red-light running behavior. The equipment includes red-light cameras for motor vehicle drivers [29] and red-light running detectors performed by a system that consists of a camera and computer embedded in a motor vehicle [30]. Recognition systems using different technologies are used to monitor the red-light running behavior of cyclists and pedestrians. These technologies include video sequences, adaptive mapping techniques, and trained classifiers. Most of these technologies are related to image recognition. The video sequence is applied to detect red-light running behavior [31]. A real-time pedestrian recognition system that ensures high accuracy using a deep learning classifier and zebracrossing recognition techniques is proposed using an adaptive mapping technique and a dual camera mechanism [32]. Finally, a recognition system for recognizing people at a pedestrian crossing is developed, which includes a trained classifier and two sets of images taken from an open database containing images of city streets from outdoor cameras [33]. These technologies can be used for image recognition of pedestrians and cyclists.

Among this technical equipment, FRT could be the most advanced one to monitor the red-light running behavior of pedestrians and non-motorized vehicles. These systems use FRT, including the red-light automatic early warning system and the red-light snapping system. The former is used for cyclists and pedestrians with automatic crossing reminders, red-light recording, exposure, and information inquiry [34,35]. In order to address the issue that the targeted face is subject to varying conditions, particularly of illumination, a novel pedestrian detection algorithm with multi-source face images is proposed [36]. With the red-light snapping system, the tracking success rate is increased to 85%, and the number of simultaneous tracking reaches 25 people [37].

However, due to the sensitivity of biometric data and the heterogeneity and openness of the network environment, the privacy leakage of biometric data is difficult to avoid [38]. Therefore, how to improve face recognition accuracy while ensuring high security of private data has provoked fierce public discussion.

#### *2.3. Regulations and Privacy Concerns about FRT*

Although advanced technologies could improve traffic safety, there are drawbacks at the same time. The main problem is the risk of privacy invasion since these technologies

can collect, store, and share personal information [39]. For example, privacy and safety are the main concerns expressed concerning traffic enforcement drones, and the citizens once opposed this technology in Los Angeles. They felt the department would use drones to track and observe them [40]. Privacy concerns are also reflected in in-vehicle data recorders. This concern tends to hinder the acceptance of innovations [41].

There are limited studies on the application feedback of FRT in recognition of redlight running behavior. However, numerous studies have conducted public surveys about FRT application, indicating their concerns about privacy invasion. In many cases, their facial information is collected involuntarily [42], which may lead to undesirable results of intrusions of privacy [43]. The privacy concerns are affected by privacy control, which means giving users the autonomy to control their private information [44]. The legitimacy of FRT contributes to allay, deaden, or possibly circumvent privacy concerns. In other words, FRT with less legitimacy could heighten people's concerns about privacy [45]. FRT also raises concerns about control over personal information, where it is used, and the potential for misrecognition [46]. These concerns about privacy invasion that FRT may cause have attracted worldwide attention.

The application of FRT for legal regulation has become the focus of legislative protection in various countries. Many states in the US have issued several bills about FRT. Government agencies in the US are cautious about using FRT and focus on prohibitive regulations. For example, the Body Camera Accountability Act states that the operation of FRT with a camera is an invasion of personal privacy [47]. Non-governmental organizations in the US are more open to using FRT, and they allow the restricted use of FRT to a certain extent. For example, Illinois proposed the Biometric Information Privacy Act (BIPA) to regulate the collection, storage, use, retention, and destruction of biometric information, including facial feature information, through individual empowerment and enhanced obligations [48]. The EU also restricts the application of FRT strictly. The General Data Protection Regulation (GDPR) incorporates different types and properties of personal information and protects personal information through civil, administrative, and criminal measures. In exceptional circumstances, the processing must meet the requirements of legal, legitimate, consent, and voluntary [49].

In China, the protection of facial features about FRT is distributed in laws and regulations. The Civil Code became effective on 1 January 2021, stipulating a natural person's personal information is protected, and the personal information mainly includes a name, birthday, and ID number. However, the Civil Code does not stipulate the contents and methods of protection expressly. China has announced more detailed regulations on the facial feature information involved in applying FRT in administrative regulations, rules, and other normative documents. Information Security Technology-Personal Information Security Specification revised in March 2020 explicitly regulates that personal biometric information is sensitive personal information. Sensitive personal information needs special protection. For example, before collecting personal biometric information, the subject should be informed of the purpose, method, and scope of personal information, storage time, and other rules, and the subject's consent should be obtained. Personal biometric information should be stored separately from personally identifiable information. In principle, original personal biometric information should not be stored. However, these regulations are only recommended and not mandatory [50]. The regulations about the application of FRT in China need to be further improved.

#### **3. Questionnaire Design and Data Collection**

#### *3.1. Research Background*

The research was conducted in Fuzhou, China. The application of FRT in Fuzhou, dates back to 2016 when the Fuzhou Traffic Police Department launched the first Red-light Record System at the intersection of Yangqiao Road and Daming Road. The system automatically can capture the images of the violators when they run a red light and recognize their personal information, and Figure 2 is the screen part of this system. Figure 3 is the red-light

running behavior of e-bikers at the intersection. Then, the violators' mobile phones will receive a message from the system, including the time and place of the violations. When the violators pay the fine, their images will disappear from the screen.

**Figure 2.** Part of the red-light monitoring system.

**Figure 3.** Red-light running behavior of e-bikers.

At the end of 2019, about 2.09 million e-bikes registered in the five districts of Fuzhou, China, resulting in increased regulatory difficulty. However, the application of FRT is facing contrary opinions. On the one hand, the effectiveness of FRT is recognized by part of the public who believe that FRT is more a deterrent than just a fine and by the traffic police for whom the technology substantially reduces the need for on-site supervision and provides reasonable evidence for punishment. On the other hand, some members of the public hesitate about accepting FRT as they are unsure whether their privacy is infringed and whether the collected information can be effectively protected.

#### *3.2. Measures*

#### 3.2.1. Public Investigation

Referring to the Motor Vehicle Risky Driver Behavior Scale [51], and according to the characteristics of e-bikers and behavior, we designed a public investigation questionnaire. The questionnaire consists of two parts. The first part consists of basic personal information, including gender, age, education level, and driving license (Table 1). The second part is the

public's attitudes toward FRT, which includes three variables: Attitudes toward red-light running behavior, the application effect of FRT, and whether FRT violates privacy (Table 2). The first part of the questionnaire uses a single-choice form, and the second part uses a Likert five-level scale (from "strongly disagree = 1" to "strongly agree = 5").


**Table 1.** Items of basic personal information.

**Table 2.** Survey items of public and traffic police attitudes toward FRT.



#### 3.2.2. Traffic Police Investigation

At the same time, we designed a questionnaire for traffic police from law enforcement officials' perspectives to understand their attitude towards FRT (Table 2). All questions use a Likert five-level scale (from "strongly disagree = 1" to "strongly agree = 5").

#### *3.3. Participants*

In July 2019, the questionnaires were distributed to the public and traffic police in Fuzhou, China. All ethical norms and standards were strictly followed during the survey. The survey randomly selected 270 people from the public. The requirements were: (1) They are between 18 and 70 years old and use e-bikes more than three times a week; (2) have lived in Fuzhou, China for more than 6 months, and (3) are able to understand and answer the questionnaire. Among the 270 public questionnaires, we excluded 12 partially unanswered questionnaires, and the remaining 258 questionnaires were valid. In addition, 94 traffic police officers in Fuzhou, China, were randomly selected. Four partially unanswered questionnaires were excluded, and the remaining 90 questionnaires were valid. Therefore, in the subsequent data analysis, only the valid questionnaires of public and traffic police are discussed.

#### *3.4. Questionnaire Data Reliability*

The test of the reliability and validity of the data set indicates that the Cronbach's α coefficient of the two questionnaires is greater than 0.7 [52], indicating good reliability of the questionnaires. Furthermore, the KMO and Bartlett spherical tests also meet the requirements of being greater than 0.6 with significance. Thus, the two questionnaire datasets used in this research are credible and compelling.

#### *3.5. Demographic Data*

Table 3 shows the demographic information of 258 interviewees of all valid questionnaires. The statistical results showed that the percentages of males to females surveyed are almost equal. Most of the people surveyed are in two age groups: 18–36 and 37–54. In terms of education level, most of them are with college and undergraduate degrees (43.4%) or high school degrees (33.7%), while other degrees account for a relatively low proportion. In addition, most of the interviewees have driving licenses (62.0%).


**Table 3.** Demographic information of the public.

#### **4. Analysis and Results**

#### *4.1. Statistical Analysis of Public Questionnaire Data*

Table 4 shows the average and standard deviation of each item in the public questionnaire. From data statistics, the scores of the three variables are all between 3.2 and 3.3. Variable A has a score of 3.238, indicating the public generally regards running a red light to be dangerous behavior (A1 ~ A6), but two items of variable A (A5 and A6) have lower scores. The scores of variable A reflect that the public's awareness of observing traffic rules is relatively poor. The score of variable B is 3.278, indicating they are more supportive of the effect of using FRT (B1 ~ B6). Among items of variable B, only the scores of B2 are lower than the average scores, and the results reflect that FRT is less effective in improving the safety awareness of the public. Variable C has the highest score of 3.297, indicating they generally view that FRT does not violate their privacy (C1 ~ C5). However, C5 has the lowest score of 3.019, suggesting that the public is less concerned about the right to know the use of FRT. Above all, the public generally supports monitoring red-light running behavior by using FRT without worrying about privacy invasion too much.

**Table 4.** The average and standard deviation of each item in the public questionnaire, *n* = 258.



The research uses Mann–Whitney U and Kruskal–Wallis tests. The Mann–Whitney U test is used to explore: (a) Whether different genders and driving license statuses resulted in differences in the three variables regarding attitudes toward red-light running behavior of e-bikers, (b) determine the application effect of FRT, and (c) whether FRT violates privacy. There is no significant difference in terms of public's gender and three variables. However, there is a significant difference in terms of driving license status and the public's attitudes toward red-light running behavior (U = 998.000, *p* < 0.001) and the application effect of FRT (U = 2865.5, *p* < 0.001). However, there is no significant difference in the public's attitudes toward privacy invasion.

The Kruskal–Wallis test is used to explore whether the public's different ages and education levels resulted in differences in the three variables of public attitudes. There is no significant difference in terms of age and the three variables. There is a significant difference in terms of education level and the three variables, i.e., attitudes toward red-light running behavior of e-bikers (χ2(3) = 114.730, *p* < 0.001), application effect of FRT (χ2(3) = 103.534, *p* < 0.001), whether FRT violates privacy (χ2(3) = 90.292, *p* < 0.001).

Then, we use Multiple Correspondence Analysis (MCA) to study the correspondence between the public's characteristics and the three variables. In order to meet MCA's data requirements, the scope of variables (A, B, and C), values, and the classification values are shown in Table 5. Figure 4 is the joint plot of the category points. Table 6 and Figure 5 present the discrimination measures of the variables. The MCA transforms all variables of the original data through the optimal scale transformation to obtain two dimensions (Dimension 1 and Dimension 2).


**Table 5.** The scores of variables (A, B, and C) and the values after classification.

**Figure 4.** Joint plot of the category points. Correspondence between the variables: (**a**) The correspondence between the public's characteristics and variable A; (**b**) the correspondence between the public's characteristics and variable B; and (**c**) the correspondence between the public's characteristics and variable C. The education level in the figures is abbreviated (JSH and below stands for junior high school and below; SHS stands for senior high school; college and HG stands for college and undergraduate; PG and above stands for postgraduate and above).

From Figures 4 and 5, and Table 6, we could observe the correspondence between the public's characteristics and the three variables. In Figure 5 and Table 6, the public's education level (x = 0.509, y = 0.186) and whether holding a driving license (x = 0.630, y = 0.030) are related to the value of variable A, and the two characteristics also possess greater explanatory power to variable B (x = 0.533, y = 0.289; x = 0.541, y = 0.009). However, variable C only related to the public's education level (x = 0.787, y = 0.269).


**Table 6.** Discrimination measures of the variables.

**Figure 5.** Discrimination measures of the variables: (**a**) Correspondence between the public's characteristics and variable A. (**b**) Correspondence between the public's characteristics and variable B. (**c**) Correspondence between the public's characteristics and variable C.

Figure 4 presents the points from the various categories. Different variables that are close to the same direction and the area of the graph may be related. In Figure 4a,b, the category points of the education level and whether holding a driving license are close to the specific scores of the variables A, B, and C. Specifically, the points of higher education level are closer to the higher scores of variables A and B. For example, "PG and above" is close to "A: (4, 5]" and "B: (4, 5]" and "College and HG" is close to "A: (3, 4]" and "B: (3, 4]".

In Figure 4c, "C: (1, 2]" and "JHS and below" have a long distance. The relationships between education level and variable C are similar to the situation in Figure 4a,b. In general, there are positive correlations between the driver's education level and the three variables. Besides, the points of whether holding a driving license are close to the points of variables A and B. Specifically, "Do not have" is close to "A: (2, 3]" and "B: (2, 3]" and "Have" is close to "A: (3, 4]" and "B: (3, 4]". However, whether holding a driving license does not have an obvious relationship with the points of variable C. The results indicate that people with a driving license get higher scores in variables A and B than those without a driving license. In short, whether holding a driving license positively affects variables A and B.

Table 7 and Figure 6 show the corresponding results between each of the three variables. In Figure 6b, the position of variable A (x = 0.752, y = 0.464) is close to that of variable B (x = 0.724, y = 0.419), and variable C (x = 0.305, y = 0.300) is farther than the two variables. In Figure 6a, the points position of the three variables with the same scores are also similar, except for the score (1, 2]. The results illustrate that the scores of the three variables have correspondence when the scores are higher. In other words, the scores of the three variables reach a higher level at the same time.

**Table 7.** Discrimination measures of the three variables.


**Figure 6.** MCA results of the three variables: (**a**) Joint plot of the category points. (**b**) Discrimination measures of the three variables.

#### *4.2. Statistical Analysis of Traffic Police Questionnaire Data*

Table 8 shows the average and standard deviation of each item in the traffic police questionnaire. The results indicate that the average of most items is between 3.6 and 3.7, and the average of all items is 3.651. Among the traffic police questionnaire items, P7 has the lowest scores, which is like the public's results. Thus, from the perspectives of the traffic police, FRT can not entirely improve the safety awareness of e-bikers. Nevertheless, overall, the traffic police have high support for the use of FRT.


**Table 8.** Average and standard deviation of each item in the traffic police questionnaire, *n* = 90.

<sup>1</sup> The average of all items is 3.651.

#### *4.3. Comparative Analysis of Questionnaire Datasets*

Extracting the same items in the two questionnaires, the Mann–Whitney U test was used to explore the attitude differences between the public and traffic police toward FRT. The test results in Table 9 illustrate the two groups differ in attitudes toward FRT (U = 6958.500, *p* < 0.001).

**Table 9.** Differences in attitudes toward FRT between the public and traffic police.


Table 10 shows the Mann–Whitney U test results of the same items by public and traffic police groups. There is a significant difference in the same items of the two groups, including B2/P7, B3/P4, B5/P6, C1/P8, and C3/P9, but there is no significant difference in B1/P2. The average of the same items also has a significant difference between the public and traffic police. Based on the results, it is concluded that there is a significant difference between public and traffic police attitudes toward FRT in general.


**Table 10.** Mann–Whitney U test results of the same items by public and traffic police groups.

#### **5. Discussion**

#### *5.1. Public Attitude toward FRT*

In general, the public supports using FRT to manage the red-light running behavior of e-bikers. To understand which public characteristics are related to the attitudes toward FRT, we analyzed the correlation between the four individual characteristics and the three variables using the method of the Mann–Whitney U test, Kruskal–Wallis test, and MCA.

The results of the Kruskal–Wallis test and MCA indicate that members of the public with higher education levels are more resistant to the red-light running behavior of e-bikers (χ2(3) = 114.730, *p* < 0.001; Figure 4a). This finding is consistent with Wang et al. [53]. Under-educated e-bikers lack safety knowledge [53], and people with higher education backgrounds comprehend more traffic safety knowledge [39,45]. Members of the public with higher education levels are supportive towards the application effect of FRT (χ2(3) = 103.534, *p* < 0.001; Figure 4b), and they also show the trust of privacy protection (χ2(3) = 90.292, p < 0.001; Figure 4c). Because of more safety knowledge, people with higher education pay more attention to red-light running behavior and highly support FRT, perhaps due to their greater acceptance of new technologies. Moreover, their acceptance of FRT affects the trust of privacy protection.

Regarding whether or not holding a driving license affects the public's attitudes toward red-light running behavior and FRT (U = 998.000, *p* < 0.001; Figure 4a), people with driving licenses appeared to be more resistant to red-light running behavior. This is because e-bikers with driving licenses have lower perceived behavioral control and higher moral norm than those without driving licenses [23]. Moreover, e-bikers with driving licenses are also more supportive of the use of FRT (U = 2865.5, *p* < 0.001; Figure 4b). The strong correlation between the attitudes toward red-light running behavior and the application effect of FRT may indicate that people with driving licenses are more supportive of FRT.

#### *5.2. Comparison of Public and Traffic Police Attitudes on FRT*

The traffic police generally support the application of FRT (the average of all items is 3.651). Comparing the results of the same questions in public and the traffic police questionnaires shows that there are significant differences between the two groups in many items (U = 6958.500, *p* < 0.001), including "raise safety awareness, support for FRT applications, privacy issues of FRT, and information protection". The support from the traffic police to FRT is significantly higher than that of the public.

For the traffic police, how to reduce the red-light running behavior of e-bikers has been a difficulty [54], and the appearance of FRT has solved the problem well [55]. Thus, reducing management difficulty may be the main reason why the traffic police support FRT. For example, Shenzhen started to use FRT in 2017, which had reduced the number of red light-running behavior at intersections from about 150 cases per hour to about 8 cases per hour within half a year [56]. Besides, FRT can realize real-time monitoring, which is difficult

for traffic police [57]. The application of FRT can protect traffic police from personal injury caused by violators [58].

#### *5.3. Measures to Protect Public's Privacy*

The public generally believes that FRT does not violate their privacy (the average score of variable C is 3.297), indicating that FRT is trustworthy for the public. However, the attitudes toward privacy violations differ in the education level of e-bikers (χ2(3) = 90.292, *p* < 0.001). Overall, highly-educated e-bikers have more confidence in privacy protection involved in FRT. Thus, under the circumstance that information can be completely protected, the public's concerns about privacy violation can be alleviated.

The privacy about personal data (e.g., facial images) consists of the right to control the access to and use of these data [59]. Regulation of the use of FRT is vital for privacy protection. In China, FRT used at signalized intersections ensures traffic safety and protects public interest. However, laws and regulations to standardize FRT use in China are still not complete. The official privacy-preserving policy could mitigate some of the privacy concerns which seem to be most troubling for the public, such as blurring people's faces, allowing officers to access only violation footage, and so on [40]. Besides, the public should be well informed about the facial recognition systems and should have consented to use these systems for the specific and justified purposes in question [59].

Updated technologies are conducive to privacy protection. For instance, FRT based on temporal features could preserve privacy [60]. A face recognition protocol, named PEEP is used to protect privacy by utilizing differential privacy [61]. The principal components of adversarial segmented image blocks can protect people's privacy and prevent the distinct face-related features of images from being easily extracted [62].

#### **6. Conclusions**

This research developed two questionnaires for the public and traffic police and analyzed their attitudes toward applying FRT and its effects and privacy issues. The results indicate that:


This study investigates the application of FRT from the perspectives of the public and traffic police. We analyzed the rationality of the use of FRT in combination with the public's attitudes toward personal privacy invasion. Our research has a certain contribution to society and science. Based on the research results, we recommend that government

departments should carry out the following tasks for the public, including people with low education and without driving licenses: (1) Conduct safety education and training regularly, and (2) promote the fact that the final purpose of FRT application is to improve public security's awareness and publish information without violating privacy.

**Author Contributions:** Conceptualization, Y.Y. and J.L.; methodology, Y.Y. and J.L.; validation, Y.Y., J.L., and D.Y.; formal analysis, D.Y.; investigation, D.Y.; writing—original draft preparation, D.Y.; writing—review and editing, Y.Y., J.L., and D.Y.; visualization, J.L. and S.M.E.; supervision, J.L. and S.M.E.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** No applicable.

**Informed Consent Statement:** No applicable.

**Data Availability Statement:** The study did not report any data.

**Acknowledgments:** The authors are grateful to four anonymous reviewers for their thorough and most helpful comments. They also thank the Fujian Traffic Police Corps for funding this research and to Shihua Ni from the Fuzhou Traffic Police Detachment for his valuable support of this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Identifying the Importance of Criteria for Passenger Choice of Sustainable Travel by Train Using ARTIW and IHAMCI Methods**

**Lijana Maskeliunait ¯ e \* and Henrikas Sivileviˇ ˙ cius**

Department of Mobile Machinery and Railway Transport, Faculty of Transport Engineering, Vilnius Gediminas Technical University, LT-10105 Vilnius, Lithuania; henrikas.sivilevicius@vilniustech.lt **\*** Correspondence: lijana.maskeliunaite@vilniustech.lt

**Abstract:** Nowadays, travelers can use different modes of transport, and they usually choose the most suitable and reliable mode available. The choice of one mode of transport as an alternative to another is subjective. It is usually built upon passenger attitude toward the advantages and disadvantages of using a particular mode. This article proposes analytical methods for and research results on passenger choices for sustainable train journeys as an alternative to traveling by bus. The rank averages of all criteria and their normalized subjective weights were calculated with reference to new linear (ARTIW-L) and nonlinear (ARTIW-N) methods of average rank transformation into weight. A correlation between sub-criteria rank averages and normalized weights is presented, based on the minimum number of passengers required to be interviewed to provide reliable results. The average ranks assigned by passengers to the evaluation sub-criteria and their global weights were used for determining and describing the most and least important key criteria by applying the inverse hierarchy for assessment of main criteria importance (IHAMCI) method. The analysis shows that the most important key criterion belonged to the sub-criteria characterizing economy, while the less important key criteria included ride comfort. The least important key criteria described safety and environmental protection, whose normalized subjective overall weights were the lowest. Rail transport authorities and companies involved in transporting passengers can make this mode of transport more attractive to people by giving priority to improving the services they provide to passengers.

**Keywords:**railway transport; passengers; sustainable travel; ARTIW method; IHAMCI method; MCDM

#### **1. Introduction**

The sustainable development of the European Union (EU) is strongly influenced by transport, which is polluting from an environmental point of view. Transport infrastructure accessibility criteria and accessibility distance have a positive effect on sustainable development [1]. In order to mitigate transport sector greenhouse gas emissions, it is necessary to assess the efficiency of transport policy [2]. One of the most effective ways of improving the sustainability of the transport sector is the choice to use a less-polluting mode of transport. Passenger satisfaction is an important factor in choosing a mode of transport to travel in municipalities and especially in big cities [3].

Mobility is crucial for the development of a country's internal market and for maintaining the desired life quality of citizens, as it is important for people to exercise their freedom to travel. This means more frequent travel by bus, rail, and air [4]. Viable options can only be available through better integration of modal networks, which means that airports, railway stations, and metro and bus stations should be increasingly interconnected and transformed into multimodal passenger transport platforms [4].

There will be more possibilities for passengers to choose a particular mode of transport when various systems of transportation become more closely integrated. Experience has

**Citation:** Maskeliunait ¯ e, L.; ˙ Sivileviˇcius, H. Identifying the Importance of Criteria for Passenger Choice of Sustainable Travel by Train Using ARTIW and IHAMCI Methods. *Appl. Sci.* **2021**, *11*, 11503. https:// doi.org/10.3390/app112311503

Academic Editor: Anselme Muzirafuti

Received: 27 October 2021 Accepted: 2 December 2021 Published: 4 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

shown that passengers usually choose the mode of transport that best suits their personal needs, habits and understanding of the expected quality of traveling. The mode of transport that is the most acceptable to a passenger can be determined by developing a system of criteria and calculating their weights using multi-criteria decision-making (MCDM) methods [5–9]. Passenger loyalty development increases the profits of airline, rail, and road transport companies. This is undoubtedly important for the success of the enterprise [10].

Traveling allows us to learn more about the world. However, it brings about some inconveniences and discomfort. People who travel by bus for long distances spend much time on a bus and become tired.

For many people, it causes stress, unless they are inveterate travelers. Therefore, when planning a trip, people should learn more about particular transport modes, trip duration, possibilities for rest, and the cost of the trip. Various transport modes (by bus, train or airplane) have advantages and disadvantages [11]. It is important to inform urban public transport passengers on the estimated time of arrival, as that influences the outlook for traveling by bus [12]. Train traffic has increased over the last decade and is likely to continue to do so, as more passengers and freight are transported by train rather than by car. This will reduce energy consumption and pollutant emissions [13].

Travel by train is still popular worldwide. The railways, rolling stock, and services provided to passengers as well as safety are being constantly improved, while the environmental pollution is being reduced. This helps rail transport to compete with other modes of transport. Matuška [14] examined the accessibility of rail transport by providing ways to assess the accessibility of railway infrastructure and trains. He applied a two-step model to assess the availability of departure halls. Train services are barrier-free for passengers, especially those traveling long distances, but they are still less accessible to disabled passengers in the suburbs and outer regions.

#### **2. Literature Review**

A U.S. interregional travel study focused on regional long-distance (100 to 500 miles) passenger transportation. Consideration was put on travel by car, plane, motor bus, and train. Attention was paid to high-speed and conventional passenger trains [15]. In the freight and passenger transport sector in the Slovak Republic, competition in the rail transport market differs. The total number of passenger-kilometers has increased by 12 percent due to an increase in free tickets for students and retired people [16].

Studies aimed at evaluating technical parameters of roads and rolling stocks and improving their interaction and safety of travel as well as risk factors and accidents have been performed. Rail transport must ensure a high level of reliability and safety of travel. Since the wheel is one of the main subsystems of the railway vehicle, it can make a significant contribution to the reliability and safety of the train. One of the main measures to meet the requirements is to implement proper maintenance procedures. The quality of railway tracks has an impact on train safety and passenger comfort. In practice, the quality of railway tracks is measured by a track recording vehicle, which measures seven key geometrical parameters of tracks. Traditionally, track gauge, vertical and lateral alignments, and cross-level (angular variation in the track section, i.e., cant or superelevation) are measured [17]. Xin et al. [18] presented a model for predicting railway track damage. Railway violations have been proved to be the most important determinants of train safety and passenger travel comfort [19]. Unauthorized changes in railway track geometry can have a negative impact on train traffic safety [20].

Compared to other modes of transport, rail travel is safe. Afazov et al. [21] provided a more detailed understanding of modern modeling techniques that can be used in the design of railway vehicles. Lin et al. [22] presented a probabilistic risk assessment methodology for analyzing adjacent-track accident risk. Research in [23] was conducted to establish the safety of glazing systems for passenger railway equipment. Kovandová and Válka [24] investigated traffic safety as a major social problem related to accidents on railways and at track crossings.

Some studies have been performed regarding the possibilities of increasing the power and frequency of rail vehicles. Xu et al. [25] presented strategies to increase train frequency and rail capacity that would be helpful to metro dispatchers.

The model presented by Sun and Schönefeld [26] made it possible to identify gaps in the capacity of the train network and to assess the impact of schedule adjustments on passenger route choice. Xiang and Zhu [27] proposed multifunctional optimization to improve the economic performance of heavy rail.

Increasing the volume of passenger transportation by rail and its effectiveness in terms of expenses is a priority task. Determining the market shares and new offers for passenger service requires the study of the specific regional features of demand and the relevant state of transport services provided by different types of passenger transportation companies. Makarova and Muktepavel [28] presented a system of calculation and analytical indicators for analyzing regional passenger traffic. It allows specialists in this area to investigate passenger flow tendencies, determining the demand and cost of transport services as well as their dispersion across the region and to perform a comparative analysis of internal passenger traffic and the total amount of passenger traffic in the network. In practice this analytical information can be used to determine the optimal passenger train length, to assess the profitability of introducing local trains into operation, to define the optimal number of stops, and to calculate the amount of passenger cars needed to satisfy the demand. The route in the model that was offered by Tang et al. [29] is divided into sections that can be independently updated, and the target function is expressed in terms of minimizing driving time. This model can help to quickly and efficiently develop a strategic plan to reduce running time in passenger rail corridors.

Liao and Liu [30] used microscopic simulation models to investigate passenger behavior in the non-payment area.

Allen and Levinson [31] studied passenger train schedules and their average speed on North American railways in the period between 1965 and 2015. These train traffic parameters were used because their values were easy to obtain.

Passenger transportation systems are being upgraded and expanded around the world. Experimental studies have been carried out to improve the quality and efficiency of high-speed train services. Lee et al. [32] investigated the aerodynamic properties of a high-speed train pantograph and made suggestions for their improvement. Ou et al. [33] investigated the reasons for the development of a comprehensive railway system in China and its impact on the development of intercity passenger railways. Teixeira and Prodan [34] reviewed railway taxation systems and their development in 2007–2012. They assessed the importance of taxation for the single European railway market.

Oh et al. [35] conducted an analysis of covariance and analysis of regression and identified the effect of wagon door width on passenger boarding time on Korean city railways. Holloway et al. [36] presented the results obtained from an experiment in determining the time required for passengers to board or deboard a train. They found out that steps had little or no effect on the time to board a train for younger luggage-carrying people, while senior passengers, on the contrary, needed more time.

Multi-criteria decision making (MCDM) methods are used to solve problems related to the use of different modes of transport [37–45]. Chen et al. [46] investigated the process of rail passenger transfer at large terminals by comparing different alternatives. Stoilova [47] presented a combination of multi-criteria models for rating railway passenger transport development.

The MCDM methods used for modeling and evaluating the quality of passenger transportation on an international route allow researchers to identify the opinions of passengers, staff and the administration of the train about the weights (significances) of various criteria describing this complicated process [48,49]. Improving various aspects of this process can help rail transport to compete with other modes of transport more effectively. Based on an investigation of existing market research practices, three main approaches were identified for a comparative analysis of the influence of different parameters of transport

services on passenger satisfaction in order to define priority directions for implementation of administrative decisions concerning service quality. They included the method of obtaining priorities from passengers, as well as calculations based on mechanisms of correlation and regression analysis using the method of smallest squares and calculations based on the application of various nonparametric methods of statistics. Methodical and practical approaches based on modeling and intended for identifying promising areas for improving the quality of public services have also been presented [50]. The comparison and visualization of the results of assessing the impact of different transport service parameters on the overall quality of service by the methods of ordinal logistic regression were also presented. Customer perceptions of the quality of service provided by the operator and the level satisfaction are key parameters to monitor performance. Kesten and Ö ˘güt [51] provided a practical way to monitor the functioning of the public transport system as a result of passenger evaluation. The passenger-oriented efficiency index was developed, and it employed 22 indicators and 6 different tools. Time, cost, ease of transfer, security and quality of service were assessed.

The aim of this study is to provide a set of criteria and show the advantages of rail transport compared to road transport (buses). By using MCDM methods, we determined the mean ranks, global, and overall weights of these criteria, employed a reverse hierarchy model and correlation of values. Finally, we calculated indicators showing the consistency of passenger views.

#### **3. The Methods of the Average Rank Transformation into Weight (ARTIW-L and ARTIW-N)**

The weights of the evaluation criteria (sub-criteria or key criteria) largely determine the evaluation result. In practice, the subjective weights assigned by experts or respondents to the considered criteria are commonly used. These weights present the judgments of highly qualified experts with long-term practical experience and theoretical knowledge in the considered field [52,53]. Passengers themselves make decisions about the mode of transport they choose for travel and, therefore, are experts themselves. However, because of their low competence, they should be referred to as respondents answering the survey questions rather than experts.

Most of the widely known and used methods for evaluating the weights of multiple criteria (factors) are based on experts' judgments. These methods embrace a thorough problem analysis by experts, the organization of this process as well as quantitative evaluation of decisions, and the arrangement of the obtained results. Therefore, the problem of practical determination of the accurate weights of the considered criteria arises. The subjective weights of the evaluation criteria can also be found from the ranks assigned to these criteria by experts. The estimates (judgments) of various experts differ considerably, often being inconsistent, which implies that the obtained weights (significances) of the criteria as well as their order of preference may be different.

The result of the experts' evaluation largely depends on their qualifications and experience in assessing the objectives, as well as their responsibility for providing the appropriate estimates of criterion significance and readiness to take part in the experimental study. The judgments of specialists and respondents about the relative significance of the criteria and their arrangement by order of priority (preference) often differ; therefore, the ranks and weights expressed in terms of the average values of the experts' estimates can be used in multi-criteria evaluation only if the consistency of the estimates has been proved. The consistency of the estimates given by a group of experts in terms of ranks is based on the idea of compactness.

In the case of expert evaluation, the average estimate obtained from a group of experts (respondents) is the problem solution (a result of decision-making) only when the judgments of all the experts are consistent. If a decision should be made based on the average estimate of the experts or respondents, the level of consistency of the experts' estimates is described by the concordance coefficient *W*. To determine the concordance coefficient *W*, the ranks of the evaluation criteria assigned by the experts or respondents

are required. If their estimates are given in other units (for example, in points), they should be ranked.

The consistency of the weights of the criteria describing an object and the estimates provided by experts are usually determined by using the analytic hierarchy process (AHP) approach [54–56]. The consistency of group evaluation results is determined by using the method of rank correlation [57,58].

The AHP approach is rather complicated [59,60] because not all of the experts can properly fill in the questionnaire (i.e., a pairwise comparison matrix), which would allow them to calculate the weights of the criteria and the consistency ratio (C.R.). The AHP method also allows for determining the consistency ratio of each expert's estimates, which should be smaller than C.R. ≤ 0.1. Moreover, the AHP approach is used for calculating each criterion's eigenvector, i.e., the procedure of normalizing the geometric mean of the rows. The maximal eigenvalue (*λ*max), the consistency index (C.I.) and the consistency ratio (C.R.) should also be calculated [41,53,55,61,62].

Experts usually assign the ranks *Rij* to the criteria by arranging them according to their significance and giving them the appropriate numbers. This method of determining the criteria weights is logical; however, its accuracy is low. Therefore, it can be used only at the initial stage of analysis. Using more accurate and complicated methods still requires preliminary ranking of the criteria.

The weights *ω<sup>j</sup>* of the ranked indicators (criteria) can be determined by applying different methods (algorithms) that do not have theoretical advantages over one another. However, the general principle of all algorithms is the same: the most important criterion is assigned the highest weight. The values of the weights *ω<sup>j</sup>* must correspond to the criteria ranks (lower rank–higher weight). The sum of the weights *ω<sup>j</sup>* of all the criteria describing the research object must be equal to one, i.e., weights must be normalized.

It is convenient to transform the ranks assigned to the criteria by a group of experts into weights by using the new ARTIW-L and ARTIW-N methods, whose sequence of operations and calculations is given below (see Figure 1).The average rank *Rj*, representing the ranks assigned by all *i*-th experts (*i* = 1, 2, ... , *m*) is calculated for each *j*-th criterion (*j* = 1, 2, . . . , *n*) by the formula:

$$
\overline{R}\_{\dot{j}} = \frac{\sum\_{i=1}^{m} R\_{ij}}{m}.\tag{1}
$$

The more important the criterion, the smaller its average rank *Rj*. In practice, it is more convenient to use the estimates of the criteria significance, whose numerical values show higher importance. For this purpose, the normalized weights of criteria *j*, expressing relative importance, are used.

Significances (weights) of the evaluation criteria of an object can be determined in the process of their normalization (setting their sum equal to one) by transforming the average ranks into weights (the ARTIW method). This method was first proposed in 2011 [63]; however, at that time it was not called ARTIW. A relative weight *ω<sup>j</sup>* of the criterion is calculated as follows:

$$
\omega\_{\dot{j}} = \frac{(n+1) - R\_{\dot{j}}}{\sum\_{j=1}^{n} \overline{R}\_{\dot{j}}} \, \tag{2}
$$

where *n* is the number of criteria describing the quality of the considered object, *Rj* is the average *j*-th criterion rank calculated by Equation (1).

**Figure 1.** The algorithm of ARTIW-L and ARTIW-N methods of the average rank transformation into weight used for calculating normalized subjective weights of the criteria describing the research object.

The normalized weights *ω<sup>j</sup>* of the *j*-th criteria calculated according to formula (2) have a linear inverse correlation (functional) relationship with the rank averages (average ranks) *Rj* of these criteria calculated according to Formula (1). Therefore, this method is called the average rank transformation into weight-linear (ARTIW-L).

The normalized weights *ω <sup>j</sup>* of the criteria can be calculated by using another method of transforming rank averages into weights. The criterion weights calculated according to Formulas (3) and (4) are related to criteria rank averages *Rj* by a non-linear inverse correlation (functional) dependence. Therefore, this method is called the average rank transformation into weight-non-linear (ARTIW-N).

Using the ARTIW-N method, the ratio of the min *Rj* of the most important criterion (with the lowest average ranks *Rj*) to the average of the ranks *Rj* of all other *j*-th criteria is initially calculated:

$$
u\_{\dot{j}} = \frac{\min\_{\dot{j}} \overline{R}\_{\dot{j}}}{\overline{R}\_{\dot{j}}},\tag{3}$$

After normalizing the values *uj* for each criterion, their subjective significances *ω j* are calculated:

$$
\omega'\_j = \frac{u\_j}{\sum\_{j=1}^n u\_j}.\tag{4}
$$

Neither of these two methods (ARTIW-L and ARTIW-N) can be considered more accurate than the other, and neither of them can be looked at as the reference method. The average of the weights *ω<sup>j</sup>* calculated for each criterion by these two methods can be considered as the result of the task calculation:

$$
\varpi\_j = \frac{\omega\_j + \omega'\_j}{2}.\tag{5}
$$

The consistency of expert group estimates is determined by the concordance coefficient *W.* The concordance coefficient *W* in the absence of tied ranks is expressed in terms of the relationship between the obtained sum *S* and the respective largest sum *S*max [58]:

$$\mathcal{W} = \frac{12\mathcal{S}}{m^2 n (n^2 - 1)} = \frac{12\mathcal{S}}{m^2 (n^3 - n)}.\tag{6}$$

When the estimates provided by the experts or respondents are in agreement, the Kendall coefficient of concordance, *W*, is about one. When the estimates differ considerably, the value of *W* is close to zero.

The deviations of the ranks *Rij* of each criterion from the sum of squares of the average rank can be calculated as follows:

$$\mathcal{S} = \sum\_{j=1}^{n} \left[ \sum\_{i=1}^{m} R\_{ij} - \frac{1}{2} m(n+1) \right]^2,\tag{7}$$

where *n* is the number of criteria (*j* = 1, 2, ..., *n*), *m* is the number of experts (respondents) (*i* = 1, 2, ..., *m*).

The random value for *S* was calculated by Equation (7), adding the squared values of all the criteria given in parentheses.

The concordance coefficient may be used in practice when its ultimate value, showing when expert estimates can be considered consistent, has been found. Kendall [57] has shown that if the number of criteria is *n* > 7, the significance of the concordance coefficient *W* can be determined using Pearson's chi-squared test statistic *χ*2. The random value

$$\chi^2 = \mathcal{W}m(n-1) = \frac{12S}{mn(n+1)},\tag{8}$$

is distributed according to *<sup>χ</sup>*<sup>2</sup> distribution, with the degree of freedom *<sup>ν</sup> = n* − 1.

When the number of the compared criteria *n* ranges from 3 to 7, the distribution *χ*<sup>2</sup> cannot be used in all cases because sometimes the critical value of *χ*<sup>2</sup> *<sup>v</sup>*,*<sup>α</sup>* may be larger than the calculated value (even though the consistency of the estimates is still sufficiently high). In this case, the probability tables of the concordance coefficient or the tables of critical values *S* (with 3 ≤ *n* ≤ 7) can be used [64].

The smallest value of the concordance coefficient *W*min allowing the authors to consider that the estimates of *m* experts of the quality of the research object based on *n* criteria, with the assigned (required) significance level *α* and degree of freedom *ν = n* − 1, are consistent, can be calculated as follows [63]:

$$W\_{\min} = \frac{\chi^2\_{\nu,\mathfrak{a}}}{m(n-1)},\tag{9}$$

where *χ*<sup>2</sup> *<sup>v</sup>*,*<sup>α</sup>* is the critical Pearson's statistic found in the table [65], assuming the degree of freedom *ν = n* − 1 and the significance level *α*.

The quality of the research object is evaluated by the additive mathematical model used for calculating its comprehensive quality index, which allows for describing the quality of the object by a single number. It also allows for comparing it with the quality of other similar objects, and the coefficients *ω<sup>j</sup>* of the normalized criteria weights (rather than the average criteria ranks *Rj*, which cannot show how one criterion is more important that another) are used.

The weights of the criteria describing the research object (the selection of rail transport rather than road transport by passengers) can be calculated by using a very popular but complicated approach referred to as the analytic hierarchy process (AHP) offered by T. L. Saaty [54–56,66]. Passengers are not highly qualified experts and, therefore, can hardly fill in a pairwise comparison matrix properly, particularly if the number of the criteria compared is large. This number may be more than nine (e.g., fifteen) criteria. In the study [67], passengers completed pairwise comparison matrices with 32 criteria, 22 of which were rejected because their C.R. (consistency ratio) was greater than 0.1. Only 10 matrices were applicable for the study on the quality of passenger transport by train. Therefore, it is not rational to apply the AHP method in passenger interviews. Not every passenger can complete the pairwise comparison matrix properly. The AHP method can only be applied to interview highly qualified experts.

The objective weights of the criteria and sub-criteria can be calculated by using the entropy method [61,68] as well as the new IDOCRIW method [52], which combines (integrates) the entropy and the criterion impact loss (CILOS) methods.

#### **4. The Structure of the Hierarchy Model, the Questionnaire, and the Respondents**

The famous American writer Mark Twain wrote that "Travel is fatal to prejudice, bigotry, and narrow-mindedness". On the other hand, people become tired when traveling and, therefore, the choice of an appropriate mode of transport is very important. Now, there is a wide choice of modes of travel, which include pedestrian traveling, cycling and traveling by automobile, by bus as well as by rail, air or water transport. A passenger decides which mode of transport is most safe and comfortable for travel. The criteria determining the choice of a particular mode of transport can be identified when a set of the evaluation criteria is defined and a certain number of passengers are surveyed. The passengers, who chose a particular mode of transport (e.g., rail transport) as an alternative to another mode of transport, assign the ranks to the considered criteria. All the criteria describing a particular mode of transport have some advantages over the criteria describing another means of transport.

The significance of hierarchically unstructured criteria or sub-criteria is identified using a two-level model (Figure 2a). In a three-level hierarchy model, which is used in multiple criteria decision-making, the goal of the study is given first, then the criteria are presented, and, finally, sub-criteria are provided [69–75] (see Figure 2b). In this work, the inverse (not classical) hierarchy model (see Figure 2c) was used for determining the ranks of the criteria and their weights. Level 1 of the model presents the goal, Level 2 the factors and sub-criteria, and Level 3 provides a group of factors and criteria. First, the average ranks and global weights of particular sub-criteria were calculated without their division into groups (Figure 2a). Then, they were grouped into three groups, and the reduced weights of the criteria groups were calculated, considering the fact that each group had a different number of criteria (Figure 2c).

The study was based on a survey of passengers traveling from Vilnius (Lithuania) to Moscow (Russia) and back to Vilnius. There is a regular rail and road service between the capital of Lithuania, one of the Baltic states (and a member-state of the EU) and the capital of Russia (Moscow). Therefore, passengers can choose between the two modes of transport in covering a distance of 944 km between these cities.

A set of criteria (sub-criteria) was defined to determine their influence on passengers' choice to travel by train rather than by bus. For this purpose, passengers had to rank the considered criteria according to their importance for their choice of this mode of transport. The following sub-criteria were included in the questionnaire presented to the passengers (Figure 2a):


A questionnaire for ranking the sub-criteria by using the method of correlation was prepared by the authors. It was also translated into Russian language. An anonymous survey was carried out, with 52 questionnaires presented to passengers on the Vilnius– Moscow–Vilnius train. About 48% of the trip, which lasts for 14 h and 05 min (944 km), took place during the night. Respondent characteristics are presented in Table 1.

The same passengers completed questionnaires and assessed the sub-criteria that determine the choice to travel by train as an alternative to aircraft. The results of this research were published in the article [11].


**Table 1.** Details of 52 respondents who gave the judgement on ranks.

The number of respondents (52) was three times that of the criteria (sub-criteria) (15). Therefore, it was sufficient because *m* ≥ *n*. A description of 15 sub-criteria was presented in the questionnaire, and the respondents assigned different ranks to them (all the ranks had different assignable values).

When applying expert research methods to assess the significance of criteria, there is a problem of determining the required (necessary) minimum number of experts. In practice, the mathematically unsound provision (principle) that the number of experts must be equal to or greater than the number of criteria is often observed. There is another common position that is often applied in practice, which maintains that the amount of data required for studies *m* ≥ 30 is also not substantiated, because in some cases the number m is sufficient (if the group range is small), while in other cases it is too small. The credibility of expert group assessments depends on the level of knowledge of individual experts and their number. Having assumed that the experts are accurate assessors, it can be stated that as their number increases, the reliability of the expertise of the whole group of experts

(average of the opinion estimate) also increases. The minimum number of experts to be interviewed *m*min can be calculated according to the sample size formula [69]:

$$m\_{\rm min} = \frac{t^2 \sigma\_{R\_j}^2}{\Delta\_j^2},\tag{10}$$

where *t* is the value of *t* (Students) distribution, which depends on the probability taken to assess the importance of the criterion in deciding to go by train as an alternative to the bus. When the probability *P* = 95% (significance level *α* = 0.05 for one-sided test), *t* = 1.96; *σRj* —standard deviation of the ranks *Rij* of the evaluated *j*-th criterion; Δ*<sup>j</sup>* is the absolute error of passengers (respondents) rank values *j*-th criterion, indicating the accuracy of the survey results.

**Figure 2.** Calculating the weights of criteria determining the choice of passengers to travel by train as an alternative mode of transport to travel by bus using: (**a**)—non-hierarchical model; (**b**)—a classical (direct) hierarchy model; (**c**)—an inverse hierarchy model.

The absolute error of the survey shows how much the average of the ranks *Rj* calculated for the *j*-th criterion of *m* surveyed passengers may differ from the average of the population set *Rjp* that would be determined by surveying all passengers. Due to the limited sample size *m*, *Rj* always differs from *Rjp* by no more than plus or minus Δ*j*. This difference is greater the smaller the m and the larger *σRj* .

By interviewing *m* passengers and calculating the standard deviation *σRj* of the ranks of *j*-th criterion with the 95% probability recommended in practice, the absolute error Δ*<sup>j</sup>* of the *j*-th criterion value can be determined from formula (10) and compared with the permissible value (if any).

#### **5. Calculating the Average Rank, the Consistency of Expert Estimates and the Criteria Weights**

All 15 sub-criteria presented in the questionnaire, which determined the choice of passengers to travel by train rather than by bus, were divided into three groups and named key criteria (Figure 2b,c). The group of safety and environmental protection included two sub-criteria (A and C), the economy group embraced five sub-criteria (B, D, H, L, M) and the ride comfort group consisted of eight sub-criteria (E, F, G, I, J, K, N, O).

The ranks of the significance of sub-criteria, determining the choice by passengers of a trip by train rather than traveling by bus, were used for calculating the average values of the ranks *Rj*, the concordance coefficient *W*, Pearson's chi-square statistic *χ*<sup>2</sup> and sub-criteria weights *ω<sup>j</sup>* and *ω <sup>j</sup>* (Table 2). The following table also presents the mean values *ω<sup>j</sup>* of the sub-criteria weights *ω<sup>j</sup>* and *ω <sup>j</sup>* and the standard deviations of the ranks *Rij*.

The total of sub-criteria ranks was 15 ∑ *j*=1 *Rj* = 6240, while the sum of average ranks *Rj* 15

of all *j* sub-criteria was ∑ *j*=1 *Rj* = 120.0. The average value *R* = 416 of sub-criteria ranks

was calculated by Equation (6) or *R* = 6240/15 = 416. The sum of squared deviations *S* = 147,172 (Equation (7)). The concordance coefficient *W* = 0.194, showing the consistency of the estimates of respondents (52 passengers), was calculated by Equation (6).

Based on the data from the passengers' survey and using Equation (11), *χ*<sup>2</sup> = 141.5 was obtained. The critical value *χ*<sup>2</sup> *<sup>α</sup>*,*<sup>ν</sup>* taken from the table of chi-squared distribution with *ν* = 15 − 1 = 14 degrees of freedom and the significance level *α* = 0.01 was equal to 29.1413. The empirical value *χ*<sup>2</sup> = 141.5 was 4.8 times the critical value *χ*<sup>2</sup> *<sup>v</sup>*,*<sup>α</sup>* = 29.1, which allowed the researchers to assume that the respondents' estimates were consistent.

The smallest value of the concordance coefficient *W*min, with the significance level *α* = 0.01 and the degree of freedom *ν* = *n* − 1 = 15 − 1 = 14, allowing the authors to assume that the respondents' estimates were consistent, was calculated by Equation (9). The smallest value of the concordance coefficient *W*min = 0.0400 corresponded to only about one-fifth of the calculated concordance coefficient *W* = 0.194.

The estimates of 52 passengers that took part in the survey were in agreement (or consistent) because the calculated concordance coefficient was equal to 0.194, while the value of Pearson's chi-squared statistic, equal to 141.5, was considerably larger than the critical value of 29.14, corresponding to degrees of freedom of 14 and a significance level of 0.01. The smallest concordance coefficient still allowing the estimates of all respondents to be considered consistent was equal to 0.0400, which was equivalent to only one-fifth of 0.194. It was hardly possible to expect very high consistency of the respondents' estimates because of their highly different experiences, wishes, habits and means.

A bar diagram of the calculated average ranks *Rj* of the 15 sub-criteria determining the passengers' choice of traveling by train rather than by bus is given in Figure 3.

By applying the new ARTIW-L and ARTIW-N methods, the passengers' reasons for selecting a trip by train rather than a trip by bus, which were described by criteria (subcriteria) and their weights *ωj*, *ω <sup>j</sup>*, and *ωj*, were determined. The calculation data for sub-criteria beginning from the most important (E) to the least important (D) ones are shown in Figure 4.

The calculated average ranks *Rj* (Figure 3) and global weights *ω<sup>j</sup>* and *ω <sup>j</sup>* (Figure 4) of sub-criteria, determining the choice by the respondents to travel by train rather than by bus, show that sub-criteria E, M and H were much more important than sub-criteria N, C and D. This implies that their priority order should be as follows:

Moreover, there should be an inverse straight-line relationship between the average ranks *Rj* and the global weights *ω<sup>j</sup>* calculated by the ARTIW-L method. The determination coefficient of the regression equation of 15 sub-criteria, R2 = 1, and coefficient of correlation *r* = −1, show that this is a functional linear relationship *ω<sup>j</sup>* = −0.0083*Rj* + 0.1333. The correlation between *Rj* and the weights *ω <sup>j</sup>* calculated using the ARTIW-N method is non-linear (Figure 5a). Ranks and weights are related by the quadratic regression equation *ω <sup>j</sup>* = 0.0009 *<sup>R</sup>*<sup>2</sup> *<sup>j</sup>* − 0.0234 *Rj* + 0.1932, coefficient of determination, which is R2 = 0.9955.



 to

**Figure 5.** Correlation between the means of the sub-criteria ranks and the normalized weights of these sub-criteria calculated using: (**a**)—ARTIW-L and ARTIW-N methods; (**b**)—average values of two methods.

The data obtained in the performed study show that the estimates of the significance (importance) of sub-criteria determining the passenger choice of traveling by train rather than by bus were consistent (in agreement) and reflect their general opinion, shown by the averages *ω<sup>j</sup>* of criteria weights *ω<sup>j</sup>* and *ω <sup>j</sup>* calculated according to Equation (5) and their correlation with the rank averages *Rj* (Figure 5b).

With reference to the principle of determining the sample size, the absolute error Δ*<sup>j</sup>* for determining the average rank *Rj* of each *j*-th sub-criterion was calculated from formula (10) (Table 3). For the calculation of Δ*j*, the values of the standard deviation *σRj* of each sub-criterion were taken from Table 2 when 52 passengers were interviewed and *α* = 0.05 significance level.

**Table 3.** The absolute error Δ*<sup>j</sup>* in determining the averages of the 15 sub-criteria ranks *Rj* of the group of 52 surveyed passengers.


The results (Table 3) show that the mean ranks *Rj* of the 15 sub-criteria were identified with an absolute error Δ*j*, the sample range of which was Δ*j*max − Δ*j*min = Δ*jA* − Δ*jK* = 1.20 − 0.88 = 0.32, and the mean value Δ*<sup>j</sup>* = 1.06. By taking Δ*<sup>j</sup>* for each sub-criterion or the mean value Δ*<sup>j</sup>* of 15 sub-criteria, it was possible to calculate the confidence interval *Rj* ±Δ*<sup>j</sup>* of ranks average *Rj*, with the population mean rank *Rjp* of 95% confidence. For example, the population mean rank *RjA* of the sub-criterion A with the highest rank variation was in the range 6.751 ± 1.20, i.e., between 5.551 and 7.951. The range would decrease if more than 52 passengers were interviewed. We believe that Δ*<sup>j</sup>* = 1.06 is close to one, so the number of passengers *m* = 52 who completed the survey was sufficient and allowed us to reliably assess the factors determining the choice of passengers to travel by train as an alternative to bus.

#### **6. Calculating the Overall Weights of Key Criteria**

It was rather difficult to determine the weights *ω<sup>j</sup>* of 15 sub-criteria by using the AHP approach because the optimal number of criteria for this method was seven plus or minus two [55,66].

When the global weights *ω<sup>j</sup>* of all 15 sub-criteria were determined and sub-criteria were divided into three groups (key criteria) as shown in Figure 2c, the overall weights of the key criteria *<sup>ω</sup><sup>g</sup>* (see Table 4) were calculated as follows using the inverse hierarchy for assessment main criteria importance (IHAMCI) method:

$$
\tilde{\omega}\_{\mathcal{S}} = \frac{\sum\_{j=1}^{k} \omega\_{j}/k}{\sum\_{j=1}^{\mathcal{S}} \sum\_{j=1}^{k} \omega\_{j}/k} \tag{11}
$$

where *ω<sup>j</sup>* is the global weight of *j*-th sub-criterion, *k* is the number of sub-criteria in the group (*j* = 1, 2, ..., *k*), *g* is the number of groups of criteria describing the research object (*b* = 1, 2, ..., *g*).

The overall weight *<sup>ω</sup>Sa* of two sub-criteria, A + C, included in the travel safety and environmental protection group (key criterion), which was calculated by Equation (11), was the smallest:

$$
\tilde{\omega}\_{\rm Sa} = \frac{0.1205; 2}{0.0602 + 0.0684 + 0.0672} = 0.3077.
$$

The overall weight *<sup>ω</sup>Ec* of five sub-criteria, B + D + H + L + M, included in the key criterion describing economy, was calculated in the same way, as follows:

$$
\tilde{\omega}\_{\rm Ec} = \frac{0.3419; 5}{0.0602 + 0.0684 + 0.0672} = 0.3492.
$$

The overall weight *<sup>ω</sup>Co* of the key criterion of eight sub-criteria, E + F + G + I +J + K+N + O, describing ride comfort, was the largest:

$$
\tilde{\omega}\_{\rm Co} = \frac{0.5376 : 8}{0.0602 + 0.0684 + 0.0672} = 0.3431... 
$$

The results of calculation show that the choice of passengers to travel by rail transport rather than by road transport (a bus) was determined by the criteria describing ride comfort and economy (about 35%) as well as safety and environmental protection (only about 31%).

The obtained global weights *<sup>ω</sup><sup>j</sup>* of sub-criteria and the overall weights *<sup>ω</sup><sup>g</sup>* of criteria divided into three groups, as shown in Figure 6, allowed the authors to identify the criteria determining the choice of passengers to travel by train (as an alternative to travel by bus). The obtained data can be used by companies engaged in passenger transportation by rail to enhance the quality of services provided by this more environmentally friendly mode of land transport so that it will have a competitive edge over rival modes of transportation.

**Figure 6.** The calculated subjective weights of sub-criteria and the key criteria.


*Appl. Sci.* **2021** , *11*, 11503

> **Table 4.** The calculated

 global, local and overall weights of the criteria

determining

 the choice of

passengers

 to travel by train.

The sub-criteria and key criteria weights calculated in this study, which show why passengers choose train travel as an alternative to the bus, are not absolutely accurate and constant. When interviewing citizens of other countries traveling on international trains, the significance of the sub-criteria and key criteria may differ. Their values can be influenced by the economic development of the country, passenger habits, the reliability factors relating to different modes of transport, and risks.

The most important part of the study consisted of the original sub-criteria system and their weighting methodology, applying the new ARTIW-L and ARTIW-N methods, as well as the method of inverse hierarchy for assessment main criteria importance (IHAMCI). These methods can be used by other researchers to calculate normalized subjective weights when ranking the results of an expert or respondent survey.

#### **7. Discussion and Conclusions**

Passengers usually make a decision to travel by a particular mode of transport by evaluating the criteria describing it, whose weights reflecting their significance seem to be different to them. The selection of a particular (alternative) mode of transport is based on the significances (subjective weights) of the considered criteria, which can be determined by using expert evaluation methods. The average value of the estimates given by a considerable number of passengers (respondents) in ranking the criteria can be used as a result, presenting public opinion about a particular transport mode chosen for a particular route if their opinions (judgments) are considered.

In the present work, the reasons behind the passengers' choice to travel by train rather than by bus were identified by considering fifteen sub-criteria. The significances of these sub-criteria for choosing travel by train were evaluated by 52 respondents (passengers on the Vilnius–Moscow–Vilnius train) against a 15 rank scale. The subjective total normalized weights of sub-criteria based on the new ARTIW-L and ARTIW-N methods allowed the authors to rank them by order of priority (preference). A functional or close and strong correlation between the means of the sub-criteria ranks and the normalized weights of the sub-criteria calculated from them indicated that the ARTIW-L and ARTIW-N methods were satisfactory to assess the significance of the sub-criteria. The mean of the sub-criteria weights calculated by these two methods was taken as the final significance of the subcriteria. The sub-criteria determining the passengers' choice of traveling by train rather than by bus (as an alternative mode of transport) included ride comfort (the availability of berths in passenger cars for sleeping and relaxation) (0.0939), the selected time of travel (0.0896) and a negligible effect of weather conditions on it (0.0764). The sub-criteria describing rail transport as safer than road transport, as well as such advantages as a smaller number of stops and delays on the way (0.0760), the availability of WCs and places for smoking (0.0699), a lower probability that passengers would disturb each other (0.0682), and simpler border control (0.0679) were less important for passengers. However, even less important for them were sub-criteria describing the priority given to rail vehicles when crossing motor roads (0.0667) and freedom of movement (0.0647), the availability of a dining car (0.0607) and unpleasant slight rocking and vibration (0.0603). The least important sub-criteria for passengers included the possibility to order food or newspapers and magazines to the compartment (0.0520), lower environmental pollution by rail transport (0.0445), and sometimes cheaper railway tickets (0.0332). The ratio of the largest total weight of subcriteria (0.0939) to their smallest total weight (0.0332), which was equal to 3.36, showed that the significance of particular sub-criteria was different for passengers choosing a particular mode of transport.

The fifteen considered sub-criteria were divided into three groups using the inverse hierarchy for assessment main criteria importance (IHAMCI) method suggested by the second author. The normalized overall weights for these groups (criteria) were calculated. The overall weight of two sub-criteria describing safety and environmental protection was equal to 0.3077, while the overall weight of five sub-criteria forming the 'economy' group

was 0.3492, and the overall weight of eight sub-criteria referring to ride comfort was the largest, at 0.3431. The overall weights of the key criteria of any sub-criteria group were calculated using a new IHAMCI method, which allowed assessment of different numbers of sub-criteria in a key criterion.

The minimum number of experts or respondents to be consulted in order to obtain reliable results was calculated using the sample size principle. After interviewing 52 passengers, the values of the standard deviations of the sub-criteria ranks were identified, and then used to determine the absolute error of the mean of the ranks of each sub-criterion. The results showed that with 95% probability, the sub-criteria sample rank averages differed from the population averages by no more than 1.20–0.88 rank (on average 1.06 rank). This difference was close to unity and indicated that the significance of the sub-criteria was determined with sufficient accuracy.

The decision-makers in the countries engaged in passenger transportation by the considered international train should primarily improve the services described by the criteria that most strongly influence decisions by passengers to choose a trip by train rather than by bus. A company providing passenger transportation by any particular mode of transport can win the competition in this area only if its provided services are of the highest quality and satisfy the ever-growing demands of passengers.

**Author Contributions:** Conceptualization, H.S.; methodology, H.S.; investigation, H.S. and L.M.; writing–original draft preparation, H.S.; writing–review and editing, L.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Reviewer Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Analyzed datasets are available from the corresponding author.

**Conflicts of Interest:** The authors declare that there is no conflict of interest regarding the publication of this article.

#### **References**


## *Article* **Semi-Markov Model of the System of Repairs and Preventive Replacements by Age of City Buses**

**Klaudiusz Migawa, Sylwester Borowski \*, Andrzej Neubauer and Agnieszka Sołtysiak**

Faculty of Mechanical Engineering, UTP University of Science and Technology, Al. Prof. S. Kaliskiego 7, 85-796 Bydgoszcz, Poland; klaudiusz.migawa@pbs.edu.pl (K.M.); aneub@umk.pl (A.N.); agnieszka.soltysiak@pbs.edu.pl (A.S.)

**\*** Correspondence: sylwester.borowski@pbs.edu.pl

**Abstract:** The paper presents a mathematical model of the system of repairs and preventive replacements by age of city buses. The mathematical model was developed using the theory of semi-Markov processes. In the model developed, four types of city bus renewal processes are considered and three types of corrective repairs and preventive replacement. Corrective repairs are considered in two types: minimal repairs (repairs carried out by the Technical Service units) and perfect repairs (repairs carried out at the stations of the Service Station). The models of restoration systems that use semi-Markov processes in which minimal repairs, perfect repairs, and preventive replacements by age, have been examined in the literature to a limited extent. The system under consideration is analysed from the point of view of two criteria: profit per time unit and availability of city buses to carry out the assigned transport tasks. Conditions of criterion functions' extremum (maximum) existence were formulated for the adopted assumptions. The considerations presented in the paper are illustrated by exemplary results of calculations.

**Keywords:** city buses; semi-Markov processes; preventive maintenance; corrective maintenance; age-replacement; minimal repair; perfect repair; profit per time unit; availability

#### **1. Introduction**

The basic task of transport systems is to transport people, animals and goods. The transport tasks, due to their particular specification, are carried out by different types of transport means. A very important branch of the transport system is the road passenger transport, which can generally be divided into international transport, interurban transport and urban transport. Urban transport systems usually operate in medium and large cities, in suburban areas and in industrialised areas. One of the important types of urban transport is the urban bus transport. The task of this type of transport system is to reliably and punctually carry out transport tasks along defined routes in accordance with an accepted timetable of courses [1] The basic characteristics for evaluating the functioning of this type of transport system are economic efficiency characteristics (e.g., profit per unit time) and operational-technical efficiency characteristics (e.g., readiness of the city buses to carry out the assigned transport tasks) [2,3]. Any kind of disruption in the implementation of the assigned transport tasks, including downtime caused by damages to the means of transport (city buses) causes a decrease in the reliability and readiness of technical facilities and generates additional costs (losses). These losses arise as a result of corrective maintenance (CM) conducted after damage, losses caused by fines for non-performance of transport services, and costs related to the maintenance of reserve buses whose task is to replace damaged buses. One of the ways of ensuring the correct and efficient fulfilment of transport tasks in urban bus operation systems is the implementation of preventive maintenance (PM). The implementation of these activities consists of planning the timing and scope of preventive maintenance in such a way as to keep their costs lower than the costs of repairs after the damage. For this reason, the determination of optimal times for

**Citation:** Migawa, K.; Borowski, S.; Neubauer, A.; Sołtysiak, A. Semi-Markov Model of the System of Repairs and Preventive Replacements by Age of City Buses. *Appl. Sci.* **2021**, *11*, 10411. https://doi.org/ 10.3390/app112110411

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti and Dimitrios S. Paraforos

Received: 5 October 2021 Accepted: 1 November 2021 Published: 5 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

preventive maintenance of technical objects is an important problem of planning strategies in the systems of exploitation of the means of transport [4].

During the operation of technical objects, their elements are subject to wear and tear processes and the impact of external factors, which causes damage to these objects. The resulting damages are the cause of lowering the effectiveness of functioning of the analysed systems. In order to ensure an appropriate level of reliability of technical facilities, different types of strategies are applied in the subsystems for ensuring serviceability. These activities are divided into two types: preventive maintenance and corrective maintenance. In practice, corrective maintenance is carried out in two variants. Firstly, the perfect repair (PR), which makes the system "As-Good-As-New" (AGAN), and secondly the minimal repair (MR), which makes the system "As-Bad-As-Old" (ABAO) are conducted. Normally, CM corrective repair costs and times are higher than PM preventive repair costs. This is due to the fact that, in general, CM activities require prior identification of the damage and high skills of personnel (diagnosticians and mechanics). In addition, there are the costs associated with the unplanned downtime of technical facilities caused by the damage to them. From this, it follows that it is possible to plan preventive actions (the scope and frequency of preventive repairs and replacements) in such a way as to ensure the required level of readiness of technical facilities and to reduce system maintenance costs. This requires the development and application of rational preventive repair and replacement strategies. For these reasons, the development of various preventive action strategies with the application of optimal decision models to reduce the system maintenance costs and the risk of adverse events is an important research topic in reliability engineering.

For the first time, the concept of minimal repair can be found in Morse's study [5]. In this study, a repair model is considered in which the criterion function is the monthly revenue generated for the technical facility under consideration. However, this model was developed on the basis of the queueing theory and not the reliability theory. However, the concept of minimum repairs in relation to the reliability theory was introduced by Barlow and Hunter in their paper [6]. In this paper, the model of periodic replacements and minimum repairs is considered, in which it is assumed that after each minimum repair the damaged technical system is restored only to the same failure condition as before the damage. In a formal way, the concept of minimum repairs was defined by Nakagawa and Kowada in their paper [7]. In the paper [8] Brown and Proschan also consider the issue of minimal repair. In this paper, the authors assume that when a technical object is damaged, a perfect repair is performed with probability p, while a minimal repair is performed with probability q = 1 − p. A modified version of such a model was proposed by Fontenot and Proschan [9]. In the model developed, the object is replaced with a new one after time T and either a perfect repair or a minimal repair is performed with probabilities p and q, respectively, for intermittent failures.

In the literature one can find descriptions of models of systems with minimum repairs, which have been developed with the use of various methods and mathematical tools. An overview of the used modelling methods and the construction of criterion functions in models of minimum repairs with preventive maintenance can be found, for example, in the papers [10,11]. The papers classify and discuss models of maintenance strategies for technical objects developed for both finite and infinite time horizons, in which the criterion functions are total costs, unit costs, reliability and readiness. Most of the models presented in the literature have been developed on the basis of renewal theory, while less frequently with the use of stochastic processes, including Markov and semi-Markov process models. For example, in the paper [12] the criterion functions cost per unit time and system availability were determined on the basis of a semi-Markov model in an infinite time horizon, and in the paper [13] the model of the imperfect maintenance system was developed using the theory of Markov processes, and the readiness function is a criterion for optimisation.

In practice, the effectiveness of the realised repair is between AGAN and ABAO repair and it concerns the so-called imperfect maintenance/repairs. The methods concerning

preventive repairs and replacements using the repair mechanism with an imperfect maintenance model with the (p, q) rule are extensively discussed in the paper [10]. The paper [14] presents the problem of imperfect repair with periodic preventive replacement. Models of preventive replacements by age are presented in papers [15,16]. In this type of model, it was assumed that the probabilities p and q depend on the age of the technical object at the time of failure, and that a thorough repair restores the technical object to the reliability state as for a new object, while a minimal repair restores the technical object to the reliability state just before failure. In the paper [17] it was shown that the PM policy limiting the possibility of failure can be more cost-effective than the PM policy implemented according to age, while the authors of the papers [18,19] analysed the imperfect repair system model with a delayed time concept.

Models of imperfect maintenance systems, using different age replacement policies that take into account different types of repairs after failure and their cost structures, have been presented in a number of papers. In [20], the authors consider replacement policies depending on the age of the system and the minimisation of repair costs. The authors of the papers [21,22] consider the age replacement policy of system subject to shocks in their models. Other papers consider policies that assume randomness of model parameters, e.g., papers [23,24] assume random repair costs. On the other hand, the paper [25] describes an age replacement policy with Bayesian imperfect repair model, in which the probability of an exact repair is a random variable with a specified distribution. A similar approach is adopted in the paper [26], where the optimal age replacement policy is determined in the case of minimising the cost per unit time. The results were obtained both for an infinite time horizon and for a single replacement cycle. The sequential imperfect preventive maintenance model for city buses is presented in the paper [27]. In this model, the optimal decision-making concerning the efficiency of maintenance of city buses is realized on the basis of the evaluation of the difference between the actual and expected increments of the intensity of damage.

The results presented in this paper are a continuation of the considerations presented in papers [4,28,29]. Similarly, to this paper, the results presented in these papers were obtained based on the study of the semi-Markov models. In the paper [4] a 4-state model of replacements according to the age of technical objects with a guarantee was analysed, in which the criterion function is the cost of preventive replacement determined per unit of time, while in the paper [28] a multi-state model of exploitation decisions was developed, in which corrective repairs (after a damage) and preventive replacements are carried out, and the profit per unit of time was used as the criterion function. A direct continuation of the conducted research are the results presented in the paper [29], in which a 4-state model of a service system with minimum repair was considered. The model was developed with the application of the theory of semi-Markov processes, and the theoretical considerations were illustrated with numerical examples on the basis of the assumed sample data. In this paper, on the other hand, the 5-state semi-Markov model of the system of preventive repairs and replacements according to the age of city buses is considered. The aim of the paper is to develop and study a semi-Markov model of preventive repairs and replacements according to the age of city buses using two criterion functions: profit per unit time and the coefficient of readiness of city buses to carry out the assigned transport tasks, and also to formulate sufficient conditions for the existence of the maximum of these functions. The results of studies of the model, developed on the basis of real data, can be the basis for decision-making in the analysed system of exploitation of urban transport means. In the developed mathematical model, the basis for the construction of the criterion function is the limit theorem for semi-Markov processes [30,31]. In the developed model, four types of implemented urban bus renewal processes are considered. The conditions for the existence of the extremum (maximum) of the criterion functions were formulated for the assumed assumptions. The theoretical considerations presented in the paper are illustrated by the results of calculations. The calculation examples have been developed on the basis of operational data obtained from a real urban bus operation system. In the first example, for

the estimated input data, the profit per unit time and the value of the readiness factor are maximised. In the second example, the analysed criterion functions are investigated, in case when the number (frequency) of preventive replacements will be higher than in the first example. In both the calculation examples, it is assumed that the time to failure of the technical object (city bus) has a Weibull distribution.

#### **2. Description of the States of the Model of the Repair and Preventive Replacement System**

In the study the object of research is the bus exploitation system of public transport, in which technical objects (city buses) can stay in one of the five states of the considered model of the renewal system (repairs and preventive replacements):

State 1—the state of operational availability of the technical object—it is the state, when the technical object (a city bus) is fully fit and supplied and can perform the assigned transport task in accordance with the accepted assumptions, i.e., in accordance with the accepted plan and schedule of realization of the courses (in the paper this state is considered as the state of failure-free work of the technical object);

State 2—the state of repair by the Technical Rescue Service without the loss of the course—this is the state, when a damaged technical object (a city bus) is repaired during the realization of the transport task (on the route), which is carried out by the Technical Rescue Service; this type of repair is carried out in a "short" time interval or in the gaps between the successive realizations of the transport task (between the next courses), i.e., it does not cause loss of the course in accordance with the accepted plan and schedule of transport realization—it is assumed that this condition does not cause any disturbance (break) in realization of assigned transport tasks (in the paper this condition is considered as a condition of minimal repair of a technical object);

State 3—the state of the repair by the Technical Rescue Unit with the loss of the course—it is the state, when a damaged technical object (a city bus) is repaired during the realization of the transport task (on the route), which is realized by the Technical Rescue Unit; this type of repair is realized in a "longer" time interval than the repair realized in the state 2 of the model, thus the repair causes the loss of the course in accordance with the plan and schedule of transportation realization—it is assumed that this state causes a disruption (break) in the realization of the assigned transportation tasks (in the paper this state is considered as the state of the minimal repair of the technical object);

State 4—state of repair at the Service Station—it is the state, when a damaged technical object (a city bus) is subject to repair at the specialized service and repair stands of the Service Station assigned for this purpose—it is assumed that this state causes a disturbance (break) in the realization of the assigned transport tasks (in the paper, this state is considered as the state of perfect repair of a technical object);

State 5—the state of preventive replacement—it is the state, when the technical object (a city bus) is subject to preventive maintenance after a specific hourly mileage and in accordance with the adopted operational strategy (according to the resour) in the technical object the elements and sub-assemblies are replaced—it is assumed, that this state does not cause disruption (break) in the realization of the assigned transport tasks (in the work this state is considered as the state of preventive age-replacement.).

Figure 1 shows a directed graph of the mapping of the state changes of the renewal system model (preventive repairs and replacements) of the considered technical objects (city buses).

**Figure 1.** Directed graph representation of the state changes of a city bus renewal system model with state spaces S = {1, 2, 3, 4, 5}.

#### **3. Determination of the Criterion Function**

For the directed graph shown in Figure 1, a mathematical model was built assuming that it is a stochastic process X(t). The mathematical model was developed using the theory of semi-Markov processes [30,31]. The paper considers a 5-state semi-Markov model of renewals (preventive repairs and replacements) with a state space S = {1, 2, 3, 4, 5}. If X(t) = i, then the technical object under consideration at time t is in state i.

In the case when the transition probabilities between the states of the modelled process are known, it is possible to determine the Markov chain inserted in the semi-Markov process. The transition matrix of the Markov chain for the model under consideration has the form

$$\mathbf{P} = \begin{bmatrix} 0 & \mathbf{p}\_{12} & \mathbf{p}\_{13} & \mathbf{p}\_{14} & \mathbf{p}\_{15} \\ \mathbf{p}\_{21} & 0 & 0 & 0 & 0 \\ \mathbf{p}\_{31} & 0 & 0 & \mathbf{p}\_{34} & 0 \\ \mathbf{p}\_{41} & 0 & 0 & 0 & 0 \\ \mathbf{p}\_{51} & 0 & 0 & 0 & 0 \end{bmatrix} \tag{1}$$

where

pij, i, j = 1, 2, 3, 4, 5—probability of transition from state i to state j.

To determine the limiting probabilities for a Markov chain, the following matrix system must be solved:

$$\mathbf{P}^{\mathrm{T}} \cdot \mathrm{II} = \mathrm{II} \begin{bmatrix} \mathbf{0} & \mathbf{p}\_{21} & \mathbf{p}\_{31} & \mathbf{p}\_{41} & \mathbf{p}\_{51} \\ \mathbf{p}\_{12} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} \\ \mathbf{p}\_{13} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} \\ \mathbf{p}\_{14} & \mathbf{0} & \mathbf{p}\_{34} & \mathbf{0} & \mathbf{0} \\ \mathbf{p}\_{15} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} \end{bmatrix} \cdot \begin{bmatrix} \pi\_{1} \\ \pi\_{2} \\ \pi\_{3} \\ \pi\_{4} \\ \pi\_{5} \end{bmatrix} = \begin{bmatrix} \pi\_{1} \\ \pi\_{2} \\ \pi\_{3} \\ \pi\_{4} \\ \pi\_{5} \end{bmatrix} \tag{2}$$

where

πi, i = 1, 2, 3, 4, 5—the limiting probability of a Markov chain inserted in a semimarkov process.

The Matrix System (2) can be replaced by a system of linear Equation (4) in which, in order to obtain an unambiguous solution, a normalization condition is introduced (3)

$$\sum\_{\mathbf{i}} \pi\_{\mathbf{i}} = 1 \tag{3}$$

then the system of linear Equation (4) takes the form

$$\begin{cases} \pi\_1 + \pi\_2 + \pi\_3 + \pi\_4 + \pi\_5 = 1 \\ \mathbf{p}\_{12} \cdot \pi\_1 = \pi\_2 \\ \mathbf{p}\_{13} \cdot \pi\_1 = \pi\_3 \\ \mathbf{p}\_{14} \cdot \pi\_1 + \mathbf{p}\_{34} \cdot \pi\_3 = \pi\_4 \\ \mathbf{p}\_{15} \cdot \pi\_1 = \pi\_5 \end{cases} \tag{4}$$

As a result of solving the system of linear Equation (4), formulas representing the limiting probabilities for the analysed Markov chain were obtained:

$$
\pi\_1 = \frac{1}{\mathbf{m}} \\
\pi\_2 = \frac{\mathbf{p}\_{12}}{\mathbf{m}} \\
\pi\_3 = \frac{\mathbf{p}\_{13}}{\mathbf{m}} \\
\pi\_4 = \frac{\mathbf{p}\_{13} \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14}}{\mathbf{m}} \\
\pi\_5 = \frac{\mathbf{p}\_{15}}{\mathbf{m}} \tag{5}
$$

where

$$\mathbf{m} = \mathbf{1} + \mathbf{p}\_{12} \cdot \mathbf{p}\_{13} \cdot (\mathbf{1} + \mathbf{p}\_{34}) + \mathbf{p}\_{14} + \mathbf{p}\_{15}$$

In this paper, a semi-Markov model of renewals (preventive corrective maintenance and age-replacement) is analysed. A 5-state semi-Markov process X(t) with state space S = {1, 2, 3, 4, 5} is considered. By zi, i = 1, 2, 3, 4, 5 means the profit (cost) per unit (per time unit) for the state i. It is assumed in the paper, that z1 > 0, zi < 0 for i = 2, 3, 4, 5. This means that if the technical object is in state 1, then a profit is generated, while if the technical object is in state i = 2, 3, 4, 5, then a cost (loss) is generated. In the paper [28] it was proved, that the summary profit (loss) per unit of time generated in the system is expressed by the formula

$$\mathbf{Z} = \frac{\sum\_{\mathbf{i}} \pi\_{\mathbf{i}} \cdot \mathbf{E} \mathbf{T}\_{\mathbf{i}} \cdot \mathbf{z}\_{\mathbf{i}}}{\sum\_{\mathbf{i}} \pi\_{\mathbf{i}} \cdot \mathbf{E} \mathbf{T}\_{\mathbf{i}}} \tag{6}$$

where

ETi, i = 1, 2, 3, 4, 5–average time spent in state i.

A technical object is subject to renewal at age T or when it is damaged, whatever comes first. By T1(x) we define the time to replace or damage (repair) a technical object. The variable T1(x) is defined as follows

$$\mathbf{T\_1(x)} = \begin{cases} & \mathbf{T\_{1\prime}} & \mathbf{gdy} & \mathbf{T\_1 < x} \\ & \mathbf{x\_{\prime}} & \mathbf{gdy} & \mathbf{T\_1 \ge x} \end{cases} \tag{7}$$

It is assumed that after time x, if the technical object has not failed, it transitions to the preventive replacement state. The process of changing states i = 1, 2, 3, 4, 5, given preventive replacement after time x is a new semi-Markov process with a matrix P(x) of transition probabilities of the Markov chain inserted into the semi-Markov process. With respect to the matrix P(1) shown above, only the first row of the matrix P changes, then the matrix P(x) takes the form

$$\mathbf{P}(\mathbf{x}) = \begin{bmatrix} 0 & \mathbf{p}\_{12}(\mathbf{x}) & \mathbf{p}\_{13}(\mathbf{x}) & \mathbf{p}\_{14}(\mathbf{x}) & \mathbf{p}\_{15}(\mathbf{x}) \\ \mathbf{p}\_{21} & 0 & 0 & 0 & 0 \\ \mathbf{p}\_{31} & 0 & 0 & \mathbf{p}\_{34} & 0 \\ \mathbf{p}\_{41} & 0 & 0 & 0 & 0 \\ \mathbf{p}\_{51} & 0 & 0 & 0 & 0 \end{bmatrix} \tag{8}$$

while the limiting probabilities determined for a Markov chain (determined analogously to Formulae (5) can be presented as:

$$\begin{aligned} \pi\_1(\mathbf{x}) &= \frac{1}{\mathbf{n}} \\ \pi\_2(\mathbf{x}) &= \frac{\mathbf{p}\_{12}(\mathbf{x})}{\mathbf{n}} \\ \pi\_3(\mathbf{x}) &= \frac{\mathbf{p}\_{13}(\mathbf{x})}{\mathbf{n}} \\ \pi\_4(\mathbf{x}) &= \frac{\mathbf{p}\_{13}(\mathbf{x}) \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14}(\mathbf{x})}{\mathbf{n}} \\ \pi\_5(\mathbf{x}) &= \frac{\mathbf{p}\_{15}(\mathbf{x})}{\mathbf{n}} \end{aligned} \tag{9}$$

where

$$\mathbf{n} = \mathbf{1} + \mathbf{p}\_{12}(\mathbf{x}) \cdot \mathbf{p}\_{13}(\mathbf{x}) \cdot \left(\mathbf{1} + \mathbf{p}\_{34}\right) + \mathbf{p}\_{14}(\mathbf{x}) + \mathbf{p}\_{15}(\mathbf{x}),$$

Based on the paper [28], the Criterion Function (6) is of the form:

$$\mathbf{Z} = \mathbf{g}(\mathbf{x}) = \frac{\pi\_1(\mathbf{x}) \cdot \mathbf{ET\_1(x)} \cdot \mathbf{z\_1} + \pi\_2(\mathbf{x}) \cdot \mathbf{ET\_2} \cdot \mathbf{z\_2} + \pi\_3(\mathbf{x}) \cdot \mathbf{ET\_3} \cdot \mathbf{z\_3} + \pi\_4(\mathbf{x}) \cdot \mathbf{ET\_4} \cdot \mathbf{z\_4} + \pi\_5(\mathbf{x}) \cdot \mathbf{ET\_5} \cdot \mathbf{z\_5}}{\pi\_1(\mathbf{x}) \cdot \mathbf{ET\_1(x)} + \pi\_2(\mathbf{x}) \cdot \mathbf{ET\_2} + \pi\_3(\mathbf{x}) \cdot \mathbf{ET\_3} + \pi\_4(\mathbf{x}) \cdot \mathbf{ET\_4} + \pi\_5(\mathbf{x}) \cdot \mathbf{ET\_5}} \tag{10}$$
 where

ET1(x)–average value of time spent in state 1, calculated from the formula [30,31]

$$\begin{aligned} \text{ET}\_1(\mathbf{x}) &= \int \text{dF}\_1(\mathbf{T}) + \mathbf{x} \mathbf{P}\{\mathbf{T}\_1 \ge \mathbf{x}\} \\ &\quad \text{ET}\_1(\mathbf{x}) = \int \mathbf{R}\_1(\mathbf{T}) \mathbf{d} \mathbf{T} \end{aligned} \tag{11}$$

ET2, ET3, ET4 and ET5,—average values of times spent in states 2, 3, 4 and 5, respectively. In particular, based on the paper [28], it can be written:

$$\begin{aligned} \mathbf{p}\_{12}(\mathbf{x}) &= \mathbf{p}\_{12} \cdot \mathbf{F}\_{12}(\mathbf{x}) \\ \mathbf{p}\_{13}(\mathbf{x}) &= \mathbf{p}\_{13} \cdot \mathbf{F}\_{13}(\mathbf{x}) \\ \mathbf{p}\_{14}(\mathbf{x}) &= \mathbf{p}\_{14} \cdot \mathbf{F}\_{14}(\mathbf{x}) \\ \mathbf{p}\_{15}(\mathbf{x}) &= \mathbf{p}\_{15} \cdot \mathbf{F}\_{12}(\mathbf{x}) + \mathbf{R}\_1(\mathbf{x}) \end{aligned} \tag{12}$$

where:

F1j(x), j = 2, 3, 4, 5—conditional distributions of the time spent in state 1, provided that the next state is state j, defined as follows [30,31]

$$\mathbf{F}\_{\overline{\mathbb{I}}}(\mathbf{t}) = \mathbf{P}\{\boldsymbol{\tau}\_{\mathbf{k}+1} - \boldsymbol{\tau}\_{\mathbf{k}} < \mathbf{t} | \mathbf{X}(\boldsymbol{\tau}\_{\mathbf{k}+1}) = \mathbf{j}, \mathbf{X}(\boldsymbol{\tau}\_{\mathbf{k}}) = \mathbf{i}\}, \text{dla i,j} = 1, 2, 3, 4, 5 \tag{13}$$

R1(x) = 1−F1(x)—random variable reliability function T1.

Additionally, in order to simplify further considerations, it has been assumed that the following equations are true

$$\mathbf{F\_{12}(x) = F\_{13}(x) = F\_{14}(x) = F\_{15}(x) = F\_1(x)}\tag{14}$$

Considering the above, the Criterion Function (10) is expressed by the formula

$$\mathbf{g(x)} = \frac{\mathbf{ET}\_1(\mathbf{x}) \cdot \mathbf{z}\_1 + \mathbf{p}\_{12} \cdot \mathbf{F}\_1(\mathbf{x}) \cdot \mathbf{ET}\_2 \cdot \mathbf{z}\_2 + \mathbf{p}\_{13} \cdot \mathbf{F}\_1(\mathbf{x}) \cdot \mathbf{ET}\_3 \cdot \mathbf{z}\_3 + \left[ \left( \mathbf{p}\_{13} \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14} \right) \cdot \mathbf{F}\_1(\mathbf{x}) \right] \cdot \mathbf{ET}\_4 \cdot \mathbf{z}\_4 + \cdots}{\mathbf{ET}\_1(\mathbf{x}) + \mathbf{p}\_{12} \cdot \mathbf{F}\_1(\mathbf{x}) \cdot \mathbf{ET}\_2 + \mathbf{p}\_{13} \cdot \mathbf{F}\_1(\mathbf{x}) \cdot \mathbf{ET}\_3 \cdot \mathbf{z}\_4 + \mathbf{p}\_{14} \cdot \mathbf{z}\_4} $$

$$\frac{+ \left[ 1 - \left( \mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14} \right) \cdot \mathbf{F}\_1(\mathbf{x}) \right] \cdot \mathbf{ET}\_5 \cdot \mathbf{z}\_5}{+ \left[ 1 - \left( \mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14} \right) \cdot \mathbf{F}\_1(\mathbf{x}) \right] \cdot \mathbf{ET}\_6}$$

or after regrouping, it can be represented as

$$\mathbf{g}(\mathbf{x}) = \frac{\mathbf{z}\_1 \cdot \mathbf{E} \mathbf{T}\_1(\mathbf{x}) + \left[\mathbf{p}\_{12} \cdot \mathbf{E} \mathbf{T}\_2 \cdot \mathbf{z}\_2 + \mathbf{p}\_{13} \cdot \mathbf{E} \mathbf{T}\_3 \cdot \mathbf{z}\_3 + \left(\mathbf{p}\_{13} \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_4 \cdot \mathbf{z}\_4 + \dots\right]}{\mathbf{E} \mathbf{T}\_1(\mathbf{x}) + \left[\mathbf{p}\_{12} \cdot \mathbf{E} \mathbf{T}\_2 + \mathbf{p}\_{13} \cdot \mathbf{E} \mathbf{T}\_3 + \left(\mathbf{p}\_{13} \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_4 + \dots\right]}$$

$$\frac{-\left(\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_5 \cdot \mathbf{z}\_3}{-\left(\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_5 \right] \cdot \mathbf{F}\_1(\mathbf{x}) + \mathbf{E} \mathbf{T}\_6}$$

Representing the numerator and the denominator of the criterion function as follows:

$$\mathbf{L}(\mathbf{x}) = \mathbf{A}\_1 \cdot \mathbf{E} \mathbf{T}\_1(\mathbf{x}) + \mathbf{B}\_1 \cdot \mathbf{F}\_1(\mathbf{x}) + \mathbf{C}\_1$$

$$\mathbf{M}(\mathbf{x}) = \mathbf{A} \cdot \mathbf{E} \mathbf{T}\_1(\mathbf{x}) + \mathbf{B} \cdot \mathbf{F}\_1(\mathbf{x}) + \mathbf{C}$$

the Criterion Function (10) can be represented analogously as

$$\mathbf{g}(\mathbf{x}) = \frac{\mathbf{A}\_1 \cdot \mathbf{E} \mathbf{T}\_1(\mathbf{x}) + \mathbf{B}\_1 \cdot \mathbf{F}\_1(\mathbf{x}) + \mathbf{C}\_1}{\mathbf{A} \cdot \mathbf{E} \mathbf{T}\_1(\mathbf{x}) + \mathbf{B} \cdot \mathbf{F}\_1(\mathbf{x}) + \mathbf{C}}.$$

where:

$$\mathbf{A}\_{1} = \mathbf{z}\_{1}$$

$$\mathbf{B}\_{1} = \mathbf{p}\_{12} \cdot \mathbf{E} \mathbf{T}\_{2} \cdot \mathbf{z}\_{2} + \mathbf{p}\_{13} \cdot \mathbf{E} \mathbf{T}\_{3} \cdot \mathbf{z}\_{3} + \left(\mathbf{p}\_{13} \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_{4} \cdot \mathbf{z}\_{4} - \left(\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_{5} \cdot \mathbf{z}\_{5}$$

$$\mathbf{C}\_{1} = \mathbf{E} \mathbf{T}\_{5} \cdot \mathbf{z}\_{5}$$

$$\mathbf{A} = 1$$

$$\mathbf{B} = \mathbf{p}\_{12} \cdot \mathbf{E} \mathbf{T}\_{2} + \mathbf{p}\_{13} \cdot \mathbf{E} \mathbf{T}\_{3} + \left(\mathbf{p}\_{13} \cdot \mathbf{p}\_{34} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_{4} - \left(\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}\right) \cdot \mathbf{E} \mathbf{T}\_{5}$$

$$\mathbf{C} = \mathbf{E} \mathbf{T}\_{5}$$

#### **4. Conditions for the Existence of a Maximum of the Criterion Function**

*4.1. Maximum of the Criterion Function—General Analysis*

The conditions for the existence of an extremum (maximum) of the Criterion Function (10) will be formulated depending on the parameters of the developed semi-Markov model of the renewal system (repairs and preventive replacements), i.e., the elements of the matrix of probabilities of changes in the states of the model P = [pij], i, j = 1, 2, 3, 4, 5, the average staying times in the states of the model ETi, i = 1, 2, 3, 4, 5, and the unit profits (costs) generated in the states of the model zi, i = 1, 2, 3, 4, 5. The considered parameters are the input data of the model, and their values depend on the category and type of the analysed technical objects, the adopted operation strategy, and specific operating conditions in which the repair and preventive replacement processes are carried out.

The assumptions regarding the values of the parameters of the examined system are defined below. The adopted assumptions must take into account the real relations occurring between the parameters characterizing the implemented processes of repair of the damaged technical objects and the preventive replacement:


The above assumptions do not define the relationship between the state of repair performed by the Technical Service unit with loss of the course (state 3) and the state of preventive replacement (state 5). In the analysed system it is very difficult to unambiguously define the relation between the average values of ET3 and ET5 times and unit costs z3 and z5. However, based on the results of research on other systems of exploitation of this class of technical objects (means of transport), an additional assumption can be made regarding the unit costs generated in states 3 and 5:

• Z12: z3 < z5; means that the unit cost generated in the state 3 (the state of repair performed by the Technical Service unit with the loss of the course) is lower than the unit cost generated in the state 5 (the state of preventive replacement).

In the subsequent part of the paper, the following coefficients have been introduced to formulate the conditions for the existence of the extremum (maximum) of the criterion function (10):

$$\mathbf{x} = \mathbf{A}\mathbf{B}\_1 - \mathbf{A}\_1\mathbf{B} = \mathbf{B}\_1 - \mathbf{z}\_1\mathbf{B}$$

$$\boldsymbol{\mathfrak{B}} = \mathbf{A}\_1\mathbf{C} - \mathbf{A}\mathbf{C}\_1 = \mathbf{z}\_1\mathbf{C} - \mathbf{C}\_1$$

$$\boldsymbol{\mathfrak{y}} = \mathbf{B}\_1\mathbf{C} - \mathbf{B}\mathbf{C}\_1$$

where:

$$\mathbf{A}\_{1} = \mathbf{z}\_{1}$$

$$\mathbf{B}\_{1} = \mathbf{p}\_{12}\mathbf{E}\mathbf{T}\_{2}\mathbf{z}\_{2} + \mathbf{p}\_{13}\mathbf{E}\mathbf{T}\_{3}\mathbf{z}\_{3} + \left(\mathbf{p}\_{13}\mathbf{p}\_{34} + \mathbf{p}\_{14}\right)\mathbf{E}\mathbf{T}\_{4}\mathbf{z}\_{4} - \left(\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}\right)\mathbf{E}\mathbf{T}\_{5}\mathbf{z}\_{5}$$

$$\mathbf{C}\_{1} = \mathbf{E}\mathbf{T}\_{5}\mathbf{z}\_{5}$$

$$\mathbf{A} = 1$$

$$\mathbf{B} = \mathbf{p}\_{12}\mathbf{E}\mathbf{T}\_{2} + \mathbf{p}\_{13}\mathbf{E}\mathbf{T}\_{3} + \left(\mathbf{p}\_{13}\mathbf{p}\_{34} + \mathbf{p}\_{14}\right)\mathbf{E}\mathbf{T}\_{4} - \left(\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}\right)\mathbf{E}\mathbf{T}\_{5}$$

$$\mathbf{B} = \mathbf{p}\_{12}\mathbf{E}\mathbf{T}\_{2} + \mathbf{p}\_{12}\mathbf{p}\_{23}\mathbf{E}\mathbf{T}\_{3} - \mathbf{p}\_{12}\mathbf{E}\mathbf{T}\_{4}$$

$$\mathbf{C} = \mathbf{E}\mathbf{T}\_{5}$$

The coefficients α, β, and γ play an important role in formulating sufficient conditions for the existence of the extremes of a criterion function. For this purpose, the following formulates the sufficient conditions for the inequalities to be true α < 0, β > 0, γ < 0. In regard to the above:

• the coefficient α is determined by the formula

$$\begin{aligned} \mathbf{x} &= \mathbf{p}\_{12} \mathbf{E} \mathbf{T}\_2 (\mathbf{z}\_2 - \mathbf{z}\_1) + \mathbf{p}\_{13} \mathbf{E} \mathbf{T}\_3 (\mathbf{z}\_3 - \mathbf{z}\_1) + (\mathbf{p}\_{13} \mathbf{p}\_{34} + \mathbf{p}\_{14}) \mathbf{E} \mathbf{T}\_4 (\mathbf{z}\_4 - \mathbf{z}\_1) \\ &+ (\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}) \mathbf{E} \mathbf{T}\_5 (\mathbf{z}\_1 - \mathbf{z}\_5) \end{aligned} \tag{15}$$

The inequality α < 0 is equivalent to the inequality

$$\begin{array}{ll} \left(\mathbf{p}\_{13} + \mathbf{p}\_{34} + \mathbf{p}\_{14}\right) & \text{ET}\_{4} \\ & > \mathbf{p}\_{12} \text{ET}\_{2}(\mathbf{z}\_{2} - \mathbf{z}\_{1})/(\mathbf{z}\_{1} - \mathbf{z}\_{4}) + \mathbf{p}\_{13} \text{ET}\_{3}(\mathbf{z}\_{3} - \mathbf{z}\_{1})/(\mathbf{z}\_{1} - \mathbf{z}\_{4}) \\ & + (\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14}) \text{ET}\_{5}(\mathbf{z}\_{1} - \mathbf{z}\_{5})/(\mathbf{z}\_{1} - \mathbf{z}\_{4}) \end{array} \tag{16}$$

• the coefficient β is determined by the formula

$$\beta = \text{ET}\_{\mathsf{5}}(\mathbf{z}\_{1} - \mathbf{z}\_{\mathsf{5}}) \tag{17}$$

Based on the assumption Z1 made, it follows that β > 0. • the coefficient γ is determined by the formula

$$\gamma = \left[ \mathbf{p}\_{12} \mathbf{E} \mathbf{T}\_2 (\mathbf{z}\_2 - \mathbf{z}\_5) + \mathbf{p}\_{13} \mathbf{E} \mathbf{T}\_3 (\mathbf{z}\_3 - \mathbf{z}\_5) + \left( \mathbf{p}\_{13} \mathbf{p}\_{34} + \mathbf{p}\_{14} \right) \mathbf{E} \mathbf{T}\_4 (\mathbf{z}\_4 - \mathbf{z}\_5) \right] \mathbf{E} \mathbf{T}\_5 \tag{18}$$

The inequality γ < 0 is equivalent to the inequality

$$\left(\mathbf{p}\_{13} + \mathbf{p}\_{34} + \mathbf{p}\_{14}\right) \mathbf{E} \mathbf{T}\_4 > \mathbf{p}\_{12} \mathbf{E} \mathbf{T}\_2 (\mathbf{z}\_2 - \mathbf{z}\_5) / (\mathbf{z}\_5 - \mathbf{z}\_4) + \mathbf{p}\_{13} \mathbf{E} \mathbf{T}\_3 (\mathbf{z}\_3 - \mathbf{z}\_5) / (\mathbf{z}\_5 - \mathbf{z}\_4) \tag{19}$$

In practice, it is difficult to unambiguously determine what is the relation between the state of repair by the Technical Service unit with course loss (state 3) and the state of preventive replacement (state 5), i.e., the relation between the average values of the times ET3 and ET5 and between the unit costs z3 and z5 is unknown. With respect to the coefficient γ, inequality (19) must be considered similarly to inequality (16) regarding the coefficient α. In this case, the right-hand sides of inequalities (16) and (19) are denoted by δ<sup>1</sup> and δ2, respectively. Let δ = max{δ1, δ2}, then the condition (p13 p34 + p14) ET4 > δ and formulas (15), (16), (18) and (19) imply the inequalities α < 0, γ < 0. From this, the following conclusion can be made:

**Conclusion 1.** *If p34 > [δ/(ET4–p14)/p13], then the inequalities α < 0, γ < 0 are true.*

#### *4.2. The Maximum of the Criterion Function—The Distributions of the Random Variable of the IFR Classes and MTFR*

In this subsection of the paper, sufficient conditions for the existence of the maximum of the Criterion Function (10) will be formulated in two cases. In the first case, the considerations apply to a class of random variable distributions for which the time to failure of a technical object T1 is assumed to be a random variable with increasing damage intensity function λ1(t), i.e., T1 ∈ IFR (Increasing Failure Rate). In the second case, a class of random variable distributions with a unimodal failure intensity function, the T1 ∈ MTFR (Mean Time to Failure or Repair), is considered. The results of testing the properties of the random variable distributions of the MTFR class are presented in detail in the papers [32–34].

**Conclusion 2.** *If T1* ∈ *IFR, λ1(t) is differentiable, α < 0, β > 0, γ < 0, β + γ f1(0+) > 0, λ1(*∞*) α ET1 + β–α < 0, then the criterion function g(x) reaches its maximum value.*

**Proof of Conclusion 2.** The derivative of the criterion function g(x) has the form

$$\mathbf{g}'(\mathbf{x}) = \left\{ \mathbf{a}[\mathbf{f}\_1(\mathbf{x}) \mathbf{E} \mathbf{T}\_1(\mathbf{x}) - \mathbf{R}\_1(\mathbf{x}) \mathbf{F}\_1(\mathbf{x})] + \beta \mathbf{R}\_1(\mathbf{x}) + \gamma \mathbf{f}\_1(\mathbf{x}) \right\} / \mathbf{M}^2(\mathbf{x})$$

where M(x) is the denominator of the criterion function g(x).

It is known, that if the time to failure T1 belongs to the class of distributions of the random variable MTFR, then the equality H(x) = λ1(x) ET1(x) − F1(x) ≥ 0 for x ≥ 0 is true. The class of distributions of the random variable MTFR has been studied in the papers [33,34]. Some lifetime distributions with unimodal damage intensity function belong to the class of MTFR [33,34]. From the fact, that the derivative H'(x) = λ1(x) ET1(x), it follows that if the damage intensity function λ1(t) increases, the function H(x) also increases. The class of distributions of a random variable with a non-decreasing damage intensity function (IFR) is contained in the MTFR class. The sign of the derivative is the same as the sign of the function

$$\mathbf{h}(\mathbf{x}) = \mathbf{a}[\lambda\_1(\mathbf{x}) \mathbf{E} \mathbf{T}\_1(\mathbf{x}) - \mathbf{F}\_1(\mathbf{x})] + \mathfrak{B} + \gamma \lambda\_1(\mathbf{x})$$

It is known, that H(0+) = 0, hence h(0+) = β + γ f1(0+) > 0. From the fact that α < 0, β > 0, γ < 0 and the function H(x) increases, it follows that the function h(x) decreases from the value h(0+) = β + γ f1(0+) > 0 to the value h(∞) = λ1(∞) α ET1 + β − α < 0. It follows from this that the derivative of g'(x) changes sign exactly once from "+" to "−". Hence, it is concluded that the criterion function g(x) reaches exactly one maximum.

If λ1(∞) = ∞, then the following conditions suffice for the existence of the maximum of the criterion function g(x): T1 ∈ IFR, differentiability of λ1(t), α < 0, β > 0, γ < 0, β + γ f1(0+) > 0. An example of such a distribution of a random variable is a Weibull distribution with an increasing damage intensity function.

From the conclusions 1 and 2, the following sufficient condition for the existence of a maximum of the criterion function follows:

**Conclusion 3.** *If T1* ∈ *IFR, λ1(t) is differentiable, β + γ λ1(0+) > 0, p34 > [δ/(ET4 –p14)/p13], λ1(*∞*) α ET1 + β–α < 0, then the criterion function g(x) reaches the maximum value.*

A sufficient condition for the existence of the asymptotic maximum of the availability factor is formulated below. To obtain the availability factor from the criterion function g(x), it suffices to assume the following conditions: z1 = 1, z2 = z3 = z4 = z5 = 0. After considering these conditions in formula (10), B1 = 0, C1 = 0 are obtained. Hence, based on (6), (7) and (9) for α, β, γ, one can calculate:

$$\mathbf{x} = -\mathbf{B} = -\mathbf{p}\_{12}\mathbf{E}\mathbf{T}\_2 - \mathbf{p}\_{13}\mathbf{E}\mathbf{T}\_3 - (\mathbf{p}\_{13} + \mathbf{p}\_{34} + \mathbf{p}\_{14})\mathbf{E}\mathbf{T}\_4 + (\mathbf{p}\_{12} + \mathbf{p}\_{13} + \mathbf{p}\_{14})\mathbf{E}\mathbf{T}\_5$$

$$\mathbf{B} = \mathbf{C} = \mathbf{E}\mathbf{T}\_5 \text{ : } \mathbf{B} > 0$$

$$\mathbf{y} = 0$$

The inequality α < 0 is equivalent to the inequality:

$$\mathbf{p\_{34}} > \left[\mathbf{p\_{12}}(\text{ET}\_5 - \text{ET}\_2) + \mathbf{p\_{13}}(\text{ET}\_5 - \text{ET}\_3) + \mathbf{p\_{14}}(\text{ET}\_5 - \text{ET}\_4)\right] / \mathbf{p\_{13}}\text{ET}\_4 \,\mathrm{d}$$

Given that β > 0 and γ = 0, one can now formulate a sufficient condition for the existence of a maximum of the availability factor.

**Conclusion 4.** *If T1* ∈ *IFR, λ1(t) is differentiable, λ1(*∞*) α ET1 + β–α < 0, p34 > [p12 (ET5– ET2)+p13 (ET5–ET3)+p14 (ET5–ET4)]/p13 ET4, then the availability factor reaches exactly one maximum value.*

**Proof of Conclusion 4.** For the availability factor, the derivative of the criterion function has the form:

$$\mathbf{g}\prime(\mathbf{x}) = \left\{ \mathbf{a}[\mathbf{f}\_1(\mathbf{x})\mathbf{E}\mathbf{T}\_1(\mathbf{x}) - \mathbf{R}\_1(\mathbf{x})\mathbf{F}\_1(\mathbf{x})] + \beta \mathbf{R}\_1(\mathbf{x}) \right\} / \mathbf{M}^2(\mathbf{x})$$

where M(x) is the denominator of the criterion function g(x).

If the damage intensity function λ1(t) is increasing, then the function H(x) is increasing. The sign of the derivative is the same as the sign of the function:

$$\mathbf{h}(\mathbf{x}) = \alpha[\lambda\_1(\mathbf{x})\mathbf{E}\mathbf{T}\_1(\mathbf{x}) - \mathbf{F}\_1(\mathbf{x})] + \mathfrak{B}$$

It is known that H(0+) = 0, hence h(0+) = β > 0.

From the fact that p34 > [p12 (ET5–ET2)+p13 (ET5–ET3)+p14 (ET5–ET4)]/p13 ET4, it follows that α < 0 and the function h(x) decreases from h(0+) = β > 0 to h(∞). If h(∞) = λ1(∞) α ET1 + β − α < 0, it means that the derivative g'(x) changes sign exactly once from "+" to "−". Hence, it is concluded, that the availability factor g(x) reaches exactly one maximum.

If λ1(∞) = ∞, then the following conditions are sufficient for the existence of a maximum of the availability factor:

$$\mathbf{T\_1} \in \mathbf{i}\mathbf{R}\mathbf{F}\_\prime \ \mathbf{p\_{34}} > \left[\mathbf{p\_{12}}(\mathbf{E}\mathbf{T\_5} - \mathbf{E}\mathbf{T\_2}) + \mathbf{p\_{13}}(\mathbf{E}\mathbf{T\_5} - \mathbf{E}\mathbf{T\_3}) + \mathbf{p\_{14}}(\mathbf{E}\mathbf{T\_5} - \mathbf{E}\mathbf{T\_4})\right] / \mathbf{p\_{13}}\mathbf{E}\mathbf{T\_4}$$

#### **5. Exemplary Calculation Results**

**Example 1.** *In Figure 2, the plots of the criterion function g(x) are shown when g(x) represents profit per unit time, and in Figure 3, when g(x) represents availability for transportation tasks*.

**Figure 2.** The graphs of the function g(x)—profit per unit time as a function of time to preventive maintenance x [h], determined for the Weibull distribution with the following parameter values: scale = 10 and shape = 9 (curve a), shape = 11.5 (curve b), shape = 14 (curve c), the values of the parameters of the distribution of the random variable ET1 were determined for the considered values of the service life of the tested city buses, respectively, 9.25, 9.5, 9.75 [h].

**Figure 3.** The graphs of the function g(x)—availability to perform transport tasks as a function of time to preventive maintenance x [h], determined for the Weibull distribution with the following parameter values: scale = 10 and shape = 9 (curve a), shape = 11.5 (curve b), shape = 14 (curve c), the values of the parameters of the distribution of the random variable ET1 were determined for the considered values of the service life of the tested city buses, respectively, 9.25, 9.5, 9.75 [h].

The calculations were conducted for the following data:

(1) values of the matrix of probabilities of changes of states of the model P:



**Example 2.** *Figures 4 and 5 show, respectively, the graphs of the criterion function g(x) in the case where g(x) represents profit per time unit and where g(x) represents availability to complete transportation tasks. The calculations were performed for the data of Example 1, assuming that the uptime (time to failure) ET1 has a Weibull distribution, for which the value of the scale* *parameter = 10 and the shape parameter = 11.5. The graphs show four cases: case d—when the number of preventive replacements is the same as in Example 1, and cases a, b, and c, when the number of preventive replacements is increased by 10%, 20%, and 30%, respectively, with respect to case d.*

**Figure 4.** The graphs of the function g(x)—profit per time unit as a function of time to preventive replacement x [h], determined when the number of preventive replacements is as in Example 1 (curve d) and when the number of preventive replacements is increased by 10%, 20%, and 30%, respectively (curves a, b, and c).

**Figure 5.** Graphs of the function g(x)—availability to perform transportation tasks as a function of time to preventive replacement x [h], determined when the number of preventive replacements is as in example 1 (curve d) and when the number of preventive replacements is increased by 10%, 20%, and 30%, respectively (curves a, b, and c).

Based on the analysis of the graphs presented in Figures 4 and 5, it can be seen that as the number of performed preventive exchanges increases (their frequency increases), the value of the criterion function g(x) increases, both in the case of profit per time unit and availability to perform transport tasks, and the maximum value of the function g(x) is reached for the increasingly smaller values of xmax (the optimum time value for preventive replacement).

#### **6. Conclusions**

The mathematical model presented in the article makes it possible to determine the optimum values of the preventive replacement time in such a way that the criterion functions (profit per unit time and readiness to carry out transport tasks) reached the maximum values. On the basis of the analysis of the results obtained, it can be noted that for the considered input data of the model, the increase in the value of the serviceability time of the examined city buses (increase in the value of the shape parameter of the Weibull distribution) causes an increase in the value of the considered criterion functions (both profit per time unit and readiness), while increasing the optimum time to preventive replacement (Figures 2 and 3, respectively). Decreasing the time to preventive replacement (increasing the frequency of preventive replacements) by 10, 20 and 30%, causes a significant increase in the values of the criterion functions: profit per unit time (from 8.9 to 15.5 [PLN/h]) and readiness to carry out transport tasks from 0.821 to above 0.86 (Figures 4 and 5, respectively). In the paper, the criterion functions are considered over an infinite time horizon. The formulation of stronger conditions requires the establishment of relations between the average stay times of a technical object and the unit profits (costs) in the states of repair by the Technical Emergency Service with the loss of course (state 3) and the preventive replacement (state 5). It has been proved that under general assumptions the criterion functions considered in the paper have exactly one extremum (maximum). On the basis of the conducted analysis, sufficient conditions were formulated for the existence of the maximum of these functions when the time to failure of a technical object is a random variable with an increasing damage intensity function. The assumptions adopted in the model and the formulated conditions define the relations between the input parameters of the developed model and verify the possibility of applying a specific set of input data for determining the optimum preventive replacement times (determining the maximum of the criterion functions). The presented research results constitute the next stage of works on modelling the exploitation systems of technical objects, in which preventive replacements by age are carried out. In the next stages, the models of preventive replacements will be developed for technical objects of other classes than the means of transport, e.g., for power equipment. These models will use both the criterion functions describing economic efficiency (e.g., profit per unit time), operational and technical efficiency (e.g., readiness) and safety (e.g., risk of loss). On this basis, it is planned to develop a comprehensive control method for the subsystem for ensuring the serviceability of technical objects using decision-making semi-Markov processes and non-deterministic methods for determining optimal (sub-optimal) solutions).

**Author Contributions:** Conceptualization, K.M. and S.B.; methodology, K.M. and A.S.; software, A.N.; validation, K.M., investigation, S.B. and A.S.; data curation, S.B. and A.S.; writing—original draft preparation, K.M., writing—review and editing, S.B.; visualization, A.S. and A.N.; supervision, K.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Long-Range Dependence and Multifractality of Ship Flow Sequences in Container Ports: A Comparison of Shanghai, Singapore, and Rotterdam**

**Chan-Juan Liu 1,\*, Jinran Wu 2,\*, Harshanie Lakshika Jayetileke <sup>3</sup> and Zhi-Hua Hu <sup>4</sup>**


**Abstract:** The prediction of ship traffic flow is an important fundamental preparation for layout and design of ports as well as management of ship navigation. However, until now, the temporal characteristics and accurate prediction of ship flow sequence in port are rarely studied. Therefore, in this study, we investigated the presence of long-range dependence in container ship flow sequences using the Multifractal Detrended Fluctuation Analysis (MF-DFA). We considered three representative container ports in the world—including Shanghai, Singapore, and Rotterdam container ports—as the study sample, from 1 January 2013 to 31 December 2017. Empirical results suggested that the ship flow sequences are deviated from normal distribution, and the sequences with different time scales exhibited varying degrees of long-range dependence. Furthermore, the ship flow sequences possessed a multifractal nature, where the larger the time scale of ship flow time series, the stronger the multifractal characteristics are. The weekly ship flow sequence in the port of Singapore owned the highest degree of multifractality. Furthermore, the multifractality presented in the ship flow sequences of container ports are due to the correlation properties as well as the probability density function of the ship flow sequences. The study outlines the importance of adopting these features for an accurate modeling and prediction for maritime ship flow series.

**Keywords:** container ship traffic flow; volatility; generalized Hurst exponents; long-range dependence; multifractality

#### **1. Introduction**

The analysis of the time series characteristics of port ship flow sequences and the accurate prediction of port ship flow can provide references for the port layout design and the management of ship navigation. Port congestion has been recognized as a serious problem in all large ports in the world, which has a significant effect on the shipping date, the transportation cost, economic loss of the owner of the goods, and even the development of ports [1,2]. Nevertheless, understanding the arrival laws and accurate prediction of the port ship traffic flow are two keys to solve this problem. Therefore, this paper aims to study the arrival laws of ship traffic flow in container ports based on the long-range correlation and multifractality, and then, to provide a reference and theoretical basis for effective modeling and prediction of port ship flows.

There is an abundance of literature on long-range dependence for time series data, such as biomedical data [3–5], stock returns [6–8], hydrology [9–11], and climatology [12–14]. However, only a few studies focus on traffic flow sequences [15,16] and the research on maritime traffic flow sequences are very limited. This is mainly due to the difficulty in obtaining data on port ship traffic flow in the maritime sector. However, in recent years,

**Citation:** Liu, C.-J.; Wu, J.; Jayetileke, H.L.; Hu, Z.-H. Long-Range Dependence and Multifractality of Ship Flow Sequences in Container Ports: A Comparison of Shanghai, Singapore, and Rotterdam. *Appl. Sci.* **2021**, *11*, 10378. https://doi.org/ 10.3390/app112110378

Academic Editor: Giovanni Randazzo

Received: 28 September 2021 Accepted: 29 October 2021 Published: 5 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

there are more and more relevant studies on the application of ship AIS data [17,18]. This makes it possible to study the time series of maritime ship flow.

Studies on maritime traffic flow, especially the ship flow sequences in the ports, are very important as they can provide basic decisions for the allocation of port operation infrastructures, rational port planning, and port investment. Furthermore, if the ship flow sequences in ports are long-range dependent and multifractal, then the traditional ships flow prediction models should revalue to incorporate this type of volatility. Unfortunately, there is no universally accepted theory to define the volatility of traffic flow sequence. In order to understand the irregular patterns of ship flow time series, especially for prediction, we need to know whether the maritime traffic flow system follows chaotic, random, or deterministic structural patterns. The complex pattern is the motivation behind the study of maritime ship flow series through the Multifractal Detrended Fluctuation Analysis (MF-DFA).

The contributions of this study are threefold: First, we present a descriptive statistics of the ship traffic flow time series. Second, we analyze the long-range dependence correlation characteristics of the ship traffic flow series using the Hurst exponent. Third, we determine the degree of multifractality of the ship traffic flow of the different ports through the Multifractal Detrended Fluctuation Analysis, and compare them based on the MF-DFA results. Our results suggested that the ship flow sequences at different time scales in the ports of Shanghai, Singapore, and Rotterdam showed different degrees of long-range dependency. Therefore, the ship flow prediction models should incorporate the long-range dependency in forecasting. In addition, the results indicated that the ship flow sequences in container ports are multifractal, where the degree of multifractality is much higher for the port of Singapore compared with Shanghai and Rotterdam.

The rest of the paper is organized as follows: Section 2 presents a brief review on the literature. Section 3 introduces the methodology used in this paper. Section 4 describes the sample data. Section 5 presents empirical results. Finally, Section 6 provides concluding observations based on the findings of the study.

#### **2. Literature Review**

In the last few decades, researchers discovered more characteristics of volatility in terms of long-range dependence and multifractality of the data in numerous fields including DNA sequences [3,4,19], climatology and hydrological time series [11,20,21], and stocks and other financial market data [6,7,22,23]. In most of these studies, the complex long-range correlations and multifractality behaviors of the time series were measured by the so called Hurst exponent, which was originally developed in hydrology for the practical matter of determining optimum dam sizing for the Nile river's volatile rain and drought conditions [22,24].

The long-range correlations can be captured using several methods, including Rescaled Range Analysis (R/S) [22], Detrended Fluctuation Analysis (DFA) [3], Wavelet analysis [25], Multifractal Detrended Fluctuation Analysis [25], and so on. Coronado et al. [26] compared various methods on Hurst exponent and pointed that DFA is superior to other methods since it is less influenced by the time series finite size than others. As a generalization of DFA, MF-DFA is a popular method for the nonstationary time series, which has been applied with great success in several areas of research. Several literatures have also demonstrated the possibility of detecting the multifractal properties in time series through MF-DFA method [9,27].

MF-DFA is a good method to study the characteristics of time series of stock market and complex traffic flow. For example, Mensi et al. [28] and Ali et al. [29] stated that MF-DFA is an acceptable choice to study comparative efficiency and the multifractality of stock markets. They found that the Islamic stock markets' adjustment to speculative activity is, in fact, higher than their conventional counterparts, and all stock market returns exhibited multifractal features. Besides, some scholars used MF-DFA to examine the highway traffic flow time series in Beijing, Shanghai and other places and discovered that the long-range

dependence behavior is ubiquitous in time series of road traffic flows. Moreover, the length of the time scale was significantly impacted on the multifractal characteristics of traffic flow sequences [30–32].

At present, a broad consensus has emerged that long-range dependence and multifractality are somewhat realistic phenomena in traffic flow series [15,33,34]. However, there is no research on the long-range correlations of the time series of maritime traffic flow. Therefore, motivated by the importance of temporal structure and long-range correlations for modeling and prediction of maritime ship flow series, we investigated the complex temporal structure and long-range correlation behaviors of the ship flow sequences of container ports from the multifractal perspective using the MF-DFA method. Evaluation of such results for container ship flow in different time scales and different container ports will facilitate the production of more insights on evolution dynamics of these ports and global trade.

#### **3. Methodology**

We analyzed the ship flow sequences of three representative container ports by the MF-DFA method, which is a generalization of DFA method [35,36]. To obtain the generalized Hurst exponent, we followed the five-step procedure introduced by [9,36] to measure the multifractality and nonstationary behavior of Brazilian rivers. Rego et al. [9] pointed out that the periodic components in the sequence should be removed at the first stage before beginning with the general procedure of Kantelhardt et al. [36].

For a record *x*(*i*), *i* = 1, 2, ..., *N*, where *N* denotes the length of the record, the MF-DFA consists of the following steps [35,36]:

Step 1: We first integrate the series and obtain the profile *y*(*j*),

$$y(j) = \sum\_{i=1}^{j} [\mathbf{x}\_i - \boldsymbol{\mu}]\_\prime \tag{1}$$

where *μ* is the mean value of the entire series.

Step 2: The integrated series *y*(*j*) is divided into boxes of equal length *s*.

Step 3: In each box of length *s*, we calculate a polynomial fitting of *y*(*j*), which represents the trend in that box. The shape of the polynomial trend is defined by the order *m*. A higher order *m* yield a more complex shape of the trend, but might lead to overfitting for a time series within small segment sizes. Therefore, in this study, we choose *m* equal to 2 as suggested by Ihlen (2012) [37]. The *y* coordinate of the fit line in each box is denoted by *ys*(*j*).

Step 4: The integrated series *y*(*j*) is detrended by subtracting the local trend *ys*(*j*) in each box of length *s*.

Step 5: For a given box size,

$$F\_{\boldsymbol{\eta}}(\boldsymbol{s}) = \left[ \frac{1}{N} \sum\_{j=1}^{N} |\boldsymbol{y}(j) - \boldsymbol{y}\_{\boldsymbol{s}}(j)|^{q} \right]^{\frac{1}{q}}.\tag{2}$$

If the series is long-range and the power-law is correlated for large values of the time scales, the fluctuation functions *Fq*(*s*) can be written as Equation (3):

$$F\_{\emptyset}(\mathbf{s}) \sim \mathbf{s}^{H(\boldsymbol{\eta})},\tag{3}$$

where *H*(*q*) is the generalized Hurst exponent. The generalized Hurst exponent *H*(*q*) can be obtained by observing the slope of the log–log plots of *Fq*(*s*) and scale *s* through the method of ordinary least squares (OLS). If *H*(*q*) is independent of *q*, the time series is monofractal, otherwise, it is multifractal. Table 1 shows the relationship between the long-range dependence of time series and the generalized Hurst exponent.


**Table 1.** The relationship between the long-range dependence of time series and the generalized Hurst exponent.

The value of the generalized Hurst exponent equal to 0.5 indicates an uncorrelated time series. A generalized Hurst exponent value larger than 0.5 indicates a positive long-range dependence and persistence of the series. In other words, the larger the *H* value is, the stronger the persistence is. A generalized Hurst exponent value smaller than 0.5 indicates a negative long-range dependence and antipersistence of the series. This means the closer the *H* value is to 0, the stronger the antipersistence is.

When *q* is equal to two, *H*(2) is identical to the well-known Hurst exponent. Generally, the Hurst exponent is between 0 and 1. However, it is worth noting that the generalized Hurst index obtained by applying the MF-DFA method in this study may be greater than 1 [36,38].

The singularity spectrum *f*(*α*) is introduced to measure the degree of multifractality of the series and can be obtained through Legendre transform:

$$\mathfrak{a} = H(q) + qH'(q),\tag{4}$$

$$f(\mathfrak{a}) = q[\mathfrak{a} - H(q)] + 1. \tag{5}$$

Here, *α* is the singularity strength and used to characterize the singularities of the time series. *f*(*α*) indicates the dimension of the subset of sequences that is characterized by *α*. The strength of multifractality can be estimated by the spans of singularity given by

$$
\Delta \mathfrak{a} = \mathfrak{a}\_{\text{max}} - \mathfrak{a}\_{\text{min}}.\tag{6}
$$

#### **4. Data**

As this research investigates the long-range dependence and multifractality for ship flow sequences in container ports, we extracted required container ship flow data from the Automatic Identification System (AIS) database. The data span is from 1 January 2013 to 31 December 2017 for three representative container ports in the world—that is, Shanghai, Singapore, and Rotterdam. Figure 1 represents the original data of ship flow at different time scales for these three container ports.

The port of Shanghai is not only the largest container port in China, but also the world's largest container port. Its shipping routes reach the world's 12 largest shipping areas, and it has established business contacts with more than 500 ports in nearly 200 countries and regions. As the world's second largest container port, Singapore is also the largest transit port in the Asia-Pacific region. The Port of Rotterdam is the largest port in Europe, as well as the European Gateway. Therefore, in this study, the ports of Shanghai, Singapore, and Rotterdam were chosen to represent all the ports around the globe.

We analyzed and compared the long-range correlation and multifractality characteristics of the ship flow sequences of both Asian and European ports and the gateway ports and transit ports. In general, the ship flow sequences for the three ports depicted different characteristics at different time scales. Among them, the ship flow series of Shanghai and Rotterdam with daily and weekly time scales showed a more significant upward trend than the ship flow series of Singapore port. Ship flow sequences of the three ports with monthly scale fluctuated up and down along the time line, indicating a nonlinear

and nonstationary feature. Generally, these fluctuations are not random, but relate to the seasonal and monthly cycles.

**Figure 1.** Original data of ship flow of different time scales in three ports.

#### **5. Empirical Results**

#### *5.1. Descriptive Statistics*

It is a well-known fact that the statistical properties of time series vary with time and depend on time windows. Table 2 presents the descriptive statistics for the original ship flow series at different time scales.


**Table 2.** Descriptive statistics of the original ship flow series.

According to Table 2, the average number of ships arriving in the port of Singapore is greater than that of the Shanghai port at any time scale. Therefore, although Shanghai's container port ranks first in the world list, the port of Singapore is still the busiest container port in the world. This is mainly due to its unique geographical location and its role as the largest transit port in the Asia-Pacific region.

Besides, the standard deviation of ship flow sequences in port of Shanghai is larger than that of Singapore and Rotterdam ports, regardless of the time scale. This shows that the numbers of arriving ships in the ports of Rotterdam and Singapore are relatively stable compared to Shanghai.

The results from skewness and kurtosis analysis demonstrated that the different time scales significantly affect the temporal structure of the ship flow sequence. Indeed, skewness reflects the degree of symmetry in the distribution. Table 2 shows that the skewness in almost all the ship flow sequences are negative, except for the daily ship flow sequence of Rotterdam (0.263) and monthly ship flow sequence of Singapore (0.164). This indicates that all of the ship flow sequences are leftward distributions, except for the daily ship flow sequence of Rotterdam and monthly ship flow sequence of Singapore.

On the other hand, the kurtosis reflects the sharpness of the image. The higher the kurtosis, the sharper the center point on the image; in this sense, the kurtosis measures the degree of data aggregation in the center. In this study, the traditional kurtosis is replaced by the "super kurtosis" calculation method, which is to subtract the kurtosis 3 of the normal distribution from the original kurtosis so that the comparison benchmark is zero. Table 2 shows that the super kurtosis values of daily ship flow sequences are greater than 0, indicating that the daily ship flow sequence distributions are more concentrated and have a longer tail than the normal distribution. In the weekly ship flow sequences, only the super kurtosis of Singapore is greater than 0. The super kurtosis of all the monthly ship flow sequences are less than 0, indicating that these sequences are scattered and have a shorter tail than the normal distribution. Therefore, only the daily ship flow sequence of three ports and the weekly ship flow sequence of Singapore exhibited the characteristic of "sharp peak or fat tail". Further, the ship flow sequences of three ports with different time scales deviated from the normal distribution. The results are further validated by the frequency and probability density distribution of ship flow at the port in Figures 2–4.

In addition, we investigated the stationarity of the above ship flow series with different time scales. In general, the ADF and KPSS tests are complementarily used to evaluate the stationarity in time series models. Therefore, both methods were used in this study to test the stationarity of the ship flow time series with the intention of obtaining more precise results. The null hypothesis of the ADF test was the presence of a unit root, indicating the nonstationarity; the null hypothesis of the KPSS test was the absence of unit root, indicating stability. Table 1 showed that the *p* values of the ADF test of all the ship flow sequences were greater than 0.05; so, the null hypothesis is accepted—that is, the sequences are nonstationary time series. Similarly, the KPSS test results indicated a *p* value less than 0.05, rejecting the null hypothesis. Therefore, all the ship flow sequences of three ports are nonstationary time series.

**Figure 2.** Frequency and probability density distribution of daily ship flow sequences.

**Figure 3.** Frequency and probability density distribution of weekly ship flow sequences.

**Figure 4.** Frequency and probability density distribution of monthly ship flow sequences.

#### *5.2. Long-Range Dependence and Multifractality*

Figure 5 depicts the fluctuation function *F* versus scale for the ship flow sequences of three ports with different time scales in log-coordinates and the OLS linear regression for these curves when *q* is equal to two. We calculated the generalized Hurst exponents from the slopes of these straight lines. The Hurst exponent of all the ship flow sequences was greater than 0.5, which indicated a positive long-range dependence and persistence in these ship flow sequences.

**Figure 5.** Generalized Hurst exponent (*q* = 2) for three ports with different time scales.

Table 3 presents the results for long-range dependence in ship flow sequences of three ports when *q* is equal to two. Empirical results suggested high Hurst exponents for ship flow sequences. An important feature of these results implied that, in general, Hurst exponents becomes larger as the time scale increases. The port of Singapore has the highest Hurst exponent in weekly ship flow sequence, which is close to 1. However, in the monthly ship flow sequences, Singapore obtains the lowest Hurst exponent, which is just 0.751. Therefore, ship flow sequences in the port of Singapore have a higher Hurst exponent when the time scale is small. On the contrary, the ship flow sequences in the ports of Shanghai and Rotterdam have a higher Hurst exponent when the time scale is larger. Moreover, the differences in Hurst exponents under different time scales seem to be higher for the port of Singapore.

**Table 3.** Generalized Hurst exponents (*q* = 2) for ship flow sequences with different time scales.


Figure 6 presents the results for the MF-DFA methodology for *q* = [−10 : 10]. Qualitative results showed that the generalized Hurst exponent for all ship flow sequences decreased with the increase in *q*; the generalized Hurst exponent of these ship flow sequences is dependent on the selection of *q*. However, for the monthly ship flow sequence

in the ports of Shanghai and Rotterdam, this dependence on *q* was not significant. The differences in generalized Hurst exponents of daily and weekly series for three ports seem to be smaller as *q* increases. The generalized Hurst exponents for port of Singapore decreased faster than those for Shanghai and Rotterdam with the increase in *q* in all time scales. In particular, there was no substantial change in generalized Hurst exponent of the monthly ship flow series for Shanghai and Rotterdam; *H*(*q*) remained between 0.8 and 1.2 with the increase in *q*. For the monthly sequence, when *q* is less than zero, the generalized Hurst exponents of Singapore and Rotterdam are larger than that of Shanghai; meanwhile, when *q* is larger than zero, the situation may change to the opposite, and the generalized Hurst exponent of ship flow for Shanghai becomes the largest. These results suggest that these differences are not spurious or due to error measures. An important additional comment is that the degrees of multifractality of ship flow sequences in Shanghai and Singapore container ports are much higher than the ones found for ship flow in Rotterdam port.

**Figure 6.** Hurst exponents of ship flow sequences calculated by Multifractal Detrended Fluctuation Analysis (MF-DFA) for *q* = [−10 : 10].

The leveling of *q*-order Hurst exponent reflects that the *q*-order root-mean-square (RMS) is insensitive to the magnitude of local fluctuations. The multifractal spectrum will have a long left tail when the time series have a multifractal structure that is insensitive to the local fluctuations with small magnitudes. In contrary, the multifractal spectrum will have a long right tail when the time series have a multifractal structure that is insensitive to the local fluctuations with small magnitudes [37].

Figure 7 depicts the multifractal spectrum for ship flow sequences of three ports with different time scales. According to Figure 7, the multifractal spectrum of ship flow sequences for all ports can be divided into two sections. However, the spans of the multifractal singularity are different, implying that they have different multifractality strengths. For the daily ship flow sequences, the port of Rotterdam has the lowest multifractal strength, while the port of Singapore has the highest multifractal strength. Furthermore, the shape of multifractal spectrum of daily ship flow for Singapore shifts to the right and the spectrum is

slightly right-skewed, indicating that the scaling behavior of small fluctuations dominates the fluctuation of the daily ship flow for Singapore port.

**Figure 7.** Multifractal spectra for ship flow sequences of three ports with different time scales.

Compared with the daily ship flow time series, the multifractal spectra of Shanghai and Rotterdam obtained from weekly ship flow sequences shifted to the left and the spectrum is slightly left-skewed. This indicates that the scaling behavior of large fluctuations dominates the fluctuation of the weekly ship flow for Shanghai and Rotterdam port.

According to the multifractal theory, the strength of multifractality can be characterized by the span of the multifractal singularity strength function in Equation (6). The bigger the Δ*α* is, the stronger the degree of multifractality becomes.

Table 4 presents the quantitative strength of multifractality of all the ship flow sequences of the three ports. It can be seen that the degree of multifractality of weekly series is the strongest for all three ports, followed by the monthly time series, and the daily time series has the weakest multifractality. The monthly ship flow sequence of Singapore port has the highest degree of multifractality. An interpretation for this result is that the weekly ship flow sequence of Singapore is very sensitive to the changes of various influencing factors, and it is very hard to predict. Therefore, compared with the forecast of the ship traffic flows of days and months, the weekly forecast is the most difficult for the ship flow at the container port. This shows that the multifractal method is essential for the analysis of ship flow sequences.


**Table 4.** The strength of multifractality for ship flow sequences.

#### *5.3. Type of Multifractality*

Another contribution of this study is to identify the type of multifractality presented in the ship traffic flow data. We performed the same analysis on the randomly shuffled series of the original ship traffic flow sequences. The randomly shuffled sequences were obtained by shuffling the original ship flow sequences. The shuffled sequences remained with the same fluctuation distributions, though it destroyed any temporal correlations in the original data.

The process of shuffling can be depicted as the three steps presented by [39]. Firstly, (*p*, *q*) pairs are generated from random integer numbers with *p*, *q* ≤ *N*, where *N* is the length of the original time series. Secondly, *p* and *q* entries are swapped with each other. Finally, the above two steps are repeated *N* = 20 times to ensure that the original series is fully shuffled.

We shuffled the ship flow sequences and calculated *H*shuffle(*q*). As seen in Figure 8, the *H*shuffle(*q*) is approximately 0.5 for most of the ship flow sequences except for the weekly ship flow sequences of Singapore. The shuffled ship flow sequences with *H*shuffle(*q*) of about 0.5, indicating the multifractality of these ship flow sequences, are caused by different fluctuations in correlations of small and large scales. However, the multifractality of the weekly ship flow sequences of Singapore port is caused by a broadening of the probability density function. This result is consistent with the results shown in Figure 3. Therefore, the multifractality presented in the ship flow sequences of container ports are due to the correlation properties as well as the probability density function of the ship flow series.

**Figure 8.** *Cont*.

**Figure 8.** Hurst exponents of the shuffled ship flow sequences for *q* = [−10 : 10].

#### **6. Conclusions**

In this study, we investigated the statistical properties of container ship flow time series and made a detailed investigation on long-range behaviors and fractal characteristics of ship flow sequences for three representative container ports in the world—Shanghai, Singapore, and Rotterdam. This study concludes three main findings.

Firstly, the empirical evidence given in this study emphasize the significance of long-range dependence behaviors and multifractal property in all ship flow sequences at different time scales for three container ports.

Secondly, the empirical evidence from comparisons among these ship flow sequences at different time scales implies that the long-range dependence becomes larger for each port as the time scale increases, except for the port of Singapore. Shanghai and Rotterdam were identified as the ports with the highest degree of long-range dependence in monthly ship flow sequences, while Singapore was identified as having the highest degree of long-range dependence in weekly ship flow sequence.

Finally, the empirical evidence confirmed the multifractal property as an impact factor on the ship flow sequences of container ports. The analysis on the shuffled data indicated that the presence of multifractality in the ship flow sequences of container ports is due to the correlation properties as well as to the probability density function of the ship flow series.

The findings of this study provide some interesting implications. First, the existence of long-range dependence and multifractality in container ship flow could be exploitable and helpful for shipping companies and policy makers. In other words, the presence of chaotic structure such as long-range dependence and multifractality in container ship flow sequences implies that the volume and direction of container ship flow may demand certain rules. Therefore, the shipping company can carry out short-term capacity allocation and adjustment according to container ship flow prediction. Second, the presence of long-range dependence and multifractality in the data suggests that container ship flow forecasting models should account for existing nonlinearities in the data, otherwise, their results may be biased and highly misleading.

Port groups can use these findings in forecasting the expected volatility in the number of arriving container ships, and thereby, in developing and carrying out the layout planning of the port infrastructure, shipping date planning, and even port expansion investment. Moreover, some advanced modeling approaches can be employed for ship flow sequence forecasting, such as statistical modeling [40,41] and machine learning methods [42,43].

In addition, we regarded the trends in the ship traffic flow time series in this study as caused by external conditions. We identified and filtered out these trends in MF-DFA analysis. However, these trends may not be completely caused by external conditions, and some trends may carry endogenous power from data. At this time, when we explore the long-range power-law dependence of the time series, whether to filter out the trend

needs further discussion, as pointed out by Hu et al. [38]. The effect of trends on detrended fluctuation analysis for ship traffic flow time series of ports can be further studied in the future.

**Author Contributions:** C.-J.L.: conceptualization, methodology, software, validation, formal analysis, investigation, visualization; J.W.: writing—review and editing; H.L.J.: writing—review and editing; Z.-H.H.; resources, data curation, writing—original draft preparation, supervision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), under grant number CE140100049, the National Natural Science Foundation of China (No. 71871136 and 71801150), and the 2021 General project of Shanghai Philosophy and Social Science Planning (2021BJB006).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data used to support the findings of this study are available from the corresponding author upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Forecasting Taxi Demands Using Generative Adversarial Networks with Multi-Source Data**

**Hasan A. H. Naji, Qingji Xue \*, Huijun Zhu and Tianfeng Li**

School of Digital Media, Nanyang Institute of Technology, Chang Jiang Road No. 80, Nanyang 473004, China; hasanye1985@gmail.com (H.A.H.N.); zhuhj1201@163.com (H.Z.); tianfengli@163.com (T.L.) **\*** Correspondence: xue\_qj@sina.com

**Abstract:** As a popular transportation mode in urban regions, taxis play an essential role in providing comfortable and convenient services for travelers. For the sake of tackling the imbalance between supply and demand, taxi demand forecasting can help drivers plan their routes and reduce waiting time and oil pollution. This paper proposes a deep learning-based model for taxi demand forecasting with multi-source data using Generative Adversarial Networks. Firstly, main features were extracted from multi-source data, including GPS taxi data, road network data, weather data, and points of interest. Secondly, Generative Adversarial Network, comprised of the recurrent network model and the conventional network model, is adopted for fine-grained taxi demand forecasting. A comprehensive experiment is conducted based on a real-world dataset of the city of Wuhan, China. The experimental results showed that our model outperforms state-of-the-art prediction methods and validates the usefulness of our model. This paper provides insights into the temporal, spatial, and external factors in taxi demand-supply equilibrium based on the results. The findings can help policymakers alter the taxi supply and the taxi lease rents for periods and increase taxi profit.

**Keywords:** taxi demand; forecasting; multi-source data; generative adversarial networks

#### **1. Introduction**

Taxis play a vital role in the modern urban transportation system as they comfortably and conveniently serve many urban passengers. According to the annual report of urban passenger transport operation [1], more than 216 million passengers used a taxi service in 2020 in Wuhan city. However, a critical challenge emerged that there is a significant mismatch between the supply of taxis and passengers' demands. For instance, passengers may not find taxis for an extended period in an area at a specific time. In contrast, taxi drivers may cruise roads without getting passengers in another area at the same time. Therefore, this may lead to several problems, such as increasing passengers' wait time and oil consumption and decreasing taxi incomes. To this end, it becomes significant to accurately predict fine-grained taxi demands in advance to guide taxi drivers to areas with high demands [2].

There is a need for a deep understanding of the temporally varying taxi-passenger demand over various spatial areas that can guide and motivate drivers to be in the areas with a high potential of passenger demands and thus enhance the taxis' utilization rate [3].

Recently, there has been much research investigating the correlation between taxi demands and related dependencies. However, taxi demand forecasting is still an open problem, which is mainly affected by several kinds of factors [4–6]:


In this study, we try to study taxi demands and related factors extracted from multisource data. The taxi demands, represented by taxi pick-up events, were analyzed using

**Citation:** Naji, H.A.H.; Xue, Q.; Zhu, H.; Li, T. Forecasting Taxi Demands Using Generative Adversarial Networks with Multi-Source Data. *Appl. Sci.* **2021**, *11*, 9675. https://doi.org/10.3390/ app11209675

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti and Dimitrios S. Paraforos

Received: 23 August 2021 Accepted: 6 October 2021 Published: 17 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

a generative adversarial networks (GAN) to forecast taxi-passenger demand by considering significant factors, including temporal, spatial, and external factors. A novel deep learning-based approach is proposed for feature extraction and fine-grained taxi demands forecasting in urban areas.

The main contributions of this article include the following:


The rest of the paper is structured as follows: Section 2 presents a related review of the recent taxi demand prediction methods. Section 3 introduced the mathematical definition of taxi demand forecasting. Section 4 introduces the proposed methodology, followed by explaining the conducted experiment in detail in Section 5. The experimental results are discussed in Section 6. Section 7 concludes the implications and value of taxi demand forecasting findings and introduces the future work.

#### **2. Related Work**

Taxi demand forecasting has attracted researchers' and taxi service companies' attention due to the massive number of GPS trajectories and the huge spatio-temporal information produced every day by GPS sensors. In general, taxi demands forecasting methods can be categorized into three main classes: traditional methods, machine, and deep learning-based methods.

#### *2.1. Traditional Methods*

Traditional methods include statistical and time-series analysis-based methods. Statistical models have been used to study the predicting taxi demands. For instance, Tang et al. [7] developed a probabilistic-based model to predict vehicle trip routes using Hidden Markov Model (HMM). Moreira et al. proposed a data-driven method to predict passengers' spatial distribution in short-term periods [8]. Liu et al. developed three predictive methods for detecting high hotspots and predicting taxi demands [9]. Chang et al. [10] investigated historical trajectory data to forecast the spatio-temporal patterns of taxi demands.

Time-series analysis-based methods are also considered in traffic data prediction. For example, in [11], the Automatic ARIMA Model is adopted for forecasting the passenger's hotspot regions using spatio-temporal data. In [12], the authors modeled taxi demand prediction as a time series issue, and an improved ARIMA method is proposed to predict taxi demands by using temporal dependencies. Tong et al. [13] introduced a linear regression-based model along with high-dimensional features to forecasting taxi demands in urban regions.

#### *2.2. Machine and Deep Learning-Based Methods*

Markou et al. [14] utilized the information extracted by unstructured data in taxi GPS data and adopted machine learning techniques to forecast taxi demands. In [15], a backpropagation neural network (BPNN) with an extreme gradient boosting (XGB) based method is proposed to predict taxi-hailing demand. In [16], the authors introduced a machine learning-based approach for identifying and predicting the short-term demand for on-demand ride-hailing services. The predicting methodology studied factors related to traffic, trip fare, and weather conditions.

Recently, deep learning-based methods have been popularly adopted in forecasting traffic flow problems. Multi-layer perceptron (MLP), Convolution Neural Network (CNN), Recurrent Neural Network (RNN), and their variation networks have achieved superior achievements in taxi demand prediction.

CNN has been used to forecast traffic flow. For example, Ma et al. [17] split a city into several small grids, transformed city traffic speed into images, and adopted CNN for forecasting traffic speed. Zhang et al. [18] applied CNN by modeling temporal and spatial factors for predicting the traffic flow of bikes in the short-future periods.

The success of RNNs and their enhanced models, such as long short-term memory (LSTM), and gated recurrent units (GRU), led researchers to adopt these methods for predicting traffic flow [5,19,20]. Xu et al. utilized a sequence learning method for predicting future taxi demands in city regions, considering current taxi requests and relevant factors. Mixture density-based recurrent neural networks are developed to investigate historical taxi demand distribution and taxi demand predictions. Rossi in [19] obtained the sequences of historical pick-up and drop-offs and then employed the LSTM network to extract the sequential features for taxi request forecasting. Zhao et al. [20] used a cascaded-based LSTM combined with an origin–destination correlation matrix to capture spatial-temporal patterns. In [10], an LSTM neural network-based method is adopted for forecasting a future pick-demand of a given taxi stand by analyzing the spatial demand of a particular taxi stand and neighboring stands.

To make full use of the spatio-temporal correlation with taxi demands, many researchers combined both CNN and RNN for forecasting traffic flow.

A convolutional-recurrent network-based model is developed for forecasting finegrained taxi requests [21]. The authors considered various factors related to taxi demands, including the spatial correlations between neighboring areas and function-similar areas, long and short-term periods, and external factors. A context-based attention method combines regions' predictions to improve the prediction results [22]. Niu et al. [6] used LSTM with CNN in a real-time prediction system by streaming taxi-passenger data. The CNN network is adopted to extract spatial features, whereas LSTM obtained temporal dimensions. Zhang et al. [23] proposed a deep-learning-based model to predict the in-flow and the out-flow of crowds in a city.

Previous works on taxi demand prediction led us to consider historical trip information to forecast future taxi requests. One of the differences between our work and the preceding works is that our model can fill this gap by integrating taxi GPS data with other related historical data sources, including weather conditions, temporal data, POI data, to fully understand taxi supply patterns and consider the significant factors affecting external dependencies of taxi demand simultaneously.

In this research, we employed the generative adversarial networks (GAN) model to forecast taxi demands. In the generative network part, the long short-term memory (LSTM) structure is selected, whereas the Conventional network is used in the discriminator network. Thus, the GAN model (GAN\_LSTM\_CNN) can deal with the factors included in the Taxi dataset and related patterns perfectly. In addition, and by considering successive iteration and adversarial learning of the discriminator and the generator, we can obtain better prediction results.

#### **3. Problem Definition**

In this section, firstly we define several key concepts, and then formally formulate the taxi demand forecasting problem.

Definition 1 (*taxi trip*). A taxi trip is defined as a tuple (ID, pickuptime, picklocation,droptime, droplocation, duration, distance, fare), where ID represents the trip identification, pickuptime is the pick-up time (in hours and minutes), picklocation is the pick-up location (longitude and latitude), droptime is the drop-off time, droplocation represents the

drop-off location, duration, distance, and fare are the calculated duration, distance and, fare of a trip.

Definition 2 (*Road Network*). A road network of a city is composed of a set of road segments. Each road segment is associated with two terminal points (i.e., intersections of crossroads), and connects with other road segments by sharing the same terminals. All road segments compose the road network in the format of a graph.

Definition 3 (*Temporal Data*). For the sake of a detailed description of the spatial and temporal dimensions of fine-grained taxi demands, the time is discretized into time slots *t* (date, hour, and minutes).

Definition 4 (*Point of Interest POI*). POI represented a place (like a restaurant) in an urban area r. Each POI pi associated with a location *pi.l* and a POI category *pi*.*c* ∈ C, where C is a set of categories.

Definition 5 (*Weather*). Weather presents several parameters related to weather status in a determined time slot (date, hour, and minutes) of an urban area r. Each weather may consider several associated parameters including temperature, humidity, weather status (rain, sunny), wind, etc.

Here, we can provide a formal definition of fine-grained taxi demands as follows.

Definition 6 (*Taxi Demands*). We use Y*<sup>i</sup>* to represent the amount of taxi pick-ups (demand of taxis) in an area *r* ∈ *R* at a time slot *t* ∈ *T*. Hence, taxi demands in a time slot *t* are defined as Y*r,t* = [*Y*0,*t*, *Y*1,*t*,... , *Y*n,*t*].

To forecast taxi demands in future, we first dig fine-grained taxi demands in past time slots from historical taxi trips data defined as in Definition 1 As Taxi Demands have a strong correlation with other temporal, spatial and external factors, we collect significant factors related to taxi demands (as in Definitions 2,3,4 and 5) in the determined area *r*, at time slot *t*.

Now, we can define the problem of forecasting the fine-grained taxi demands in a determined time slot *t* as follows.

Definition 7 (*Taxi Demand Forecasting Problem*). Consider an urban area is divided into disjoint regions *R* by the road network.

Given fine-grained taxi demands in the past time slots {Y*t*|*<sup>t</sup>* = 0, 1, ... ,*t*} extracted from historical taxi trips data (pick-ups), we try to predict fine-grained taxi demands at a determined time slot. The taxi demand forecasting is denoted as *Y*ˆ**t**+1 = [*Y*ˆ 1,*t*+1, ... , *Y*ˆ n,*t*+1].

#### **4. Methodology**

This section introduces the proposed taxi demand forecasting model in detail, including raw data, features extraction and analysis, features pre-processing, and the GAN model. Figure 1 depicts the framework of the proposed model.

#### *4.1. Raw Data*

#### 4.1.1. Taxi GPS Data

The dataset used in this paper contains temporally ordered location records collected from 9124 GPS-enabled taxis during 91 days (historical GPS dataset of September, October, and November in 2013) of the city of Wuhan, China.

The temporal resolution of the dataset is 25 s; therefore, approximately 3500 GPS records for a taxi per a day (24 h) would be collected, and the total volume of the dataset approached 320 million records. The GPS meter triggered information to the taxi data center and contains seven taxi attributes, i.e., taxi (driver) ID, timestamp, longitude, latitude, speed, direction, and taxi status. The detailed description of the taxi attributes is shown in Table 1.

**Figure 1.** Framework of the proposed model.

**Table 1.** Description of the attributes of a record in the Taxi GPS dataset.


#### 4.1.2. Road Network Data

A dataset of the city of Wuhan's road network obtained from the OpenStreetMap website [24] is used in this research, containing 94,214 intersections and 95,781 road segments. Each road intersection is vectorial, located by related latitude and longitude. The whole area of the city is divided into eight significant regions and 623 disjoint small regions. Figure 2 shows the road network and the central areas in the city of Wuhan, China.

#### 4.1.3. Temporal Data

Besides the taxi GPS data, our study considered temporal data and related factors, including weekday, weekend, holiday, etc., significantly influencing traffic flow and taxi demand distribution.

#### 4.1.4. POI Data

The information of points of interests (POI) may provide useful details about the functions of urban regions relevant to taxi demands. We obtained POI dataset of the city of Wuhan by web data crawler tool (using selenium and PhantomJS in python) from guihuayun website [25] which resulted in collecting approximately 425476 POIs of Wuhan city, grouped into 16 different categories as shown in Table 2.

**Figure 2.** (**a**) Road network, (**b**) Main regions of the city of Wuhan.


**Table 2.** Description of POIs categories in the city of Wuhan.

#### 4.1.5. Weather Data

Besides the above external factors, weather conditions have a notable influence on taxi demands [2]. A weather dataset from 1 September 2013 to 30 November 2013 is also obtained by web data crawler tool from the timeanddate website [26]. For ease and simplification, weather categories are converted into categories defined as shown in Figure 3.

Moreover, the weather data includes other variables, including temperature, wind speed, humidity information, etc. To enhance weather data accuracy, we adopted updated data for every hour unless there is a change in the existing conditions.

#### *4.2. Features Extraction and Analysis*

To understand the detailed information behind taxi demands and significant correlated factors which play a vital role in taxi demands forecasting, we need to extract and analyze features from the raw multi-source data introduced in Section 4.1.

**Figure 3.** Weather categories from historical Weather data.

#### 4.2.1. Taxi Trips

To enhance data accuracy and deal with invalid data such as incomplete, noisy data and outliers, etc. Raw GPS data have been manipulated and prepared carefully. The raw data have been explored using with outstanding packages and libraries in R language and Python such as Pandas, skit-learn and stats. Considering Wuhan graphical coordinates (longitude: 113◦40 ~115◦04 and latitude: 29◦60~31◦20 ), the GPS records located outside this range deleted. Besides, GPS data with a taxi status of 0 or 2 (0: invalid device, 2: temporarily stopped) and repeated records were removed. GPS records of more than five sequential timestamps with zero speed value have been deleted.

In addition, data caused by the device errors and noises, such as GPS records with zero or null values of longitude, latitude, timestamp, were also deleted.

As in [27,28], taking into account taxi states during working operation, we can consider two trips: occupied (O) and vacant (V). The empty trip means that the taxi status value is 1, and the driver is cruising without a passenger, whereas in the occupied trip, the taxi is crossing the roads with a passenger. Figure 4 shows an example of the two trips randomly extracted from GPS trajectory taxi.

**Figure 4.** An example of the two trips randomly extracted from a GPS taxi trajectory. The occupied trip crossed Tuanjie road and ends at the gate of Hubei university and dropped off the passenger, and after several meters another passenger is picked up and anther occupied trip begins.

As shown in Figure 4, small green circles form an occupied trip with a giant green circle (represents a taxi pick up) and end up with a big red circle (represents a taxi drop off). In contrast, small red circles together are considered a vacant trip. In this study, taxi demand prediction focuses on occupied trips. Table 3 shows variables related to occupied trips.

**Table 3.** Attributes of an occupied trip.


For the sake of more accurate results, the trips with a distance less than 200 m or trip period less than 3 min or number of GPS records less than five have been removed and are not considered in this study. Finally, the dataset considered 29,984,635 trips, and this number is sufficient for performing taxi demand prediction. Figure 5 shows three selected study areas in the city of Wuhan, (a) Jiedokou, (b) GuangGu area, and (c) Wuhan railway station area.

#### 4.2.2. Taxi Demands and Temporal Factors

• Taxi demands distribution in Weekdays and Weekends

Figure 6 shows taxi demands distribution in Weekdays and Weekends of three study areas in the city of Wuhan.

We can see that these areas have similar taxi demand distributions (especially on Fridays, indicating that people take taxis on the last weekday for reasons such as going to travel or for entertainment or coming back on Sundays) and similar changes in taxi demands. Therefore, we can leverage the weekdays and weekends data to help in forecasting taxi demands.

• Taxi demands distribution in Holidays

The collected dataset contains eight holidays (National days 1–7 October and 24 November Christmas day). Here we study the taxi demand distribution in the three study areas on the holidays, as shown in Figure 7.

In general, there is a noticeable pattern during holidays in the three areas. Days 1 and 2 October showed high taxi demands followed by a decrease in the other days and finally reached another high demand on7 October. It can be explained that in the beginning holiday period, and 24 November Christmas day, people may take taxis to travel. In contrast, taxi demands increase again on 7 October when people may come back home.

• Taxi demands distribution in daily hours

Figure 8 reports taxi demand distribution during a day (24 h) in the areas in Wuhan city.

**Figure 5.** Three study areas in the city of Wuhan, (**a**) Jiedaokou area, (**b**) Guanggu area, (**c**) Wuhan railway station area.

**Figure 6.** Taxi demands distribution in Weekdays and Weekends in the city of Wuhan. Weekday show high taxi demands as they related to working time, Sunday has a high number of Taxi demands as well.

**Figure 7.** Taxi demands distribution in the city of Wuhan during holidays. Holidays show high taxi demands in the three areas.

**Figure 8.** Taxi demands distribution in the city of Wuhan during daily hours.

As we can see in Figure 8, the three study areas have some shared patterns. For example, all of them showed the minimum rates at 4 am and peak-rush at 8 am and 5 pm.

The taxi demand shows high periodicity at long terms, due to the various changes of taxi demands on Thursdays and Fridays in many weeks are almost similar, in the three areas. The periodicity of short-term and long-term can assist in forecasting taxi demands in city areas in the future.

#### 4.2.3. Taxi Demands and POI

In general, the number and type POI may have a significant influence on taxi demand numbers. Figure 9 shows the distribution of POIs and taxi demands in the three areas of Wuhan city.

**Figure 9.** Distribution of POIs and taxi demands in different regions.

From Figure 9, we can find that the distribution of POIs and taxi demand is unbalanced over the three areas, and the more POIs there are in each area, the more taxi demands. For instance, POIs and taxi demands in Jiedaokou and Guanggu are at high numbers, while those in Wuhan railway are at lower numbers.

#### 4.2.4. Taxi Demands and Weather Factors

Besides the above factors, taxi demands are also influenced by weather conditions [22]. This study considers various weather-related factors, including weather conditions (e.g., fog, clear, and rainy), temperature, wind speed, and humidity information, etc., which are reported each hour. Figure 10 shows the general distribution by percentages of different weather types during the study periods.

Figure 10 shows the general taxi demands of the three-study area in a day with various weather-related factors.

From Figure 11, we can observe a distinctly remarked impact of weather on taxi demands. We can find that taxi demands significantly increased on a rainy day, and more people may prefer to take taxis when it rains during 7–9 am. This can assist in providing information of taxi traffic flow and human traveling behavior. In adaption, with the functions of each area (using POI) we can understand the flow of passengers.

#### *4.3. Features Preparation*

After extracting and analyzing features and taxi demands, there is a need to prepare the features to be ready for utilization in the proposed model. Here we perform three steps: features normalization, transformation, and concatenation.

#### 4.3.1. Feature Transformation

For the sake of reducing the complexity of computing factors, we transformed factors into categorical values, including: Is weekend, Is holiday, region, weather, etc. These factors' values are converted to be numbers that begin with 0 and end with the summation number of the values. For instance, Is weekend factor's values would be represented by 1 for a weekend and 0 for a weekday; weather condition's value would be 0,1,2,3,4 for Clear, Cloudy, Rain, Light Rain, and Heavy Rain, respectively.

**Figure 11.** Weather Influence on Taxi demands.

#### 4.3.2. Feature Normalization

We use the Min-Max normalization [0, 1] standard to reduce the absolute scale's effects. The normalization process is performed on continuous values as follows:

$$\chi\_{norm} = \frac{\mathbf{x} - \mathbf{x\_{min}}}{\mathbf{x\_{max}} - \mathbf{x\_{min}}} \tag{1}$$

where *x*, *xmin*, *xmax*, *xnorm* are the original, minimum, maximum and normalized values from the dataset (training dataset), respectively.

#### 4.3.3. Feature Concatenation

To facilitate the dataset training, we embed the features in Table 4 into a 1 × 11 vector according to a time-step t. Figure 12 depicts Feature Concatenation.


**Figure 12.** Feature Concatenation.

After performing features normalization, transformation, and concatenation, the dataset was ready for forecasting by prediction models. The features included in the dataset are shown in Table 4.

#### *4.4. Generative Adversarial Network (GAN) Model*

A generative adversarial network (GAN) contains two deep neural networks, a generator, and a discriminator. The generator network provides a (fake) dataset fed to the discriminator with the real data. The discriminator network determines and distinguishes the real data and the fake (generated) ones. During the model's training process, both the generator and the discriminator's weights would be updated using the related loss function. Once the discriminator becomes unable to differentiate the two types of data, it terminates the training process, and then the model becomes ready to be used; otherwise, in GANs, both components can be any deep neural network. In the following, we introduce LSTM andCNN and then illustrate the proposed GAN model used in this study.

#### 4.4.1. LSTM and CNN

• LSTM

LSTM neural networks are emerged to add long-term memory function, which enhanced Recurrent Neural Networks' ability to deal with more complicated issues, including prediction and classification [29]. In the LSTM network, the input vector sequence X would be mapped to an output vector sequence y by i iterations. Figure 13 depicts the architecture of an LSTM cell.

**Figure 13.** The architecture of a traditional LSTM network.

An LSTM cell contains three layers: an input layer, an output layer, and a memory block layer. The memory block includes three types of gates, including the input gate, the output gate *ot*, and the forget gate *ft*. Besides, *ct* and *ct* as memory cell vectors and the candidate value, respectively. The notation *t* represents a random time-step. During the training process, the related information can be updated as the following formula [30]:

$$\dot{a}\_t = \sigma(W\_i[h\_{t-1}, \{X\}\_t] + b\_i) \tag{2}$$

$$f\_t = \sigma(\mathcal{W}\_f[h\_{t-1}, \{X\}\_t] + b\_f) \tag{3}$$

$$c = \tanh(\mathcal{W}\_c[h\_{t-1}, \{X\}\_t] + b\_c) \tag{4}$$

$$\mathbf{c}\_{t} = f\_{t} \times \mathbf{c}\_{t-1} + i\_{t} \times \mathbf{c}\_{t} \tag{5}$$

$$\rho\_t = \sigma(\mathcal{W}\_o[h\_{t-1}, \{X\}\_t] + b\_o) \tag{6}$$

$$h\_l = o\_l \times \tanh(c\_l) \tag{7}$$

where function *σ*(.) is a sigmoid function can be computed as follows:

$$\sigma(\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{x})} \tag{8}$$

*Wi*, *Wf*, and *Wc* are the weight and the of the input gate, input gate, and input gate, respectively, whereas the function tanh(.) is a tangent function that is computed as follows:

$$\tanh(\boldsymbol{x}) = \frac{\exp(\boldsymbol{x}) - \exp(-\boldsymbol{x})}{\exp(\boldsymbol{x}) + \exp(-\boldsymbol{x})} \tag{9}$$

• CNN

In general, Convolutional Neural Networks (CNNs) are composed of a convolutional cell group, pooling layers, and a set of fully connected layers, as depicted in Figure 14.

**Figure 14.** The architecture of traditional Convolutional Neural Network.

A convolutional layer i includes a group of filters Wi <sup>=</sup> ∈RS×D×<sup>N</sup> that is convolved with an input tensor, S denotes the filters' number, D is a filter's size, and N represents the input channels' number [31]. A pooling layer can pool the output of a convolutional layer. Both convolutional and pooling layers are adopted to obtain the temporal patterns and to correlate temporally distant features. A set of fully connected layers follows the last convolutional/pooling layer to classify the input time series. The network output showed the dataset's classification result and provided one of the two labels (real, predicted) for each time step.

$$\sigma(\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{x})} \tag{10}$$

#### 4.4.2. GAN\_LSTM\_CNN Model

In the proposed model, and due to their stability, the generator component adopted LSTM networks and CNN for the discriminator network. Figure 15 illustrates the architecture of the GAN\_LSTM\_CNN Model.

**Figure 15.** The architecture of GAN\_LSTM\_CNN model.

• The generator

In our GAN model, we set the LSTM network as the generator according to its stability. The dataset contains 12 features (for 24 h of 91 days) as listed in Table 4. For building up a robust generator network with good performance, we use four layers of LSTM; the numbers of the neuron are 1024, 512, 256, and 128, followed by three fully connected layers. The activation function used in LSTMs is ReLU, and the dropout neurons are 20 percentage, and the neuron number of the latest layer will be the same as the output step we are going to predict.

• The discriminator

The discriminator in the proposed GAN model is a Convolutional Neural Network that aimed to distinguish whether the input data of the discriminator is real or fake. The input for the discriminator will be from the real data or the fake data from the generator. The discriminator network contains four Convolution layers with 32, 64, 128, and 256 neurons separately. Besides, a flatten layer is combined to flatten the output of the convolutional layers to generate a single feature vector for classification, which is performed by the following four fully connected layers with 220, 220,220, and a neuron, respectively. The Leaky Rectified Linear Unit (ReLU) has been set as the activation function on all layers, except the output layer (adopted Sigmoid function). The sigmoid function will produce a single scalar output, 0 or 1, which means real or fake as the final result.

#### 4.4.3. Training Process

In the proposed model, the generator generates fake data and tries to fool the discriminator, whereas the discriminator tries to distinguish the real data from the predicted data. Thus, there is a need to perform dataset training on the generator and the discriminator. During the training process, the cross-entropy loss is utilized in our GAN model to minimize the difference between the two data distributions.

In training the discriminator, we aim to maximize its objective function, the probability of assigning the correct label to the samples, the loss function of the discriminator is defined as follows:

$$\text{D\\_loss} = -\frac{1}{m} \sum\_{i=1}^{m} \log D(y^i) - \frac{1}{m} \sum\_{i=1}^{m} \left(1 - \log D(G(x^i))\right) \tag{11}$$

where *x* is the features vector, *y* is taxi demands from the real data, *G*(*x<sup>i</sup>* ) is the generated taxi demands produced by the generator. Then we train the generator to minimize the loss function which is obtained as follows:

$$\mathcal{G}\\_\text{loss} = -\frac{1}{m} \sum\_{i=1}^{m} \left( 1 - \log D(\mathcal{G}(\mathbf{x}^i)) \right) \tag{12}$$

Through the training process, it always needs to minimize the loss function to obtain better results. In the proposed model, we adopted cross-entropy to calculate our loss for both generator and discriminator. In the discriminator, we combined the generated taxi demands with the historical taxi demands of input steps as our input for the discriminator. This step enhances the data length and increases the accuracy for the discriminator to learn the classification. In addition, the batch size is 30, and the epochs' number is 400. We apply Leaky ReLU for fully connected layers as the activation Nesterov Accelerated Gradient (NAG) function. We utilized the NAG algorithm with a learning rate of 0.01 [31]. Besides, to avoid over-fitting, we implemented the dropout method with a probability of 0.2.

#### **5. Experiment**

This section introduces the experimental setting, baselines, evaluation metrics and shows the results of the proposed model and baselines in detail.

#### *5.1. Experimental Settings*

The main aim of this paper is to perform taxi demand prediction for the three-study areas in the city of Wuhan, including Jiedaokou, Guanggu, and Wuhan railway station, following 18 days with the data of the past 73 days (whole dataset 91 days in 24 h). During training the forecasting model, the input data contains not only the historical taxi demands but also 11 related features that might have a significant effect on the taxi demands. In the training process, the dataset will be split into a training dataset of 80% (73 days with 3264 observations) and a testing dataset of 20% (18 days with 649 observations).

Our model is implemented on a hardware environment of two GPUs, NVIDIA GeForce RTX 2070 with 32 Gigabyte memory, and executed by the conjunction of scikit-learn with the TensorFlow framework.

#### *5.2. Model Comparison*

To illustrate the effectiveness of our model, we compare the performance along with six mainstream baselines and tune the parameters for all methods. The models used for the comparison are as follows.

#### 5.2.1. Auto-Regressive Integrated Moving Average (ARIMA)

In ARIMA (*p*,*d*,*q*), the parameters *p* and *q* are related to the order of the autoregressive term and the moving average term, respectively, while the parameter d indicates the dth order different from the original data series, which points to remove the trend from the data series [11]. In this study, these parameters were optimized by the auto-optimal function in the forecast functions in the sikit-learn package in python.

#### 5.2.2. Gradient Boosting Decision Tree (XGBoost)

Researchers widely use XGBoost to perform traffic flow prediction and provide stateof-the-art results [32].

#### 5.2.3. Multiple Layer Perceptron (MLP)

Our model is compared with the MLP method, which contains four hidden layers. The structure of every layer includes 128, 128, 128, and 64 hidden units, respectively.

#### 5.2.4. GAN\_LSTM Model

In this model, both the generator and the discriminator have a similar structure as in the generative network G in the GAN\_LSTM\_CNN model, i.e., four LSTM layers and three fully connected layers. A flattening layer would be added as in the discriminator network in the GAN\_LSTM\_CNN model. Figure 16 shows the structure of the GAN\_LSTM model.

**Figure 16.** The structure of GAN\_LSTM model.

#### 5.2.5. GAN\_CNN Model

In this model, the structure of the generator and discriminator is as same as the discriminator network D in the proposed GAN\_LSTM\_CNN model, i.e., four convolutional layers and three fully connected layers. Figure 17 represents the structure of the GAN\_LSTM model.

**Figure 17.** The structure of GAN\_CNN model.

#### *5.3. Experimental Metrics*

For the sake of assessing the verification and validation of a predictive model, many measures for assessing the predictive accuracy have been used [14,22]. Such as MAE, root MSE (RMSE) r and r2 are among the most commonly used or recommended measures [14,22,30]. Therefore, the commonly used measures, mean absolute error (MAE), root mean square error (RMSE), and the mean absolute percentage error, are considered in this study. The metrics are computed as the following formulas [29]:

$$MAE(i) = \frac{1}{T} \sum\_{i=1}^{T} \left| \hat{Y}\_{i,t}^{\uparrow} - Y\_{i,t} \right| \tag{13}$$

$$RMSE(i) = \sqrt{\frac{1}{T} \sum\_{t=1}^{T} \left( Y\_{i,t} - \overset{\wedge}{Y}\_{i,t} \right)^2} \tag{14}$$

$$MAPE(i) = \frac{1}{T} \sum\_{t=1}^{T} \frac{\left| Y\_{i,t} - \stackrel{\wedge}{Y}\_{i,t} \right|}{Y\_{i,t} + a} \tag{15}$$

where *Yi,t* is a real taxi demand in the area *i* at the time-step *t*, whereas ∧ *Yi*,*<sup>t</sup>* is the predicted taxi demand. The constant number *a* in the Equation (15) is a small parameter (*a* = 1) to

avoid division by zero situation when both *Yi,t* and ∧ *Yi*,*<sup>t</sup>* are 0.

Thus, the forecasting performance by the three areas at time-step *t* can be defined as the followings:

$$\text{sMAPE}\_{all} = \sum\_{t=1}^{T} \frac{\left| \mathbf{Y}\_{i,t} - \stackrel{\triangle}{\mathbf{Y}}\_{i,t} \right|}{\mathbf{Y}\_{i,t} + \stackrel{\triangle}{\mathbf{Y}}\_{i,t} + a} \tag{16}$$

$$RMSE\_{all} = \sqrt{\sum\_{t=1}^{T} \left( Y\_{i,t} - \overset{\wedge}{Y}\_{i,t} \right)^2} \tag{17}$$

$$MAE\_{all} = \sum\_{i=1}^{T} \frac{\left| \stackrel{\triangle}{Y\_{i,t}} - \underline{Y\_{i,t}} \right|}{\underline{Y\_{i,t}}} \tag{18}$$

By considering the initial batch size was 36, and the epoch size was 500, it was 50 iterations per epoch and totally about 25,000 iterations.

#### **6. Results**

#### *6.1. Comparison with Baselines*

This section illustrates a detailed comparison among models by experiential metrics as shown in Section 5.3. Firstly, we report the performance on MAE, sMAPE, and RMSE over the three areas together. Secondly, we show the prediction performance at the all specific areas as time passes, separately.

#### 6.1.1. Performance Comparison over the Three Areas Together

To assess the forecasting performance over the three study areas together (Jiedaokou area, Guanggu area, and Wuhan railway station area), we analyze the performance of the ARIMA, XBoost, MLP, GAN\_LSTM\_CNN model, GAN\_LSTM, and GAN\_CNN model in terms of MAE, RMSE, and sMAPE as defined in Equations (16)–(18). The comparison results are shown in Figures 18–20.

**Figure 18.** MAE Comparison Results with different time slots. MLP, XGBoost attained high MAE whereas GAN\_LSTM\_CNN achieved better MAE.

**Figure 19.** RMSE Comparison Results with hours ranged from 0–23 with increasing values from 6:00 to 22:00 and all deep learning methods achieved better results.

**Figure 20.** sMAPE Comparison Results during time slots. The achieved values showed deep learning methods (such as GAN\_CNN, GAN\_LSTM, GAN\_LSTM\_CNN) achieved better results comparing to ARIMA, XGBoost an MLP.

From the three figures above, we can notice that although they are different forecasting performance metrics, there are some common patterns. For example, all metrics reach the minimum rates at three hours 3, 10, and 23 as the with lowest number of pickups wrt their neighboring time slots have. Moreover, the GAN\_LSTM\_CNN model presents a better performance compared to other predictors.

Figure 18 plots the mean absolute error (MAE), which describes the prediction accuracy. There is a dramatic decrease at the early hours of a day until it reaches 1.4 (the lowest point) in the GAN\_LSTM\_CNN model at 3 am, following with a gradual increase until 22, and then a further sharp fell is found.

As shown in Figures 19 and 20, sMAPE presents the mean absolute error, whereas RMSE shows the root mean squared error between the predicted demands and the real demands. The values dropped from 3 am until 10 am, in which the value becomes lower again, then a noticeable increase has emerged again. GAN\_LSTM\_CNN model also provides the minimum MAPEs and RSMEs, which shows the highest accuracy between the predicted models adopted in this paper.

6.1.2. Performance Comparison over the Three Areas Separately

Our proposed model's performance and the baselines for the three study areas, namely Jiedaokou, Wuhan Railway Station Area, and Guanggu, are shown in Tables 5–7.

**Table 5.** Performance Comparison Jiedaokou Area.


**Table 6.** Performance Comparison Wuhan railway Station Area.


**Table 7.** Performance Comparison Guanggu Area.


As we can see from the three Tables above, we can notice that the GAN\_LSTM\_CNN model presents the lowest MAE, RMSE, and MAPE in the three areas, compared with the other baselines. Specifically, MLP and XGboost perform the poorest, whereas GAN\_LSTM and GAN\_CNN achieve good performance. Therefore, it proves that GAN-based models achieve better performance than other baselines. It confirms that deep neural networks can work efficiently in taxi demand forecasting. Compared withGAN\_LSTM, GAN\_LSTM\_CNN model achieves 15.86% (2.31%, 13.46%) lower MAPE (RMSE, MAE) on Jiedaokou area, 9.22% (3.86%, 6.84%) lower MAPE (RMSE, MAE) on Wuhan railway Station area, and 4.69% (10.02%, 6.84) lower MAPE (RMSE, MAE) and on Guanggu area, respectively. Compared with GAN\_CNN, our model reduced MAPE (RMSE, MAE) by 13.79% (4.77%, 12.29%) on the Jiedaokou area, 11.67% (11.45%, 9.93%) on Wuhan railway Station Area, and 4.38% (3.83%, 10.80%) on Guanggu area, respectively.

#### *6.2. Comparison of Training Loss Function*

The optimizer used in our model is the Adam algorithm with a learning rate of 0.0001. The batch size is 128, and we trained the model for 400 epochs. Figure 21 shows the Comparison of Training Loss Function among GAN\_LSTM and GAN\_CNN and GAN\_LSTM\_CNN models.

**Figure 21.** Training loss of the Generator(G\_loss) and Discriminator(D\_loss) of (**a**) GAN\_LSTM, (**b**) GAN\_CNN and (**c**) GAN\_LSTM\_CNN models. In GAN\_LSTM and GAN\_CNN, the loss of discriminator is higher than the loss of generator, and through the training process, both loss paths become flat. Compared with GAN\_LSTM and GAN\_CNN, the discriminator in GAN\_LSTM\_CNN learns better and faster.

Figure 21a,b showed the loss of the GAN\_LSTM and GAN\_CNN. The blue line represents the loss path of the discriminator, and the orange line is the generator's loss path. From the beginning, the loss of discriminator is higher than the loss of generator, and through the training process, both loss paths become flat.

In Figure 21c, the discriminator loss of GAN\_LSTM\_CNN decreases dramatically towards 0. Compared with GAN\_LSTM and GAN\_CNN, the discriminator in GAN\_LSTM\_CNN learns better and faster.

#### *6.3. Comparison of Time Efficiency*

We estimate the time efficiency reached by the GAN\_LSTM\_CNN model and GANbased baselines (i.e., GAN\_LSTM and GAN\_CNN) in terms of the time consumption of training and testing.

As shown in Table 8, time spent for training and testing by GAN\_LSTM is significantly larger than GAN\_CNN and GAN\_LSTM\_CNN, even though its number of trainable parameters is the fewest. Moreover, we can found that our proposed GAN\_LSTM\_CNN achieves higher time consumption during training and testing processes than GAN\_CNN. This is because the trainable parameters in GAN\_LSTM\_CNN are almost 22 times as much as in GAN\_CNN.

**Table 8.** Comparison of Time Efficiency.


#### **7. Discussions**

In this study, we try to investigate taxi demands and related factors extracted from multi-source data. A generative adversarial network (GAN) is adopted to forecast taxipassenger demand by considering significant factors, including temporal, spatial, and external factors. Using GANs, the experiments showed that overall performance is very good compared to similar methods.

Comparing this study to the study by [2,5], we highlight several similarities and dissimilarities. Regarding their methodology and results, in particular, this study has several differences and innovations.

Firstly, a comprehensive experiment was conducted by GANs using multi-source data collected for the city of Wuhan in China, which covers all variables associated with taxi demand. The obtained variables include time, Is rush-time, Is weekend, Is holiday, Region, temperate, Weather, Wind, Humidity, and POI counts. In contrast, key variables were not included in the methods of Kuang Li et al.'s study [2] and Jun Xu et al.'s study [5], such as the POI counts, Is rush-time, Is weekend, Is holiday, Region, temperate, etc. These variables undoubtedly play a vital role in forecasting the taxi-passenger demand, and some of these variables were significant according to the results of this study. Not surprising that these factors are considered significant in forecasting taxi-passenger demand. It conforms to the findings of a previous study [16,18,29].

Secondly, Kuang Li et al.'s study [2] adopted the Convolutional Neural Network (CNN) to study contributing factors to taxi demand, but this study adopted a generative adversarial networks model. Three kinds of GANs have been adopted, namely the GAN\_LSTM\_CNN model, GAN\_LSTM model, and GAN\_CNN model. The results comparison indicated that the GAN\_LSTM\_CNN model outperformed the GAN\_LSTM model and the GAN\_CNN model in MAE, RMSE, and sMAPE introduced in Figures 18–20. It shows that the GAN\_LSTM\_CNN model can present a more comprehensive view of the impact of significant factors on taxi demands.

Moreover, this study presents an analysis of the Training Loss Function among the GAN\_LSTM\_CNN model, GAN\_LSTM model, and GAN\_CNN model. It is important for considering computing consumed time during the Training process. The discriminator in GAN\_LSTM\_CNN shows that the model may learn better and faster, which can improve the efficiency and effectiveness of taxi demand prediction.

Finally, although the training time consumed by GAN\_LSTM is considerably higher than those in GAN\_CNN and GAN\_LSTM\_CNN, even though the trainable parameters are the lowest number. In addition, GAN\_LSTM\_CNN results in higher time consumption during training and testing processes than in GAN\_CNN. This is reasonable as the training parameters in GAN\_LSTM\_CNN are 22 times bigger than those in GAN\_CNN.

The findings of this study indicate that our hypothesis is correct. The results highlighted the vital role which deep learning-based methods can play in improving the accuracy of taxi demand forecasting. And this can help policy-makers regulating the taxi industry to enhance the temporal taxi supply during specific periods, which in turn can increase taxi profit.

Although the used dataset contains various vital variables, there are three limitations found during this study. The first is that more datasets can be collected for a more extended period, such as a year or two years. It may enhance the results and may provide a more accurate understating of taxi demands. The second limitation is the dataset used in this study is a historical dataset, not a real-time dataset. A real-time dataset may help in real-time traffic situations, where some areas have great demands and drivers compete for getting passengers in another area of the city. Thirdly, as the dataset of the whole city are huge, and beyond the capability of computers' resources, we investigated three specific areas. If we have a considerable computer resource for dealing with such big data, a more comprehensive understanding of taxi demands would be obtained, which may help enhance taxi demands prediction.

#### **8. Limitations**

Although the used dataset contains various vital variables, there are three limitations found during this study. The first is that more datasets can be collected for a longer period such as a year or two years. This may enhance the results and may provide a more accurate understating of taxi demands. The second limitation is the dataset used in this study is a historical dataset, not a real-time dataset. Real-time dataset may help in real-time traffic situations, where some areas have large demands and drivers are competing with each other for getting passengers in another area of the city. Thirdly, as the dataset of the whole city are huge and beyond the capability of computers' resources, we investigated three specific areas. If we have a huge computer resource for dealing with such big data, a more comprehensive understanding of taxi demands would be obtained which in turn may help in taxi demands prediction.

#### **9. Conclusions**

Accurate taxi demand forecasting can solve the traffic congestion problem caused by the supply-demand imbalance. Although many methods have been successfully employed to address the taxi demand prediction problem, most existing methods may not comprehensively consider various factors that influence the forecasting results. To fill the gap, we propose a deep learning-based model for forecasting taxi demands in the urban area by considering multi-source data.

Various factors have been considered, including trips factors, temporal factors, spatial factors, weather conditions, and POI. Firstly, in the proposed model, significant factors are extracted from raw data and then analyzed to understand the influences of these factors on taxi demands. Pick-up locations of Taxi trips are derived from taxi GPS trajectory, combined with temporal factors, weather conditions, POI data, and road network data. All information is then integrated to explore the travel pattern of taxi demand and its related influences.

Secondly, the extracted factors are prepared for use in the forecasting model. Normalization, transformation, and concatenation were employed. Thirdly, the Generative Adversarial Networks (GANs) structure is introduced, followed by a training process setting. The convolutional recurrent neural network model. In the proposed model, the LSTM network is adopted as the generator according to its stability. The convolutional neural network (CNN) is employed to distinguish whether the input data is real or generated by the generator. Finally, comprehensive experiments are performed on real-world datasets. The proposed model can automatically learn various characteristics to understand spatiotemporal patterns and enhance forecasting performance. For proving the xxx of predictive accuracy, the proposed model is validated and compared with several benchmark algorithms, including the ARIMA, XBoost, MLP, GAN\_LSTM, and GAN\_CNN on the real-world data of Wuhan city.

The results show that our model outperforms the other prediction approaches in the measurements of MAPE, RMSE, MAE, and time-consuming.

The evidence from findings proves that considering a deep learning-based approach and considering spatial-temporal, weather, road network correlations in models can significantly improve predictive accuracy. The results can assist policymakers in regulating the taxi industry to enhance the temporal taxi supply and the taxi lease rents (which vary by shift) for specific periods, which may improve passengers' degree of satisfaction and improve the transportation capacities of the taxi industry cities.

For future work, there is a need to investigate the impact of other similar types of traffic demands such as online car-hailing services (for instance, didi and ober), which lead the taxi demands to be reduced. Moreover, the spatial correlation between city areas would be considered and then forecast taxi demand by graph-based deep neural networks.

**Author Contributions:** H.A.H.N. designed, developed the methodology, collected and analyzed the data; Q.X. supervised the work and provided analysis tools; H.A.H.N. wrote the paper; H.Z. performed data curation and visualization; T.L. project administration. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by Foundation of Nanyang Institute of Technology (grant no.: 510102), Projects of Henan Provincial Department of Science and Technology (no. 212102310297).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks**

**Mohammed Al-Sarem 1,\*, Faisal Saeed 1,2,\*, Zeyad Ghaleb Al-Mekhlafi 3,\*, Badiea Abdulkarem Mohammed 3, Mohammed Hadwan 4,5, Tawfik Al-Hadhrami 6, Mohammad T. Alshammari 3, Abdulrahman Alreshidi <sup>3</sup> and Talal Sarheed Alshammari <sup>3</sup>**


**Abstract:** The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user's attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after preprocessing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyperparameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.

**Keywords:** ANOVA-test; clickbait news; feature selection; social network

**Citation:** Al-Sarem, M.; Saeed, F.; Al-Mekhlafi, Z.G.; Mohammed, B.A.; Hadwan, M.; Al-Hadhrami, T.; Alshammari, M.T.; Alreshidi, A.; Alshammari, T.S. An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks. *Appl. Sci.* **2021**, *11*, 9487. https://doi.org/10.3390/app11209487

Academic Editors: Giovanni Randazzo, Anselme Muzirafuti and Dimitrios S. Paraforos

Received: 10 September 2021 Accepted: 6 October 2021 Published: 13 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Currently, social networks have become the main environment for communicating, sharing, and posting news on the Internet. Twitter, Facebook, and Instagram are the main social networks that are used to share our opinions and news. With this development, a huge amount of textual data are posted on these media, which increasingly become difficult to process manually. Although the social networks provide an easy way to express our opinions, this platform also can be used to share misinformation in the form of news and advertisements. This is a very serious issue, because this misinformation has the power to influence individuals and sway their opinions. Therefore, finding a way to protect users of social networks from the spread of this misinformation and develop a reliable mechanism to detect it is very important. This misinformation can take the form of clickbait, which aims at enticing the users into clicking a link to news items or advertisements, whose titles (headlines) do not completely reflect the inside contents. According to Chen et al. [1], clickbait is defined as "Content whose main purpose is to attract attention and encourage visitors to click on a link to a particular web page".

The automatic detection of clickbait headlines from the huge volume of news on social networks has become a difficult research issue in the field of data science. Some previous efforts have utilized machine learning to detect clickbait headlines automatically. For instance, Biyani et al. [2] applied Gradient Boosted Decision Trees (GBDT) on a dataset drawn from news sites such as Huffington Post, New York Times, CBS, Associated Press and Forbes. The dataset contains 1349 clickbait and 2724 non-clickbait webpages. The best results achieved were an F1-score of 61.9% with five-fold cross-validation for the clickbait class and an F1-score of 84.6% for the non-clickbait category. Potthast et al. [3] applied linear regression, Naïve Bayes, and random forest methods on a dataset gathered from Twitter. The dataset contained 2992 data points. The results recorded were relatively close, with an approximate precision of 75%.

Chakraborty et al. [4] built a browser extension that used support vector machine (SVM), decision tree, and random forest to automatically detect the clickbait headlines. For training purpose, they collected a well-balanced dataset which contains 30,000 headlines (clickbait and non-clickbait) from ViralStories, Upworthy, BuzzFeed, Wikinews, Scoopwhoop, and ViralNova. In addition, for each data point in the dataset, they extracted sentence structure, clickbait language, word patterns, and n-gram features. The results they achieved are as follows: SVM: an accuracy rate of 93% with 95% precision, 90% recall, 93% F1-score, and 97% ROC-AUC values; Decision Tree: 90% accuracy rate with 91% precision, 89% recall, 90% F1-score, and 90% ROC-AUC values; Random Forest: 92% accuracy rate, 94% precision, 91% recall, 92% F1-score, and finally; ROC-AUC values of 97% using a combination of all extracted features.

Khater et al. [5] proposed the use of logistic regression and linear SVM. They extracted 28 features from a dataset provided by Bauhaus-Universität Weimar at the time of a clickbait detection challenge. The most commonly extracted features were Bag of Words (BOW), noun extraction, similarity, readability, and formality. The best results achieved were 79% and 78% precision for logistic regression and linear SVM respectively. Since the methods of the first category require extracting and labeling each feature before feeding the data into the machine learning tool, researchers have found that deep learning techniques are useful to overcome the feature engineering phase. For instance, López-Sánchez et al. [6] combined metric learning with a CNN deep learning algorithm by integrating them with case-based reasoning methodology. For feature selection, they used TF-IDF, n-gram, and 300 dimensional Word2Vect using the dataset provided by [4]. The proposed approach achieved average areas of 99.4%, 95%, and 90% under the ROC curve using Word2vec, TF-IDF, and n-gram count. Agrawal [7] also used a CNN model to classify a manually constructed news corpus obtained from Reddit, Facebook, and Twitter social networks into clickbait and non-clickbait. As feature selection methods, they used Click-Word2vec and Click-scratch. The highest results that they achieved were 89% accuracy with 87% ROC-AUC score for Click-scratch features and 90% when the Click-Word2vec was used. Kaur et al. [8] also proposed a hybrid model where a CNN model is combined with LSTM. They found that the CNN-LSTM model when implemented with pre-trained GloVe embedding yields the best results, based on accuracy, recall, precision, and F1-score performance metrics. They also identify eight other types of clickbait headlines: reaction, reasoning, revealing, number, hypothesis/guess, questionable, forward referencing, and shocking/unbelievable. They also found that shocking/unbelievable, hypothesis/guess, and reaction clickbait types to be the most frequently occurring types of clickbait headlines published online.

Although several machine learning approaches have been proposed to detect clickbait headlines, most of these recent methods are not very robust. The previous studies used hybrid categorization techniques such as Gradient Boosted Decision Trees, linear regression, Naïve Bayes and random forest methods, SVM, decision tree, logistic regression, and convolutional neural network deep learning. Most of these studies used datasets with headlines written in English. However, this paper uses an Arabic language dataset and proposes a comprehensive approach that includes three main phases: data collection, data preparation, and machine learning model training and testing phases. This dataset was pre-processed and then the most important features were selected using ANOVA F-test. Several machine learning methods were then applied which include random forest (RF), stochastic gradient descent (SGD), Support Vector Machine (SVM), logistic regression (LR), multinomial Naïve Bayes (NB), and k-nearest neighbor (k-NN). Hyper-parameter tuning methods were applied to ensure finding the optimal settings. Finally, the ML models were evaluated and the overall performance is reported here. The key contributions of this paper are as follows:


#### **2. Related Works**

#### *2.1. Characteristics of Clickbait News*

Biyani et al. [2] define eight types of clickbait, which include exaggeration, teasing, inflammatory, formatting, graphic, bait-and-switch, ambiguous, and wrong. In exaggeration, the title overdraws the content on the target page. Teasing means hiding the details from the title to build more suspense. In the inflammatory type, inappropriate or vulgar words are phrased. Formatting means overusing the capitalization/punctuation in the headlines, for instance ALL CAPS or exclamation points are used. In graphic types, the subject matter is disturbing or unbelievable. Bait-and-switch means the news included in the title is not found at the target page. Ambiguous means the title is unclear or confusing, while wrong means using a plainly incorrect article. Kaur et al. [8] also identify eight other types of clickbait headlines: reaction, reasoning, revealing, number, hypothesis/guess, questionable, forward referencing, and shocking/unbelievable. They also found that shocking/unbelievable, hypothesis/guess, and reaction clickbait types to be the most frequently occurring types of clickbait headlines published online.

According to Zheng et al. [9], different ways of attracting users' attention are used by the headlines of different article types, which means the characteristics of clickbait vary between article types. This is different from traditional text-analysis issues. For instance, the headlines of forums or blogs are more colloquial than the headlines of other traditional

news. The main difference between these two types of headlines is the use of functional linguistic characteristics such as wondering, exaggerating, and questioning. In [9], two types of characteristics were used: general clickbait, and the type-related characteristics, while the main characteristics used by Naeem et al. [10] for detection of clickbait were sensationalism, mystery, notions of curiosity, and shock.

In another approach, Potthast et al. [3] used three types of features for clickbait headlines, which are: the teaser message, the linked web page, and meta information. The first type includes basic text statistics and dictionary features, while the second type analyses the web pages linked from a tweet, and the third type includes meta information about the tweet's sender, medium, and time.

Bazaco, Redondo, and Sánchez-García [11] describe the characteristics of clickbait using six variables under two categories: presentation variables and content variables. The first category includes incomplete information, appealing expressions, repetition and serialisation, and exaggeration, while the second type includes the use of soft news and sensationalist content and striking audiovisual elements. According to [1], the characteristics of curiosity used in clickbait are: its intensity, tendency to disappoint, transience, and association with impulsivity. These lead to a knowledge gap that are exploited by the clickbait headlines to encourage readers to click through to read the whole article.

#### *2.2. Machine Learning and Deep Learning Methods for Clickbait Detection*

Several machine learning and deep learning methods have been applied to detect clickbait headlines from different social networks, including Twitter, Facebook, Instagram, Reddit, and others. Table 1 summarizes recent studies on clickbait detection methods. The results in the table show that the performance of machine learning methods still needs to be improved. In the best cases, the highest accuracy obtained reached 0.87 by [12]. In contrast, the use of deep learning showed a good improvement in performance, where the accuracy obtained by [13] reached 0.97. Most of the existing studies used headlines written in English or other languages. Only a few studies focused on clickbait headlines in Arabic. Although Arabic and English scripts have some similarities, there are a number of characteristics that specify the uniqueness of Arabic script. These include: the direction of Arabic, which is written from right to left, and the fact that neither upper nor lower cases exist in Arabic, which is written cursively. In Arabic, all letters are connected from both sides, except six letters that can be connected from the right side only. Each of the 28 letters of Arabic script has different shapes, depending on its position in the word, and some letters are very similar, differing only in the number and/or the position of dots [14,15]. In addition, there are other special features which are unique to Arabic script such as elongation, morphological characteristics, word meters, and morphemes [16].

**Table 1.** Summary of recent studies on clickbait detection methods.




To address the lack of study of clickbait detection in Arabic texts, this paper focuses on improving the performance of machine learning methods for detecting clickbait headlines on social networks in the Arabic language.

#### *2.3. Problem Formulation for Clickbait Detection*

The clickbait detection problem is a subset of natural language processing that can be represented as a binary classification as follows:

Given a set of shared posts via social networking platforms (tweets) *T* = {*t*1, *t*2, . . ., *tn*}, let *<sup>t</sup>* <sup>∈</sup> *<sup>T</sup>* denote a post that is classified into a class <sup>C</sup> <sup>=</sup> {C+, <sup>C</sup>−} where <sup>C</sup><sup>+</sup> is a class of the tweets *ti* <sup>∈</sup> *<sup>T</sup>* that are considered as legitimate news, and <sup>C</sup><sup>−</sup> is the class of the clickbait news *tj* <sup>∈</sup>/ <sup>C</sup>+.

To solve the problem, let *<sup>D</sup>* be a dataset of all posts *<sup>D</sup>* <sup>=</sup> {*V*1, *<sup>V</sup>*2, <sup>C</sup>} where *<sup>V</sup>*<sup>1</sup> <sup>=</sup> *v*1 <sup>1</sup>, *<sup>v</sup>*<sup>1</sup> <sup>2</sup>, *<sup>v</sup>*<sup>1</sup> <sup>3</sup>, ... , *<sup>v</sup>*<sup>1</sup> *n*  a vector of extracted features from user portfolio (user-based features (UFs)) and *V*2 = *v*2 <sup>1</sup>, *<sup>v</sup>*<sup>2</sup> <sup>2</sup>, *<sup>v</sup>*<sup>2</sup> <sup>3</sup>, ... , *<sup>v</sup>*<sup>2</sup> *n*  is a vector of extracted features from the post/tweet content (content-based features (CFs)). Let also *v*<sup>1</sup> *<sup>i</sup>* and *<sup>v</sup>*<sup>2</sup> *<sup>i</sup>* be the points of a specific feature *I* and *v*<sup>1</sup> *<sup>i</sup>* ∈ *<sup>V</sup>*1 and *<sup>v</sup>*<sup>2</sup> *<sup>i</sup>* ∈ *V*2.

Let *D* be a training set and *D* be a testing set, where *D* and *D* ∈ *D*. Let *ξ* be a function that generates *I* from *D* and *D* based on the feature space *V* : *ξ* : *T* × *V* → *I*. As the vector space can be high-dimensional, the clickbait detection problem is now formulated as follows:

Let *<sup>χ</sup>* be a function that maps post *ti* <sup>∈</sup> *<sup>T</sup>* to <sup>C</sup> <sup>=</sup> {C+, <sup>C</sup>−}, *<sup>C</sup>*: *<sup>χ</sup>*: *<sup>T</sup>* <sup>→</sup> *<sup>C</sup>*, where *<sup>C</sup>* <sup>=</sup> ԧ, *<sup>r</sup>* and *r* is a binary relation which takes value 1 if a post *ti* ∈ *T* is a legitimate post and *ti* ∈ ԧ+, and 0 otherwise.

The function *χ* can now be set as an optimization problem as follows:

optimize *f* χ(V1, V2) subject to *c*(*V*1, *V*2) where *c* is a constraint set on the search space.

#### **3. Materials and Methods**

The proposed multiple-feature-based approach for detecting clickbait news is presented in this section. Since the difference between clickbait and normal news can be distinguished directly by analysis of the linguistic character of news content [20], the proposed approach takes into consideration both the headlines and the content of the news features (CFs). In addition, to overcome the limitations of such approach, they are combined with news content features.

Figure 1 presents the methodology followed in this study, which consists of the following phases: data collection, data preparation, and machine learning model training and testing. For detecting clickbait news on social networks, both of the investigated news and profile of the user who shared the post are collected. We first constructed a baseline dataset from the raw dataset by labelling the news as clickbait or legitimate. Since the amount of collected data was huge and for building a sufficiently satisfactory dataset, we used a pseudo labelling learning (PLL) technique [21]. In the next phase, both of the news headlines and contents are pre-processed, including text cleansing, normalization, stemming, stop word removal, and tokenization. These steps are necessary to enhance the overall performance of the ML-based model. We concatenated the processed text with user-based features and then applied the feature reduction using a one-way ANOVA test. The selected features were fed to the ML model. A set of ML models was tested, and their hyper-parameters were tuned to ensure finding the optimal settings. Finally, the ML model was evaluated, and the overall performance reported.

**Figure 1.** The proposed multiple feature approach for detection of clickbait news.

#### *3.1. Data Collection*

We collected 72,321 Arabic news items from Twitter. The dataset can be obtained from github.com (https://github.com/Moh-Sarem/Clickbait-Headlines#clickbait-headlines) (accessed on 1 October 2021). For this purpose, we implemented a special crawler that can access breaking news on social networks by feeding the name of the public breaking news agencies. Often, Twitter APIs return tweets in JSON format. However, because many features are not helpful for the proposed model, the used crawler filters out and saves all the collected information from user profile and shared content in comma-separated values (CSV) format. The details of the collection process through multiple feature analysis are shown in Algorithm 1. In addition, the full description of the features used is presented in Tables 2 and 3.

**Algorithm 1** Pseudocode of dataset collection process for extracting UFs and CFs

```
Input: A list of public Twitter breaking news agencies' profiles N
Output: Unlabelled dataset with UFs and CFs
For each profile p ∈ N do:
Access public page of p
Retrieve all shared tweets tp
Pull out using Twitter APIs tweet's features (USs)
If tp contains an external URL Then:
Visit the external webpage pe
For all html tags in pe do:
Find html tag that contains news full text (CFs)Compute similarity score between tp and pe
End
End if
Store the extracted features in csv format
End
```


**Table 3.** Content-based features.


#### *3.2. Data Annotation*

Once we obtained the final dataset by using the implemented crawler, we prepared a baseline dataset from the retrieved dataset. Every shared tweet was labelled as a clickbait or legitimate by asking three media professionals to volunteer in judging 12,321 tweets and their associated news. They were asked to access the external webpage by following the URL link provided with tweet and comparing the tweet's body and headline with the full text in the destination webpage. To facilitate this job, we provided them with examples showing what clickbait news looks like. Table 4 shows a guideline for how to classify the content of the shared tweets.


**Table 4.** Example of clickbait news.

As shown in Table 4, there are seven categories that the volunteers could use to label each post as clickbait news. In case of unclearness or doubt about which class the post belongs to, the post is labelled as "incomplete". Every content text in the baseline dataset has three labels, one provided by each annotator. To assign the final class label, we applied the majority voting algorithm and labelled the content as clickbait or legitimate news. Table 5 shows the details of the baseline dataset, which includes 4325 items of clickbait news and 6743 legitimate items. The news items that are labelled as incomplete were later removed from the dataset. The remaining baseline dataset contained 11,068.

**Table 5.** Details of baseline dataset.


As the size of our final baseline dataset was quite small (17% of the original dataset), we decided to apply a pseudo-labelling learning technique to enhance the performance of the ML model. PLL is an efficient semi-supervised technique that can be applied to utilize unlabeled data while training ML models. As shown in Figure 1, the ML model is trained first on the labeled data (in this case: the baseline dataset). The model then predicts the labels of unlabeled data. The predicted pseudo-labels are assigned as target classes for unlabeled data and combined with the original baseline dataset (labeled data). Finally, the produced new dataset is then used to train the proposed ML-models. After applying PLL

technique, the size of the labeled dataset was increased to around 54893 instances. Table 6 shows the details of the final dataset after applying the PLL technique on 71.54% of the remaining unlabeled data.

**Table 6.** Final dataset after applying PLL technique.


#### *3.3. Pre-Processing and Numeric Representation*

Beside the UFs and CFs described above in Tables 2 and 3, the "headline" *CF*3, "tweet text" *CF*4, and "body text" *CF*<sup>5</sup> features from CFs required additional treatment.

#### 3.3.1. Pre-Processing

For many text classification systems, pre-processing is considered as an essential step to improve the quality of data as well as the efficiency and accuracy of ML models [22,23]. The common pre-processing steps include text cleansing, tokenization, removing stop words, stemming, and normalization. Since the obtained data is pulled out from Twitter and by accessing the external web pages following the URL links associated with the body of the tweets, additional pre-processing steps were performed, such as deletion of unnecessary, insignificant items from texts (e.g., digits, punctuation marks, URLs, special characters, non-Arabic characters, diacritics), and removal of emojis and hashtags.

#### 3.3.2. Numeric Representation

By numeric representation, we mean converting the textual content into a form that could be fed into the ML model in treatable format. In this work, the term frequencyinverse document frequency (TF-IDF) is used as a numeric representation. Mathematically, the TF-IDF can be calculated as in Equations (1)–(3):

$$tf\\_idf\_{t,D} = TF\_{t,D} \times IDF\_{t} \tag{1}$$

where

$$TF\_{t,D} = \frac{\text{Number Of Repetition of Term t In a Document D}}{\text{\# Of terms In a Document}} \tag{2}$$

and

$$\text{IDF}\_l = \log \frac{\text{Number OfDocuments}}{\text{Number OfDocuments Continuing The term} \,\text{t}} \tag{3}$$

After applying the TF-IDF technique on the final dataset, the training time of ML models was long because of high dimensionality, where the number of extracted features reached 10,230.

#### *3.4. Feature Selection*

Feature selection (FS) is an effective way to reduce large data [23]. The main purpose of FS is to delete irrelevant and noisy data. It also enables a representative subset of all data to be chosen to minimize the complexity of the classification process. Several FS techniques can be found in the literature. These include: Mutual Information (MI), Information Gain (IG), improved Chi-square, and the one-way ANOVA F-test [24] (referred to, hereafter as FV-ANOVA). This paper proposes to use FV-ANOVA as a feature selection method that is used to statistically select the important features according to the F-values. The features are sorted in ascending order, where the most relevant features appear on the top. Finding the best cut-point value is a challenge. Thus, we divided the features into a set of groups based on a given percentile (*p*%) of the original number of features. This step allows us to find the top-scoring features. Later, only the p% top-scoring features were used to train the ML classifiers. The process of selecting features for FV-ANOVA is presented in Algorithm 2.

**Algorithm 2** Pseudocode for selecting features-based FV-ANOVA method.

**Input**: *D*-dataset, *V* features extracted as numeric representation by TF-IDF, *C*-class label and *p*% percentile. **Output**: *DFS* subset of top-scoring features based on the given *p*% *k* ← number of classes in *D N* ← number of features in *D* **For each** pair *fj* ∈ (*V*, *C*) **do**: Count number of samples per class Compute (mean, standard deviation, standard error) of each *fj* with respect to *Ci* Compute degree of freedom between/within classes (*SSB*, *SSw*) Compute sum of square of (*SSB*, *SSw*) Find mean square *MSB* between groups as *MSB* = *SSB* / (*k* − 1) Find mean square *MSW* between groups as *MSW* = *SSW* / (*N* − *k*) **End for** *<sup>F</sup>*\_*value* <sup>←</sup> *MSB MSW* Sort *F*\_*value* in ascending order *DFS* ← Select the top-scoring features based on *p*%

**Return** *DFS*

#### *3.5. Feature Selection*

Six ML classifiers were implemented: Random Forest (RF), Logistic Regression with Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Naïve Bayes (NB), and k-Nearest Neighbor (k-NN). To explore the effectiveness of the proposed feature selection method, we conducted different experiments and employed these classifiers on different subsets of features based on F-values.

For tuning hyper-parameters of the used ML classifiers, the grid search algorithm with k-fold cross-validation is used. Subsequently, the values of hyper-parameters that yield the highest performance measure are set to be the final values of hyper-parameters for each classifier. The set of values of hyper-parameters used in this work is presented in Table 7.

**Table 7.** List of optimized hyper-parameters of each classifier.


#### *3.6. Model Evaluation*

To evaluate the performance of classifiers, we computed the accuracy (*Acc*), recall (*R*), precision (*P*), and f1-score (*F*1) metric of each classifier with those features that were selected by the proposed F-values of the one-way ANOVA test. The descriptions of these metrics are shown in Equations (4)–(7) respectively.

$$Acc. = \frac{TP + TN}{D} \tag{4}$$

where (*TP* + *TN*) is the accurately predicted content either clickbait or not, *D* is the total number of samples in the dataset.

$$P = \frac{TP}{TP + FP} \tag{5}$$

where (*TP* + *FP*) is the total number of predicted clickbait content.

$$R = \frac{TP}{TP + FN} \tag{6}$$

where (*TP* + *FN*) is the total number of actual clickbait content.

$$F1 = 2 \times \frac{P \times R}{P + R} \tag{7}$$

#### **4. Experimental Design**

The experiments in this study were performed on Python 3.8 with Windows 10 operating system. We used numerous Python packages including sklearn 0.22.2 for implementing the classifiers, nltk 3.6.2 for pre-processing Arabic text and Beautiful soup 4.9.0 for scraping data from external web pages. The user-based features and content-based features were fed into classifiers separately. Later, we merged both types and measured the performance of ML classifiers based on top-scoring features p% that were selected based on f-values of oneway ANOVA. For ensuring fair comparison between classifiers, the same pre-processing steps and the same set of features were used for each classifier. In addition, we considered four experimental scenarios per feature type, as illustrated in Table 8.

**Table 8.** Number of features per each experiment.


#### **5. Results and Findings**

This section describes and discusses the results for each experiment shown in Table 8. First, we present the findings that were obtained when only the user-based features were used. The accuracy of each classifier is presented in Table 9. The second type of features, content-based features, were then investigated, as shown in Table 10. Finally, we combined both types of features and the performance of classifiers is presented in Table 11.


**Table 9.** Accuracy of different experiments with user-based features only.

**Table 10.** Accuracy of different experiments with content-based features only.


**Table 11.** Accuracy of different experiments with combination of UFs and CFs.


Based on the results presented in Table 9–11, the following findings are observed and can be summarized as follows:


#### **6. Conclusions**

This paper has proposed a comprehensive approach that includes three main phases: data collection, data preparation, and machine learning modeling phases. After collecting the dataset, which is considered the first Arabic clickbait headline news dataset, the preprocessing tasks were performed, which included text cleansing, normalization, stemming, stop words removal, and tokenization. The features of the processed text (content-based features) were then combined with the user-based features and the feature selection was then applied using one-way ANOVA test. Finally, the ML models were applied, which included Random Forest (RF), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Naïve Bayes (NB), and K-nearest Neighbor (k-NN). Hyper-parameter tuning methods were applied to ensure finding the optimal settings. The experimental results showed a great enhancement when the CFs were used and also when a combination of UFs and CFs was used. The accuracy achieved reached 92.12% using 10% of the top-scoring features, which is better than that reported in many previous studies (discussed in the related works). This enhancement is particularly interesting, as we are dealing with Arabic contents. Future work will investigate the application of several deep learning methods on this Arabic dataset in order to enhance the detection performance. Moreover, collecting more Arabic content to add to the dataset will be a beneficial addition to conducting the analysis.

**Author Contributions:** Conceptualization, M.A.-S., F.S., T.A.-H.; methodology, M.A.-S., F.S.; software, M.A.-S.; validation, T.A.-H., A.A.; formal analysis, F.S., M.H.; investigation, M.T.A., T.S.A.; resources, Z.G.A.-M., B.A.M., M.H., A.A., T.S.A.; data curation, B.A.M.; writing—original draft preparation, M.A.-S., F.S.; writing—review and editing, M.A.-S., F.S.; visualization, F.S.; supervision, M.A.-S., F.S.; project administration, M.A.-S., Z.G.A.-M.; funding acquisition, Z.G.A.-M., B.A.M., M.T.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research has been funded by the Scientific Research Deanship at the University of Ha'il, Saudi Arabia, through project number RG-20 023.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The dataset can be obtained from https://github.com/Moh-Sarem/ Clickbait-Headlines#clickbait-headlines (accessed on 1 October 2021).

**Acknowledgments:** We would like to acknowledge the Scientific Research Deanship at the University of Ha'il, Saudi Arabia, for funding this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **An Investigation of the Policies and Crucial Sectors of Smart Cities Based on IoT Application**

**Armin Razmjoo 1,\*, Amirhossein Gandomi 2,\*, Maral Mahlooji 3, Davide Astiaso Garcia 4, Seyedali Mirjalili 5,6, Alireza Rezvani 7, Sahar Ahmadzadeh <sup>8</sup> and Saim Memon 9,10**

	- Via Gramsci 53, 00197 Rome, Italy; davide.astiasogarcia@uniroma1.it
	- Engineering, London South Bank University, 103 Borough Road, London SE1 0AA, UK

**Abstract:** As smart cities (SCs) emerge, the Internet of Things (IoT) is able to simplify more sophisticated and ubiquitous applications employed within these cities. In this regard, we investigate seven predominant sectors including the environment, public transport, utilities, street lighting, waste management, public safety, and smart parking that have a great effect on SC development. Our findings show that for the environment sector, cleaner air and water systems connected to IoT-driven sensors are used to detect the amount of CO2, sulfur oxides, and nitrogen to monitor air quality and to detect water leakage and pH levels. For public transport, IoT systems help traffic management and prevent train delays, for the utilities sector IoT systems are used for reducing overall bills and related costs as well as electricity consumption management. For the street-lighting sector, IoT systems are used for better control of streetlamps and saving energy associated with urban street lighting. For waste management, IoT systems for waste collection and gathering of data regarding the level of waste in the container are effective. In addition, for public safety these systems are important in order to prevent vehicle theft and smartphone loss and to enhance public safety. Finally, IoT systems are effective in reducing congestion in cities and helping drivers to find vacant parking spots using intelligent smart parking.

**Keywords:** smart cities; Internet of Things (IoT); strategy; monitoring

#### **1. Introduction**

While old, crowded cities are under pressure from many issues such as population explosion and improper infrastructure, the rise of smart cities (SCs) has provided a good solution for solving many of the existing problems and overcoming urban challenges [1]. Therefore, the rapid development of SCs in recent years reflects their importance [2]. In fact, SCs with greater opportunities for citizens and proper services are becoming an

**Citation:** Razmjoo, A.; Gandomi, A.; Mahlooji, M.; Astiaso Garcia, D.; Mirjalili, S.; Rezvani, A.; Ahmadzadeh, S.; Memon, S. An Investigation of the Policies and Crucial Sectors of Smart Cities Based on IoT Application. *Appl. Sci.* **2022**, *12*, 2672. https://doi.org/10.3390/ app12052672

Academic Editors: Dimitrios S. Paraforos, Anselme Muzirafuti, Giovanni Randazzo and Stefania Lanza

Received: 13 January 2022 Accepted: 1 March 2022 Published: 4 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

attractive choice for people and communities, and they can be a place for fostering success in health and businesses across the world with the help of smart infrastructure [3]. On the other hand, achieving SC development requires detailed and specific planning and the proper implementation and establishment of policies. Thus, identifying obstacles can grant us a deeper understanding of how to determine the best solutions with less difficulty [4]. Since, in developing SCs, we are faced with numerous barriers and problems in various areas, such as roadways, environment, utilities, parking, public safety, waste management, and public transport, it is pertinent to enhance these sectors through accurate investigation and practical actions [5]. To overcome such barriers, SC governments must implement appropriate strategies and present proper solutions to mitigate or eliminate these barriers [6]. In this regard, the role of the Internet of Things (IoT) and the utilization of these systems is an essential and beneficial strategy to appropriately develop SCs [7]. In fact, with the appearance of new technologies such as the IoT, the concept of SCs has changed and continues to evolve for the better, subsequently improving and accelerating urban management across various sectors [8]. This means that the utilization of the IoT leads to the development of smart cities [9]. In recent years, many types of research have been conducted on SC development [10]. For instance, the importance of the IoT for SC development was also reviewed by Badis Hammi et al., who demonstrated that a higher level of interaction between SCs and IoT development is essential, as it can integrate electronic devices. However, the safety risks and privacy issues of participating individuals, companies, and organizations should be considered carefully in such cases [11]. Ejaz et al., investigated efficient energy management for the IoT in the context of SCs and observed that such management is a key paradigm to monitoring complex energy systems. They also showed that efficient energy management can support wireless energy transfer for IoT devices and energy-efficient planning in smart homes [12]. Tanveer et al. investigated the growth of the IoT markets across energy systems of SCs. Regarding the importance of smart grid technology innovations in supporting smart energy systems in SCs, the study showed that investment in these systems has increased in recent years. Based on the literature, the IoT of the global energy market exceeded USD 6.8 billion in 2015 and is anticipated to reach USD 26.5 billion by 2023, which portrays an annual growth rate of 15.5% between 2016 and 2023 [13]. Bresciani et al., investigated the IoT in terms of organizations, in order to innovate and implement it in everyday business activities. The results from the 43 IoT SC project alliances across Italian cities they investigated demonstrated that multinational enterprises are building alliances for exploring new technologies for cities as well as exploiting new IoT-based devices to gain economic profit. The study proved that for companies to achieve the desired results, they must integrate different types of knowledge to ensure efficient management and effective support [14]. Evertzen et al., analyzed the effects of smart governance on the quality of life in SCs in the three well-known cities of Palo Alto, Nice, and Stockholm. This research emphasized the importance of innovative approaches across SCs, which should be implemented based on the IoT, and consequently, many services should be promptly digitalized. Therefore, in order to achieve these goals and successfully implement an SC model, strong leadership, citizen involvement, and business collaboration are required [15]. With regard to the importance of transportation systems in SCs, the prospect of handling considerable information using sensor data from the environment for better monitoring of transport systems in SCs was examined by AlZubi et al., as the time data extracted from sensors is important; the researchers presented a responder-dependent add-on information fusion scheme concerning sensor data. This guided vehicle scheme can observe the responding sensor information in order to determine the success of the goal endorsed. This scheme, which is based on classification machine learning, can help us identify and subsequently reduce the errors caused by sensor information [16]. In light of the importance of the IoT in the development of smart cities, this article examines the problems and solutions of seven key sectors that have a significant impact on SC development, including the environment, public transportation, utilities, street lighting, waste management, public safety, and smart parking. We also considered certain important

cities in the EU (Paris, London, Copenhagen, Barcelona, Amsterdam, and Oslo) and in the United States (Boston, New York, and San Francisco) based on the relevance of the IoT.

#### **2. Motivation and Objective of the Critical Review**

Creating and developing SCs is an important objective for many countries [17] to enhance the life quality of their population through the optimal management of their resources [18]. In addition, SC development supports global mitigation strategies, especially across the environmental and energy sectors [19]. One of the most important factors for SC development is the IoT [20], which integrates different systems related to energy, transport, and waste and water management within SCs in order to enhance the inhabitants' quality of life [21]. Given that more of the global population resides in urban areas, therefore it can be said that cities are held accountable for the majority of the global energy consumption and greenhouse gas (GHG) emissions [22]. Thus, a reduction in energy use and the maximization of renewable energy use, when available, can support these objectives. The use of the IoT in SCs provides an opportunity to make incremental changes in efficiency by harnessing new technologies and automating processes in applications [23]. It is important to recognize that the innovation, advancement, and implementation of the IoT across SCs have a dynamic impact on many other intertwined systems, including the environment, economy, and transportation. Therefore, it is crucial to create an in-depth understanding of these independencies to ensure that negative impacts are not overlooked and positive impacts are enhanced and used to create an incentive to create changes across cities. The aims of this study include investigating the concept of SCs, identifying the IoT barriers across seven important sectors, and compiling appropriate solutions to tackle each barrier.

#### **3. Methodology**

To identify the potential barriers to IoT development in SC development and, based on the importance of the IoT, we conducted an exhaustive review of more than 400 relevant publications related to the IoT, and have searched in Internet the using established scientific databases, such as Google Scholar, Scopus, Web of Science and Journal sites (Taylor & Francis, Elsevier, MDPI, Springer, Willey, etc.).

In this regard, we searched, in the Internet, words such as smart cities, environment, road traffic, public transport, utilities, smart lighting, public safety, waste management, street lighting, and smart parking. In the first step, between 2019 and 2020, we investigated more than 200 review papers to understand the concepts of the IoT and smart cities. Then, we investigated, in 2021, more than 200 technical papers, and eventually selected 121 papers. After these steps, we categorized the most important papers which helped us to start writing this paper and we selected the methodology. Review articles helped us understand SC development and the IoT technologies that have come under the spotlight within a short period of time. Moreover, technical articles established a deeper understanding of effective policies in SC development relative to the IoT in order to obtain proper solutions to the barriers. Figure 1 shows the flowchart for the methodology of this study. After all the relative papers were collected, the articles were categorized into two groups—review papers and technical papers. We based the methodology on the best of these. In the last step, we determined recommended actions and policies to achieve the goal of the paper.

**Figure 1.** The methodology flowchart.

#### **4. Results and Discussion**

*4.1. Recognizing the Existing Obstacles in the Development of SCs*

As we are faced with various barriers and problems across seven specific sectors in SCs, i.e., environment, public transport, utilities, street lighting, waste management, public safety, and parking, we believe that the utilization and implementation of the IoT will be effective in mitigating or resolving the problems associated with these areas. In the following sections, we comprehensively discuss these problems and the solutions that we obtained from the review articles and scientific research.

#### 4.1.1. Environment

Cleaner air and water systems are crucial elements of the environment [24]; for this, a network of sensors should be used to monitor air [25] and water quality [26]. Specifically, sensors can be used to detect the amount of CO2, sulfur oxides, and nitrogen to monitor air quality and to detect water leakage, pH levels, and changes in the chemical composition of water. Therefore, sensors can be implanted along busy roads, around plants, and near houses, offices, and organizations [27]. Moreover, it is necessary to utilize sensors for detection and monitoring and to obtain data and results [28]. According to the McKinsey Global Institute, emissions can be reduced by 10–15% through applications that focus on building automation, mobility, and dynamic electricity pricing. Thus, SCs can support and

contribute to a cleaner and more sustainable environment [29]. Nowadays, sensors, as well as environmental sensors, have significantly affected lives, as individual environmental sensors obtain data about the environment and then transform that data into electrical signals to feed higher-level systems around the individual sensors. The advantages of these sensors are lower cost, smaller size, and reliability [30].

#### 4.1.2. Public Transport

Considering the safety and efficiency of citizens of SCs is crucial, especially on roads [31]. Therefore, municipalities are attempting to implement smart traffic using IoT development solutions [32]. In this regard, the IoT will play a crucial role in traffic management. For instance, data from various types of sensors and GPS systems are sent by drivers' smartphones in order to determine the speed, number, and locations of vehicles on a particular road. Subsequently, smart traffic lights are immediately connected to a cloudmanagement platform and provide timing information to automatically and accurately monitor green lights, thereby preventing traffic congestion. Additionally, these methods can predict traffic in the future and offer prevention plans, with which the transport administration department is able to detect potentially dangerous situations in time and take required actions to prevent traffic congestion [33]. Therefore, considering the obvious importance of transportation systems in SCs, specific and accurate planning to control these systems is necessary [34]. According to [35], transport technological development with the IoT will have a big revolution between 2020 and 2030, that will have direct impact on toll operators and highways and provide safe and secure networks [35]. In addition, traffic data from multiple sources, such as traffic information and ticket sales, can be used to perform sophisticated analyses and achieve better results, and train operators can maximize the capacity of tracks and easily prevent train delays [36]. Fortunately, many countries around the world, especially developing countries, are now trying to make use of new systems connected to the IoT for controlling their transportation systems [37].

#### 4.1.3. Utilities

IoT-equipped SCs give more control to citizens over their home utilities, reducing overall bills and related costs [38]. By utilizing IoT technologies and effective approaches, such as smart meters for billing, monitoring consumption patterns, and remote monitoring, municipalities can achieve cost-effective connectivity to utility companies' IT systems. This helps customers consume energy and water based on improved monitoring and, therefore, presents better management services to the citizens [39]. Precooling optimization using system data (IoT), while preserving the thermal comfort of the inhabitants, has a direct influence on expenses and energy consumption (electricity costs) for cooling of a building by up to 30% percent, according to an Australian study [40]. Also, other research shows that, in Arabian Gulf countries, a smart energy management system using the Internet of Things can reduce costs, especially for air conditioning, which accounts for up to 60% of electricity consumption, while still meeting energy demand [41]. On the other hand, use of the IoT in utilities has a good effect on attainment of efficiency (management of large-scale solar photovoltaic systems) [42] and conservation of resources [43].

#### 4.1.4. Street Lighting

In SCs, the maintenance and control of streetlamps can be more cost-effective and straightforward through the use of the IoT [44]. In particular, IoT systems can be paired with sensors that connect to a cloud-management solution [45], providing confident monitoring of illuminated transport paths such as streets and the movement of people and vehicles. Measuring the environmental conditions can also allow for a more accurate analysis regarding the need to improve the lighting schedule and indicate if lights should be brighter or dimmer [46]. On the other hand, IoT systems have a remarkable effect on energy-saving associated with urban street lighting as using warmer lights and increasing light uniformity can result in a 30–50% energy saving on street lighting, and for medium-sized cities with

populations around 200,000–400,000 residents, energy savings on street lighting it can reach 8–23 MWh per annum [47].

#### 4.1.5. Waste Management

Waste collection is one of the most important sectors of SCs [48]. In this regard, IoT can reduce a lot of problems in this regard [49]. To achieve this, a sensor will be placed on each waste container, which will gather data regarding the level of waste in the container; then, after the container is filled, a notification will be sent to truck drivers via a mobile app. By following this useful and effective plan, truck drivers will expend time and energy to only empty full containers instead of half-full ones [50]. A study in China of recycling and household waste segregation between 2018 and 2019 showed that integration of the Internet of Things (IoT) was effective in household waste management. During the study, collections of recyclable waste and biodegradable food waste were elevated by 431.8% and 88.8%, respectively, which had good environment effects and meant that this macro policy increased the recyclable waste collection by 431.8% in Shanghai [51].

#### 4.1.6. Public Safety

Theft of motor vehicles throughout the world, coupled with a massive loss of cash, is a disaster for insurance companies. For instance, just in the USA, in 2019 about USD 6.4 billion was lost to motor vehicle theft [52]. Likewise, every year, worldwide, 70 million smartphones are lost or stolen [53]. In these regards, IoT-based SC technologies are vital for offering real-time monitoring, enhancing public safety, and supporting proper decisionmaking, that will prevent a lot of harm to people [54]. For example, testing of the motorcycle antitheft system (MATS) showed that this system had 100% accuracy at speeds of up to 70 km/h and for speeds up to 80, it had 94.4% accurate [55].

#### 4.1.7. Smart Parking

Parking occupies a large amount of the area in a city—81% of the city area in Los Angeles, 23% in Munich, 23% in Paris, 19% in Copenhagen, and 18% in Zurich and Hamburg. Therefore, cities must use of intelligent parking systems in order to reduce congestion and help drivers [56]. In this regard, IoT technology has built a special mobile application in order to solve vehicle parking problems and this has had a remarkable effect for drivers. Based on research, from 2013 to 2018, downloading of the mobile application, increased from 17 million downloads to 80 million, which shows the benefit of this application in solving problems related to parking [57]. Therefore, the importance of smart parking in SCs should be investigated by policymakers [58], considering that finding parking spots can improve the welfare of citizens [59]. This action can be achieved by utilizing GPS data from drivers' smartphones and road-surface sensors embedded in the ground of parking spots. As a result, drivers can be notified of occupied and vacant parking spots via a real-time parking map [60].

#### *4.2. Strategic Policies for Boosting Economic Recovery of Smart Cities through the IoT*

The IoT technology inherent in smart cities, promises effective options that will allow cities to be more safe, inclusive, and resilient [61]. In this regard, the IoT helps cities to improve good governance and privacy which are important for the socio-economic dimensions of urban areas [62]. In addition, the advance of 5G technologies [63] and artificial neural networks (ANNs) will prompt further innovations in smart city technologies of the IoT [64]. In fact, cloud-based IoT applications that contain information gathered from citizens could help smart cities to monitor and manage crime detection, proficiency, water supply systems, healthcare facilities, electric utilities, digital libraries, transportation networks, resource management, waste management, and security mechanisms [65]. Therefore, smart technologies such as the IoT are significant when developing SCs, while maintaining emphasis on the implemented strategies and policies [66]. It is clear that the implementation of targets related to SCs requires strong and calculated strategies and

policies [67]. In fact, achieving "smartness" is not a one-time action; it is a continuous process. Therefore, policymakers should aim to devise a plan [68] that considers the individual goals of each sector whilst also evaluating the dynamic and indirect impacts on other areas within an SC. Undoubtedly, to advance SCs and continue their expansion, officials and policymakers must vigorously strive to create a unique quality of life, work, and environment for the citizens of their cities [69]. On the other hand, since the concept of SCs falls in line with the smart grid, economic issues related to the programs that are used for demand-response management (DRM) and real-time pricing should be taken into consideration [70]. In addition, it can be added that as SCs aim to improve the quality of life of urban citizens, the success of SCs depends on participation by private companies [71]. Therefore, through the use of new communication channels between the government and its citizens, policymakers should focus on the essential needs of stakeholders, such as affordable energy, urban security, and energy security [69], because, public participation will help improve quality of life and establish trust between local governments and people [72]. This means that the investment in developing SCs has advantages for both people and the community, including a reduction in the cost of living, improvement of living standards and environmental sustainability, improvement of operational efficiency, improvement of eco-friendly infrastructure, and development of smart technology through the IoT [73]. Moreover, private investment (companies) can help governments easily overcome old issues pertaining to big cities or developing cities that have not been well planned [74]. In general, investment on IoT technologies, is opening new possibilities for cities and helping them to be smart cities [75]. According to these cases, effective strategies and policies can accelerate the conversion of a standard city into an SC and, thus, should aim to attract investment, improve IT infrastructure, integrate connected local energy storage systems in order to support better renewable energy sources on the power grids, and adapt an IoT implementation strategy based on the city's size to reduce costs, support the utilization of smart LED streetlights in major metropolises, increase the collaboration between local governments and stakeholders, increase the utilization of new technologies such as sensors, change the mentality of the citizens, and redefine the governance model with proper politics. Based on the comprehensive explanations presented above, the most important barriers and the most appropriate solutions related to IoT-based SC development are presented in Table 1.


**Table 1.** The most important barriers related to SC development based on the IoT and the appropriate solutions.


To complete this Table, other investigations can be added. For example one of the greatest challenges at present is the low or inadequate quality of the life in many areas of the world. This means that in many areas, use of energy is not based on world standards and there is less use of modern technologies to manage it. This affects the quality of life of citizens, and, in particular, the economy of households [104,105]. Without a doubt, collection of the wastes in crowded areas especially cities, is important for citizens, because it prevents illness. Utilization of IoT is very important in providing more efficient waste management and overcoming other problems in this area [106]. Healthcare is one of the most important challenges for governments because low health of the citizens can have negative effects on people. In this regard, IoT systems can improve net health and increase people's health knowledge [107]. In addition, using e-health services, for instance in a global pandemic such as COVID-19, for data collection by citizens, for giving health advice through the Internet, and for increasing the health of medical staff is fruitful [108]. On the other hand, as mentioned previously, transportation systems are one of most important sectors in many countries because transport has a large effect on the environment and the movement of people. Therefore, today, the emergence of IoT systems inside cars and the conversion to smart cars (vehicles), helps the environment and can also move people easily without loss of time [109,110]. In addition, in order to reduce traffic and help the environment, greater utilization of bicycles and an increase in bike-sharing services has been implemented through IoT systems [111]. Moreover, to improve the electrical energy saving of the cars, increasing the lifetime of battery-operated devices (by up to a couple of years) by using of IoT systems is possible because IoT systems are able, during inactive periods, to keep the device in a low-power state [112]. The issue of the elderly and their care is also important in many countries. Fortunately, however, IoT systems have provided assistance applications through the use of a single wearable device in both outdoor and indoor locations. These systems are able to recognize changes in the behavior of elderly people, are low-cost, unobtrusive, have a low power consumption, and can easily prevent problems [113].

Table 2 shows a comparison between results of this work and a number of works in the literature. As can be seen, most previous work has investigated limited subjects related to the IoT and smart cities, while this work comprehensively investigated these subjects.


**Table 2.** A comparison of the results of this work and those of previous work.

#### **5. Conclusions**

Future with less CO2 [121], and relying upon renewable energy [122] as main fuels, are some of the most important goals of scientists and researchers. In this regard, the role of IoT in controlling CO2 emissions and managing energy consumption is important. This work investigated the problems related to seven important sectors of the IoT, namely environment, public transport, utilities, smart parking, public safety, waste management, and smart lighting. Each sector was analyzed carefully to identify the challenges to be mitigated or removed such that building SCs would be possible. For instance, in the environment sector, the utilization of air and water sensors allows us to monitor air and water quality and detect the amount of CO2, sulfur oxides, and nitrogen, water leakage, and changes in the pH level and chemical composition of water over time, as well as other factors that have potentially detrimental effects. In terms of road traffic, the determination of the speed, number, and locations of vehicles and monitoring of green-light timings can be achieved through the use of various types of sensors and GPS data collected from drivers' smartphones. Across the public-transport sector, IoT sensors can help enhance our travel experiences and achieve a higher level of safety and punctuality. In utility monitoring, the IoT allows users to control their home utilities for billing, consumption patterns, and remote monitoring. In particular, via cost-effective connectivity to utility companies' IT systems, customers can adjust their energy and water consumption more economically. For street lighting, utilizing both IoT systems and sensors connected to a cloud-management solution can ensure the confident monitoring of illuminance for the safe movement of people and vehicles. In terms of environmental effects, we can improve the lighting schedule and determine which areas require different intensities of light (some streets may only need a dim light, so less electricity would be used). In the waste-management sector, the use of IoT technologies can lead to the optimization of waste-collection schedules by tracking the waste levels, providing route optimization, and ensuring useful operational analytics. In this regard, each waste container would be implanted with a sensor that gathers data on the level of waste in a container. Then, a notification of filled containers would be sent to truck drivers via a mobile app. This is a strategic plan to avoid emptying half-full containers, resulting in less travel by trucks and reducing GHG emissions. In the public-safety sector, IoT-based SC technologies have a crucial role in offering real-time monitoring, enhancing public safety, and developing decision-making tools and analytics through CCTV cameras and acoustic sensors. At the same time, data from social media feeds can be carefully analyzed to improve public safety in a city and predict potential crime scenes. For the smart-parking sectors, IoT technologies can help drivers identify available parking spots

on a real-time map based on GPS data extracted from drivers' smartphones or road-surface sensors embedded in parking spots.

**Author Contributions:** Several authors contributed to this research: methodology, validation, review, and editing, A.R. (Armin Razmjoo), A.G., and M.M.; formal analysis and investigation, S.A. and A.R. (Alireza Rezvani); resources, D.A.G.; writing, and final analysis, S.M. (Seyedali Mirjalili) and S.M. (Saim Memon). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not Applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **Review of the Methods to Optimize Power Flow in Electric Vehicle Powertrains for Efficiency and Driving Performance**

**Izhari Izmi Mazali 1,\*, Zul Hilmi Che Daud 1, Mohd Kameil Abdul Hamid 1, Victor Tan 1, Pakharuddin Mohd Samin 1, Abdullah Jubair 1, Khairul Amilin Ibrahim 1, Mohd Salman Che Kob 1, Wang Xinrui <sup>2</sup> and Mat Hussin Ab Talib <sup>1</sup>**


**Abstract:** Electric vehicles (EV) are quickly gaining a foothold in global markets due to their zero tailpipe emissions and increasing practicality in terms of battery technologies. However, even though EV powertrains emit zero emissions during driving, their efficiency has not been fully optimized, particularly due the commonly used single-speed transmission. Hence, this paper provides an extensive review on the latest works carried out to optimize the power flow in EV powertrains using multispeed discrete transmission, continuously variable transmission and multi-motor configurations. The relevant literatures were shortlisted using a keyword search related to EV powertrain in the ScienceDirect and Scopus databases. The review focused on the related literatures published from 2018 onwards. The publications were reviewed in terms of the methodologies applied to optimize the powertrain for efficiency and driving performance. Next, the significant findings from these literatures were discussed and compared. Finally, based on the review, several future key research areas in EV powertrain efficiency and performance are highlighted.

**Keywords:** electric vehicle powertrain; multispeed discrete transmission; continuously variable transmission; two-motors configuration; four-motors configuration

#### **1. Introduction**

Electric vehicles (EVs), which offer zero emissions during driving, are quickly gaining market share recently due to their increasing practicality; contributed by the latest technological advancements made particularly in the areas of energy storage and charging systems. Together with the recent developments in terms of the emission regulations worldwide, the market share of EV is expected to increase further contrary to that of conventional vehicles with internal combustion engines (ICEs). The latest forecast conducted by [1] from Deloitte showed that the percentage of EVs in the global market share is expected to reach 32% by the year 2030. This forecast was made based on four factors, namely customers' changing sentiments regarding EVs due to their improved practicality and ownership cost, favorable government policies, mostly in terms of financial incentives and accessibility to charging facilities, car manufacturers' business strategy of putting more emphasis on EV-related technologies, and support from companies outside of the car industry in adapting EV en masse. This trend, when viewed from a tailpipe emission perspective alone, presents a positive outlook to the global environment since the amount of harmful CO, CO2 and NOx emissions are expected to be reduced gradually in transportation. At the same time, it also opens up possibilities to explore numerous frontiers like vehicle connectivity (vehicle-togrid, vehicle-to-vehicle, and vehicle-to-infrastructure), autonomous technology as well as

**Citation:** Mazali, I.I.; Daud, Z.H.C.; Hamid, M.K.A.; Tan, V.; Samin, P.M.; Jubair, A.; Ibrahim, K.A.; Kob, M.S.C.; Xinrui, W.; Talib, M.H.A. Review of the Methods to Optimize Power Flow in Electric Vehicle Powertrains for Efficiency and Driving Performance. *Appl. Sci.* **2022**, *12*, 1735. https:// doi.org/10.3390/app12031735

Academic Editors: Anselme Muzirafuti, Giovanni Randazzo and Dimitrios S. Paraforos

Received: 11 January 2022 Accepted: 28 January 2022 Published: 8 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

advanced materials for energy storage. However, new challenges will also emerge from the EVs' increasing popularity and they must be studied and addressed properly.

#### **2. New Challenges Emerged from EVs' Popularity**

EVs' increasing popularity leads to numerous new challenges that must not be conveniently ignored. These challenges can be categorized into three classes, namely; challenges in ensuring the sustainability of the EV production, challenges in meeting the increasing demand of electricity due to EV penetration, and, challenges in managing the migration of ICE-to-EV in terms of number of vehicles and the industry eco-system. In the context of EV production sustainability, it was argued in [2] that, although EVs emit zero emission, the same cannot be said for their production. This is because the production process involves a significant amount of depletable materials, like heavy rare earth materials, for the production of motors and batteries. Moreover, the process also leads to higher amounts of emissions of heavy metals like lead, nickel and molybdenum, as compared to the production of ICE vehicles, and this was claimed to be detrimental to human health. According to the study by [3], the carbon footprint from these activities is currently very high due to their localization. At the moment, these activities are mainly located in China, South Korea and Japan, where a significant portion of the power is generated by fossil fuels, resulting in a high carbon footprint. To address this, refs. [3,4] proposed either diversifying the production locations to places with high concentration of renewable power generation, or intensifying the amount of renewable power generation at the existing locations. At the same time, ref. [3] also suggested stopping the trend of increasing the battery size because it has direct relationship with the aforementioned carbon footprint issue. This suggestion can be achieved by improving the efficiency of EV powertrains.

The increasing demand for EVs also causes electricity demand to shoot up and this leads to the second challenge emerged from the increasing EV popularity. According to [5], the amount of electricity used for EVs, on a daily basis, is about the same as the average daily electricity usage of a typical household in the United States. As such, when EVs reach 20% of a total vehicle market share globally, the electricity peak demand is expected to increase by 36%. In some countries, like China, research by [6] indicated that the popularity of EVs will strain not only its national grid, but also to its national water supply. This is because in China, two major contributors to power generation are hydroelectric and coal power plants that rely heavily on the national water supply. Thus, building and operating additional hydroelectric dams and coal power plants to meet the demand for EVs will divert vast amounts of water away from household usage, causing water scarcity if not properly planned. To address this challenge, two fundamental strategies must be seriously evaluated; efficient power grid management, which can be achieved via either implementation of vehicle to grid technology or implementation of extensive battery swapping activity, and efficient, sustainable and economical EV powertrains, which include the application of optimum motors, transmissions and batteries, with, possibly, a significant amount of carry-over technologies from ICE vehicles.

Finally, it is also critical to properly manage the ICE-to-EV migration so that a smooth transition phase can be realized. Simply increasing the market share of EVs alone is not enough if the total number of existing ICE vehicles, especially those that have low emission standards, is not drastically reduced. Besides, such migration must also be managed from the perspective of the existing industrial supply chain. For instance, an appropriate strategy has to be planned for the existing ICE-related manufacturing plants which are expected to face redundancy once EVs take over ICE vehicles' market share. In this aspect, one of the strategies is to repurpose the existing manufacturing plants to focus on EV-related products. This, however, is less popular due to the high costs involved in training the existing workers and upgrading the plants [7]. Market readiness is also another major challenge in ICE-to-EV migration, especially for emerging countries. To address this, one option is to implement bridging technologies, like hybrid vehicles, that implements technologies from both ICE and EV, or the use of biofuels. The advantage of the former is that it is more practical since

it also uses gasoline for operation, which is widely available especially in the emerging markets. The advantage of the latter, on the other hand, is its renewability. Nevertheless, implementing these technologies might not lead to the desirable reduction target for the carbon emissions [8,9].

One strategy that can be applied to accelerate the ICE-to-EV migration is EV powertrain retrofitting of existing ICE vehicles. The idea here is not only to accelerate the market penetration of EV, but also to utilize the existing resources; in this case, the existing ICE vehicles on the road, which leads to, ideally, no increase in the net number of vehicles on the road. A study by [10] investigated the potential as well as the challenges of widespread EV retrofitting with an emphasis on public and business perceptions. The investigation, conducted based on the current situation in Germany, highlighted some challenges in terms of public acceptance and vehicle homologations. In general, public acceptance of EV retrofitting can be improved gradually through effective communication between the government, technology providers and the public, by highlighting the benefits in terms of sustainability, long term financial savings and reduced emissions. Simultaneously, the compatibility and flexibility of EV powertrains should also be improved so that initial retrofitting cost can be reduced. Such powertrains can also contribute in the aspect of homologations, which is a major hurdle in implementing EV retrofitting.

Therefore, it can be summarized here that, an increasing EV market share, although from one angle it reduces the carbon emissions globally, still leads to several major economic and overall sustainability challenges. If these challenges are not properly addressed, they will negate the aforementioned benefits of EVs. Figure 1 shows a summary of these challenges, and based on the figure, optimizing the performance, efficiency and sustainability of EV powertrains is the key to guarantee positive economic effects and carbon neutrality in transportation.

**Figure 1.** Summary of potentials and challenges of EV regarding environmental and other issues based on literatures in [1–10].

#### **3. Main Components of EV Powertrains**

EV powertrains mainly consist of batteries, an electric motor and transmission and their performance can be defined in terms of efficiency and practicality. A highly efficient EV powertrain means that its power consumption (kWh) per distance (km) can be kept as low as possible, thus allowing the vehicle to increase its driving mileage. For practicality, the target is to ensure that the powertrain components are cost effective; the cost for production and operation (i.e., maintenance) can be kept as low as possible, and sustainable, i.e., with a low carbon footprint from production until application.

The purpose of the battery, the first component of an EV powertrain, is to store electricity for the electric motor's operation. To ensure that the powertrain is highly efficient and practical, the battery needs to have high energy density so that it can store high amounts of electric power without affecting its weight. Achieving this involves implementation of new cathode, anode and electrolyte materials. One of the options, suggested by [11], is to increase the nickel content in the cathode. However, this method inevitably leads to the reduction of the cathode's thermal stability, hence risking thermal runaway or damage to the battery [12]. However, according to [13], a high battery temperature, if properly managed, also presents an opportunity to enhance its performance in delivering the electricity to the motor. Because of this, many researchers have proposed either active cooling methods so that the battery's temperature can be optimized to suit various driving conditions, or emerging materials for the anode surface [14–17]. Nonetheless, such cooling methods require extra management complexity and additional power consumption for operation, while the usage of emerging materials, though promising, usually involves a significant investment for new mining and manufacturing process [18]. This is consistent with the findings by [19], which estimated that new investment of 100 Euros is required to increase the battery capacity by 1 kWh. In short, increasing the battery energy capacity, even though can avoid the increase of weight, has its own challenges in terms of safety, complexity and cost.

The next major component of an EV powertrain is the electric motor which is responsible for converting the electricity from the battery into mechanical power to move the vehicle using the electromagnetic induction principle. The motor is controlled by an inverter that regulates the required current flow from the battery to suit the driving conditions. There are two typical types of motor used in EVs: permanent magnet synchronous motors (PMSMs) and induction motors (IMs). In PMSMs, the magnetic field required to rotate the rotor is generated using permanent magnetic materials in either the stator or the rotor. On the contrary, in IMs, the electromagnetic field is produced using a current flow in the rotor conductor. Compared to ICEs, the volume of both types is relatively more compact, and, they also have a higher power to weight ratio. Even so, there are still continuous studies carried out to explore the implementation of advanced materials, like ultraconductive copper for motor windings, and grain boundary diffusion processed magnets, with the intention to increase the motors' power density even further [20]. The compactness and high-power density contribute positively to the power consumption of an EV. Not only that, but these motors also offer high torque capability at low motor speed (RPM) which eliminates the requirement of high gear ratios for vehicle start-stop. This explains the typical omission of multispeed transmissions in the existing EVs. Between these two types of motor, some researchers argued that IM ones are more robust, sustainable and low cost, partially due to the absence of a permanent magnet, while others prefer PMSMs due to their high-power density and no issue of current losses in the IMs' rotor to induce the magnetic field [21,22]. In terms of efficiency, both PMSMs and IMs have a very high peak efficiency, ranging from 85% up to 97% [23]. However, such efficiency is available only within a limited motor speed range, hence, for diverse driving conditions, the powertrain's efficiency usually falls significantly below that value. Not only that, but the construction of motor also involves the usage of heavy rare earth materials, which causes issues of high cost and less sustainable production. Therefore, sustainable and cost-effective approaches to realize the actual EV powertrains' potential in terms of driving range and performance is desired.

The final major component of EV powertrains is the transmission, responsible for ensuring that the power can be transmitted from the motor to the wheels efficiently. Because of the characteristics of the typical electric motors used in the existing EVs, the transmission used usually only provides a single speed ratio. The main benefit of using single speed transmissions is their simple construction that leads to relatively low cost for production

and maintenance. However, this limits the flexibility of the electric motor to operate optimally to suit diverse driving conditions. Therefore, it is difficult to realize the actual potential of EVs in terms of driving mileage and power consumption. A summary of the areas that can be improved to enhance the performance of an EV powertrain is illustrated in Figure 2. This figure indicates that transmission, or any method to manage power flow between the motor to the wheels optimally, is crucial in optimizing the EV powertrain performance. Once the power flow is optimized, the electric motor will have the flexibility to operate more efficiently and effectively, resulting in less power consumption from the battery. This presents a promising and cost-effective prospect of increasing EVs' driving mileage without expanding the battery size or capacity. Thus, this paper reviews and discusses the latest and most significant research works carried out to optimize the power flow in EV powertrains.


**Figure 2.** Various methods recently proposed to optimize EV powertrains' performance.

#### **4. Optimizing Power Flow in EV Powertrains**

Ensuring that a single motor EV powertrain can operate optimally for various driving conditions, especially when the vehicle is travelling at high speed and low load, is very challenging and because of that, their efficiency normally falls to only around 60% from about 90% for the best-case scenario [23]. One of the possibilities to avoid this is by allowing flexible power flow configurations in the powertrain. Studies [24–26] support this argument, where it is found that the powertrain efficiency and driving performance (in terms of acceleration time and comfort) can be optimized for the full EV driving experience if the driving loads can be properly distributed to two electric motors in the powertrain configuration with different transmission ratios. In the study, the possibility of implementing different hybrid EV (HEV) powertrain configurations was evaluated, and the configuration was defined in terms of coupling between the motors to the ICE, and also in terms of different transmission ratios. When the loads are properly distributed, the motors' speed can be reduced drastically during high vehicle speed, and this contributes to increasing the powertrain efficiency while ensuring the acceleration can be performed smoothly. Hence, it can be summarized here, that, flexible motor's power flow, optimized powertrain components and control are the key to optimize EV powertrains, and this can be achieved by optimizing multi-motor configurations, or by implementing multispeed transmission in the EV powertrain.

In terms of design complexity, the multispeed transmission in an EV should be less complicated than the one used in the existing ICE-powered vehicles. This is because of several factors; most notably the requirement of moving-off elements in the conventional ICE vehicles. Generally, because of the ICE idling speed condition, a moving-off element; like a dry friction clutch, or, torque converter, is required to facilitate the vehicle's start-stop condition. For an EV, however, because of the availability of the motor's torque from as low as 0 RPM, the implementation of moving-off elements is no longer required. On top of that, the elimination of moving-off elements also opens up the chance to implement a much simpler transmission control algorithm, since now it is no longer necessary to control the moving-off element to achieve desirable driving comfort during start-stop conditions (Figure 3). As a result, only ratio shifting control is required in an EV, although, if a discrete multispeed transmission is used, then a clutch or brake system is still required for the shifting. This is contrary to the conventional ICE vehicles, where it is absolutely critical to optimize both moving-off control and ratio shifting control. In this paper, the works related to the implementation of multi-speed transmission in EV are divided into two categories: multispeed discrete transmissions and continuously variable transmissions (CVTs). In addition, the possibilities of implementing multi-motor configurations are also reviewed here, since this approach can also lead to optimization of the motor operation for various driving conditions, which according to some scholars [25], is more effective than the implementation of multi-speed transmissions.

**Figure 3.** Differences in the powertrain requirements for ICE vehicle and EV.

This paper focuses on reviewing research works published from 2018 until early 2022. Therefore, by using keywords "multispeed transmission electric vehicle", "continuously variable transmission electric vehicle", "multi motors electric vehicle" and "electric vehicle powertrain" in the ScienceDirect and Scopus databases, 60 references have been identified and shortlisted as related to the topic of optimizing the power flow in EV powertrains. Among them, 24 papers describe work related to multispeed discrete transmissions, while 13 and 15 papers cover work on CVTs and multi-motor configurations, respectively. Lastly, eight papers from the 60 were review papers related to the topic of EV powertrains. The review conducted in this paper focuses on the methodologies applied and the significant findings, followed by a comparison between them. Finally, the expected key research areas

in optimizing the power flow of EV powertrain are highlighted. Figure 4 illustrates the breakdown of the selected literature reviewed in this paper.

**Figure 4.** The number of shortlisted references related to the methods used for optimizing the power flow in an EV powertrain.

#### *4.1. Multispeed Discrete Transmission*

The main motivation of applying multispeed discrete transmission in an EV, similar to ICE vehicles, is to provide the most suitable gear ratio in the powertrain so that the motor can operate efficiently for diverse driving conditions. Since EV motors typically are capable of producing high torque output from very low RPM, the number of gears for an EV is expected to be very minimal, as few as two gears, as opposed to the ones used in the ICE vehicle. Latest works done to evaluate the efficiency difference between the EV powertrain with single-speed transmission and two-speed transmission were described in [27,28]. In [27], the comparison was conducted using simulation model-based estimation by taking into account vehicle parameters, reference motor's data and three driving cycles, namely the New European Driving Cycle (NEDC), Worldwide Light Duty Test Cycle (WLTC) and US Environmental Protection Agency (EPA) Federal Test Procedure for city driving (FTP-75). Firstly, the simulation model was run using an EV powertrain with a single speed transmission for the three driving cycles. Thereafter, the simulation results of the WLTC were used to determine the appropriate size of two gear ratios for improving the power consumption, and then the model was rerun using the newly determined ratios. Using the WLTC results, instead of NEDC and FTP-75, was logical, considering that it is the most power demanding cycle that covers diverse driving phases of urban, suburban, rural and highway scenarios. Besides, it also saves a significant amount of work and computing time as opposed to using the results from all three driving cycles. The comparison results showed that efficiency improvements were measured at the range of 1.7 to 2.4% with the two-speed transmission for the three driving cycles. It must be highlighted, however, that [27] emphasized on obtaining the gear ratios for powertrain efficiency only, without consideration of the driving performance. In terms of driving performance, the typical target is to achieve fast acceleration with minimum jerking, which leads to contradictory requirements between this and achieving maximum powertrain efficiency. Besides, the gearshifting model must also be incorporated into the powertrain simulation model to allow a realistic evaluation of the jerking. Finally, an advanced optimization method must be implemented in the model for optimizing the gear ratios for powertrain efficiency and fast acceleration with minimum jerking.

In the subsequent work by [28], the comparison was conducted using an electric bus model that runs in a specific city driving cycle with a two-speed dual clutch transmission (DCT). A DCT allows fast gearshifting thanks to its capability to pre-select the next gear before the shifting is done by the engagement of the second clutch. Such capability is not available for a single clutch automated manual transmission (AMT) and a conventional automatic transmission. The driving cycle, on the other hand, was obtained based on an existing bus route in Espoo, Finland. The powertrain model took into account not only the vehicle parameters and the motor's data, but also the efficiency mapping of the inverter. Based on the model, an exhaustive search algorithm was implemented to determine the size of the two gear ratios for optimum efficiency. The results proved that, first, the efficiency gain was in the range of 2 to 3.2%, which is consistent with the findings in [27], secondly, the application of two-speed transmission opened up the option to use a more cost-effective motor with a narrow high efficiency range, and, lastly, further studies are still required to evaluate the application particularly in terms of maintenance cost, to assess how much higher the cost will be as compared to single-speed transmission. Nevertheless, unlike [27,28] focused on maximizing the efficiency during city driving with the speed below 60 km/h and frequent starts-stops. Hence, the results are applicable only for a very specific city driving cycle. Moreover, no details on the gearshifting mechanism are provided, which means that further study to evaluate the jerking during gearshifting is required. This is particularly very important since the driving cycle studied here involves frequent start-stops. Finally, ref. [28] also considered the application of CVT with metal belt, which they found out that was less desirable due to the significant power losses in the belt. This is expected because of the high hydraulic pressure requirement to clamp the belt, especially since higher torque is required to move the bus as opposed to the passenger cars. Therefore, using such CVT in heavy vehicles, like a bus, is less practical as compared to using it in passenger cars.

Another study covering the implementation of multispeed discrete transmission in an electric bus was described in [29] where a four-speed automated manual transmission (AMT) was used. In the transmission, two DC electric motors were used for gearshifting, where one motor was used to select gear 1 and gear 2, while another motor was used to select gear 3 and gear 4. Because of this configuration, the shifting from gear 2 to gear 3 required sequential operation of both motors, thus it is expected to take longer time and higher actuation power than the shifting of gear 1 to 2 as well as gear 3 to 4. The shifting performance was evaluated experimentally on a test bench in terms of efficiency and shifting time. Based on that, a complete powertrain model for the electric bus was developed so that the optimized gearshifting strategy (defined as the optimal gearshifting points with respect to vehicle speed and throttle's opening) can be determined for minimum power consumption and minimum shifting frequency. The optimized gearshifting strategy was formulated based on the actual Beijing city driving data representing two driving conditions; high urgency driving with frequent acceleration and higher average speed, and, gentle driving with less frequent acceleration and lower average speed. Then, the fourspeed AMT with the optimized gearshifting strategy was tested on a dynamometer to gauge its workability. Unlike the works in [27–29] provided the details of gearshifting mechanism in the AMT, hence analysis on the shifting time can be done realistically. However, further improvement in the shifting time here to match the DCT's performance is challenging due to the operation of two DC motors in the mechanism. This means that the shifting time can be minimized only if the number of motors can be reduced, which is possible only with the reduction of the number of gears. Therefore, the next area that can be focused on in [29] is the optimization of the gear ratios so that the possibility to reduce the number of gears can be explored. Besides, study on the jerking during gearshifting can also be carried out here thanks to the availability of the AMT's prototype.

Subsequently, research works in [30–32] described the working principle of two-speed transmissions using planetary gearset for application in an EV powertrain. In terms of the planetary gearset design, the transmission was similar to a conventional automatic

transmission for ICE vehicles. However, in terms of actuation system for its clutches and brakes, the proposed one used electro-mechanical actuation system that featured DC motor and a screw nut system. The significant benefit of using the screw nut system is that it provides self-locking mechanism, hence the desired gear can be maintained without exerting continuous hydraulic pressure on the clutches and brakes. This will improve the transmission's efficiency since no power is required to generate the needed pressure. The challenge, however, is the complexity to integrate the design of the screw nut system with the clutches and brakes. Besides, the system also has to handle very high thrust force between the pulley (rotating based on motor's power) and the screw (rotating only during ratio shifting to axially move the pulley). If not properly optimized, this will lead to excessive tear and wear in the screw nut system, and also power loss in the thrust bearing. In the research works conducted by [30–32], the focus was to minimize the jerking by properly implementing various gearshifting strategies with different objectives; first, to maintain a constant transmission input torque, second, to maintain a constant transmission output torque, and third, to maintain a semi-constant transmission output torque. In terms of efficiency analysis, however, no results and comparison were presented between the proposed transmission and the typical single speed transmission in an EV.

A summary of the works described in [27–31] is presented in Table 1, highlighting the potential of multispeed discrete transmissions in improving the efficiency of EV powertrains. However, these works still insufficiently discussed the topic of gear ratio optimization which is crucial to achieve not only powertrain efficiency, but also desirable driving performance. In addition, details on the gearshifting mechanism were also rarely provided, which means that analyses of the jerking and actuation power consumption during gearshifting are still lacking.

**Table 1.** Summary of the literature review on efficiency evaluation of using two-speed discrete transmission in an EV powertrain.



Optimizing the best two-speed gear ratios, however, is not straightforward due to its multi-objective nature. For instance, the best ratios should be able to achieve the desirable driving performance (in terms of acceleration rate and top speed), and minimum power consumption. These objectives consistently contradict each other, and they are influenced by diverse parameters like the road gradient and instantaneous vehicle speed. Thus, advanced optimization techniques are required, for instance, a work by [33] focused on the optimization of two gear ratios based on specific gearshift scheduling strategy that took into account three parameters; vehicle speed, vehicle acceleration and road gradient. As a comparison, the usual parameter used for gearshift scheduling is the throttle position. In the work, an AMT was used and its baseline gear ratios were set at 10.00 and 5.20 for the overall gear ratio 1 (G1) and 2 (G2), respectively. For the shifting strategy, the motor speed of 3000 RPM is set as the reference for the driving due to its high efficiency in that speed, and G1 is reserved for low vehicle speed (0 to 25 km/h), and G2 is reserved for high vehicle speed (65 to 120 km/h). Between 25 to 65 km/h, the suitable ratio was selected based on the motor efficiency and power output at a particular vehicle speed, while the baseline buffer zone of 40% was set between the upshifting and the downshifting lines to avoid too frequent gear shifting. Subsequently, the baseline ratios and shifting's buffer zone were optimized using two methods: gradient descent and pattern search. Simulated under NEDC (to reflect flat road condition) and Economic Commission for Europe (ECE) Extra Urban driving cycle (to reflect gradient road), the optimized model produced a 4% and 7.5% reduction in the power consumption as compared to the baseline model, respectively. Next, the performance of the optimized model was compared against a conventional gearshift model. The conventional gearshift model consists of the same ratios as the optimized model, but it uses a conventional gearshifting strategy based on throttle. The comparison showed that the optimized model led to almost 18% energy saving over the conventional model for the gradient road driving cycle (ECE Extra Urban). However, for the flat road driving cycle (NEDC), the conventional model was slightly more efficient at about 3 to 4%. These results highlighted the contribution of two different gearshifting strategies

in optimizing powertrain efficiency for driving cycles involving diverse road gradients. Nevertheless, for the actual application of these strategies, an additional system is required to activate the suitable strategy. In this case, a gradient detection system is required so that the road gradient can be measured to activate the proposed gearshifting strategy. Therefore, further comparisons between the proposed strategy and the conventional strategy should be carried out on more driving cycles (instead of just NEDC and ECE Extra Urban) to provide clearer picture on the importance of implementing two different gearshifting strategies. Figure 5 presents a graphical summary of the work performed by [33]. In short, it can be concluded from the work, that highly flexible gear shifting strategy is crucial in optimizing EV driving mileage, and such flexibility is possible with the optimized multispeed transmission in the EV powertrain.

**Figure 5.** Summary of the work done in [33].

Other works involving the optimization of multispeed discrete transmission in EV can be found in [34–39]. However, unlike [33,34] they presented the optimization of two-speed transmission in an electric truck subjected to specific gradient route without heavy traffics, and the motor's efficiency mapping also included regenerative braking efficiency. The two-speed transmission mainly consisted of two planetary gearsets with two brakes to select the desired gear ratios. The brakes were actuated using a DC motor through a worm gear and worm wheel (Figure 6). The worm wheel was designed with an inner spiral guide, allowing it to convert its rotation about the axis of the motor's shaft into an axial movement. Depending on its rotational direction, the worm wheel, at one time, could press and lock either the first brake (engaging gear 1) or the secondary brake (engaging gear 2), accordingly. The application of worm gear here provides an advantage in terms of big torque multiplication, which leads to the possibility of using compact motor to engage the brakes. However, the worm gear is more vulnerable to tear and wear than the usual gear wheels, which means frequent gear shifting here will very likely lead to high maintenance cost. The shifting strategy used here, in contrast to [33] that took into account the road gradient, involved only throttle position and motor speed as the parameters and the driving cycle was designed to reflect an operation in an iron mine. Apart from vehicle speed, the studied driving cycle also took into account change in the weight, considering the delivery of iron ore, and also drastic change of gradient, considering the geography of the mine. Based on the aforementioned shifting strategy and driving cycle, the two ratios of the transmission were optimized for efficiency and acceleration using particle

swarm optimization (PSO). The results showed that, when compared against one-speed transmission, the optimized two-speed transmission managed to reduce the overall power consumption by 6.1%, contributed by efficiency motor's operation during driving and regenerative braking, but the gain in acceleration was very minimal. In terms of shifting strategy, for the future study, it is interesting to evaluate if there will be any efficiency improvement if the same strategy as described in [33] (consider road gradient as parameter for gearshifting) is to be implemented here in [34] (do not consider road gradient as parameter for gearshifting). The efficiency difference between them is crucial for evaluating the viability of considering the road gradient in the gearshifting strategy.

**Figure 6.** (**a**) Schematic diagram and (**b**) CAD model highlighting the important components of the novel two-speed transmission proposed by [34].

Another study involving regenerative braking for a two-speed transmission was described in [35], where a two-speed planetary gearset was used. Here, the main objective, instead of maximum efficiency, was to minimize jerking during the braking. The regenerative braking procedure was divided into three phases: driving phase, brake engaging phase, and braking phase. These phases were proposed to minimize torsional oscillations, that causes the jerking, by optimally synchronizing the application of hydraulic service braking and the motor's braking torque during the brake engaging phase. As a result, the jerking was reduced by around 55% as opposed to the conventional regenerative braking that does not consider such oscillations.

Next, ref. [36] presented a work carried out to optimize the gear ratios of two-speed DCT based on not only the motor's efficiency, but also the transmission efficiency. In the work, the transmission efficiency model was developed by taking into account the losses at the gear meshing, bearings, clutch and concentric shaft. Thus, different gear ratios produced different efficiency in transmitting the torque between the input and the output shafts. Based on the model, the optimum gear ratios were selected for the EV powertrain, and its performance was compared against a single-speed EV powertrain for Worldwide Harmonized Light Vehicles Test Procedure (WLTP) cycle, which showed an efficiency improvement around 10.7 to 12.1%. Regarding the transmission efficiency, research in [37] explained a possible method to improve it by modifying the tooth profile of the gears which can potentially reduce not only the loss in the gear meshing, but also the required effort for the gearshifting. In [38], on the other hand, a two-speed EV powertrain model were optimized using genetic algorithm (GA) with the objectives to achieve quick 0–100 km/h acceleration (driving performance) and minimum power consumption (efficiency) under NEDC. The type of transmission used in the model were not specifically mentioned, hence

the shifting mechanism involved was unknown. However, the model included regenerative braking efficiency model. Unlike many papers that emphasized on optimizing the size of the gear ratios, the work in [38] optimized not only the gear ratios, but also the maximum motor's output torque in Nm and its maximum rotational speed in RPM. The motor's torque and RPM were optimized within the range of 150–200 Nm and 8000–12,000 RPM, respectively. In order to obtain the balance optimization results, specific weightage was given to both of the objectives: driving performance and efficiency. The results showed that it was possible to achieve a balance (compromised) solution between the driving performance (quick acceleration) and efficiency (power consumption) by optimizing the two gear ratios and the motor's maximum torque and speed. Nevertheless, different set of gear ratios, or a continuous ratio range between 1.341 and 3.050, were required to achieve the fastest possible acceleration and highest possible powertrain efficiency. This meant that, to achieve maximum performance and efficiency in a single powertrain system, the gear numbers must be higher than 2. In a discrete transmission, however, increasing the gear number must be done together with redesigning the gearshifting mechanism which leads to increased design complexity and cost. Figure 7 shows summary of the optimization work done in [38] using GA to determine the optimum gear ratios and motor's outputs for driving objectives. Based on the results, if a continuous ratio range between 1.341 to 3.050 can be provided by one transmission (like a continuously variable transmission), then all the objectives can be met, instead of opting for a compromised two gear ratios in a two-speed transmission.

Another interesting work regarding EV powertrain with discrete gear transmission was presented in [40], which experimentally evaluated the performance of three different transmission ratios—6.00, 8.00 and 10.00—for one driving cycle. Among the ratios, 8.00 served as the benchmark for the results' analysis. In the experiment, the test vehicle was tested on the same track using three different one-speed transmissions, corresponding to the three ratios. The results showed that, with the gear ratio of 10.00, the power consumption was higher by 4.2% as compared against the benchmark. The authors argued that the increase was caused by the possibility of fast acceleration provided by the ratio, hence the driver has the tendency to often accelerate suddenly. On the other hand, the power consumption can be reduced by 2.4% when the ratio of 6.00 was used, since it was claimed that with that ratio, the driver was forced to drive with using gentler acceleration. From the work, three important conclusions can be drawn. Firstly, driving style is critical for the power consumption of an EV with one-speed transmission, thus, encouraging drivers to drive economically plays an important role in increasing EVs' efficiency. Secondly, the size of the gear ratio has some influence on a person's driving style which ultimately affects the driving power consumption, and thirdly, multispeed transmissions can offer flexibility to suit drivers ' driving preferences, which means that the EV can then be driven either to achieve maximum efficiency, or with an aggressive driving style.

A summary of the works presented in [33–40] is provided in Table 2 and they highlighted the significance of optimizing the gear ratios and the gearshifting strategy to achieve powertrain efficiency and driving performance. Some of these works have started to discuss on the gearshifting mechanism, but analysis on jerking and actuation power usage during shifting was still limited. Moreover, since some of the gearshifting mechanisms are novel, new study areas concerning their durability and practicality must also be covered in the future.

**Figure 7.** Summary of the work done in [38].

**Table 2.** Summary of the literature review on optimizing two-speed transmission for efficiency and driving performance in an EV powertrain.




The latest works related to gearshifting mechanism and its control are described in [41–50]. Researchers in [41–43] argued that criteria for the EV motor to operate efficiently is not just the application of the multispeed discrete transmission but also smooth gearshifting process with minimum jerking and actuation power usage. Reference [44], on the other hand, explained jerking effects in relation to friction clutch, one-way clutch and types of the driveline. The jerking effects were evaluated under three common shifting scenarios: upshifting during driving, downshifting during driving, and downshifting during braking. In general, smooth shifting is not only beneficial for driving comfort, where it avoids excessive jerking and torque interruption, but it also helps in terms of the overall powertrain efficiency. Thus, a novel clutchless AMT was proposed in [41–43] featuring a unique synchronizer called bilateral Harpoon-shift synchronizer. Such a synchronizer uses a torque spring, constructed based on multiple bended coil springs, inside the dog body's internal groove to keep the dog gear damped to the guide ring (Figure 8). This results in quick synchronization of the guide ring and the dog gear without using frictional cones, and also smooth shifting due to the spring's damping effect. Additionally, the spring also helps in reducing the required axial force for shifting; minimizing the required DC motor's work to actuate the fork for shifting. Hence, faster and more efficient shifting process can be done with a compact DC motor. However, the spring also causes additional normal force between the guide ring and the sleeve, and this causes friction force between them during the shifting process. This eventually leads to an extra load that must be overcome by the motor.

**Figure 8.** The proposed bilateral Harpoon-shift synchronizer by [41–43].

Another novel synchronizer design that featured springs inside it was also presented in [45]. Meanwhile, the work in [46] reported the optimization of the gearshifting with the objectives to minimize shifting time, friction work, due to the engagement and disengagement of the clutch during shifting, and jerking. The optimization was conducted using the Legendre pseudospectral method and the gearshifting model was simplified as two degree of freedom (2-DOF) and 4-DOF dynamic models based on a friction clutch and a sleeve shifting mechanism in a two-speed transmission. The results were divided into four different patterns: the least shifting time, the least friction work, the least jerking, and finally, the compromised solution. In the compromise solution, obtained in the 2-DOF model simulation, the shifting time was recorded at 0.92 s, with the square of continuous jerking measured at 0.48 (m/s3) 2, and the friction work at 1856 J. In the work, however, no detailed descriptions of the actual actuators used for the clutch and the gearshift sleeve was provided, which can be the focus for future works.

Subsequent research work performed to analyze the performance of shifting mechanism was described in [47,48], where a two-speed dry clutch inverse AMT (I-AMT) was used to vary the gear ratios with very minimal torque interruptions with help from two one-way clutches. The one-way clutches, on the other hand, were integrated into the first gear and the second gear separately, hence the shifting can be achieved by actuating only a single dry clutch. Prior to that works, another study has been carried out, as described in [50], to evaluate the clutch control of a wet dual clutch two-speed transmission for EV application. The objective of the study was to experimentally quantify the clutch control's performance in terms of jerking and engagement time. However, because of the usage of the wet clutch, some power was lost due to the clutch actuation. Not only that, but the gearshift quality was also less desirable due to the high jerking at around 10 m/s3, signifying noticeable torque interruption during the shifting. Besides, the sticking characteristics of the wet clutch, due to its hydraulic system, caused difficulty to optimize the clutch control for minimum jerking and engagement time. For improvement, other type of clutch, like a dry clutch system with electro-mechanical actuator, can be implemented, so that the clutching and gearshifting can be precisely controlled based on the motor's torque to minimize jerking and shifting time. Returning to the work explained in [47,48], a dry clutch was used, and its slip control was optimized using high-order disturbance observer to minimize jerking and shifting time, and the clutch control was then tested experimentally in a small EV during upshifting and downshifting. The dry clutch was actuated by a DC motor. The results were encouraging, with the jerking measured at most around 3 m/s3, which is significantly lower than the widely accepted threshold of 10 m/s3. Nevertheless, the operation of I-AMT involved frequent slipping in the dry clutch, hence its durability is expected to be compromised. This leads to the possible increase in the maintenance cost against a simpler single-speed transmission EV powertrain. Thus, detailed study in this aspect is crucial to quantify its long-term operation. Further studies on optimizing the gearshifting mechanism were explained in [49], in which the application of torque sensor was proposed in a two-speed DCT so that precise clutch engagement force can be regulated to fit the desired clutch torque for optimum shift quality. The torque sensor allowed precise real time torque measurement which is crucial to regulate the clutch engagement for optimum engagement time with minimum jerking. However, the application of torque sensor required significant cost, which will increase the transmission's cost tremendously. This unfortunately makes the option of implementing the torque sensor in the actual transmission impractical.

More advanced studies related to EV powertrain with two-speed discrete transmission focused on the shifting strategy that adapts driver behaviors are described in [51]. Previous studies on the methods to recognize driver behaviors can be read in [52–54], covering its application in a fuel cell vehicle and HEV, but none of them was conducted specifically for EV powertrains with two-speed transmissions. However, all of the literatures have certain similarities, in the sense that, the throttle opening rate was used as the indicator for the behavior, and then fuzzy logic was applied to predict the suitable driving style corresponding to the modified standard driving cycles based on driving aggressiveness. Subsequently, the baseline driving style (usually established based on conventional practice) was optimized in the literatures by embedding the correcting factor representing the fuzzy logic's output. Therefore, in [51], they also proposed a predictive model based on a fuzzy neural network (FNN) to recognize the driver's intention via the actual throttle opening rate. Simultaneously, the learning vector quantization neural network (LVQNN) method was used to select the appropriate driving cycle by comparing the actual vehicle speed data against three predetermined different driving cycles. These predetermined cycles were obtained offline based on samples generated from the driving cycles of New York City Cycle (NYCC) UDDS and HWFET. Finally, a correcting factor, representing the outputs from FNN and LVQNN, was introduced to the baseline shifting strategy to optimize it for efficiency and driving performance. The baseline shifting strategy was formulated by taking into account the motor's efficiency, throttle opening and battery's SOC at 40% and 70%. Comparison between the baseline shifting strategy with and without the correcting factor, through simulation and dynamometer testing, showed an average efficiency improvement of up to 2%, proving the benefits of adapting driver behaviors in the shifting strategy.

A summary of the works related to the gearshifting mechanisms and the adaption of driver behaviors in [41–51] is presented in Table 3. In terms of gearshifting mechanisms, the works reviewed here mostly discussed their standalone performance in terms of jerking and shifting time, while limited discussions were carried out to evaluate their performance when integrated in a powertrain system. This means that the question on the potential improvement efficiency and driving performance in a complete powertrain system is still not properly answered. Nevertheless, with the numerous novel designs of gearshifting mechanism proposed recently by the researchers, the outlook of developing and implementing multispeed discrete transmission, especially two-speed transmission, in a commercialized EV looks promising.


**Table 3.** Summary of the literature review on gearshifting mechanism of multispeed discrete transmission in EV powertrain.


Overall, it can be summarized that the research works on multispeed discrete transmission for EV mainly focused on the implementation of two-speed discrete transmission which can be in the form of AMT or DCT. The two-speed design is very compact, which means the additional weight relative to the usual single-speed transmission can be kept to a minimum. Besides, the two gear ratios provide the necessary flexibility in the EV driving modes' selection for optimum efficiency and driving performance. The main challenge, however, is how to optimize the gear ratios and the shifting strategy so that the gains in powertrain efficiency and driving comfort can be maximized. Several optimization methods have been implemented to optimize the gear ratios and the shifting strategy, and the results highlighted the capabilities of the two-speed transmission to reduce the power consumption by up to 16% for some driving cycles. However, more work is still required to evaluate the operation and controls of the gearshifting mechanisms in a complete powertrain in terms of efficiency and driving performance. For now, the studies on gearshifting mechanism mostly focused on assessing its jerking and shifting time, with very limited discussion to answer question on its contribution to the overall powertrain's efficiency.

#### *4.2. Continuously Variable Transmission (CVT)*

The main motivation of utilizing CVT in EV, identical to the multispeed discrete transmission, is to provide variable transmission ratios so that the motor can operate optimally for diverse driving conditions. In general, there are many types of CVT available for automotive application, but in this review paper, the focus will be on CVT that uses pulleys and metal belt which is the most widely used type currently in automotive. Unlike multispeed discrete transmissions, CVTs with metal belts offer a continuous ratio range, which mean more ratios are available to be chosen to suit the driving conditions. In this sense, a CVT is more flexible than any multispeed discrete transmission, hence, the motor has a much better chance to operate optimally for a longer duration of the driving. However, this type of CVT has certain limitations in terms of power loss in the metal pushing V-belt, or metal chain, used to transmit the torque between the primary pulley; connected to the motor, and the secondary pulley; connected to the vehicle's wheels. The loss is caused by a portion of motor power consumed to produce the required high clamping force to clamp the belt for the torque transmission between the pulleys. Research in [55] discussed the possibility of controlling the appropriate CVT ratio using fuzzy logic based on the motor's efficiency mapping as the reference. The fuzzy control algorithm was tested in a simulation model developed based on three driving routes differentiated in terms of road gradients. While the controller helped in enhancing the motor's efficiency throughout the routes, more detail studied are still required particularly for the ratio and clamping force actuation system of the CVT which was not explained in the paper. Subsequently, in 2017, ref. [56] suggested that a CVT, with a possibility to clamp the belt using an electromechanical actuation system with self-lock capability, has the potential to increase the powertrain efficiency. They explained that, unlike conventional CVT that uses engine power to generate hydraulic pressure to clamp its metal belt, such CVT eliminates the required power consumption for the clamping since the self-lock mechanism can held the clamping force. Thus, more power can be transmitted to the wheels, and its ratio can also be selected more efficiently. For an EV, this is particularly beneficial since the motor can operate with high flexibility, resulting in improved powertrain efficiency and increased driving mileage. However, to incorporate the self-locking mechanism required extensive design modifications on the CVT's pulleys as well as integration of the DC motor to actuate the mechanism accordingly.

Next, refs. [57,58] reported their research involving an evaluation of four different DCTs and a CVT applied in an EV. The four DCTs were differentiated in terms of the number of gear ratios (from single to four-speed), and the size of the gear ratios were determined based on the gradient climbing requirement (first gear), high speed driving (top gear) and the progression factor for the intermediate gears. Thus, the gear sizes, as well as the gearshifting strategy, were not optimized based on suiting any driving cycle. The ratio range of the CVT, on the other hand, was defined based on the continuous ratio range between the first and the top gears of the DCTs. In addition, the CVT was also considered to use electro-mechanical actuation system, instead of the conventional hydraulic system, to vary its ratio and to clamp the belt. Therefore, its efficiency was considered to be significantly higher as compared to the existing CVT used in any ICE vehicle. Also, the manufacturing cost for the CVT was considered to be lower than the two-speed transmission in the research. All the transmissions were then simulated based on a hybrid driving cycle established by combining FTP-75 and HWFET. The results showed that, CVT was the best performer in terms of efficiency for a B-segment car, reduced the power consumption by 31.9% against the single-speed transmission, followed by the three-speed DCT (19.1%), four-speed DCT (18.2%) and the two-speed DCT (16.4%). This result highlighted the magnitude of improvement that can be gained by eliminating the losses in the hydraulic actuation system conventionally used in a CVT with metal belt. Moreover, it also emphasized the saturation in the increment of the gear numbers in a multispeed discrete transmission, which in this case, it can be observed by the reduction of the efficiency improvement between the three-speed DCT and the four-speed DCT. This

means that the saturation point here is at three gears, and further increases in the gear numbers will only cause significant actuation losses in the additional shifting mechanism added for the extra gear ratios. The rather low saturation point is typical for a small car (i.e., B-segment) due to its narrower range of the power required as opposed to a bigger car (i.e., E-segment). For a E-segment car, the results showed that CVT was the best performer (23.6%), followed by the four-speed DCT (15.2%), three-speed DCT (9.0%) and two-speed DCT (9.6%). This result suggested that the saturation point for E-segment EV could be higher than four gears for a multispeed discrete transmission, which is logical considering its wider range of power required as opposed to B-segment car. Based on these results, CVT seems more promising, provided that a reliable electro-mechanical actuation system can be successfully integrated in its pulleys system. To achieve this, further research works are still required, especially in the areas of the workability and durability of the electro-mechanical actuation system in the CVT, since such actuation system is still relatively new and has not been implemented previously in any commercialized CVT with metal belt.

Other works discussing the application of CVT in EV powertrain were presented in [59,60]. In [59], the potential of CVT's continuous ratio range to improved EV power consumption was assessed against the single-speed, two-speed AMT and two-speed DCT EV powertrains. In the assessment, all types of transmissions were considered to have the same constant efficiency of 97%. The assessment was conducted based on an analytical model of the motor's efficiency, and it showed that with CVT, the powertrain efficiency can be improved by about 3% for WLTP cycle against the other discrete multispeed transmissions. However, more detail analysis, particularly on the CVT's efficiency, is required, because the application of CVT conventionally involves high hydraulic pressure for clamping and ratio shifting. Thus, without optimization on the hydraulic actuation system, it is inappropriate to assume that the CVT has the same efficiency as the discrete multispeed transmission. Subsequently, in [60], the CVT was considered to be using the optimized electro-hydraulic actuation system (more compact and requiring low power for generating the belt's clamping force) and the novel single loopset belt (as opposed to the typical metal pushing V-belt, hence more compact design and reduced power losses). On top of that, the possibility of downsizing of the motor was also studied, where it was achieved through the reduction of the rotor's diameter and inertia. Nevertheless, the work did not take into account the optimization of the transmission ratios and the shifting strategy. When the powertrain was simulated under WLTC, it showed a 12.7% efficiency improvement against the EV powertrain with single speed transmission. That, however, was less than the two-speed AMT that produced a 13.5% improvement. The lower improvement was very likely caused by the hydraulic actuation system. Even though the system was optimized, the required belt's clamping force was still very high (depending on the EV motor's torque) and must be provided continuously during operation. Hence, continuous power to generate the clamping force, albeit lower thanks to the optimization, was still needed. The AMT, on the contrary, used geartrain to transmit the power, hence no requirement for the belt's clamping force. This means that the continuous power for the clamping force was eliminated entirely. This situation also affected the power flow in the powertrain, which can compromise the driving performance and this can be observed in the 0–100 km/h acceleration time, where the AMT and the single speed transmission yielded 6.9 s, while the CVT achieved 7.4 s. Based on the results, it appeared that AMT is the better transmission for EV than the CVT, although it must be noted that with the latter, it is possible to eliminate torque interruption during ratio shifting.

Research work described in [61] explained optimization and discretization of the CVT ratios so that optimum power consumption can be realized with as minimum shifting as possible. The CVT featured an electro-hydraulic actuation system, where an electric pump was used to precisely control the required hydraulic pressure for clamping and ratio shifting (Figure 9). The rationale of discretizing the ratios was to avoid too frequent shifting would lead to uncomfortable driving due to jerking, as well as power losses in the hydraulic actuation system. The discretizing process started by first establishing the

appropriate number of ratios based on the relation between the energy cost and the ratio number. Hence, the number of ratios was set at four, and the ratio sizes were determined through an equal ratio series method. Then, these ratios were optimized using GA for optimum efficiency when undergoing a combined driving cycle that comprised of UDDS, NYCC and HWFET. For the driving cycle simulation, three ratio shifting strategy models were employed; first, continuously variable ratio shifting strategy, where the best ratio was selected continuously during driving for maximum efficiency, second, the discrete ratio shifting strategy based on the ratios established through the equal ratio series method, and third, the discrete ratio shifting strategy based on the ratios optimized through GA. Comparison between the results confirmed that the third strategy performed the best, with the minimum total power consumption and average jerking, measured at 8.10 kWh and 4.32 m/s3, respectively. The first and second strategies, meanwhile, recorded 8.16 kWh and 5.35 m/s3, as well as 8.69 kWh and 4.65 m/s3, respectively. To summarize, the work reported in [61] highlighted two very important findings. Firstly, CVT provides a continuous ratio range, hence the ratios can be discretized and optimized to suit diverse vehicle parameters, which means the same CVT can be implemented for several type of EVs for optimum driving performance and efficiency. Secondly, high ratio number presents better flexibility for motor's operation, but it also leads to complicated shifting logic which will cause too frequent shifting, resulting in the power losses in the actuation and compromised driving comfort. This presented an opportunity to apply the same CVT with different sets of discretized ratios to suit the requirements of diverse EV segments, which can contribute in terms of cost reduction in the transmission production.

**Figure 9.** CVT with electro-hydraulic actuation system presented in [61].

One of the latest works describing the application of CVT in an EV powertrain can be accessed in [62]. This work presented an optimization of the EV ownership cost by taking into account the components' cost and the electricity cost for all the components involved in the powertrain; battery, motor (with inverter), and, CVT. The optimization was carried out using convex programming design optimization, with the targets to minimize the cost; by a means of optimizing the size of the motor and battery, without compromising the driving performance in terms of 0–100 km/h acceleration time (at most 11 s), top speed of 165 km/h and gradeability of 30%. In the work, three powertrain models were evaluated; first, the base powertrain model taken from the actual EV that used single speed

transmission, second, the modified powertrain model, which essentially the based model with CVT instead of the single speed transmission, and lastly, the optimized powertrain model, in which the CVT ratio as well as the size of the main powertrain components were optimized based on the actual driving data obtained from road and dynamometer tests. Similar to [61], the CVT evaluated in [62] also used electro-hydraulic actuation system for belt's clamping and ratio changing. However, the design integrated the cooling system of both the CVT and the motor, where the heat from the CVT fluid was dissipated to the motor's coolant through a heat exchanger, and then the coolant would be cooled down by the radiator (Figure 10). As a result, an extra radiator for the CVT was unnecessary, and this led to a more compact and cost-effective thermal management system. By simulating all the powertrain models under WLTC, the results showed that the optimized powertrain model performed the best in terms of efficiency (11.19 kWh/100 km, means 2.1% improvement against the base model with 11.43 kWh/100 km) and cost (2% cost reduction against the base model). In terms of cooling power consumption, the optimized model also gained an improvement of 30% as compared to the base model, and this means that the integrated thermal management system was not only cost effective, but it was also very efficient in controlling the operating temperature of the motor and the CVT. Besides, the integrated system also presents an opportunity for further integration with the battery's thermal management system which will potentially lead to further improvements in terms of cost and efficiency. Nevertheless, power losses in the hydraulic actuation system can still be expected, and it is interesting to evaluate how the optimized CVT performs against a two-speed transmission with optimized gear ratios.

**Figure 10.** Integrated thermal management system for motor and CVT proposed in [62].

Another paper that discussed the potential cost benefit of using CVT in an EV can be found in [63]. Here, the application of CVT was complemented by the hybrid battery technology that incorporated a supercapacitor, and the cost benefits considered not only the component and electricity cost, but also the battery replacement cost. The CVT used an electro-mechanical actuation system for clamping and ratio changing, which means it featured a self-lock mechanism to maintain the belt's clamping force without using hydraulic pressure. The actuation system comprised of two DC motors with presumably power screw mechanism, each for actuating the primary pulley and the secondary pulley. So, theoretically, it was more efficient than the CVT described in [61,62] thanks to the self-lock capability. However, detail description on the CVT's actual electro-mechanical actuation system was not described in the paper. The CVT losses model was developed to estimate its efficiency, and based on the model, the efficiency was estimated between the range of 78% to 89%. When simulated under a combined driving cycle of HWFET and FTP-75, the usage of the electro-mechanical CVT reduced the motor's losses by 37.9%, which was translated into 8.3% improvement in the power consumption of the vehicle when compared against single speed transmission. In terms of battery degradation, using CVT reduced the degradation by 7.2% as opposed to the single speed transmission, and, with the proposed hybrid battery technology, the improvement rate can be increased further to 17.5%. The battery degradation was defined in terms of capacity loss percentage, estimated using the LiFePO4 cell's dynamic model that took into account the charging rate and the battery temperature during driving [64,65]. The latest review paper providing further descriptions on the estimation techniques for battery state of health in an EV can be accessed in [66]. Finally, in terms of cost benefits for 11-year of operation, when compared the application of CVT against single speed transmission (both using the hybrid battery technology), a saving of around USD 4541 can be expected for the consumers for the battery cost, resulted from the reduction of the required battery capacity due to the improvement in the powertrain efficiency. In addition, a further saving of USD 1768 can also be gained due to the reduction in the electricity cost. In overall, after reflecting the battery replacement cost as well as the penalty cost for using CVT, the total cost benefit was estimated at USD 3178 for 11-year of operation.

More advanced research on CVT application in an EV powertrain was explained in [67] which involved the optimization of an eco-driving strategy. The optimization objective was to minimize the reduction of the battery's SOC during the driving by taking into account not only the motor's efficiency, but also the instantaneous SOC as well as the efficiency of the CVT. Here, the CVT efficiency model was developed using mathematical equations introduced in [68–70]. NEDC was used as the driving cycle, in which it was divided into three driving conditions, namely, constant driving speed, acceleration condition and deceleration condition. In these conditions, the powertrain efficiency was analyzed for different SOC, CVT ratios and ambient temperature. One interesting aspect of this work is that it analyzed the SOC until the range of below 10%. Such SOC is rarely discussed in other literatures because in the actual application, the cut-off set point for the battery is usually set at around 20% to avoid any risk of damage. The analysis showed that, low SOC (at about 10%) decreased the powertrain efficiency by 33.12% and the acceleration time became longer by 68.8%. Such inefficiency was caused by the degradation of the battery that becomes significant starting from 10% SOC and lower. Because of the degradation, the battery internal resistance increases and its open circuit voltage decreases, which resulting in the increase losses percentage in the current flow from the battery to the EV motor. Moreover, at 10% SOC, the battery also became more sensitive to ambient temperature, which caused the losses in the current flow to increase even further. Above 10% SOC, however, the ambient temperature was insignificant in influencing these parameters, and the battery degradation was negligible, hence powertrain efficiency became more stable. By incorporating this knowledge in the eco-driving strategy, the constant driving speed condition in the NEDC can be increased from 61 to 70%, and the total driving time can be reduced by 12.1%, resulting in a more economical driving. This study highlighted the importance of ecodriving method that can only be implemented if the EV powertrain has the required flexibility in providing diverse driving modes. In addition, the study also served as the starting point to integrating battery's health conditions in the powertrain analysis.

To summarize, CVTs can provide better flexibility in ratio selection due to their continuous ratio range as opposed to any multispeed discrete transmission. This flexibility allows the EV's motor to operate optimally for various driving conditions. However, for the actual operation of the CVT in an EV, an appropriate shifting strategy must be formulated, either

continuous shifting or discretized shifting. The first strategy leads to better motor efficiency, but requires higher actuation power and advanced shifting logic. The second strategy, on the other hand, compromises the motor's efficiency slightly as compared to the first strategy, but its shifting logic can be made simpler and the actuation power consumption can be reduced. Another area that has to be studied is the actuation system for ratio shifting and belt's clamping in the CVT. Here, three possibilities can be explored; either optimizing the hydraulic actuation system typically used in the existing CVT, or implementing electromechanical actuation system to replace the hydraulic actuation system in the CVT, or, developing geartrain-based design of CVT. By optimizing the hydraulic actuation system, the power required to generate continuous pressure for CVT ratio and belt's clamping force can be reduced. However, since the required belt's clamping force is still very high (around at least 10 kN and it increases with the increment of the motor's torque), thus the amount of required power will always be significant. As compared to any multispeed discrete transmission, such issue is eliminated thanks to the application of geartrain. Implementing electro-mechanical actuation system, on the other hand, eliminates the power requirement for the continuous pressure due to its self-lock mechanism, but, designing and integrating such system in a CVT requires extensive study to confirm not only its workability but also its reliability. A summary of the literature review on the CVT application for EV powertrains can be found in Table 4.

**Table 4.** Summary of the literature review on CVT application in an EV powertrain.




#### *4.3. Multi-Motor Configuration*

The idea of optimizing EV powertrains using a multi-motor configuration involves properly distributing the driving loads, typically between two motors or four motors, so that they can operate optimally under various driving conditions. Additionally, in some situations, these motors can also provide boosting power for faster acceleration and higher top speed. As a result, the capacity of the motors can be reduced without compromising the driving performance and this leads to potentially significant cost savings. There are three common types of multi-motor configuration studied for EV; the first one is two-motors configuration where both motors are connected to a transmission before the wheels, the second is the two-motors configuration where both motors are directly coupled to the wheels, and finally, the four-motors configuration where all motors are directly coupled to the wheels (Figure 11).

**Figure 11.** (**a**) Two-motors configuration where both motors connected to a transmission before the wheels, (**b**) both motors connected directly to the wheels, and (**c**) four-motors configuration where all motors directly connected to the wheels.

In the two-motors configuration where two motors are connected to a transmission before the wheels, allowing torque and speed couplings between the motors is crucial to suit diverse driving conditions. As shown in Figure 12, torque coupling means the torque from the ICE and the motor is combined through direct gearwheel, resulting in the shared torque requirement between them at the wheels. Equations describing the output torque and speed of the coupling are shown in Equations (1) and (2). Speed coupling, on the other hand, means the speed of the ICE is combined with the motor's speed through planetary gearset, resulting in higher speed, hence higher power, at the wheels (Figure 13). Equations (3) and (4) explain the relationship between the motors' inputs and the coupling's outputs. Torque coupling is generally useful for start-stop condition, while

speed coupling is usually applied to achieve fast acceleration and high-speed driving. Latest examples for the optimization of HEV powertrain configurations can be read in [71–73]. In Peng et al. [71], various HEV powertrain configurations based on CVT with metal belts and discrete gear automatic transmissions as the torque coupling, and a planetary gearset as the speed coupling, were generated using a fundamental matrix. In the work, feasible driving modes of these configurations were determined using an adjacency matrix, and based on these modes, the powertrain configurations were evaluated and compared against the benchmark configuration (Figure 14a) in terms of 0–100 km/h acceleration time and average power consumption under WLTC. The results demonstrated that the best configuration, as depicted in Figure 14b, managed to reduce the acceleration time and the average power consumption by 8.7% and 12.2%, respectively. Such improvements were possible because of the flexible driving modes provided by the proposed configuration that resulted in the reduced ICE power required for some driving conditions (due to the planetary gearset at the motor's output), and more efficient regenerative braking (due to the several torque coupling possibilities at clutch C3 and C2). Results in [71] are also consistent with those discussed in [73], where it was found that the combination of CVT and planetary gearset is crucial to optimize the HEV's efficiency for various vehicle speeds.

$$T\_{out} = \frac{R\_{out}}{R\_{motor1}} T\_{motor1} + \frac{R\_{out}}{R\_{motor2}} T\_{motor2} \tag{1}$$

$$
\omega\_{out} = \frac{R\_{motor1}}{R\_{out}} \omega\_{motor1} = \frac{R\_{motor2}}{R\_{out}} \omega\_{motor2} \tag{2}
$$

$$T\_{out} = \frac{2R\_{carrier}}{R\_{sun}} T\_{motor1} = \frac{2R\_{carrier}}{R\_{ring}} T\_{motor2} \tag{3}$$

$$
\omega\_{out} = \frac{R\_{sun}}{2R\_{carrier}} \omega\_{motor1} + \frac{R\_{ring}}{2R\_{carrier}} \omega\_{motor2} \tag{4}
$$

**Figure 12.** Torque coupling realized through direct gearwheel meshing.

**Figure 13.** Speed coupling realized through planetary gearset.

**Figure 14.** (**a**) The benchmark and (**b**) the proposed HEV powertrain configuration described in [71].

Outcomes in [71–73] highlight the importance of a proper strategy for torque and speed coupling between two or more motors that can lead to flexible driving modes for an EV. Thus, to evaluate its implementation in EV [74–77], studied the effects of using multimotor configurations in EV powertrain on the power consumption based on several driving cycles. In [74], the same two motors were used in a two-motor EV powertrain configuration, and the powertrain was evaluated based on three different torque distribution strategies between the motors, where the final strategy was optimized using adaptive non-linear PSO. In [75], on the contrary, two motors with different maximum torque were used where they were connected through planetary gearset. Here, only speed coupling is possible, and these motors were controlled using a combination of speed feedback control strategy and torque feedforward control strategy to minimize jerking during the shifting of the driving modes. Other papers describing the implementation of dual motor configuration with planetary gearset can be read in [76,77], and because they allowed only speed coupling, the flexibility in terms of driving modes was limited.

Meanwhile, in [78], three configurations of two motors EV powertrain without multispeed transmission were considered; configuration for torque coupling (Figure 15a), configuration for speed coupling (Figure 15b), and a configuration for both torque and speed couplings (Figure 15c). Therefore, this configuration offered significant improvement in driving modes flexibility. In the first configuration, a single planetary gearset was used, in which the first motor was directly connected to the gearset's sun gear, while the second

motor was connected to the same sun gear through a clutch. For the second configuration, two planetary gearsets were used with a brake at their ring gears. The first motor was meshed to the sun gear of the first planetary gearset, and the planet carrier here was rotatable. The second motor, on the contrary, was connected to the second planetary gearset's sun gear, where its planet carrier was fixed to the casing. The engagement of the brake on the ring gears of both gearsets was used to control the speed coupling in the configuration. Finally, the third configuration was essentially a heavily redesigned second configuration, that now has a clutch between the sun gears, and the rotation of the second planet carrier was controllable through another brake. As a result, the third configuration was more flexible, thanks to its capability to provide both torque (through the clutch between the sun gears) and speed couplings (through the brake at the ring gears). The benchmark in the study was a single motor EV powertrain with single-speed transmission. The gear ratios of all these powertrain configurations were optimized using non dominated sorting GA (NSGA-II) for optimum efficiency under UDDS, HWFET and NEDC. Compared to the single motor EV powertrain with single-speed transmission, in average, the single motor with two-speed transmission was about 2% more efficient, while the first, second and third two motors EV powertrain configurations were 5.77, 5.57 and 6.40% more efficient, respectively. It was interesting to note here, that the efficiency of the first two motors configuration (capable of torque coupling only) and the second configuration (capable of speed coupling only) were pretty much the same, although, in terms of mechanical system, the latter was significantly more complex than the former due to the application of two planetary gearsets. Next, for the second configuration with speed coupling and torque coupling options, the design was much more complex since it required three actuators for the two brakes and a clutch. With the difference of only 0.63% in terms of efficiency gain between the first and the third configurations, it was logical to choose the first configuration for an actual implementation. Therefore, in the future work for [78], more studies can be carried out to compare the driving and gearshifting performance of these configurations so that more aspects can be evaluated.

**Figure 15.** (**a**) First, (**b**) second and (**c**) third two motors EV powertrain configurations proposed in [78].

Next, in a latest work described by [79], a two-motor configuration based on a Simpson planetary gearset was proposed in a two-motor EV powertrain configuration. Unlike the configuration described in [78], this proposal provided two gear ratios in the powertrain, resulting in more flexibility for the driving modes. The Simpson planetary gearset consisted of two planetary gearsets with a brake for each set's ring gear, and another brake was used at the first motor's shaft. The ring gear of the first planetary gearset was connected to the second motor, and the ring gear of the second planetary gearset was connected to the first gearset's planet carrier. The planet carrier of the second gearset, on the other hand, was connected to the wheels through a differential, while the sun gears of both planetary gearsets was connected to the first motor. Figure 16 depicts the diagram of the Simpson planetary gearset configuration proposed in [79], which was capable of providing six driving modes (two modes with two motors, and four modes with one motor). The two modes with two motors represented the possibility of torque coupling and speed coupling of the motors, and the four single-motor modes represented the power flow from the first and second motor through two-speed gear ratios. The powertrain configuration's motor power and gear ratios were optimized using GA for minimum average efficiency in six driving cycles (LA92, JP1015, NEDC, WLTP and HWFET), high gradeability (40% at 10 km/h), high top speed (at least 190 km/h) and fast 0–100 km/h acceleration time (at around 10 s). The proposal was then compared with the typical parallel axle dual motor configuration with fixed gear ratio for evaluation. In terms of acceleration, the proposal provided faster 0–50 km/h acceleration time, but no significant difference was observed for the 0–100 km/h. For the average efficiency, the proposed Simpson planetary gearset configuration was more efficient than the typical parallel axle dual motor configuration by around 2.88 to 8.33% when employed in driving cycles with frequent acceleration and deceleration. However, in a driving cycle with relatively constant vehicle speed (like HWFET), the typical configuration was slightly more efficient by 0.45%, very likely due to the losses in the planetary gearsets. Moreover, the proposed configuration also required three actuators for the three brakes to properly control coupling between the two motors. In this aspect, advanced control algorithm for the actuators is crucial to ensure that they can be operated systematically not only for powertrain efficiency but also for driving comfort by minimizing the jerking during the driving mode shifting. Thus, in the future study, it is imperative to evaluate the workability and control of these actuators so that their performance in terms of powertrain efficiency, jerking and shifting time can be quantified and compared.

**Figure 16.** (**a**) Proposed two motors powertrain configuration with Simpson planetary gearset and (**b**) the typical parallel axle two motors configuration [79].

The subsequent work on multi-motor configuration is described in [80], which like [79], studied the possibility of applying an EV powertrain with a two motor configuration and a multispeed discrete transmission, in this case four speed AMT. The powertrain was applied in a city bus for a specific Nuremberg City Cycle and also for NYCC, where the focus was to optimize the driving strategy by properly coordinating the gearshifting. This was to avoid gear hunting, which not only affected the powertrain efficiency, but was also detrimental to the driving comfort. Next, in [81], the objective was to determine and compare the performance of the configuration against single motor configuration with four-speed AMT. The study started by generating two optimized configurations of two-motors-four-speed-AMT EV powertrain, where the optimization objectives were to obtain minimum operating cost (defined as minimum total power consumption for the aforementioned driving cycle) and high driving performance (defined as the minimum acceleration time for 0–40 km/h). The control variables of the optimization were motor scale factor; expressed as the ratio of motor 1's power divided by the power of motor 2, and the four gear ratios of the AMT. The optimum combinations of the variables to meet the optimization objectives were determined using NSGA-II. As a result, three powertrain configurations were finalized; Configuration 1 that consisted of one motor configuration and two optimized gear ratios, Configuration 2 that consisted of two motors configuration with the motor scale factor of 0.42 and four optimized gear ratios, and Configuration 3 that consisted of two motors configuration with the motor scale factor of 1.00 and four optimized gear ratios. All of them achieved the same acceleration time of 8.5 s, but in terms of power consumption, Configuration 1 recorded the worst at 7.48 kWh, while the second and third configurations managed to improve over the first one by 4.82% and 5.08%, respectively. Next, the optimum driving strategy was formulated for each configuration in terms of shifting schedule and motors coupling modes so that the driving efficiency and performance can be improved further. The simulation results of the three powertrain configurations plus the optimal driving strategy showed that, both Configurations 2 and 3 allowed the motors to operate with at least 85% efficiency rate for 65% of the driving cycle. As a comparison, in Configuration 1, the motor was allowed to operate at the same efficiency rate only for about 32% of the same driving cycle. Because of that, the total power consumptions obtained for Configurations 2 and 3 were lower than Configuration 1, at 7.219 kWh and 7.216 kWh, against 7.627 kWh, respectively. Nevertheless, it must be mentioned that in Configuration 1, the gear shifting occurred only 46 times during the cycle, which was significantly lower than 84 and 80 each for Configurations 2 and 3. These findings indicated that the one motor EV powertrain configuration with two-speed transmission is potentially advantageous in terms of overall cost (production, operation and maintenance costs) than those two motors configurations, even though it performed the worst in terms of efficiency. Not only that, the acceleration time was also the same for all configurations, and this reinforce Configuration 1 as the overall best choice as opposed to the other two. Therefore, further study on the operation cost can be carried out in the future by relating the data of the gearshifting frequency and the wear and tear of the shifting mechanism. Next, the shifting performance especially in terms of jerking must also be evaluated for all the configurations so that more aspects can be compared to determine the overall best configuration between the three. Other works describing the application of multi-motor configurations with multispeed transmission can also be read in [82], which reviewed the methodologies of the multi-motor configurations, and [83], which presented an optimization of gear ratios and torque distribution of two-motor EV powertrain configuration with two-speed transmission using a surrogated model developed based on an effective adaptive sampling method.

Apart from using two motors powertrain configurations, there were also other studies performed to evaluate the application of four motors in the EV powertrain. For instance, in [84,85], four motors were used with each of them assembled in the EV's four wheels (Figure 17). The main idea here was to split the weight distribution of the powertrain evenly to all wheels, thus enabling the increase of driving flexibility without the application of bigger motor and transmission. Each motor was also coupled to a two-speed AMT

designed based on planetary gearset, where the appropriate gear ratio was actuated using DC motor and worm gear mechanism. In addition, the gear ratio actuator also featured ball-ramp self-energizing that consisted of translation plate with steel balls and spiral ramps. The purpose of this mechanism was to amplify the clutch engagement force during the gearshifting, resulting in the reduction of DC motor's required power. To simulate the powertrain performance, the AMT's gear ratios as well as its complete parameters were determined based on a transmission-equipped wheel hub motor described in [86], while the gearshifting schedule was developed based on the vehicle speed and the throttle position. To simplify the simulation model, a dog clutch model was used as the surrogate model for the proposed gear ratio actuator. Two shifting approaches were applied—synchronous and asynchronous—where synchronous means that the gearshifting occurred simultaneously for the front and the rear wheels, while asynchronous means the gearshifting was done independently between the front and the rear wheels with a delay of 0.2 s. The key benefit lay in the asynchronous approach, which minimized the jerking thanks to the delay that reduced the torque interruption in the front wheels' gearshifting by compensating it with the torque at the unshifted rear wheels (and vice versa). As a result, the jerking can be kept within the range of 4 m/s<sup>3</sup> to 6 m/s3. However, the proposed powertrain configuration involved four independent two-speed AMT actuators, which means, although it was highly flexible in terms of gearshifting, it required sophisticated control logic to avoid too frequent shifting and gear hunting. Too frequent shifting and gear hunting, if not properly optimized, will cause driving discomfort and increased losses in the actuators.

**Figure 17.** Four motors EV powertrain configuration with two-speed AMT [84,85].

The latest studies on four motors EV powertrain configuration focused on not only achieving optimum efficiency, but also enhancing driving safety and steering assist [87,88]. Unlike the powertrain configuration proposed in [83–85], the four motors EV powertrain introduced in [87,88] lacked the gearshifting options since it used single speed transmission. However, the omission of gearshifting options increase its simplicity in terms of operation and controls. In [87], the element of driving safety, together with the optimum efficiency in terms of power consumption, can be realized by implementing integrated motors' torque vectoring control strategy in the wheels. The integrated strategy was intended to achieve multiple objectives such as reasonable traction torque distribution on the wheels for yaw stability control and steering assists, proper motor's output torque to enable it to operate at

its most efficient range for optimum power consumption, and reducing the wheels' dynamic slip for driving stability as well as optimum power consumption. The proposed integrated strategy was then compared against the conventional axis distribution and maximization of stability margin strategies through simulation under WLTC, where the results showed that the integrated strategy reduced the wheels slip by 14.38% compared to the axis distribution strategy that led to the improved power consumption by 5.37%. The simulation results were then validated experimentally based on a single seat EV prototype that was driven at 60 km/h on a slippery road and then executing standard lane change maneuver. In the experiment, the wheels slip was reduced by 12.75% due to the implementation of the integrated torque vectoring control strategy as opposed to the axis distribution strategy. Meanwhile, in [88], a fuzzy logic algorithm was applied in the traction distributions on the wheels. The algorithm was responsible to make sure that sufficient traction can be given to each of the wheels to achieve the desired vehicle speed during the driving, while also ensuring that the vehicle trajectory follows the desired driving path. Components of the powertrain used here were optimized using PSO for minimizing the weight of the battery and motors, minimizing drop in the battery's SOC, and reducing driver steering efforts. With the optimized EV powertrain and the fuzzy logic algorithm for the wheels' traction distribution, driver steering efforts can be reduced by 78.5% and the driving mileage can be increased even with the reduced size of the battery. The results obtained in [87,88] demonstrated the potentials of implementing independent motors for all wheels not only for efficiency but also for the vehicle handling that can possibly contribute in the vehicle safety and autonomous vehicle technologies. For the future scope, the work in [87,88] can be expanded to evaluate the proposed configuration's performance in controlling the vehicle maneuvering against conventional traction control methods.

Summary of the reviewed literature on the multi-motor configurations for EV powertrain is presented in Table 5. There are two common multi-motor configurations studied in the literature; the two motor configuration, and the four motor configuration. The objective of the two motor configuration is to allow the operation of one motor to be supported by the other motor through either torque coupling or speed coupling. As a result, the operation of these motors can be optimized for diverse driving conditions. The challenge, however, is to come up with the proper mechanism for the couplings, which typically involves multi clutches and brakes. The next challenge will be to effectively and systematically control these clutches and brakes through actuators so that the driving mode shifting can be executed smoothly and efficiently. Regarding the four motors configuration, the objective is to minimize the transmission power loss, since the motor is coupled directly to each wheel. Besides, the traction on the wheels can also be distributed independently, which can improve not only the power consumption, but also the driving stability and safety. Nevertheless, to achieve these, advanced control algorithm is required to integrate effectively the operation of the four motors at the wheels.


**Table 5.** Summary of the literature review on multi-motor configurations for EV powertrain.



#### **5. Comparison and Future Works Related to the Methods for Optimizing Power Flow in EV Powertrains**

In the previous section, the methods to optimize power flow in an EV powertrain are reviewed and divided into three: applying multispeed discrete transmissions, applying CVTs with metal belts, and implementing multi-motor configurations. Although these methods can lead to improvements in driving efficiency and performance, their advantages and disadvantages relative to each other must also be properly assessed. Therefore, in this section, the three methods are compared and evaluated extensively to discuss their potential advantages and disadvantages. Afterwards, key areas for future research works in the context of optimizing EV's power flow are presented and discussed.

#### *5.1. Comparison of the Methods*

Among the three methods, multispeed discrete transmission is the most common one studied by scholars. Within this class, the two-speed transmission is the most popular, which can be described in the form of either AMT or DCT. Most of the works related to the application of two-speed transmission reviewed here involved optimization of the gear ratios and shifting strategy to achieve optimum power consumption and driving performance, in terms of acceleration rate and top speed [33–40]. As a result, the EV powertrain becomes more efficient and more capable, and this opens up the possibility of optimizing the size and capacity of the motor and the battery. Such a possibility is beneficial for production sustainability, because the usage of heavy materials for motors and batteries can be reduced. At the same time, since the two-speed transmission is very compact and shares significant degree of similarity with the traditional one in ICE vehicles, the existing transmission manufacturing process can also be utilized which will be costeffective for total EV production cost. The significant challenges, however, are the jerking in the gearshifting mechanism, the limited flexibility and the additional maintenance cost for the two-speed transmission.

Similar to multispeed discrete transmission, CVTs with metal belts also involve providing multiple ratios for optimum efficiency and driving performance in an EV powertrain. However, unlike a multispeed discrete transmission, CVTs are capable of providing a continuous ratio range, which addresses the limited flexibility problem faced by the multispeed discrete transmission approach. This presents an opportunity to implement it for diverse driving conditions and various vehicle segments. At the same time, it is also beneficial in terms of production sustainability and technology migration, because of the possibility to reuse the existing manufacturing processes and facilities. This is because the metal belt-based CVT for EV shares a significant number of common components with the existing ones used in ICE vehicles. The main challenges for this type of CVT, however, are its metal belt's operation and maintenance cost. The metal belt's operation requires high hydraulic pressure, significantly higher than the requirement for AMT and DCT, for maintaining its clamping force and ratio. In addition, the belt's operation also inevitably involves micro slippage between its components. These two factors cause transmission loss which is higher than that suffered in AMT and DCT. Also, these factors require slightly costlier maintenance than the other two types of discrete transmission commonly studied for EV application.

The last method reviewed here is the implementation of multi-motor configurations, which can be divided into two typical approaches: two-motor configurations and fourmotor configurations. For the two-motor configuration, the possibility for both torque and speed couplings certainly leads to flexibility for various driving modes. However, in some cases, providing either speed coupling or torque coupling can already be sufficient to achieve higher powertrain efficiency than the application of two-speed discrete transmission. The two-motor configuration can also be applied with a multispeed transmission, which can be particularly useful for optimizing the power consumption of heavy vehicles such as trucks and busses. For the four-motor configuration, it allows even more diversity in the driving modes than the two-motor configuration, thanks to the possibility to distribute

traction to each wheel independently. This presents an opportunity to optimize not only the powertrain efficiency, but also the driving dynamic of the EV (which is not possible for both the multispeed discrete transmission and CVT) and the motors' capacity and size. The main challenges of the multi-motor configurations, however, are the complexity in terms of the mechanical design and control due to the application of multi clutches, brakes and EV motors, as well as the production sustainability due to the high number of motors involved. A summary of the comparison between these three proposed methods of optimizing the EV powertrain is presented in Table 6.

**Table 6.** Comparison between the application of multispeed discrete transmission, CVT and multimotor configuration in EV powertrain.


*5.2. Key Areas for Future Research Works*

The latest research works on two-speed transmission for application in EV powertrains reviewed in this paper mostly focused on optimizing the gear ratios for powertrain

efficiency and performance. Besides, these works also emphasized the proper gearshifting strategy, which should be formulated accurately by taking into account the road conditions and driver's input so that a balance between the powertrain efficiency and performance can be realized. This was consistent with the research trends previously discussed and reviewed in [89,90]. However, these works still did not sufficiently discuss the control algorithm of the gearshifting mechanism in detail. Examples of works on this topic can be read in [41–43,47,48], though these works evaluated only on the jerking of the mechanism alone without being integrated with the multispeed transmission inside the powertrain. Besides, they also did not assess the effect in terms of transmission efficiency and actuation power consumption. Therefore, in the future, works in the control of the gearshifting mechanism are expected to be intensified.

On the CVT with metal belt for EV powertrains, most of the previous related literatures focused on analyzing and comparing its efficiency and performance against multispeed discrete transmissions and multi-motor configurations. However, due to the application of hydraulic actuation system, the CVT suffered significant power losses, which prompted some scholars to study the practicality of replacing it with an electro-mechanical actuation system. At the moment, very limited works have been carried out to analyze thoroughly the application of electro-mechanical CVT for EV. Based on the latest review paper on an electro-mechanical CVT with metal belt in [91], some of the designs are capable not only of self-locking the ratio and belt's clamping force, but also precisely controlling them. Controlling the belt's clamping force, particularly, is the key to optimize the CVT efficiency as well as to optimize the durability of the electro-mechanical actuation system, as extensively explained in [92]. This means that transmission losses can be minimized as much as possible, though thorough studies still need to be performed since electromechanical CVT is still not a mature technology and so far, not being implemented for commercialization. Therefore, key research area here is the optimization of the ratio and clamping control algorithms in the electro-mechanical actuation system for CVT with metal belt. Another area that can be focused on is the possibility of implementing geartrain-based CVT, which eliminates the application of the metal belt entirely.

In the context of multi-motor configurations, key research areas that can be pursued are the control algorithm for traction distribution on the wheels and the durability of the powertrain system. Apart from optimum power consumption and driving performance, multi-motor configuration offers the chance to implement steering assist and wheels' traction control. However, since the powertrain is now attached directly to wheels, it now becomes part of the unsprung mass. As a result, the powertrain is now subjected to harsh operating condition involving direct vibrations caused by the road surface, as well as water splash and debris from the road surface. This factor very likely will affect the durability and maintenance routine of the powertrain, which requires further study to evaluate its significance to the overall ownership cost of the EV.

In terms of maintenance, refs. [93,94] discussed the gap between the operation cost for ICE and EV. In general, the purchasing cost for EV is higher due to the high battery cost, while the cumulative maintenance cost for ICE will be greater over time due to the frequent maintenance requirement for its powertrain. Depending on the vehicle segment, EV can achieve cost parity with the equivalent ICE model in around 8 years of ownership. Therefore, detail studies to determine the acceptable cost parity between ICE vehicle, multispeed discrete transmission, CVT with metal belt, multi-motors configuration and the conventional single-speed transmission EV are still required.

Another topic that is relevant for future research work here is the lubrication and cooling of the powertrain, which is particularly crucial for the CVT with metal belt. If an integrated cooling and lubrication system can be developed for all components (i.e., battery, motor and transmission) in an EV powertrain, the ownership cost of the EV can be reduced significantly. Study by [62] have started to evaluate the possibility of integrating the cooling system for the CVT fluid and the motor's fluid, and this helps in making the powertrain system more compact and cost effective. Latest review on the lubrication for

EV powertrain can be accessed here [95,96] which explored the possibility of integrating the cooling system for all components of an EV powertrain. The literatures also evaluate various possible lubricants for specific EV powertrain that have different requirements than the conventional ICE powertrain.

The subsequent research area that worth studying in the future is the possibility to implement the same EV powertrain configuration for diverse vehicle segments for cost savings in production [97,98]. As highlighted in [99], CVTs with metal belts or chains present the opportunity for application in various EV segments thanks to their continuous ratio range. However, the powertrain's performance when applied in different segments have to be properly analyzed so that the advantages and disadvantages of its application can be determined. Besides, certain modifications on this type of CVT must also be studied, since different segments normally involve different motor power requirements, which will require different specifications for the metal belt. To address this, the possibility of using geartrain-based CVT should be explored, which can eliminate the belt's application.

The final research area proposed here is the implementation of a holistic eco-driving method. According to [100], the fundamental aspect of eco-driving is to maximize the constant vehicle speed range so that losses in the powertrain can be reduced, which based on the study can be reduced by as much as 27% depending on the vehicle segments. A holistic eco-driving method, for the future study, should consider not only maximizing the constant vehicle speed range, but also optimizing the motor's efficiency range, minimizing the transmission power losses, optimizing regenerative braking, as well as maximizing the battery's health and durability without compromising much on the driving comfort [101]. Table 7 summarizes the potential key research areas that can be pursued in optimizing EV powertrains in the near future

**Table 7.** Expected key research areas on optimizing power flow in EV powertrain.


#### **6. Conclusions**

EV market penetration globally is expected to intensify in the near future thanks to their improved practicality, reduced ownership cost and governments' policy on emissions, among other factors. In one aspect, this development is expected to reduce greenhouse gas emissions from new vehicles. In other aspects, however, the increased EV market share also leads to new challenges such as production sustainability, excessive increase in electricity demand and technology migration issues. If not properly addressed, these challenges will cause not only excessive cost to the manufacturers and customers, but also potentially reverse the environmental gains from the reduced tailpipe emissions. Therefore, optimizing the power flow of the EV powertrain is the key to addressing those challenges, which can be divided into three methods: multi-speed discrete transmission, CVT and multi-motors configuration.

In this paper, the latest literatures on the three methods have been reviewed extensively in terms of the methodology and significant findings. Next, the methods are compared to assess their advantages and disadvantages. In short, multispeed discrete transmission, especially two-speed discrete transmission, features an advantageous compact design which makes it very practical for EV powertrains. As a result, the extra weight due to the inclusion of the transmission in an EV can be minimized, and the shifting strategy can be made simpler and more effective to avoid too frequent gearshifting that will compromise driving comfort. However, the two-speed discrete transmission lacks flexibility due to its limited number of gears, hence it is not practical for diverse driving modes and vehicle segments. In this aspect, CVTs and multi-motor configurations are more flexible, due to their continuous ratio range, and options for independent traction distribution on the wheels, respectively. Nevertheless, CVT suffers from significant losses in its hydraulic actuation system and belt, while multi-motors configuration requires advanced control algorithm to precisely distribute the wheels' traction, as well as extra cost due to the high number of motors being used.

From the review, several key research areas have been identified for the future study. The latest literature mostly focused on optimizing the gear ratios considering motor's efficiency and driving conditions (for multi-speed discrete transmission and CVT), optimizing the shifting strategy for diverse driving cycles (for multi-speed discrete transmission, CVT and multi-motors configuration), and optimizing the traction distribution on the wheels for reduced power consumption and improved vehicle dynamics (for multi-motor configurations). Thus, the identified key research areas are; optimizing the gearshifting mechanism and its control (for multi-speed transmission and multi-motors configurations that feature two-speed AMT), evaluating and optimizing the electro-mechanical actuation system for CVT, optimizing the wheels' traction distribution for steering assists and driving safety (for multi-motors configuration), optimized and integrated lubrication and cooling system for all EV powertrain's components, detailed cost and environmental assessments of their application in EV, and finally, implementation of advanced eco-driving strategy considering not only motor's efficiency, but also transmission losses and battery SOC. These areas are crucial for optimizing EV powertrains' efficiency and performance for a more sustainable and cost-effective EV.

**Author Contributions:** Provision of resources in terms of the relevant literatures on EV powertrain, I.I.M., Z.H.C.D., M.K.A.H., V.T. and A.J.; Research and review on the literatures related to EV powertrain optimization using multispeed discrete transmission, I.I.M., M.S.C.K. and M.H.A.T.; Research and review on the literatures related to EV powertrain optimization using continuously variable transmission, I.I.M., Z.H.C.D. and M.S.C.K.; Research and review on the literatures related to EV powertrain optimization using multi-motor configuration, I.I.M., P.M.S., K.A.I. and W.X.; Research project supervision, I.I.M., Z.H.C.D. and M.K.A.H.; Writing of the original draft, I.I.M., V.T. and A.J.; Reviewing and editing of the final article, I.I.M., P.M.S. and M.H.A.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by Malaysian Ministry of Higher Education (MOHE) through Fundamental Research Grant Scheme (FRGS) (FRGS/1/2021/TK0/UTM/02/44).

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We are grateful Opia Anthony Chukwunonso for Reviewing and editing of the final article.

**Conflicts of Interest:** The authors declare no conflict of interest in terms of financial and personal that could influence the work presented in this paper.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-4858-6