# **Smart Energy and Intelligent Transportation Systems**

Edited by Albert Y.S. Lam, Bogusław Łazarz and Grzegorz Peruń Printed Edition of the Special Issue Published in *Energies*

www.mdpi.com/journal/energies

# **Smart Energy and Intelligent Transportation Systems**

# **Smart Energy and Intelligent Transportation Systems**

Editors

**Albert Y. S. Lam Bogusław Łazarz Grzegorz Peru ´n**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Albert Y. S. Lam The University of Hong Kong Hong Kong

Bogusław Łazarz Silesian University of Technology Poland

Grzegorz Perun´ Silesian University of Technology Poland

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Energies* (ISSN 1996-1073) (available at: https://www.mdpi.com/journal/energies/special issues/ energy intelligent transportation).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-4453-3 (Hbk) ISBN 978-3-0365-4454-0 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **About the Editors**

#### **Albert Y. S. Lam**

Albert Y. S. Lam received his BEng degree (First Class Honors) in Information Engineering and his PhD degree in Electrical and Electronic Engineering (EEE), both from The University of Hong Kong (HKU), in 2005 and 2010, respectively. He was a postdoctoral scholar at the Department of Electrical Engineering and Computer Sciences of University of California, Berkeley. He is now the CTO at Fano Labs, a deep-tech startup specializing in speech and language technologies. He also serves as an Adjunct Assistant Professor in both EEE and Geography at HKU, and a Croucher research fellow. His research interests include optimization, artificial intelligence, and smart cities. He is an Associate Editor of IEEE Transactions on Evolutionary Computation and IEEE Transactions on Intelligent Transportation Systems. He also serves as a co-Editor-in-Chief of EAI Endorsed Transactions on Energy Web.

#### **Bogusław Łazarz**

Bogusław Łazarz—Professor at Silesian University of Technology (SUT), Faculty of Transport and Aviation Engineering, Department of Automotive Vehicle Construction, the Vice-Rector for General Affairs of SUT, previous Dean of the Faculty of Transportation; supervisor of many engineering, master's and doctoral theses; winner of numerous awards of the Rector of the SUT for individual and team scientific, didactic and organizational achievements; member of the Presidium of Transport Committee Polish Academy of Sciences.

#### **Grzegorz Peru ´n**

Grzegorz Perun received his M.Sc. degree from the Silesian University of Technology, Katowice, ´ Poland in 2004. His Ph.D. degree was received in 2010. The thesis was defended with honors at the Silesian University of Technology, and also for which he received an award from the Fiat Group (Gliwice - Turyn 2011). Habilitation "Modeling dynamic of power transmission system with planetary gear in computer-aided design and diagnostics" obtained on Air Force Institute of Technology, Warsaw, Poland in 2018. Author and co-author of over 200 published scientific works. Author of the monograph, academic textbook "Engineering design with software HiCAD" (2014), and co-author of two patents and two academic textbooks: "Aircraft Airframe" (2014), and "Aircraft Systems" (2015). His research interests include mechanical engineering, construction of machines, especially modeling of toothed gears, vibroacoustics, gearbox diagnostic, signal processing, non-destructive testing techniques, GIS, and transport. Since 2019 he is a professor SUT in the Faculty of Transport and Aviation Engineering, Department of Road Transport. Member of The Polish Association of Technical Diagnostics (PTDT), Vice-Chairman of PTDT Audit Committee. He is proficient in several programming languages, which he uses effectively in his academic work and research, and is the author of, among others, software for resonance defectoscope.

### *Editorial* **Smart Energy and Intelligent Transportation Systems**

**Albert Y. S. Lam 1,2,\* , Bogusław Łazarz <sup>3</sup> and Grzegorz Peru ´n <sup>3</sup>**


#### **1. Introduction**

With the Internet of things and various information and communication technologies, a city can manage its assets in a smarter way, constituting the urban development vision of a smart city. This facilitates a more efficient use of physical infrastructure and encourages citizen participation. Smart energy and smart mobility are among the key aspects of the smart city, in which the electric vehicle (EV) is believed to take a key role. EV adoption in the market can clearly provide supporting evidence. When comparing to 2019, the 2020 global EV stock showed a 43 % increase, hitting the 10 million mark.

EVs are powered by various energy sources or the electricity grid. With proper scheduling, a large fleet of EVs can be charged from charging stations and parking infrastructures. Although the battery capacity of a single EV is small, an aggregation of EVs can perform as a significant power source or load, constituting a vehicle-to-grid (V2G) system. V2G refers to a system allowing EVs to communicate with the power system for demand response services by either discharging their excessive energy to the grid or by being charged with the excessive electricity from the grid. Besides acquiring energy from the grid, in V2G, EVs can also support the grid by providing various demand response and auxiliary services. We can reduce our reliance on fossil fuels and utilize renewable energy more effectively. V2G enables EVs to store electricity generated from renewable energy sources so as to overcome the intermittency of renewable due to different weather and time of day conditions.

The EV market is growing very quickly, and there will likely be an abundance of EVs running on the road in the near future. EVs are also important building blocks to developing intelligent transportation systems. The self-control of autonomous vehicles (AVs) and the systematic remote control of AV fleets will bring smart energy and intelligent transportation systems into new dimensions. We can develop a public transportation system with AVs, in which a fleet of AVs is managed to accommodate transportation requests, offering pointto-point services with ride sharing. AVs can also participate in V2G to support various V2G services. Through properly coordinating AVs in appropriate parking facilities, it has been shown that AVs can facilitate better V2G services with higher efficiency. On the other hand, an energy delivery system can be built upon the transportation network and EVs utilized as energy carriers to transport energy over a large geographical region. With proper routing, energy can be transmitted from sources to destinations as in a packet-switched network. Such a system can complement the power network and enhance the overall power system performance.

This Special Issue contributes to the smart energy and intelligent transportation system agenda through enhanced scientific and multi-disciplinary knowledge intended to improve performance and deployment by bringing some focus to electric and autonomous vehicles in order to meet technical, socio-economic, and environmental goals, as well as for energy security. We are particularly interested in investigating how smart energy technologies contribute to intelligent transportation systems, and vice versa.

**Citation:** Lam, A.Y.S.; Łazarz, B.; Peru ´n, G. Smart Energy and Intelligent Transportation Systems. *Energies* **2022**, *15*, 2900. https:// doi.org/10.3390/en15082900

Received: 8 March 2022 Accepted: 12 April 2022 Published: 15 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **2. Summary of the Contributions**

Research works by scholars from divergent backgrounds are included in this Special Issue. Refs. [1,2] focus on the health of the vehicle and road–rail accidents. Ref. [3] investigates the control of charging in V2G while [4] develops an EV recommendation system. Ref. [5] studies the loading hub placement in an electric bicycle-oriented city logistic system.

Ref. [1] makes use of artificial neural networks in diagnosing the technical condition of driving systems operating under variable conditions, in which the effects of temperature and load variations on the values of diagnostic parameters were taken into consideration. A new approach is proposed to train the network using a learning set from the efficient system only. The responses to new data from the undamaged system are compared to the response to data recorded for three damage states: misalignment, unbalance, and simultaneous misalignment. As a normalized measure of the deviations, a diagnostic parameter for the faulted system is derived from the result for the undamaged condition.

Ref. [2] focuses on assessing the likelihood of the occurrence of various effects of road–rail accidents in different selected situations.The specificity of the road–rail accidents requires a separate characteristic to categorize types of incidents and to specify the affecting factors with an assessment of the strength of this impact. Classification trees are adopted to deal with the ordinal form of the dependent variables. The experiment results facilitate the characterization and assessment of the danger and constitute guidelines for taking preventive actions.

The massive adoption of EVs creates an unprecedented energy load for the power system, in which the stability and quality are compromised by multiple simultaneously connected vehicles, especially on a local distribution level. In [3], a choice-based pricing algorithm is proposed to indirectly control the charging and V2G activities of EVs in non-residential facilities. Two metaheuristic methods are applied to solve the proposed optimization problem with a comparative analysis for performance evaluation.

Due to the Act on Electromobility and Alternative Fuels, EVs are becoming popular in Poland, in which local government units and state administration are expanding their EV fleets. The expansion of the fleet should be well-planned and supported by economic analyses. Ref. [4] develops an EV recommendation system which meets the needs of the local and state administration to the greatest extent. A multi-criteria decision analysis method is designed with the Monte Carlo method, and it allows of promoting more sustainable vehicles with high technical, economic, environmental and social parameters.

Electric cargo bicycles are popular transport for last-mile goods deliveries in urban areas with restricted traffic. In a cargo bike delivery system, loading hubs serve as intermediate points between vans and bikes ensuring loading, storage, and e-vehicle charging operations. The loading hub placement a key problem for designing city logistic systems, which heavily rely on electric bicycles. In [5], the authors propose a mathematical model by considering consignees and loading hubs as vertices in the graph constituting a transport network. This allows determination of the location of a loading hub under stochastic demands for transport services in the closed urban area.

**Author Contributions:** All authors contributed equally to the conceptualization and realization of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **The Use of Deep Learning Methods in Diagnosing Rotating Machines Operating in Variable Conditions**

**Paweł Pawlik 1,\*, Konrad Kania <sup>2</sup> and Bartosz Przysucha <sup>2</sup>**


**Abstract:** This paper presents the use of artificial neural networks in diagnosing the technical condition of drive systems operating under variable conditions. The effects of temperature and load variations on the values of diagnostic parameters were considered. An experiment was conducted on a testing rig where a variable load was introduced corresponding to the load of the main gearbox of the bucket wheel excavator. The signals of vibration acceleration on the gearbox body, rotational speed, and current consumption of the drive motor for different values of oil temperature were measured. Synchronous analysis was performed, and the values of order amplitudes and the corresponding values of current, speed, and temperature were determined. Such datasets were the learning vectors for a set of artificial deep learning neural networks. A new approach proposed in this paper is to train the network using a learning set consisting only of data from the efficient system. The responses of the trained neural networks to new data from the undamaged system were performed against the response to data recorded for three damage states: misalignment, unbalance, and simultaneous misalignment and unbalance. As a result, a diagnostic parameter as a normalized measure of the deviation of the network results was developed for the faulted system from the result for the undamaged condition.

**Keywords:** condition monitoring; vibroacoustic diagnostics; gearbox; power transmission systems; neural networks; deep learning

#### **1. Introduction**

Monitoring systems using vibration signals are often used in the industry to diagnose the technical condition of machinery. Systems based on parameters such as RMS (root mean square), crest factor, and peak of vibration signal are perfect for machines operating in constant operating conditions, where one can assume threshold values for these parameters, whereby exceeding them signals a system fault. However, these parameters are not sufficient to diagnose machines operating under variable loads caused by varying operating conditions [1]. Variable loading causes a change in speed, which makes diagnosis difficult using classical methods based on spectral analysis of the time signal. Synchronous methods are used in the diagnosis of machines operating at variable speed, which are based on the synchronization of the signal carrying diagnostic information with the rotational speed of the tested object [2–5]. However, varying operating conditions such as load or operating temperature can also affect the amplitude of the diagnostic signal [6–10]. An increase in amplitude due to a variable load can be misinterpreted by monitoring systems. Therefore, this impact must be considered in the monitoring process. A previous study [1] investigated the effect of load and speed on the amplitude values of the characteristic components and proposed a method for scaling these parameters. It is also possible to find papers on diagnosing a planetary gearbox operating under variable conditions, which addressed

**Citation:** Pawlik, P.; Kania, K.; Przysucha, B. The Use of Deep Learning Methods in Diagnosing Rotating Machines Operating in Variable Conditions. *Energies* **2021**, *14*, 4231. https://doi.org/10.3390/ en14144231

Academic Editors: Davide Astolfi and Bogusław Łazarz

Received: 20 June 2021 Accepted: 9 July 2021 Published: 13 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the problem of changing parameter values from load [7,11]. The solution to this problem can be realized using artificial neural networks [12–15]. The authors of [16] presented a one-dimensional multi-scale domain adaptive network in bearing diagnostics for different degrees of load. The preliminary preparation of diagnostic signals is important in using neural networks to diagnose machines working in variable conditions. In [17], higher-order spectral analysis and multitask learning were used. However, in [18], a method of learning a convolutional network was proposed using a labeled date and pseudo-labeled date. The measurement of drive motor current in diagnosing gear damage and bearings [19–21] is increasingly being used. The presented cases used artificial neural networks as classifiers. In the learning process, vectors for the undamaged and damage states were used as inputs of the network. However, in industrial conditions, especially in the heavy industry, there are no measurement data registered during damage because not all machines are monitored, and the drive system components are produced in small amounts. Moreover, it is impossible to predict all types of faults and introduce them into the network learning process. Accordingly, we propose a solution based on learning neural networks with data recorded only during fault-free operation, which analyzes the responses of the trained networks to signals recorded when faults are introduced.

This paper presents the use of artificial neural networks in diagnosing the technical condition of drive systems operating under variable conditions. The effects of temperature and load variations on the values of diagnostic parameters were considered. An experiment was conducted on a testing rig, where a variable load was introduced, corresponding to the load of the main gearbox of the bucket wheel excavator. The vibration acceleration, speed, and current signals feeding the drive motor were recorded for different values of oil temperature. Synchronous analysis was performed, and the values of order amplitudes and the corresponding values of current, speed, and temperature were determined. Such datasets were the learning vectors for a set of artificial neural networks.

This article presents a new approach to diagnosing machines working in variable conditions. Instead of analyzing the order spectrum, an analysis of the amplitudes of orders as a function of changes in speed, current, and temperature was carried out. A new approach proposed in this paper is to train the network using a learning set consisting only of data from the efficient system. In the next step, the behavior of such a trained neural network on new data from the undamaged system was investigated in comparison with the response to data recorded during three fault conditions of the drive system. As a result, a diagnostic parameter as a normalized measure of the deviation of the network results was developed for the faulted system from the results for the fault-free condition.

#### **2. Materials and Methods**

#### *2.1. Testing Facility Description*

The testing rig (Figure 1) consisted of a TRAMEC EP 90/1 planetary gearbox driven by an electric motor controlled by a frequency converter. The load consisted of a second induction motor connected to the gearbox by a jaw coupling. The load motor was also controlled by a frequency converter, which allowed setting any load function of the system.

Vibration acceleration on the planetary gear case was measured using a PCB 356B08 triaxial acceleration sensor, the temperature was measured using an LM35 sensor, speed was measured using a laser tachometer, and electrical current was measured using an ACS714 sensor. All measurements of the experiment were obtained using a specially built measuring system based on the PXI platform (PCI Extension for Instrumentation). The platform comprised the PXI Trigger Bus, which allows measurement synchronization between individual measuring cards.

**Figure 1.** Laboratory bench, (**1**) drive motor, (**2**) planetary gearbox, (**3**) load motor, (**S1**) acceleration sensor, (**S2**) temperature sensor, (**S3**) tachometer, (**5**) frequency inverter for the drive motor, and (**6**) frequency inverter for the load motor.

#### *2.2. Diagnostic Experiment*

Measurements were made for four drive system conditions, which are shown in Table 1.


**Table 1.** Designation and measurement time for the different conditions of the drive system.

The misalignment condition (*F*1) consisted of placing 0.5 mm thick shims under the front feet of the drive motor. The unbalance condition (*F*2) was introduced by placing additional mass (13 g) on the output shaft coupling. Condition *F*3 consisted of the simultaneous introduction of misalignment and unbalance. Signals of 30 min duration were recorded for each condition. Measurements were carried out when warming up a system for a temperature in the range of 35–40 ◦C.

The variable load changes from 1.3 Nm to 4.0 Nm were entered into the system. This caused a change in the rotational output shaft speed of 730–758 RPM and in the RMS value of drive motor current in the range of 2.48–2.51 A. The system was subjected to a variable load corresponding to the load occurring on the main gear of the bucket wheel excavator. The reference signal was obtained from the KWK 1500 s excavator main gear monitoring system, the subject of the patent in [22]. The main gear of the bucket wheel excavator is a bevel-planetary gearbox that drives a bucket wheel. The input of the bucket in the ground causes the system load. With a fixed set voltage value on the drive engine, the load causes a change in speed and vibration amplitude. Diagnosing the main gear of the wheel excavator has already been addressed in [7,11,23], which investigated the signals coming from industrial conditions. However, in industrial conditions, a controlled experiment with given damage cannot be carried out. Therefore, the experiment was carried out in laboratory conditions. The variable load-induced velocity change signals recorded under industrial conditions were scaled up to the capabilities of the laboratory bench.

#### *2.3. The Method of Determining Diagnostic Parameters*

The proposed signal analysis method is based on the order analysis algorithm. The order spectrum is determined using a method based on resampling the vibration acceleration signal against the input shaft speed. Figure 2 shows the schematic of the order analysis algorithm. In the first phase, the tachometer signal is subjected to an interpolation procedure using a cascaded integrator–comb filter (CIC). Next, on the basis of filtered tachometer signal, a vibration signal resampling procedure is performed to determine the vibration signal against the rotation angle (even angle signal). In the resampling method, time samples are converted to angle samples. The time samples are samples of the physical signal that are equally spaced in time. The angle samples are samples that are equally spaced in the rotation angle. The signal resampled as such can be subjected to a fast Fourier transform (FFT), the result of which is an order spectrum. An order spectrum represents amplitude as a function of order rather than as a function of frequency. The orders correspond to multiples of the rotational frequency of the shaft on which the speed measurement is performed [24]. In the case under consideration, the rotational speed measurement is carried out at the output shaft of the gearbox.

**Figure 2.** Schematic of the order analysis algorithm [24].

In addition to the averaged order spectrum, the time course of the amplitudes of the individual orders can be obtained from the order analysis. Information about the technical condition of the tested object can be obtained by monitoring the amplitudes of the characteristic orders. However, the change in amplitudes can also be caused by a change in the load on the system [6,25]. The load can be measured indirectly by measuring the electrical current consumption of the drive system. However, it is not always possible to accurately measure electrical current in industrial conditions, especially a measurement synchronized with a vibration measurement. However, in industrial settings, the driving motor is often supplied with a constant voltage and frequency, and any variation in speed is due to a change in load. The relationship between load change and speed recorded on the testing rig is shown in Figure 3. The rotational speed decreased as a result of the increase in the system load, while the load increase resulted in a higher power consumption by the driving motor.

In the test object, the rotational speed was determined by measuring the keyphasor signal. This measurement was synchronized with the vibration acceleration measurement, allowing load variations to be included in diagnostic estimates, such as characteristic component amplitudes. Figure 4 shows the speed waveform of the input shaft of the diagnosed object during operation under load. Speed changes were caused by changes in the system load because the frequency and voltage applied to the motor driving the system were constant. A similar case can be encountered in industrial settings. In the experiment conducted, rotational speed variation waveforms recorded on the main gear drive system of a bucket wheel excavator were used. The shape of the rotational speed variation waveform shown in Figure 4 corresponds to the waveform recorded on the main gear of a wheeled excavator operating in a brown coal mine.

**Figure 3.** The relationship between rotational speed and current consumption of the drive motor.

**Figure 4.** Input shaft rotational speed waveform.

Changes in rotational speed associated with varying load resulted in a significant change in the amplitude of the characteristic orders. The value of the order amplitude for the loaded and unloaded system changed significantly, hindering the diagnosis based on this parameter. Figure 5 shows the meshing order amplitude (no. 72) as a function of rotational speed for the efficient system, which varied under load. For maximum rotational speed value, a small load was applied to the system. As the load increased, the speed decreased, while the value of the meshing order amplitude increased because of the interaction between the teeth increasing.

**Figure 5.** Dependence of order no. 72 amplitude as a function of speed.

The values of ordinate amplitudes are also affected by temperature, and this was shown in [9]. Diagnostics should be performed for a constant temperature or by considering its influence during inference. Therefore, temperature waveforms measured on the gearbox body were also recorded.

Changes in amplitudes of characteristic orders in the order spectrum are usually analyzed in vibroacoustic diagnostics. According to the literature [6,26–28], individual faults or installation defects are characterized by changes in the amplitudes of the corresponding orders, e.g., a change in the amplitude of order no. 1 is symptomatic of unbalance, while a change in the order corresponding to the number of claws of the jaw coupling is due to system misalignment. Therefore, each order obtained from the order spectrum was considered separately because the characteristics of the order amplitudes as a function of load (rotational speed) would change with failure for each order in a different way [25].

As a result of the order analysis, 100 consecutive orders of vibration acceleration recorded in the direction parallel and perpendicular to the shaft axis were obtained. Current, temperature, and speed were recorded simultaneously. Signals of 30 min duration were recorded for each condition. Next, all the recorded signals were divided into time frames of 30 s each. Sets of signals of this length were sequentially subjected to the processing algorithm shown in Figure 6. The processing time of 30 s signals was shorter than 30 s; therefore, the algorithm can be implemented in continuous monitoring systems.

This procedure was performed for data recorded during correct operation (*F*0) and for signals recorded for individual faults *F*1, *F*2, and *F*3.

First, the vibration signal was subjected to order analysis. The results of this analysis were the waveforms of order amplitudes and speed over time. Figure 6 shows the case for one order. The waveforms of order amplitude, speed, temperature, and current were then sorted in ascending order with respect to speed. The results of such sorting were the courses of these parameters relative to the rotation speed. The next step was to determine the moving average for *N* consecutive elements. Averaging was used to reduce data scatter. The data thus prepared were input vectors for artificial neural networks.

**Figure 6.** Measurement signal processing algorithm for a single order.

#### *2.4. Using the Artificial Neural Network for Condition Assessment*

Deep learning artificial neural networks were used to evaluate the condition of the diagnosed drive train. The idea behind the method was to design the learning process of the neural network such that it detects abnormal situations in systems where faults have not historically occurred or have occurred so infrequently that typical classifiers based on supervised learning could not be used. Accordingly, only data from the undamaged system were used for the learning process.

In addition, it was considered that neural network learning is more efficient when there are significantly fewer outputs than the input data variables [29–31]. For this reason, instead of one model evaluating all 100 orders simultaneously, a separate model was built for each order.

From a diagnostic perspective, the best solution would be to reduce the diagnostic decision to a single numerical parameter. Therefore, the system was designed to produce a synthetic diagnostic parameter for each order by exploiting the ability of deep neural networks to recognize patterns autonomously [32,33]. Moreover, some theoretical results suggested that network-based systems can lead to proper robustness [34].

The input data vector for each network were *xk* ∈ *<sup>R</sup>*5, with its components including the following values, in respective order:


The input layer, therefore, consisted of five neurons, followed by two hidden layers [35,36]:


The output layer of the individual networks consisted of a single neuron returning a numerical value, which, for the correct state of the machine (*F*0), should return values close to the constant value determined in the network learning stage. In the experiment conducted, the conventional value of this constant was taken as *m* = 1.

The expected effect of modeling was that the applied neural network should realize some continuous transformation [32] *Hk* : *<sup>R</sup>*<sup>5</sup> → *<sup>R</sup>* with the property described in Equation (1).

$$H\_{k,i}(\mathbf{x}\_{k,i}) = \begin{cases} \begin{array}{c} m + \varepsilon \ \text{when } \mathbf{x}\_{k,i} \in F0\\ t(\mathbf{x}\_{k,i}) \ \text{when } \mathbf{x}\_{k,i} \notin F0 \end{array} \end{cases} \tag{1}$$

where *<sup>t</sup>*(*xk*,*i*) is the function from *<sup>R</sup>*<sup>5</sup> in *<sup>R</sup>* with the property that <sup>|</sup>*<sup>m</sup>* <sup>+</sup> *<sup>ε</sup>* <sup>−</sup> *<sup>t</sup>*(*xk*)<sup>|</sup> is significantly different from 0, *xk*,*<sup>i</sup>* ∈ *F*0 is the vector containing data from the undamaged system for order *k*, *xk*,*<sup>i</sup>* ∈/ *F*0 is the vector containing data from the faulty system (*F*1, *F*2, or *F*3) for order *<sup>k</sup>*, *<sup>i</sup>* is the index of the input vector, *<sup>m</sup>* <sup>=</sup> 1.00, and *<sup>ε</sup>* <sup>∼</sup> *<sup>N</sup>* 0, *σ*<sup>2</sup> 0 .

This means that, if *xk*,*<sup>i</sup>* ∈ *F*0, then the value of the function *Hk*,*<sup>i</sup>* should be close to the value of *m* = 1. However, if the vector *xk*,*<sup>i</sup>* corresponds to the operation of the defective transmission, the neural network should return a value different enough from *m* such that, taking into account the amount of variance, *σ*<sup>2</sup> <sup>0</sup> this difference could be detected by statistical analysis. Since the transformation in Equation (1) is continuous, this guarantees the stability of the network (similar values of the network response come from similar input values). There are some papers where similar observations were made for obtaining stabilization of randomized systems [37].

The primary task in constructing such an architecture is to avoid network overfitting. Due to the fact that we only used the data of a well-functioning machine during the network learning process, such a network may tend to adjust the weights in such a way that, regardless of the output, it returns a response equal to the learned value (i.e., returning a parameter close to *m* regardless of the input state). Therefore, the network training process used as input a set of vectors *xk*,*<sup>i</sup>* associated with the state *F*0. On the other hand, random values from the distribution *N*(*m* = 1, *σ* = 0.1) were generated as the output vector. Adding a perturbation with a small variance to the output vector, as early as at the network learning stage, reduced the risk of trivializing the function realized by the neural network.

Further reducing the risks associated with overfitting involved the following steps:


In addition, the use of these operations renders the network training process nondeterministic, such that the process of training the network for each order can be repeated independently, each time obtaining slightly different values of the weights in each layer, for the same input data. One can notice that this feature of the training process can be exploited to implement the bagging technique. The learning process was, therefore, repeated 120 times, resulting in 120 network models for each order. This approach provided a key mechanism to reduce the risk of system overfitting, as it dispersed this risk across multiple, independent models. In order to achieve that dispersion, evaluation values for 120 models were considered instead of considering the evaluation of a single model. The algorithm of the learning process is shown in Figure 7.

**Figure 7.** Learning process algorithm.

#### *2.5. Verification of Artificial Neural Network Functionality*

The functionality of the obtained model was verified after the network learning process. The responses of the proposed solution were investigated when the vectors obtained from the measurements for the machine fault conditions *F*1, *F*2, and *F*3 and for the correct operation state *F*0 were input. Vectors that were not used during network learning were used to verify the undamaged conditions.

The network response values *Hk*,*<sup>i</sup>* were determined for all orders, *k* ∈ {1...100}. Next, the average value was determined *H*ˆ *<sup>k</sup>* for each order according to the following relationship:

$$
\hat{H}\_k = \frac{1}{n} \sum\_{i=0}^n H\_{k,i}(\mathbf{x}\_{k,i})\_\prime \tag{2}
$$

where *n* is the number of input vectors, *i* is the index of the input vector, and *Hk*,*i*(*xk*,*i*) is the *i*-th output value of the network for the *k*-th order.

This process was then repeated for 120 trained models. The 120 values of the parameter *H*ˆ *<sup>k</sup>* were obtained for each order. This process was repeated for each condition of the tested drive train. The process of analyzing the functionality of the learned neural network for the *F*1 condition is shown in Figure 8.

For verification, ratings for states with damage were compared to ratings for a defectfree system. Verification of whether the average of 120 evaluations *H*ˆ *<sup>k</sup>* for each order differed significantly from the average of 120 evaluations *H*ˆ <sup>0</sup> for condition *F*0 was conducted using an ANOVA test.

Two independent input datasets of the undamaged system were used when testing the response for the undamaged system. This approach allowed not only determining whether there was a statistically significant difference between the network response to the fault condition and the response to the no-fault condition, but also measuring the magnitude of the observed discrepancy in a normalized way using the parameter *ω*2. According to the recommendations [38], negative values of this coefficient, which occur when the size of the noise exceeds the size of the effect, were treated as zeros. Taking this correction into account, the value *ω*<sup>2</sup> is described by Equation (3).

$$\omega^2 = \begin{cases} \begin{array}{c} \frac{SS\_{effact} - df\_{effact} \cdot MS\_{curr}}{MS\_{curr} + SS\_{total}}, & \text{when } \cdot SS\_{effect} > df\_{effact} \cdot MS\_{error} \\ 0, & \text{otherwise} \end{array} \end{cases} \tag{3}$$

where *SSef f ect* is the sum of squares of the effect from the ANOVA test, *d fef f ect* is the number of degrees of freedom of the effect from the ANOVA test, *SStotal* is the total sum of squares from the ANOVA test, and *MSerror* is the mean squared error of the residuals from the ANOVA test.

**Figure 8.** Algorithm of neural network functional analysis process for the *F*1 condition.

The parameter *ω*<sup>2</sup> allows a numerical evaluation normalized to the interval [0, 1] of the deviation of the results obtained from the neural networks for a given technical condition from the result for the condition without faults. The values of the parameter *ω*<sup>2</sup> can be classified by determining the power of the effect on the Cohen scale [39] (Table 2).


#### **3. Results and Discussion**

*3.1. Results of the Order Analysis*

When analyzing the spectrum of orders of vibration acceleration signals recorded during machine operation at variable load (Figure 9and Figure 11), a significant standard deviation can be observed for the gear coupling band (orders around coupling order no. 72). This scatter is caused by the high dynamics of the load (Figure 4) applied to the planetary gearbox.

**Figure 9.** Order spectra of vibration acceleration signal (vertical direction) for the undamaged system (**black**) and for the misaligned system (**red**).

Figure 9 shows the spectra of vibration acceleration orders (vertical direction) for the system without damage (black) and with the introduced misalignment (red).

In the case of misalignment, one would expect an increase in the amplitude of order no. 4 corresponding to the number of clutch teeth. However, there is a noticeable increase in amplitude near the coupling band—orders no. 60 to 70. On the other hand, observing the dependence of the amplitude of the order no. 4 as a function of rotational speed (Figure 10), we can observe more than a twofold increase for the speed equal to 745 RPM and for the speed over 755 RPM. Moreover, in the range of 748–755 RPM, the values for the amplitudes for the misaligned system are smaller than those for the efficient system. On the order spectrum, the difference is smaller because the order spectrum represents a value averaged over a period of over a dozen rotations.

**Figure 10.** Order no. 4 amplitudes of vibration acceleration signal (vertical direction) versus rotational speed for the system without damage (**black**), for the misaligned system (**red**), and for the unbalanced system (**orange**).

Comparing the order spectrum from the undamaged system with the order spectrum from the imbalanced system, it is difficult to see the increase in order no. 1 amplitude, which is symptomatic of this damage (Figure 11). These differences are relatively small when compared to the values of the coupling order amplitudes. However, the order no. 1 has a much smaller standard deviation than the orders in the coupling band of Figure 11.

**Figure 11.** Order spectra of vibration acceleration signal (vertical direction) for the undamaged system (**black**) and for the unbalanced system (**red**).

A much smaller scatter of amplitudes of the order no. 1 for unbalance than for misalignment is also seen in Figure 12. However, the amplitude for the alignment signal increased for the entire speed range by a value of 0.04 m/s2, constituting more than 70% of the initial value.

**Figure 12.** Order no. 1 amplitudes of vibration acceleration signal (vertical direction) versus rotational speed for the system without damage (**black**), for the misaligned system (**red**), and for the unbalanced system (**orange**).

In the case of variable load with a dynamic character causing speed changes shown in Figure 4, the analyzed damage is hardly visible in the order spectrum. Significant changes are visible only after careful analysis of the dependence of the amplitudes of the individual orders on the changes in rotational speed, which are caused by the change in load. Therefore, it is necessary to take into account the changes in ordinate amplitudes due to loading in the diagnosis process.

Temperature also has a significant effect on the values of order amplitudes in the interlocking band. Figure 13 shows the order spectra of vibration acceleration for the signal recorded for different temperatures. There is a clear increase in order amplitudes occurring with increasing temperature. This confirms the need to consider the temperature in vibroacoustic diagnostics.

**Figure 13.** Order spectra of vibration acceleration signal (vertical direction) for a system without damage for different temperatures.

#### *3.2. Results Obtained from Deep Learning Neural Networks*

(Figures 14–16) show the values of the *ω*<sup>2</sup> parameter for each order. This parameter is a numerical evaluation of the deviation of the results obtained from the set of neural networks for a given fault from the results obtained for the no-fault condition.

**Figure 14.** Values of the *ω*<sup>2</sup> parameter for individual orders—the system in the *F*1 condition (misalignment).

**Figure 15.** Values of the *ω*<sup>2</sup> parameter for particular orders—the system in the *F*2 condition (unbalanced).

**Figure 16.** Values of the *ω*<sup>2</sup> parameter for individual orders—the system in the *F*3 condition (misalignment and unbalance).

First, the control data from the measurement for the undamaged system (*F*0) were provided to the input of the trained networks. These values for all orders were 0, indicating no effect according to the Cohen scale.

Figure 14 shows the values of the parameter *ω*<sup>2</sup> for the misalignment condition (*F*1). A large effect size can be observed for orders no. 1, 2, 3, 4, and 68, while a moderate effect can be observed for orders near the coupling order. The increase in the parameter for order no. 4 is most reasonable and is related to the number of teeth in the coupling used. On the other hand, large values of the *ω*<sup>2</sup> parameter for the coupling band may be due to increased inter-tooth interaction during system misalignment.

For the unbalanced condition (*F*2) (Figure 15), a large effect of the Cohen scale is observed only for order no. 1. The values of the *ω*<sup>2</sup> parameter for the other orders indicate no effect (differences between the response of the network to the condition *F*2 and *F*0). This is consistent with theory because unbalance affects the order no. 1 amplitude values with respect to the shaft on which it occurs.

In the case of the *F*3 condition (Figure 16), where both faults were introduced, similar values of the *ω*<sup>2</sup> parameter can be observed as in the *F*1 condition. However, the effect of unbalance is also evident by the increased amplitude for the parameter for order no. 1.

#### **4. Conclusions**

This paper presents the problem of diagnosing machines operating under variable conditions. A diagnostic experiment was conducted on a testing rig for diagnosing a planetary gearbox. The effects of varying load and oil temperature on vibration acceleration signals measured on the body of a planetary gear that was part of the drive system were analyzed.

For the applied variable load, the use of order spectrum was not sufficient to diagnose the introduced drive train damages. This paper shows that the analysis of the dependence of the order amplitudes on the changes in rotational speed (caused by varying load) allows observing significant changes.

A method to evaluate the drive train technical condition using a set of artificial deep learning neural networks was proposed. The input data to the network were values of order amplitudes, temperature, current, and corresponding values of rotational speed. A new approach was proposed to train a set of neural networks using only data recorded for a drive train without faults. The response of the network to given sets of data from a system with introduced faults, i.e., misalignment, unbalance, and simultaneous misalignment and unbalance, was then tested.

The *ω*<sup>2</sup> parameter was proposed to evaluate the deviation of the network results for introduced faults from the result for the no-fault condition. This parameter is determined for each order individually. Therefore, damage can be identified on the basis of previous diagnostic knowledge by observing the orders carrying information about each damage.

The use of artificial deep learning neural networks made it possible to take into account the effects of varying load and temperature on the values of the amplitudes of the characteristic orders.

**Author Contributions:** Conceptualization, P.P.; methodology, P.P., K.K. and B.P.; software, P.P. and K.K; validation, P.P., K.K. and B.P.; formal analysis, P.P. and K.K.; investigation, P.P.; resources, P.P. and K.K.; data curation, P.P. and K.K.; writing—original draft preparation, P.P., K.K. and B.P.; writing review and editing, P.P., K.K. and B.P.; visualization, P.P.; supervision, P.P.; project administration, P.P.; funding acquisition, P.P. and B.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Polish Ministry of Science and Higher Education, grant number 16.16.130.942, and by the Fund for Science and Research of Lublin University of Technology.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Classification Trees in the Assessment of the Road–Railway Accidents Mortality**

**Edward Kozłowski 1, Anna Borucka 2,\*, Andrzej Swiderski ´ <sup>3</sup> and Przemysław Skoczy ´nski <sup>3</sup>**


**Abstract:** A special element of road safety research is accidents at the interface of the road and rail system. Due to their low share in the total number of incidents, they are not a popular subject of analyses but rather an element of collective studies, whereas the specificity of the road–rail accidents requires a separate characteristic, allowing, on the one hand, to categorize these types of incidents, and on the other, to specify the factors that affect them, along with an assessment of the strength of this impact. It is important to include in such analyses all potential predictors, both qualitative and quantitative. Moreover, the literature considers most often a number of accidents while, according to the authors, it does not fully reflect the scale of the danger. A better evaluation would be the victim's degree of injury. Therefore, the purpose of this article is to assess the likelihood of occurrence of various effects of road–rail accidents in the aspect of selected factors. Due to the ordinal form of the dependent variable, the classification trees method was used. The results obtained not only allow the characterization and assessment of the danger but also constitute guidelines for taking preventive actions.

**Keywords:** road–railway accidents; classification trees; road safety; transport means; accidents victims

#### **1. Introduction**

Railroad transport safety is an important factor taken into account when evaluating the operation of this branch of transport. Due to the importance, scope and consequences for society and economy of the low level of traffic safety, it is the subject of many studies and analyses and is systematically evaluated [1–3]. According to the latest annual reports/statistics from the International Union of Railways (UIC), the number of railroad accidents is decreasing [4,5]. The European Union Agency for Railways (ERA) reports are similarly optimistic. The latest reported safety level is historically the highest, although ERA points out that while safety levels have steadily improved, the rate of improvement has slowed down [6].

World statistics are derived from research, and thus, the presented trends are very important for decision-making also at the national level. It should be noted that the level of safety should be shaped primarily at the local level. In Poland, according to the latest data [7], both the overall number of railroad accidents and fatalities decreased. However, the rates of accidents at railroad crossings have not decreased. For example, in 2019, 11 more people died there than the year before. This shows that despite the overall positive indicators (both at the global and national level), there are areas for safety improvement and analysis in a smaller scope, in this case concerning only road–railway crossings. This became the genesis of this publication. Additionally, the UIC report shows that at the international level, railway crossing accidents account for as much as 15% of

**Citation:** Kozłowski, E.; Borucka, A.; Swiderski, A.; Skoczy ´ ´ nski, P. Classification Trees in the Assessment of the Road–Railway Accidents Mortality. *Energies* **2021**, *14*, 3462. https://doi.org/10.3390/en14123462

Academic Editor: Grzegorz Peru ´n

Received: 11 May 2021 Accepted: 9 June 2021 Published: 11 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

all diagnosed causes of railroad accidents, being the second most common cause (after persons trespassing on railroad infrastructure) estimated at 75% [5]. This further supports the analysis presented in the article. Another important argument is that, according to the ERA report [6], the overall level of safety at railroad crossings in Europe has improved. The average annual decrease in accidents between 2010 and 2018 was 3%, and 4% for fatalities. This shows that Poland does not fit into the general trend in Europe in this regard. Therefore, detailed analyses for the country are necessary.

Rail–road level crossings are an important element of the road infrastructure, enabling an intersection of the road and rail vehicles' tracks. There are over 14,000 such intersections in Poland [8] at which, in the period under investigation (the years 2014–2020), almost 8000 incidents (collisions and accidents) were recorded. Approximately 259 people were killed, and 404 were injured. Such accidents, apart from the tragic consequences, are associated with high costs, especially regarding the repair of rail vehicles. That is why the research is important, aimed to increase the level of safety and reduce the scale of the danger.

The accidents at the meeting point of road and rail transport are not a popular subject of analyses because they are relatively rare. This is shown in Figure 1, which shows the number of total road accidents during the study period and the number of road– railway accidents. The differences are so large that two scales had to be used to make the figure visible.

**Figure 1.** Number of total traffic accidents and road–railway accidents during the studied period.

Calculations show that road–rail accidents account for only 5% of all traffic incidents. Therefore, the interest in this issue is mainly due to the potential severity of their consequences. The research in the literature concerns, for example, predicting the likelihood of accidents, injuries and fatalities using logistic regression [9]. Ghomi and others in [10] used the ordered probit model, CART and association rules to evaluate the factors that most strongly affect the risk of an accident, which turned out to be train speed, age and gender of the incident participant.

The large-scale study was conducted in [11] and in [12]. The first concerns the analysis of the railroad level crossings in Great Britain over 64 years (1946–2009), while the second one concerns Australia (Victoria) in the years 1969–1974. However, the mentioned research is not new. The dynamic development of motorization, the constantly increasing number of road users, as well as the development of road safety systems [13] make it necessary to update them. This conclusion indicates that new analyses are needed in this area.

The accessibility of information does not help such analyses, which in many countries is very limited [14,15], which is why the data from simulators are used [14].

The most frequently studied element are factors affecting the number of accidents or their consequences. They differ from one country to another. In Israel [14], the warning device category, vehicle traffic intensity, train traffic intensity, and visibility conditions were considered important. In Ethiopia [16], the studies have shown that most accidents are due to human error, followed by technical problems and non-compliance with operational procedures. In [17], Ling et al. evaluated the derailments of Australian passenger trains as a result of a collision with heavy road trucks.

Classification models are popular both for the analysis of railroad accidents and all accidents in general [18,19]. The decision to use them is primarily influenced by the form of the dependent variable, as well as the nature of the factors that may affect the number of such incidents or the injuries caused by them. These are often qualitative variables, which limits the availability of some methods of mathematical analysis.

Most commonly used are logistic regression and decision trees, as the most popular tools for assessing the impact of qualitative predictors [20–24]. Decision trees were used, for example, in the investigation of car accidents from the years 2005 to 2006 in Taiwan, where the key factors determining the effects of injuries turned out to be driving under the influence of alcohol, not using seat belts, type of vehicle, type of collision, number of vehicles involved in the accident and location of the accident [25].

The authors [24] also used such a method to analyze 4-year observations regarding road accidents in India, indicating that in order to improve safety on motorways, first of all, it is necessary to properly design and control motorway entrances and limit the speed achieved by vehicles. The CART algorithm was used to investigate accidents in 2001 in Taiwan. This made it possible to evaluate the relationship between the severity of injuries and driver characteristics, vehicle type and conditions during an accident. The results indicate that the most important variable related to the severity of the collision is the vehicle type. It has also been identified that pedestrians, motorcyclists and cyclists run a greater risk of injury than other drivers in road accidents [26].

The logistic regression, in turn, was used in the study of falling asleep at the wheel or fatigue as factors contributing to an accident [27]. In [20], the probability of death was examined, indicating a significant impact of the location and cause of the accident. On the other hand, the authors [28] used polynomial logistic regression to identify significant factors of risk of accidents, indicating elements related to the road infrastructure, driver's characteristics and vehicle type.

Guided by the experiences resulting from reviewing the literature in terms of the tools used and the research gap identified regarding the railroad accidents analysis, the authors decided to use classification trees to evaluate factors influencing the effects of such an incident. Additionally, other approaches like random forest [29,30] and boosting technique [31–33] were used for improving the results obtained from the decision tree. All research was performed in the R environment.

#### **2. Data for the Study**

In Poland, information on traffic incidents is collected by the Police in the Accident and Collision Records System (SEWiK). Based on this, the Polish Road Safety Observatory (POBR), operating at the Motor Transport Institute (ITS), develops databases that have become the genesis of this study. The files made available contain a number of detailed characteristics for each incident, including:


This study uses factors related to:


The data from 2014 to 2020 was examined. The total number of incidents per month, with the participation of rail vehicles in this period, is shown in Figure 1 (green line).

Almost 11,000 people took part in these accidents, most of whom have not been injured. Other victims, according to the type of injuries sustained, were divided into three categories [34]:


Due to the fact that almost 93% of participants in incidents did not suffer any injuries during the period under investigation, it was decided to further analyze only those who suffered health damage or died.

Results of the analysis of road traffic safety in Poland indicate that in the studied period there were 8474 accidents and collisions. Participants in them were 10,960 people, of which 210 died. This is less than 2% of all victims. For comparison, the total number of all road accidents and collisions in this period was 219,863. As many as 18,527 people died in them. Reducing the number of people injured in road accidents is the main goal of all actions undertaken in the area of road safety improvement. For this reason, the authors found it necessary to focus only on accidents and their worst effects. This will allow identifying the most important causes (factors) of these accidents and thus the necessary preventive actions (making the right decisions).

#### **3. Decision Trees**

We consider the feature *Y* (called the injured state), which depends on the value of the features (independent variables) *X*1, ... , *Xm* presented in the previous chapter. One of the possible ways to determine the relationship between features is to construct a decision or regression tree. In the presented case, the *Y* feature is qualitative, so in order to analyze the impact of incident circumstances on the injured person's condition, decision trees have been used.

Let *D* = *x*(*i*), *yi* : *x*(*i*) ∈ *R*<sup>1</sup> × ... × *Rm* , *yi* ∈ *A*, 1 ≤ *i* ≤ *n* be the learning set. For any 1 ≤ *j* ≤ *m* set *Rj* denotes the possible realization of *Xj* feature, but set *A* denotes

the set of possible realizations of a response variable, where cardinality #*A* = *h* > 0 (power of the *A* set or the number of possible classes is equal *h*). For the *i*−th observation, 1 ≤ *i* ≤ *n* the vector *x*(*i*) ∈ *R*<sup>1</sup> × ... × *Rm* denotes the realizations of independent (input) variables (usually feature values that influence to output variable) but *yi* ∈ *A* denotes the value of the response variable. Our task is to define a model, where based on observations *x* ∈ *R*<sup>1</sup> × ... × *Rm* we should predict a victim condition. To assess the features influence on the sufferer state, we will apply the decision tree model.

The tree-based method consists of partition (splitting, division) of the feature space *S* = *R*<sup>1</sup> × ... × *Rm* into a set of separable regions and fitting values of a response variable to appropriate regions. Below, we consider a decision problem for response variable *<sup>Y</sup>*. We split the entire *<sup>S</sup>* feature space into *<sup>S</sup>*1, *<sup>S</sup>*2, ... , *Sk* regions, where *Si* <sup>∩</sup> *Sj* <sup>=</sup> <sup>∅</sup> for 1 ≤ *i* = *j* ≤ *k*. Based on input vector *x* ∈ *S*, we predict the output variable *Y* as follows:

$$f(\mathbf{x}) = \sum\_{j=1}^{k} c\_j I\_j(\mathbf{x}),\tag{1}$$

where,

$$d\_{\dot{j}}(\mathbf{x}) = \begin{cases} 1, & \text{for } \mathbf{x} \in S\_{\dot{j}} \\ 0, & \text{for } \mathbf{x} \notin S\_{\dot{j}} \end{cases} \tag{2}$$

and value *cj* ∈ *A* for 1 ≤ *j* ≤ *k* denotes the most commonly occurring class of response variable in the *Sj* region. From (1), we see that the main task during decision tree building consists of splitting the entire space of features into separated regions.

The regression tree is usually presented in graphic form. The internal tree nodes describe how the division was made, while the leaves correspond to the classes to which the objects belong. The tree edges, in turn, represent the values of the features based on which the division was made.

For each *Sj* region we estimate the classification rates 0 ≤ *pj*1, ... , *pjh* corresponding to elements from set *A* (possible realizations of response variable) where *pj*<sup>1</sup> + ... + *pjh* = 1. The value *pji* represents the proportion of observations in the *j*−th region that are from the *i*−th class. The classification error rate is a fraction of observations in this region that do not belong to the most common class

$$error\_{\hat{\jmath}} = 1 - \max\_{1 \le i \le h} p\_{ji} \tag{3}$$

The decision tree building method consists of portioning of an appropriate region by minimizing the Gini index

$$G\_{\vec{j}} = \sum\_{i=1}^{h} p\_{\vec{j}i} \left(1 - p\_{\vec{j}i}\right) \tag{4}$$

or entropy

$$E\_{\vec{j}} = -\sum\_{i=1}^{h} p\_{ji} \log p\_{ji} \tag{5}$$

From (4) and (5) we can see that the Gini index and entropy take on a small value when the classification rates *pj*1, ... , *pjh* are close to zero or one. Both the Gini index and entropy are referred to as purity of *j*−th node and typically used to assess the quality of a particular split of a region.

The restrictions that can be applied during the division of *S* feature space are: the minimum cardinality of node subject to dividing, the minimum cardinality of the node resulting from dividing, the maximum number of tree levels. Selecting the right tree size can also be adjusted by pruning the original model.

For this purpose, there are selected algorithms being used. Among the most popular are: CART and C4.5 (and then C5.0) algorithms. Additionally, CHAID [35], QUEST, THAID and others [36] can be used. In our analysis, we employ the CART algorithm to select the important features' influence on accidents result.

The decision trees suffer from high variance. One of the possible techniques to improve the predictions obtained from decision trees is bagging [33]. The main idea depends on creating an ensemble of decision trees based on several bootstrapped training sets. These training sets are chosen randomly with replacement from the data set and are used to train the decision trees. The variance is reduced by aggregating a set of predictions obtained from an ensemble of trees. For classification trees, we take a majority vote from the obtained class predicted by each tree.

The random forest is an extension of the bagging method [29,30]. The main difference is that during the making of the training set for each tree we randomly choose the set of features from a full set of features. Thus, we make an ensemble of random trees. A multitude of random trees is called a random forest. This technique avoids the problem of selecting the dominant predictor in the split of space for each tree. The predictions obtained from trees with randomly selected features are less correlated, thereby making the average of the predictions obtained from regression trees or majority vote from classification trees less variable and more reliable.

Another technique to improve the results obtained from the decision trees is boosting [33]. Boosting, like bagging, depends on creating an ensemble of decision trees. For bagging, we adapt the trees to training sets chosen randomly from the data set. By applying the boosting technique, the trees are made sequentially, i.e., the current tree is built based on information from the previously grown tree and the response variable in the current tree is defined as residuals (not explained outcomes) from the previous tree.

The adaptation of a large decision tree to the data can be hard and potentially overfitting. The boosting approach results in the learning process being slow. By adding the current decision tree into the ensemble of trees in order to update the residuals, we define the model that explains the dependences between outcomes and features. Each of these trees can be small but by adapting the small trees to the residuals, we improve the outcomes (response variable) in areas where this does not work well. It is the main benefit of this method. In general, the learning process is slow and sequential but tends to explain the dependences well. In our analysis, we employ the XGB (eXtreme Gradient Boosting) algorithm [31,32] to select the influence of important features on the accidents' result.

Various measures are used to evaluate the classifier. Most often, the basis for their definition is the confusion matrix. The columns of this matrix determine the actual decision classes while rows determine the decisions predicted by the model. The *Nij* value at the intersection of the *i*−th verse and *j*−th column specifies the number of observations of *j*−th class classified into the *i*−th class, 1 ≤ *i*, *j* ≤ *h*. In general, the case has the form presented in Table 1.


**Table 1.** Form of the confusion matrix.

For each possible realization, we estimate the basic values. For the *j*−th class (1 ≤ *j* ≤ *h*), the TP (true positive) denotes a number of outcomes (instances) that are correctly classified for this class, the FP (false positive) is the number of outcomes that are classified for the class but they do not belong to it, the FN (false negative) is the number of outcomes that belong to the class but are incorrectly classified, the TN (true negative) is the number of correctly classified outcomes that do not belong to the class. According to the notation presented in Table 1 for the *j*−th class (1 ≤ *j* ≤ *h*), we determine the basic values as follows:

$$TP = N\_{\vec{j}\vec{j}\prime} \cdot FP = \sum\_{\substack{\vec{i} = 1, \\ \vec{i} \neq \vec{j}}}^{h} \mathbf{i} = \mathbf{1}\_{\prime} \cdot N\_{\vec{j}\prime} \tag{6}$$

$$FN = \sum\_{\substack{i=1 \\ i \neq j}}^h i = 1, \quad N\_{ij}, \quad TN = \sum\_{\substack{i,j=1 \\ i \neq j}}^h N\_{ij} - TP - FP - FN \tag{7}$$

Additionally, for each class, we estimate the following basic metrics:

1. Sensitivity (Recall, True Positive Rate—*TPR*), indicating to what extent the truly positive class has been classified as positive:

$$\text{TPR} = \frac{TP}{TP + FN} \tag{8}$$

2. Specificity (True-Negative Rate—*TRN*), indicating to what extent the truly negative class has been classified as negative:

$$\text{TNR} = \frac{TN}{TN + FP} \tag{9}$$

3. Positive Predictive Value (*PPV*), indicating with what certainty we can trust positive predictions, i.e., in what percentage are the positive predictions confirmed by the truly positive state:

$$PPV = \frac{TP}{TP + FP} \tag{10}$$

4. Negative Predictive Value (*NPV*), indicating with what certainty can we trust negative predictions, i.e., in what percentage the negative predictions are confirmed by the truly negative state:

$$\text{NPV} = \frac{TN}{TN + FN} \tag{11}$$

5. *Prevalence* is the fraction of cases possessing the examined feature (it shows how often the positive class occurs in the sample).

$$\text{Prevalence} = \frac{TP + FN}{TP + TN + FP + FN} \tag{12}$$

6. *Detection rate* shows the number of correct positive class predictions as a proportion of all of the predictions made.

$$\text{Detection rate} = \frac{TP}{TP + TN + FP + FN} \tag{13}$$

7. *Detection prevalence* or predicted positive condition rate (PPCR) is the percentage of observations that the classifier predicted as positive (it illustrates the feasibility of the model in practice).

$$\text{Detection Percentage} = \frac{TP + FP}{TP + TN + FP + FN} \tag{14}$$

8. *Balanced accuracy* is an average arithmetic sensitivity and specificity, specifying the average number of predictions for each class, correctly classified by the model (it finds better use when we have just one test set, and it is not balanced).

$$\text{Ballanced Accuracy} = \frac{TPR + TNR}{2} \tag{15}$$

Additionally, for the entire classifier, we determine accuracy (*ACC*), which denotes the fraction of all instances that are correctly categorized:

$$\text{ACC} = \frac{\sum\_{i=1}^{l} N\_{ii}}{\sum\_{i,j=1}^{l} N\_{ij}} \tag{16}$$

#### **4. Results**

The CART algorithm was used for the construction of decision trees. The influence of the characteristics of the province and the time of the incident on the condition of the victim was investigated. There were 631 observations used for the construction. The following symbols were adopted for individual provinces: B—Podlaskie, C—Kujawsko-Pomorskie, D—Dolno´sl ˛askie, E—Łódzkie, F—Lubuskie, G—Pomorskie, K—Małopolskie, L—Lubelskie, N—Warmi ´nsko-Mazurskie, O—Opolskie, P—Wielkopolskie, R—Podkarpackie, S—Sl ˛ ´ askie, T—Swi˛ ´ etokrzyskie, W—Mazowieckie, Z—Zachodnio-Pomorskie. Figure 2 presents the classification tree with maximum depth equalling 5. This tree contains only seven rules.

**Figure 2.** Classification tree for the variables of place and time of the accident.

The accuracy of the model taking into account only two variables is not satisfactory, with the value of ACC = 0.517. The accuracy of the predictions is presented by the confusion matrix—Table 2. The elements on the main diagonal indicate correctly classified observations.

**Table 2.** Confusion matrix for the decision tree based on location of incident and time.


Because the quality of the prediction is not satisfactory, a decision tree was constructed that includes a higher number of predictors. The following variables were taken into account: time of day, month, type of vehicle, type of participant, existence of traffic lights at the level crossing, age of the victim, area (developed, undeveloped), and location of the incident (province).

The classifier includes 49 decision rules. Its extensive form prevents legible, graphical presentation. Therefore, only a descriptive characteristic of the model was made using a matrix of errors, basic measures of states and a graph with variable importance.

The inclusion of additional predictors has improved the quality of the classifier. The accuracy is ACC = 0.679. Table 3 shows the confusion matrix. On the main diagonal, there are correctly classified observations.


**Table 3.** Confusion matrix for the extended decision tree.

The predictors, ranked by their importance, are shown in Figure 3. This figure shows the percentage of decrease of the Gini index (interpreted as gain) for the construction of the classification tree. The evaluation of the importance of the predictors' impact on the dependent variable has indicated a significant impact primarily of the location of the accident (province) and the time of the incident. A detailed evaluation of the model was made by analyzing measures for each of the singled-out injury levels. The sensitivity and specificity take values exceeding 73% (except for the "serious injury" class for which sensitivity is about 49%). The precision of positive and negative prediction is high (up to 75% except for the "death" class). The remaining results are presented in Table 4.

**Figure 3.** Variable importance ranking for the extended decision tree.

**Table 4.** Detailed measures for the individual injury classes.


In order to verify the influence of each feature, a random forest was also constructed, consisting of 50 trees, where three features were randomly selected for the learning set. As before, we present the results in the form of a confusion matrix (Table 5), a matrix of the basic measures for each state (Table 6) and the variable importance plot (Figure 4).


**Table 5.** Confusion matrix for the random forest.

**Table 6.** Detailed measures of the random forest for the individual injury classes.


**Figure 4.** Predictor importance ranking for the random forest.

By comparing Tables 3 and 5, we can see that the quality of the classifier for the random forest is better than for the extended decision tree. For the random forest, the accuracy is equal to 0.913.

From Figure 4, we can see that the predictors: age of the victim, time, province and month have the greatest impact on the variable describing the state of the injured person. Basic metrics for the random forest are presented in Table 6.

From Table 6, we see a significant improvement in metrics sensitivity, specificity, balanced accuracy, PPV and NPV.

The classifier was also constructed using the boosting technique. In this case, it is assumed that the maximum depth of a tree equals 3, and the maximum number of boosting iterations is 50. As before, we present the results in the form of a confusion matrix (Table 7), matrix of the basic measures for each state (Table 8) and a variable importance plot (Figure 5).


**Table 8.** Detailed measures of boosting tree model.


By comparing Tables 3, 5 and 7, we can see that the quality of the classifier for the boosting tree model is the best. For this model, the accuracy equals 0.972.

From Figure 5, we can see that the province predictor dominates the others. The time, month and vehicle type predictors have a significant impact on the variable describing the state of the injured person. Basic metrics for the boosting tree model are presented in Table 8.

From Table 8, we see that the sensitivity, specificity, balanced accuracy, PPV and NPV metrics are over 0.95.

The most powerful predictor, in the case of decision trees and the boosting tree model, is the location of the accident, in this case defined by the province. This can be due to two reasons. The first is the discrepancy in the state and quality of both linear and point infrastructure in the individual regions of Poland. These elements remain in the sphere of administration of the local government units. This condition applies to as much as 95.4% of all roads in the country and means different management, financing and control. Because the impact of infrastructure on both the number of accidents and injuries is significant [37], the impact of location in the presented study is also significant. Another reason influencing this result is the different densities of the railroad network in individual regions of Poland. In eastern Poland, it is smaller than in western Poland. However, central Poland has the largest number of railroad lines. Moreover, the number of road users as well as their mobility vary between provinces. This activity, different not only in particular areas but also during the times of the day, results in the next factor strongly influencing the model being the time of the event.

Another important predictor related to time is the month. The increased number of traffic accidents is probably influenced by the increase in traffic associated with vacation activity from May to September. It may also be due to the reduced alertness of drivers focused on resting. The fact that some drivers are not daily users of cars and have little experience in driving also contributes to the accidents. The fourth most important is the type of vehicle. Among the established rules, the most common means of transport is the bicycle, the passenger car and the truck. Additionally, in many situations, cyclists' deaths are equal to or very close to 100%.

In the case of the random forest, the age of the victim was the first factor contributing to injury in accidents. The other factors with the highest influence were the same but ranked in a different order (time, province, month).

The influence of the remaining predictors was significant but smaller. The existence of traffic lights and developed areas is conducive to the tragic consequences of accidents. The drivers are more likely to die than passengers. The influence of alcohol is also an important factor, but it should be emphasized that this result is affected by a small number of accidents in which such a violation was noted. In the analyzed sample, there is less than 1.2% of them.

#### **5. Conclusions**

The classification trees are a flexible, user-friendly, and easy-to-interpret tool for analyzing large sets of observations, consisting of many variables. Their biggest advantages include transparency and readability of the result presented in the form of rules, as well as no requirements for the form and distribution of variables. Additionally, it is necessary to emphasize their insensitivity to the occurrence of non-typical observations and deficiencies in the data set.

However, in the analyzed case, the application of this method resulted in an accuracy of 68%. This result was improved by applying the boosting tree model for which the highest accuracy of 97% was achieved. In both cases, the result was the same. The most important predictors were: province, time, month and vehicle time. An additional method proposed was random forest, for which ACC = 91%. This model indicated a different order of predictors, placing the age of the victim first.

The results obtained, in addition to indicating the factors increasing the risk of certain injuries in an accident, also indicate the need to develop comprehensive solutions for the entire country in terms of improving road safety. Such a large impact of location may result from the different functioning of individual local government units and the differences in administration of the governed infrastructure. Therefore, it is necessary to develop and implement common standards and equalize the differences between individual regions, particularly in relation to the condition of the road, its surroundings and road equipment.

The obtained measurable results of measurements concerning the influence of the examined factors on traffic safety at railroad crossings provide information on such actions, which are conducive to improving road–rail traffic safety through:


A limitation of the analysis presented in this paper is, first of all, the qualitative form of most of the variables. It limits the possibility of research to classificatory methods. The quality of the presented research is also affected by the number of recorded factors. Especially in the framework of further considerations, the authors would like to take into account the volume of traffic. This factor is very important from the point of view of road– railway traffic safety, but it is not monitored at most of the railroad crossings. Generally speaking, traffic volume monitoring in Poland concerns mainly selected regions (mostly intersections of big cities). However, the dynamic development of smart transportation systems is conducive to obtaining the necessary information [38], so analyses that take this factor into account for a smaller area will probably be possible soon.

**Author Contributions:** Conceptualization, A.B., E.K. and A.S.; methodology, A.B. and E.K. software, ´ A.B. and E.K.; validation, A.B. and E.K.; formal analysis, A.B. and E.K.; investigation, A.B. and E.K.; resources, A.S. and P.S.; data curation, A. ´ S. and P.S.; writing—original draft preparation, A.B., E.K. ´ and A.S.; writing—editing, A.B., E.K. and A. ´ S.; visualization, A.B. and E.K. All authors have read ´ and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in a publicly accessible Mendeley Data repository: http://dx.doi.org/10.17632/7vwx2pnbx5.1 (accessed on 11 May 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Optimal Pricing of Vehicle-to-Grid Services Using Disaggregate Demand Models**

**Charilaos Latinopoulos \*, Aruna Sivakumar and John W. Polak**

Department of Civil and Environmental Engineering, Imperial College London, London SW7 2BU, UK; a.sivakumar@imperial.ac.uk (A.S.); j.polak@imperial.ac.uk (J.W.P.)

**\*** Correspondence: charilaos.latinopoulos10@imperial.ac.uk

**Abstract:** The recent revolution in electric mobility is both crucial and promising in the coordinated effort to reduce global emissions and tackle climate change. However, mass electrification brings up new technical problems that need to be solved. The increasing penetration rates of electric vehicles will add an unprecedented energy load to existing power grids. The stability and the quality of power systems, especially on a local distribution level, will be compromised by multiple vehicles that are simultaneously connected to the grid. In this paper, the authors propose a choice-based pricing algorithm to indirectly control the charging and V2G activities of electric vehicles in non-residential facilities. Two metaheuristic approaches were applied to solve the optimization problem, and a comparative analysis was performed to evaluate their performance. The proposed algorithm would result in a significant revenue increase for the parking operator, and at the same time, it could alleviate the overloading of local distribution transformers and postpone heavy infrastructure investments.

**Keywords:** electric vehicle charging; vehicle-to-grid; genetic algorithms; particle swarm optimization; demand-side management; discrete choice theory; revenue management

**1. Introduction**

Advances in battery technology, the low emission factors, the low operation costs and the high fuel economy of Battery Electric Vehicles (BEVs) and Plugged-in Hybrid Electric Vehicles (PHEVs) are some of the reasons that the family of Electric Vehicles (EVs) has attracted a lot of attention over the last few years. New models with extended capabilities and longer electric ranges are presented every year by major automobile manufacturers [1]. The latest generation of EVs shifts the paradigm towards new markets by providing extended full electric ranges of over 300 km and significantly reducing the problems associated with "range anxiety" [2].

The adoption of EVs in private transportation could ultimately lead to a replacement of crude oil with cleaner energy sources. At the same time, they can be transformed from unidirectional devices that draw power from the grid to bidirectional assets that transfer power back. Vehicle-to-Grid (V2G) may enable drivers to provide ancillary services to the grid in exchange for financial returns, as well as to contribute in alleviating peak power demand [3]. Kempton and Tomic [3,4] have compared existing grid services (spinning frequency reserves, peak power supply and regulation) with Vehicle-to-Grid (V2G) support and concluded that using EVs for regulation can offer the most substantial returns to vehicle owners.

The accelerated growth in electric mobility also demands the development of new methods to address their implications for the power grid. The stability of the power system is at stake, particularly when charging events cluster in space and time. Without charging coordination, the variations in charging demand could have a great impact on the electricity market. Peak power demand could be deteriorated without investment in charging infrastructure at working places and throughout cities.

**Citation:** Latinopoulos, C.; Sivakumar, A.; Polak, J.W. Optimal Pricing of Vehicle-to-Grid Services Using Disaggregate Demand Models. *Energies* **2021**, *14*, 1090. https:// doi.org/10.3390/en14041090

Academic Editors: Albert Y.S. Lam and Javier Contreras

Received: 3 November 2020 Accepted: 12 February 2021 Published: 19 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Public or non-residential private parking facilities are characteristic examples of places where large numbers of EVs can concentrate in short periods of time. In order to avoid system disruptions, the aggregated load in the parking facility should be closely coordinated. Techniques that achieve this are widely known as "smart charging".

Smart charging can be either centralized or decentralized. In centralized approaches, EVs transmit a signal with the required State of Charge (SOC) and the desired parking duration, and a control unit allocates charging times and sends the information back to the charger/inverter of each vehicle [3–7]. In decentralized smart charging, the information on spatiotemporal demand is transmitted in the form of incentives that are processed by the in-vehicle controller, which optimizes the charging intervals of the vehicle [8–18].

This process can be facilitated by a Charging Service Provider (CSP), which coordinates charging events with the objective of optimizing one or more from the list below: power losses, transformer overloading, system operation costs, generation costs, vehicle integration, the cost of the power supply, costs for individual drivers, balancing demand and supply, and revenue for the CSP.

For this study, it was assumed that CSPs are contracted as intermediate agents to carry out the task of charging and V2G coordination for parking operators in their control area. As a result, they have a threefold role: (a) to provide EV drivers with their desired SOCs at departure time, (b) to exchange electricity in two directions by respecting the local grid constraints from the Distribution System Operators (DSOs) and (c) to maximize revenue for contracted parking operators. If the parking operators were acting directly as Charging Point Managers (CPMs), there would be no need for the intermediate services of the CSP. These services can be classified as Business-to-Business (B2B) and Business-to-Customer (B2C), and they are schematically presented in Figure 1.

**Figure 1.** Business-to-Business (B2B) and Business-to-Customer (B2C) services of Charging Service Providers (CSPs) in public and private parking facilities with Vehicle-to-Grid (V2G)-enabled charging infrastructure.

As they grow in size, EV fleets offer greater load flexibility to the CSP. For example, if an individual vehicle needs a lot of energy over a limited period, this energy can be balanced from drivers who are more flexible in their demand and are willing to postpone charging or even provide V2G services. Load flexibility is valuable for charging control, and price incentives can be applied to promote longer charging events with lower power rates and discourage short, power-intense intervals [19].

The work presented in this paper has contributed to the existing literature by developing a Revenue Management (RM) framework for charging coordination. In this framework,

EV drivers reserve, in advance, a parking-and-charging bundle with certain characteristics, such as a charging location, start time and duration, as well as a charging rate. For the users, this approach offers more transparency and control over the charging parameters, and a better understanding of the underlying mechanisms compared to typical scheduling algorithms. For example, if they wish to leave the charging facility earlier, they can estimate, with precision, the final SOC at that time. However, this transition of control from the CSP to the EV driver does not necessarily mean a loss of flexibility. On the contrary, CSPs can vary prices to incentivize low-power bundles; they can better predict EV arrivals, segment their users and optimize their operations in advance.

The modelling framework was evaluated in a microsimulation framework with synthesized activity patterns from a London-based travel diary and choice parameters adopted from a stated preferences experiment [20]. Different EV penetration rates were simulated for a commercial and a shopping area in the city centre.

The rest of the paper is structured as follows: Section 2 discusses the literature review that is relevant to the context and the methodological approach of this study, while the proposed optimization algorithm is presented in Section 3. In Section 4, a brief overview of the data sources that were used for this study is provided along with a demonstration of the simulation framework. Section 5 presents the results and a comparative analysis. Finally, the outcomes of the study are discussed in Section 6.

#### **2. Literature Review**

#### *2.1. State of the Art in EV Scheduling and V2G Optimization for Non-Residential Facilities*

Several studies have investigated the optimal charging of electric vehicles and the provision of V2G services for peak power and ancillary markets [21–33]. In these studies, the objectives of the control algorithms vary significantly between maximizing economic factors and minimizing the impact on the power network.

The addition of V2G services on top of regular charging complicates the decision process for EV owners. The fast recovery of SOC is desirable because it means that smart charging does not interfere significantly with the daily schedules of the owners. On the other hand, arbitrage techniques based on wholesale electricity prices can generate profits for them at the expense of recharging speed [4].

Rotering and Ilic [21] present a dynamic programming approach to solving this V2G optimization problem, where the objective is to maximize the profit for the EV owners and the decision variable is the daily SOC curve. Zhang et al. [22] examined the potential application of V2G for ancillary services by developing optimal methods for voltage regulation and reactive power control. Shokrzadeh et al. [23] present optimal scheduling strategies for minimizing harmful operating conditions for distribution transformers at a neighbourhood level. DeForest et al. [24] present a centralized method for optimizing EV charging and bid capacity via V2G while maximizing the profit and minimizing the operation costs for the aggregator. Vandael et al. [25] propose a multiagent system for charging coordination with the objective of minimizing imbalance in a smart grid.

Non-residential parking facilities such as parking garages or supermarket parking lots are characteristic examples of places where large numbers of EVs can concentrate in short periods of time. As a result, there is an increasing volume of studies that are shifting their focus away from home recharging.

Yao et al. [26] investigated the optimal integration of EVs in a parking station with the distribution grid according to two different objectives: an economic and a technical one. Shafie-khah et al. [27] present an optimization problem for a parking operator that satisfies demand while curtailing loads for a power utility. They suggest that a constant power rate over the charging session, which is in line with the present paper implementation, can prolong the battery service time. Zhang et al. [28] also modelled the parking operator as a demand response aggregation agent, but they found the optimal level of participation in a set of demand response programs (price-based and incentive-based) instead of focusing on one. Su and Chow [29] developed an intelligent energy-management system for EVs that

charge at a municipal parking site, which they solved using probabilistic model-building genetic algorithms.

Finally, the studies that explore the interaction between utilities and parking lots, when V2G is available, are more limited.

Mehta et al. [30] implemented a water-filling algorithm to minimize the variance of the load profile for workplace car parks as a means of reducing peak demand. Babic et al. [31] propose a framework where the parking operator is a broker with two distinct roles: (a) an energy retailer and (b) a player in a target electricity market. The electricity trading functionalities of the operator were evaluated within an agent-based simulation. Moradijoz et al. [32] used genetic algorithms to solve a multi-objective algorithm that optimizes the sites and sizes of parking lots that provide V2G services to the grid.

Probably the research work most closely related to the present paper is that of Hashimoto et al. [33]. In their study, drivers could reserve charging and V2G services through an auction-based system, and the improvements in revenue for the operators were evaluated. It also presents similar methodological assumptions such as discretizing hour intervals for parking reservations and billing by parking duration. In addition, a probability distribution for parking users' willingness to pay was derived from the completion of a questionnaire addressed to them. Finally, the arrival and departure patterns were modelled according to real parking data.

Summarizing the state of the art in EV scheduling methodological approaches, while much effort has been devoted to assessing the economical and network impact of charging and V2G, the current practice largely relies on simplistic representations of charging and parking demand.

#### *2.2. Representation of Charging Demand*

In the majority of relevant studies, as was also pointed out by Daina et al. [34], smart charging appraisals adopt predefined charging scenarios and exogenous EV use patterns. The attributes that affect the charging process (the start and end times of charging sessions, initial SOC, subsequent trip duration, parking duration, etc.) are calculated by drawing values from typical probability distributions. Fazelpour et al. [35] used probability distributions of arrival rates and arrival times in a movie theatre parking lot in Tehran, in order to optimize the charging rates of the vehicles. Vandael et al. [25] used fixed charging scenarios, and they assumed that the charging rate is constant during the charging process.

In some rare cases [28], these values have been validated with historical parking information. When real-world charging data are available and are used to deduce charging flexibility [29], these data are not adequate for capturing the elasticity of drivers to charging parameters, because their choices are constrained by the limitations of the provided options.

Recently, there have been some efforts to predict charging behaviour with machinelearning methods. The impact of accurate predictions on charging scheduling has been demonstrated in [36], where the authors suggest a potential 27% decrease in peak load. However, when using historical data of parking durations and energy consumption to predict user behaviour, there is a lack of understanding of individual users' preferences for service attributes, such as the location of the charger and the charging rates.

The prominent novelty of the choice-based pricing optimization that was developed in the present paper is the representation of the charging behaviour in a random utility context and the use of parameters that were empirically estimated from user-tailored choice experiments for charging choices. In a previous study by the authors, the results from these experiments were used under expected and non-expected utility frameworks to understand how people perceive price probabilities and how risk averse they are when they book charging events in advance [37].

Our model bridges the gap in the literature by capturing the heterogeneity in activitytravel and charging preferences and transferring this disaggregate information to a pricebased control mechanism.

One of the first studies that modelled the implication of discrete choice models for smart charging services was [34], but in the context of home-based charging activities. In [8,11], the authors developed an activity-based microsimulation where the electric vehicles were controlled with smart charging. Nevertheless, the willingness-to-pay assumptions are not backed by empirical estimation based on revealed or stated preference data. The parameters in the utility function are somewhat arbitrarily tuned using the difference between the forecasted electricity price for next day and the current equivalent price of gasoline. This trade-off makes sense in the decision process for a PHEV driver who can run in both electric and gasoline modes, yet it does not adequately capture the behaviour of a BEV driver.

The integration of charging coordination with the demand response of EV users is achieved through the implementation of revenue management. Before proceeding to the methodological framework in Section 3, there is a brief review of RM and its existing applications in the context of parking and charging.

#### *2.3. Revenue Management for Electric Vehicle Charging*

Revenue management is a widely adopted method for the allocation and pricing of non-storable services and perishable goods in the service industry. It made its first appearance in the 1970s, when airline companies were deregulated. Data analytics were adopted to differentiate the fares for seats located in the same cabin, and the operators started making dynamic decisions based on predicted demand [38]. Today, we encounter RM techniques in car rental services, hotels, hospitality and most of the industries where there is an inventory with capacity constraints [39].

The first revenue-management approach for a car parking lot was presented in Guadix et al. [40]. In a later study, car parking revenue maximization was achieved by finding the optimal balance between early subscriptions and last-minute drive-in users [41]. In [42], the authors pursued the goal of maximizing revenue for a parking lot in a slightly different manner. A fuzzy-logic-based intelligent parking system with learning capabilities decided, in real time, which reservations to accept and which to reject.

To the best of the authors' knowledge, the first study where revenue management was applied for electric vehicle charging was that of Flath et al. [43]. The optimization objective was to minimize disruption at the distribution network level and to balance energy demand with supply. In this approach, there were two main differentiations from conventional RM methods: (a) instead of discrete inventory units (e.g., hotel rooms, airplane seats, etc.), the energy provided to the EV drivers was continuous, and (b) limited and highvalue transactions were replaced by frequent and low-cost ones. In terms of customer segmentation, it is assumed that there are two types of users: drivers who charge their vehicles on a regular schedule (e.g., at home or work) and drivers who have a spontaneous need for topping-up their batteries.

Compared to [43], our methodological approach adds the physical dimension of charging-post availability, moving from a single-resource (energy amount) to a dualresource allocation. Most importantly, it uses a sophisticated representation of charging demand, and it enables customer segmentation using both quantitative and qualitative choice parameters.

#### **3. Choice-Based Price Optimization**

#### *3.1. Charging Offer Set*

Charging services in the rest of the paper are represented as "bundles" that combine a parking place with the electricity for recharging. As with existing revenue-management applications, this study developed an online reservation system where EV drivers can book their charging bundles up to 1 h before arrival. This system should display all the available options at the time of reservation, including their prices and other service attributes. By packaging out-of-home charging with other parking services, overall prices could become even more appealing than home energy tariffs and attract sufficient charging events to boost investment in public infrastructure.

This is not the first study where users have had the option to choose between different charging offers. In [44], the authors suggest a menu-based pricing system where the users select among contracts of fixed energy quantity and time windows.

The charging bundle offer set for the EV customers was designed by the CSP, and it could either include or not V2G services. The optimal pricing algorithm that is presented here used as input the demand for parking and charging along with the elasticities to the characteristics of each bundle. Given that the size of the price menu escalated exponentially with additional hours of operations, multiple periods of four-hour slots were evaluated.

Moreover, the various bundles were categorized according to their charging rates, which ranged between 3 and 12 kW. It was decided that rapid chargers should not be included in the analysis, due to their relatively low availability at the time that the research was undertaken. The charging preferences of EV customers were estimated under a Latent Class (LC) model.

The optimization problem has two capacity constraints. The first one is the physical constraint of the available charging posts across the parking lots. The second one is the power constraint that is defined by the remaining capacity available to the DSO. While the objective of the algorithm is revenue maximization for the CSP and the contracted parking facilities, the demand-driven management of charging events has the potential for peak shaving and the alleviation of bottlenecks in the local distribution network.

One of the ethical concerns around dynamic pricing is the fairness of the prices. For this reason, the fairest approach is to differentiate prices for EV services based on the impact they have on the grid. Therefore, we need to make sure that we apply accurate bounds for each charging bundle. In order to achieve that, the following equation was formulated:

$$p\_{\rangle}^{\*} = p\_{\text{hour}} + p\_{\text{base}} \mathcal{R}\_{\text{j}} T\_{\text{j}} \mathcal{C} D\_{\text{j}} A r\_{\text{j}} \tag{1}$$

where *p*∗ *<sup>j</sup>* is representative of the specific charging service, and the upper bound is defined at 1.5 *p*∗ *<sup>j</sup>* ; *phour* is the parking price for an hour, *pbase* is the price for baseload electricity, *Rj* is the factor that penalizes power-intensive services, *Tj* is the factor that penalizes peak-load time intervals, *CDj* is the charging duration and *Arj* is the factor that penalizes parking lots with high occupancy. All these factors were normalized in a way meaning that the electricity price varied between the base price (10 p/kWh) and a maximum of 55 p/kWh.

In the UK power market, after the actual delivery of electricity, the differences between demand and supply are resolved with an imbalance settlement. This settlement recovers the costs for the system operator by compensating every entity that produced an energy surplus and charging every entity that produced a deficit [45]. Therefore, if the CSP agrees with the DSO for a certain amount of power supply, and the actual demand for EV charging is lower than the expected one, each unit of deficit will have to be reimbursed according to the market index for imbalance costs.

Imbalance prices should be higher for peak-load periods because these are translated to higher costs for the TSO; as a simplification of the complexity of real-market trading values, the imbalance price for each charging bundle was estimated using the *Tj* factor from Equation (1). Subsequently, these prices were used to calculate the costs from excess power capacity, which were subtracted from the charging service profits to calculate the net revenue for the CSP.

The analysis went a step further by incorporating V2G services in the bundles offered by the CSP. Preferences for V2G services were not estimated because there was a high risk of compromising the estimation validity by augmenting the stated preferences experiments conducted in [20] with V2G scenarios and increasing the complexity for the respondents. Therefore, an assumption was made that the sensitivities to the selling price and discharging amount are identical to those for the buying price and charging amount. The only difference was that the marginal utilities would then have the opposite signs.

While charging and V2G behaviour are assumed to be symmetric, it is likely that this is not the case in reality. For example, the marginal disutility from discharging could be higher for the drivers because of the degradation of the battery or range anxiety. As there are increasing examples of V2G trials around the world, future research could explore these behavioural nuances.

When V2G services are provided, the offer set is extended from 46 to 60 charging bundles. Therefore, EV drivers with low energy requirements (<1 kWh/day) are presented with an extended choice set, where all 14 discharging alternatives deliver a 6 kWh discrete energy quantity to the grid, using different combinations of discharging rates and plug-in durations.

As Moradijoz et al. [32] elaborate, the revenues from V2G power depend on the type of the electricity market that it is sold to. For example, there are markets that pay for energy such as the peak-power market and markets that pay for the available capacity and only require having the vehicle plugged in, such as ancillary services [4]. In the following analysis, only peak-power services were taken into account.

In the next section, we show that by incorporating a latent class model, within the choice-based formulation, it is possible to capture the taste heterogeneity among EV users.

#### *3.2. Utility Specification*

The utility of the individual user n selecting the charging bundle j under a discrete choice model specification is the following:

$$dI\_{jn} = A S C\_j + \beta\_X X\_{jn} + \beta\_{XY} X\_{jn} Y\_n + \beta\_{XZ} X\_{jn} Z\_n \tag{2}$$

where *ASCj* is the alternative specific constant of the charging bundle j, *β<sup>X</sup>* is a set of parameters to be estimated, *X* is a vector of charging-bundle-specific attributes, *βXY* and *βXZ* are sets of parameters for interaction terms, *Y* is a vector of the travel and charging patterns of the individual n, and *Z* is a vector of individual attributes of the decision maker such as age, marital status, employment type, income, gender and parental status. In particular, the vector *X* consists of the following characteristic parameters of the charging service:


The last two parameters (i.e., *CISDEjn* and *CISDLjn*) are based on the theory behind time-of-travel-choice modelling. In particular, the methodology developed in Vickrey's seminal paper [46] suggests that when a commuter choses what time to leave for work, this decision comes after a trade-off between travel time and the measures Schedule Delay Early (SDE) and Schedule Delay Late (SDL). These two measures were defined as follows:

$$SDE = \max\left(PAT - (t\_d + TT(t\_d)), \, 0\right) \tag{3}$$

$$SDL = \max\left(t\_d + TT(t\_d) - PAT\_{b\prime}, 0\right) \tag{4}$$

where *PAT* is the preferred arrival time, *td* is the time of departure from home and *TT*(*td*) is the travel time, which is a function of the departure time. We defined *CISDEjn* and *CISDLjn* in a similar fashion to the disutility of starting the activity earlier or later, respectively, due to the starting time of the charging event. To better clarify these terms, they are visually demonstrated in Figure 2 for a hypothetical daily scenario.

**Figure 2.** Graphical representation of charging-induced schedule delay for four scenarios: (**a**) there is no schedule disutility, as the charging and the parking episodes are identical; (**b**) there is no schedule disutility, as the charging episode is a subset of the parking episode; (**c**) there is Charging-Induced Schedule Delay Early (CISDE) because both episodes start before the Preferred Arrival Time (PAT); and (**d**) there is Charging-Induced Schedule Delay Late (CISDL) for the next activity because both episodes finish later than the typical departure time. The red dotted lines in (**c**,**d**) represent the actual arrival times that are different from the preferred arrival times.

The majority of the above parameters were estimated in [20]. The parameter for the energy amount *β<sup>E</sup>* was based on a previous estimation of EV drivers' sensitivity to postcharging SOC [47], and the parameter for CISDL was obtained from the CISDE coefficient using the SDE-to-SDL ratio estimated in another study for London commuters [48].

#### *3.3. Latent Class Specification*

Latent Class (LC) models are typically applied for segmentation, since they identify classes of users with distinct choice behaviours. As a result, the objective of these models is to achieve intrasegment homogeneity and intersegment heterogeneity based on a set of attributes. Individuals are attributed to each of the classes probabilistically, by estimating class-membership probabilities. A typical LC model formulation is the following:

$$P\_n\left(j\middle|X\_{\text{jn}}, Y\_{\text{n}}, Z\_{\text{n}}, \beta\right) = \sum\_{\mathbf{x}=1}^{K} P\_n\left(j\middle|X\_{\text{jn}}, Y\_{\text{n}}, Z\_{\text{n}}, \beta\_{\mathbf{x}}; \mathbf{x}\right) P\_n(\mathbf{x}|z\_n) \tag{5}$$

where *K* represents the total number of classes, *zn* is the set of attributes that makes the behaviour across the segments distinctive and *Pn*(*κ*|*zn*) is the probability that the user *n* belongs to class κ, conditional on *zn*, widely known as the class-membership probability. It is deduced that ∑ *<sup>K</sup> <sup>κ</sup>*=1*Pn*(*κ*|*zn*) = 1. Additionally, *Pn j Xjn*,*Yn*, *Zn*, *βκ*; *κ* is the classspecific probability calculated in (6).

$$SP\_n\left(j\middle|X\_{\bar{\jmath}n\prime}\,\,\,Y\_{n\prime}\,\,Z\_{n\prime}\,\beta\_{n\prime};\kappa\right) = \frac{e^{\mathcal{U}\_{\bar{\jmath}n}}}{\sum\_{j=1}^{l}e^{\mathcal{U}\_{\bar{\jmath}n}} + e^{\mathcal{U}\_{\bar{\jmath}0}}}\tag{6}$$

where *J* is the total set of bundles and *U*0*<sup>n</sup>* is the utility gained from the "no buy" choice. Since "no buy" was not an option in the choice experiments, this utility was approximated by calibrating the alternative specific constant of Equation (2) in a way implying that a small share of the EV drivers did not buy any of the charging bundles.

The typical formulation for a class-specific probability is the multinomial logit (MNL) model; however, other Generalized Extreme Value (GEV) models, such as nested logit or cross-nested logit, can also be adopted to relax some of the hard assumptions of MNL.

The empirical estimation of the latent class model in [20] identified two latent classes: a class where the common characteristic was the high elasticity to price, and a class that was more sensitive to time coefficients, such as walking time and charging duration. The class-membership model is shown in Equation (7).

$$P\_n(\kappa|z\_n) = \frac{\mathfrak{e}^{\delta\_\kappa + \gamma\_\kappa z\_n}}{\sum\_{l=1}^K \mathfrak{e}^{\delta\_l + \gamma\_l z\_n}} \tag{7}$$

where *δκ* is a constant that is specific to the *k* class and *γκ* are the parameters to be estimated.

#### *3.4. Price Optimization*

In revenue management, it is commonly assumed that the demand is homogeneous. The introduction of discrete choice models as a means to better capture customer behaviour was first attempted with the Choice-Based Deterministic Linear Program (CDLP) [49]. MNL models were replaced by more sophisticated specifications such as nested logit [50] and LC models [51] in subsequent studies.

The objective of the optimization problem developed in this paper is to maximize the expected revenue of the CSP for each four-hour period. The final solution is a menu-based pricing strategy, which should satisfy the constraints for the two-dimensional capacity (charging-post and power availability). The decision variables of this problem, *pj*, are the prices of the 46 (or 60 when V2G is available) charging bundles. The latent-class optimization problem is formulated as follows:

$$\max\_{p\_{\parallel}} \text{Revenue} = D^{EV} \left[ \sum\_{j=1}^{I} \left\{ \sum\_{k=1}^{K} P\_{\mathbf{t}} \left( j \left| \mathbf{X}\_{j\mathbf{t}\tau} \mathbf{Y}\_{k\tau} \mathbf{Z}\_{\mathbf{t}\tau} \mathbf{g}\_{\mathbf{t}\tau} \mathbf{x} \right\} P\_{\mathbf{t}\mathbf{t}} (\mathbf{x} | \mathbf{z}\_{\mathbf{t}}) p\_{j} \right) \right\} - \left[ \mathbf{Y}^{\mathcal{C}} - D^{EV} B \sum\_{k=1}^{K} P\_{\mathbf{t}} \left( j \left| \mathbf{X}\_{j\mathbf{t}\tau} \mathbf{Y}\_{k\tau} \mathbf{Z}\_{\mathbf{t}\tau} \mathbf{g}\_{\mathbf{t}\tau} \mathbf{x} \right\} P\_{\mathbf{t}\mathbf{t}} (\mathbf{x} | \mathbf{z}\_{\mathbf{t}}) p\_{j} \right) p^{\mathbf{t}} \right] \right]$$

subject to:

• Capacity constraints:

$$D^{EV}A\sum\_{\mathbf{x}=1}^{K}P\_{\mathbf{n}}\left(j\middle|X\_{\mathbf{j}\mathbf{n}\prime}\,\mathbf{Y}\_{\mathbf{n}\prime}\,Z\_{\mathbf{n}\prime}\,\beta\_{\mathbf{x}\prime};\kappa\right)P\_{\mathbf{n}}\left(\kappa\vert z\_{\mathbf{n}\prime}\right)p\_{\mathbf{j}}\leq X^{c}\tag{9}$$

$$\Pr^{\mathbb{E}V}B\sum\_{\mathbf{x}=1}^{K}P\_{\mathfrak{n}}\big(j\big|X\_{\mathfrak{j}\mathfrak{n}},\chi\_{\mathfrak{n}},Z\_{\mathfrak{n}},\boldsymbol{\beta}\_{\mathbf{x}};\mathbf{x}\big)P\_{\mathfrak{n}}(\mathbf{x}|z\_{\mathfrak{n}})p\_{j}\leq \mathcal{Y}^{\mathbb{E}}\tag{10}$$

• Price-policy constraints:

$$p\_j^- \le p\_j \le p\_j^+ \tag{11}$$

where *DEV* is the total number of electric vehicles that arrive at the parking facilities for recharging or discharging, *X<sup>c</sup>* is the total number of charging posts, *Y<sup>c</sup>* is the power in kW supplied to the CSP for the contracted facilities, *p<sup>I</sup>* are the imbalance prices, A and B are the incidence matrices for the two capacity dimensions, and finally, *p*− *<sup>j</sup>* and *<sup>p</sup>*<sup>+</sup> *<sup>j</sup>* are the lower and upper bounds for the decision variables, which are calculated using Equation (1). The methodological approach is visually demonstrated in Figure 3.

**Figure 3.** Decomposed elements of the choice-based revenue-management approach. The choice probability for a charging service for an electric vehicle driver affects the revenue outcome for the CSP. Simultaneously, the price, which is the decision variable of the optimization problem, can alter the choice probability, creating a closed-loop formulation with a nonlinear objective function.

Equation (8) can be decomposed in two terms. The first term is the generated profit that is calculated by multiplying the number of EVs by the latent class probability of choosing a charging bundle j and the price of this bundle. The second term is the imbalance cost that is calculated by multiplying the imbalance price for each hour slot with the respective deficit power. The maximum revenue for the CSP is equal to the difference between the generated profit and the imbalance cost for the optimal price menu.

The capacity constraints (9) and (10) correspond to the number of charging posts and the supplied power, respectively. Price-policy constraints (11) vary across charging and V2G services. The minimum price for charging bundles is equal to the maximum price for V2G bundles, and it is assumed to be zero. On the other hand, the maximum prices for charging bundles and minimum prices for V2G bundles reflect their main characteristics; i.e., power-intensive bundles are allowed to have higher prices.

#### **4. Data and Simulation Approach**

A simulation approach was employed for the demonstration of the developed pricing algorithm. Two areas with distinct activity patterns and increased travel demand levels were selected for the simulation: a large shopping mall (Westfield Shopping Centre) and a busy commercial area (Canary Wharf). Similar assumptions were made in Battistelli et al. [52], where two garages with EV parking spaces were modelled, serving an office and a residential area. The characteristics of the trips for these areas were extracted from an annual household survey, which combined personal and household information with data from travel diaries: the London Travel Demand Survey (LTDS) [53].

The number of parking spaces used for the simulation corresponded to the existing off-street infrastructure within a 1 km2 radius of the centroid of each area. It was assumed that these spaces and the hypothetical charging posts were concentrated in two large parking lots for both areas. In terms of data preprocessing, the steps below were followed:


• If the number of stops in the area of interest was higher than one, the parking event with the highest duration was identified, and that is where the charging event was assumed to take place.

All the electric vehicles in the simulation were assumed to be BEVs, because of range anxiety and their higher likelihood of depending on out-of-home charging infrastructure. At the time of the research, one of the most competitive BEVs in the market was the Nissan Leaf; hence, it was used for the estimation of electricity consumption [54]. For combined city and highway driving, this was set equal to 30 kWh/100 miles, and given a battery capacity of 24 kWh, it corresponds to a driving range of 83 miles. Finally, the energy efficiency of the charging posts was assumed to be 80% and to remain constant at any time step.

The number of charging posts for the simulation was assumed to be equal to the number of parking spaces. While this could be considered a very optimistic scenario for the evolution of electromobility and its infrastructure, there is a rapid deployment of charging posts in large urban centres, which is going to be further facilitated by economies of scale. Furthermore, it should be noted that the areas examined are of high economic interest, with the potential to attract infrastructure investments.

Along with travel demand, these areas have increased electricity demand, especially during hours of peak occupancy. Some examples of appliances that contribute to the aforementioned peaks are personal computers, display lighting and air conditioning units. In order to model the base load of electricity without electric vehicles, first, we identified the baseload profiles for a typical winter weekday for both domestic and nondomestic users. Then, the average population density of residents, the number of employees and the percentages of domestic and nondomestic consumption were used to scale up from a personal profile to the daily distribution. The resulting load curves are presented in Figure 4.

**Figure 4.** Typical winter weekday baseload curves for the areas of analysis (dashed lines represent installed capacity for the scenario of 20% overload capacity).

The overload capacity can generally fluctuate between local distribution networks. Typically, distribution transformers are replaced when the peak demand grows to be almost equal to the installed capacity. Figure 4 demonstrates the capacities that were selected for the simulation, which in both cases, were 20% higher than the peak value of the base load.

The trip data from LTDS only represent a sample of the population and not the actual demand. Thus, the respondents were used to generate a synthetic population that would allow the exploration of network effects for the two areas. The specifics of the population synthesis algorithm are presented in [20]. The energy requirements for the synthetic EV drivers were calculated based on the reported mileage for all the daily trips. However, since out-of-home charging events were expected to follow a top-up pattern, there was an asymmetric draw from the lower end of the distribution (1–5 kWh).

In order to account for different levels of EV penetration, three scenarios were evaluated: a mid-term scenario (25%), a long-term scenario (50%) and a full-electrification (100%) scenario. As a preliminary step, the incoming demand was satisfied with an uncoordinated charging strategy. This allowed an initial estimation of the spatiotemporal allocation; thus, it could be used for an informed pre-allocation of the supplied power capacity amongst the parking facilities. Some basic assumptions for the uncoordinated scenario were the following: (a) recharging starts as soon as the vehicle is plugged in and (b) the charging event has a constant rate and is evenly distributed over the parking dwell time.

Combining the data sources and the synthetic information that has been described so far in this section, we present all the variables that were used for simulation and optimization in this study, in Table 1. Scaling up the trip sample dataset using aggregate statistics for the areas of interest, we ended up with 10,852 trips for Canary Wharf and 14,360 trips for Westfield. Then, depending on the scenario for EV penetration levels, the total numbers of trips in the simulation are presented at the end of the table.


**Table 1.** Description of the input data used in the simulation.

The first step of the methodological approach was to examine an uncontrolled charging scenario. The parking and charging characteristics in the top left of Table 1 were deduced from the trip dataset following a set of assumptions and rules. For example, it was assumed that the driver will park at the facility that is closer to the final destination and that the energy requirements will depend on the subsequent trips of the day and the remaining SOC. Subsequently, the initial SOC depends on the distance driven so far and the energy consumption. Additional context variables that were necessary for the simulation are depicted in the middle-right part of the table.

The "dumb charging" strategy is useful for understanding the spatiotemporal allocation of demand and applying the simple area-based and time-based fixed prices that are explained in Equation (1). It also enables the higher allocation of power capacity to the busiest parking facility as a strategic decision. In the next step of our methodology, the charging events were driven by actual choices of the users, which in turn, were probabilistic outcomes of a decision process. Individuals try to maximize their utility (Equation (2)), which is a linear combination of factors from all the sections of Table 1. Using the choicebased formulation described in Equation (8), the prices of the charging bundles were optimized with respect to CSP revenue. Then, they were used to rerun the simulation and evaluate other key metrics such as the load factors and demand–supply imbalance.

EV scheduling problems are typically characterized by large numbers of variables and constraints that are not continuously differentiable and increase the related models' execution times for finding an optimal solution. Metaheuristic algorithms (MAs) are very popular in the relevant literature as a means for solving these NP-hard (nondeterministic polynomial-time hard) problems [55,56]. The main disadvantage of MAs is that they are not guaranteed to reach the global optimum, due to the stochasticity in the process. However, V2G scheduling problems are typically highly dimensional, nonconvex and nonlinear optimization problems, and MAs are arguably one of the best options for both binary and real-valued problems.

The formulation in this paper is, in fact, a constrained nonlinear problem, and the two main categories of MAs that are applied to solve such are Genetic Algorithms (GAs) and Particle Swarm Optimization (PSO).

The reader is referred to [55,56] for a detailed review of GAs, PSO and other metaheuristics applied in Unit Commitment (UC) problems and EV charging/discharging coordination. The studies [57,58] are typical applications of the two methods. In [57], the authors used a GA in a game theoretic analysis of EV charging coordination. On the other hand, Hutson et al. [58] applied binary PSO to find the optimal buying and selling times for a fleet of vehicles in a parking lot.

The genetic algorithm creates a population of candidate price vectors for the charging bundles, and the best candidate approaches the optimal price vector. Each population is generated after applying certain stochastic operators to the previous population. In particular, the revenue for the operator, which is the fitness score in this application, is calculated for every price vector in the population using Equation (8). If the score of the vector is higher than the average fitness score, it is selected to be transitioned to the next population. This process is typically known as selection. If the score is lower than the average fitness score, random changes are applied to the parent price vectors in order to generate children for the new population. This process of adding diversity in the creation of new offspring is typically known as mutation. If some of the prices in a price vector lead to solutions that violate capacity constraints (Equations (9) and (10)) or price-policy constraints (Equation (11)), a penalty score is incremented by the maximum price. In this way, infeasible price candidates are less likely to be selected for the next population.

The algorithm is presented in detail in Figure 5. First, we initialize the number of iterations (generations) as *G* = 100. We randomly generate an initial population P0 of size *P* = 100. Each individual in the population is represented by a D-dimensional vector, where D is the number of charging (or charging and discharging) bundles:

$$p\_i = (p\_{i1,} p\_{i2,} \dots, p\_{iD}) \tag{12}$$

and the genetic encoding is a direct value encoding the price. This value can be between 0 and the upper bound for the charging bundles or between the lower bound and 0 for the discharging bundles. The fitness function (the choice-based revenue function in Equation (8)) is evaluated for all the individuals in P0. Then, as long as the stopping criterion is not satisfied, we iteratively create subsequent populations. We calculate the score (revenue) for each individual and the average score of the population. We select the individuals with scores higher than the average population score to move to the next generation, and we remove the remaining ones. We also mutate the genes (prices) of the successful individuals by adding random noise, in order to create new price vectors and always have a population of size 100. At each generation *t,* we evaluate the fitness function Pt , and we follow the same steps. Finally, when the stopping criterion is satisfied, the iteration is terminated, and the individual that generates the maximum revenue is selected.

**Figure 5.** Genetic algorithm solution for price optimization.

The second nature-inspired algorithm that was adopted in this study for optimization is the PSO. Like the genetic algorithm, each individual in a population (here referred to as swarm because it is based on the information exchange of birds in a swarm) is updated in an iterative manner. The exploration of the problem search space is guaranteed by introducing, again, a certain stochasticity in the transitions. Upon initialization, each individual particle in the swarm creates a randomized position solution, i.e., vector of prices, and a randomized velocity within a uniform range of values. Then, the initial positions are assigned to the particles' best-known positions. For each iteration, the algorithm updates the velocity of the price vectors using their best individual positions and the best position of the swarm. A cognitive constant c1 limits the influence that the particle's best-known positions have on their new velocity, while a social constant c2 limits the influence the best price vector of the swarm has on the other vectors.

The details of the PSO solution are presented in Figure 6. The algorithm starts by initializing a set of parameters including the acceleration constants *c*<sup>1</sup> and *c*2. The size of the swarm is assumed to be *P =* 100. The position of each solution (particle) *i* of the swarm is represented by a D-dimensional vector, where D is the number of charging (or charging and discharging) bundles, as depicted earlier in Equation (12). Simultaneously, each particle is randomly assigned an initial velocity, which is given by:

$$
v\_{i} = (v\_{i1,}v\_{i2,} \dots\_{\prime}, v\_{iD})\tag{13}$$

**Figure 6.** Particle Swarm Optimization (PSO) solution for price optimization.

The fitness value (Equation (8)) is evaluated for each particle's position *f(xi).* The current position is assigned as the best local position (x\* i), while the position with the highest fitness value is assigned as the best global position (*gbest*). Then, as long as the termination criterion is not satisfied, the positions and velocities of the swarm particles are updated based on (a) their own best local positions and (b) the global best positions in their neighbourhood:

$$p\_i^{(t+1)} = p\_i^{(t)} + v\_i^{(t+1)} \tag{14}$$

$$v\_{i}^{(t+1)} = v\_{i}^{(t)} + c\_{1}r\_{i1} \left(\mathbf{x}\_{i}^{\*(t)} - \mathbf{x}\_{i}^{(t)}\right) + c\_{2}r\_{i2} \left(gbest - \mathbf{x}\_{i}^{(t)}\right) \tag{15}$$

where *t* is the iteration count, while *r*<sup>1</sup> and *r*<sup>2</sup> are random vectors that take values between 0 and 1. The fitness value is evaluated for the new position, and the local and global best solutions are updated. At the end, when the termination criterion is satisfied (*t =* 100)*,* the best particle is selected as the optimal price vector.

Both the GA and PSO algorithms were developed in Python, which was also used to build the simulation framework.

The simulation framework, which is demonstrated in Figure 7, considered four control scenarios: (a) fixed pricing based on the time of day (FP), (b) fixed pricing based on the time of day and typical spatial demand (FP2), (c) optimal pricing with the GA, and (d) optimal pricing with PSO. All these control scenarios were compared with an uncontrolled charging scenario where it was assumed that charging demand was equally allocated for the parking duration.

**Figure 7.** Simulation framework for charging and V2G coordination. The three levels capture (**a**) the synthetic approach to building a representative demand for travel and energy, (**b**) the technology and behavioural assumptions and (**c**) the dynamic parameters that were evaluated under different scenarios.

> All the simulation runs account for a 12 h operating window between 9:00 and 21:00 with overlapping subsequent 4 h scheduling windows. By running the simulation for all the possible combinations, a total of 48 cases were modelled. Breaking down the problem into 4 h subproblems led to near-optimal solutions but, simultaneously, did not become computationally expensive, which would be prohibitive for the numerous scenarios that were analysed.

#### **5. Results**

The main metrics that were evaluated after each simulation run are the revenue for the charging service provider, and the load factors for both the physical dimension of charging posts and the power supplied by the DSO.

Figures 8 and 9 demonstrate the results for the simulation across two dimensions: the percentage of EVs and the overload capacity. Figure 8 corresponds to the heavy business area, while Figure 9 corresponds to the commercial area. For each case study, we examined the load curve against a transformer capacity that was designed based on the maximum value of the baseline curve (first column) and on an overload capacity of 20% (second column). Finally, the three rows are associated with an increasing percentage of EVs from top to bottom.

**Figure 8.** Charging allocation analysis for Electric Vehicle (EV) market penetration and overload capacity (Canary Wharf area). The red dotted line represents the nominal capacity of the distribution transformer; the grey line, the baseline load demand; the blue dotted line, the uncoordinated charging scenario; and the green line, the charging allocation with Fixed Pricing based on time of day (FP).

**Figure 9.** Charging allocation analysis for EV market penetration and overload capacity (Westfield Shopping Centre area). The red dotted line represents the nominal capacity of the distribution transformer; the grey line, the baseline load demand; the blue dotted line, the uncoordinated charging scenario; and the green line; the charging allocation with Fixed Pricing based on time of day (FP).

Since the peak demand for the study areas is already high, a 20% overload capacity is a significant increase in the available power, and it is sufficient, even for the full-electrification scenario. The baseline load distribution is quite similar for the two areas, with the busiest time being around midday. For the uncoordinated-charging scenario, there are several periods where demand exceeds the available capacity. The most extreme case is the scenario for Westfield, with 100% of EVs and 0% overload capacity.

The FP control algorithm takes into account historical driving and activity data to penalize the hour slots with the highest charging demand. The price for each charging bundle is calculated based on Equation (1) without the area factor. Each charging bundle is allocated only if it is not constrained by power availability and the number of free charging posts. This pricing incentive initiates a behavioural shift, with some EV drivers charging later in the day, while some drivers with low SOC needs are discouraged from charging at all.

Figures 8 and 9 do not include V2G services for parking customers. The impact of V2G services can be observed in Figure 10. From this point on, only the Canary Wharf results are presented because the relative effects of the parameters on the results were similar for the Westfield area. The difference is that now, the overload capacity was kept fixed at 0%, and it is replaced in the graphs by the V2G availability parameter.

It is interesting to observe that the final load curve is below the baseline curve for the majority of the day. This can be explained by the fact that several drivers prefer to sell electricity back to the grid instead of charging their cars, even taking into account the disutility from the reduced SOC for the rest of their daily trips. This is extremely useful for periods of peak demand because it allows other drivers with higher charging needs to refill their batteries. Nevertheless, it incurs additional imbalance costs to the CSP during nonpeak periods, when the V2G services are not required.

The net revenue for the CSP is optimized with the choice-based pricing algorithm that was presented in the previous sections. Figure 11 shows the load curves under the four different control approaches, with and without V2G availability. FP2 is similar to FP, but now, the area factor is included in the calculation of Equation (1). In this way, busy parking is penalized, and drivers are incentivized to plug in their cars in a more distant location.

In terms of overall load scheduling, this method has very similar results to the previous one. On the other hand, the two solution algorithms for our optimization problem have a more profound effect. When V2G is not available, a higher portion of charging events is shifted from peak to nonpeak hour slots by setting prices that are closer to the drivers' willingness to pay. To better understand what happens when V2G services are provided, one can have a look at Table 2.

**Figure 10.** Charging allocation analysis for EV market penetration and V2G availability (Canary Wharf area and 0% overload capacity). The red dotted line represents the nominal capacity of the distribution transformer; the grey line, the baseline load demand; the blue dotted line, the uncoordinated charging scenario; and the green line, the charging allocation with Fixed Pricing based on time of day (FP).

**Figure 11.** Charging allocation analysis for V2G availability and control algorithm (Canary Wharf area and 0% overload capacity). The red dotted line represents the nominal capacity of the distribution transformer; the grey line, the baseline load demand; the blue dotted line, the uncoordinated charging scenario; and the green line, the charging allocation with the respective algorithm.


**Table 2.** Revenue performance and parking load factors for various parameter combinations.

When drivers are allowed to sell electricity to the grid using V2G technology, all the methods result in losses for the CSP because the expenses from the "selling" bundles exceed the income from the "buying" bundles. The choice-based optimization reduces the losses, especially when the GA solution is applied. The reduced parking load factors indicate that upon decreasing the sale prices to avoid excessive V2G and reduce the imbalance costs, the V2G activities are spread throughout the day.

The trade-offs between charging and discharging in the choice model heavily rely on the customers' sensitivity to price. As was highlighted earlier, the class-specific parameters for the two latent classes were estimated by the authors in previous work. Therefore, the estimates were bound to the tariffs used in the choice experiments and do not necessarily reflect future fluctuations in electricity prices. The effect of this behavioural uncertainty on the outcome of the optimization problem was addressed by performing a sensitivity analysis regarding the price coefficients. The outcomes are demonstrated in Table 3 for the GA solution. The relative differences were similar for the PSO solution.

**Table 3.** Analysis of revenue and parking load factor sensitivity to price parameters.


The original specification was defined as "Medium Price Sensitivity", and the price parameters were halved and doubled, respectively, for the "Decreased" and the "Increased" scenarios. When drivers are less sensitive to charging prices, their utility is overruled by

their energy needs and their willingness to walk, so there is an increase in drivers that charge in parking lot 1, and the overall revenue becomes positive for the CSP. On the contrary, when drivers are more sensitive to charging prices, the number of V2G events increases in both parking facilities. Consequently, the imbalance costs are increased by the excess electricity, and the optimal solution leads to an overall loss.

The following results indicate the significance of the demand parameters in the developed framework. If similar datasets become available in the future, a cross-validation could be useful since some elements of the choice experiments that were used are not established at the moment (e.g., workplace charging services), and some properties were approximated (e.g., the sensitivity to energy quantity). At the same time, the estimated parameters should be treated with caution when applied to other geographic locations, since it is likely that they are correlated with some idiosyncratic preferences of British drivers.

The next step in the analysis was to understand how the optimal prices were related to the attributes of the charging bundles. Figure 12 demonstrates a classification of the prices that were generated with the GA solution by power and charging duration. It is observed that short-duration, high-power bundles tend to be more dispersed around 0 compared to long-duration, low-power bundles. This is essentially an indirect reward to users that strain the power network less by spreading their charging demand over time.

**Figure 12.** Classification of GA-optimized prices for charging and V2G bundles by power and charging duration.

To conclude this section, we performed a comparative analysis of the two algorithms' performance in terms of computational time. The run time of the optimization was evaluated for three different parameters: (a) the number of iterations, (b) the number of populations (GA) or the number of particles (PSO) and (c) the effect of scaling up the

number of EVs. All the parameters were found to have linear effects on computational time as is shown in Table 4.


**Table 4.** Comparative analysis of computational times for solution algorithms.

The metaheuristics' overall performance can be summarized as follows:


#### **6. Conclusions**

This paper shows how a choice-based revenue-management problem can be integrated with the parking and charging choices of electric vehicle drivers. In particular, an integrated latent class and nonlinear framework was developed to optimize prices for charging services provided by a charging service provider.

There are two innovative elements of the developed methodological framework in comparison to existing research.

First, the endogenous relationship between the sensitivity of EV drivers to charging characteristics and the charging coordination methods applied by the operator was captured with a sophisticated disaggregate model of demand. Most importantly, the parameters of this model were empirically estimated from user-tailored choice experiments for charging choices.

The second contribution of this paper is the development of a framework that can have direct implementation in the charging service industry. Revenue management allows the closed-loop integration of supply and demand and has proven to be financially beneficial in several service industries. The excessive number of papers that have been written in the last decade aiming to accommodate the increasing charging needs of EV drivers cannot always be aligned with practice-ready business models.

A microsimulation framework was used to implement the pricing model for a synthesized network of two distinctive regions. Baseload electricity curves were modelled by scaling up typical load profiles, and survey data for travel behaviour were scaled up by using synthetic population methods. The charging behaviour of the simulated drivers was modelled using an advanced discrete choice model, and different scenarios were established for EV penetration rates, the overload capacity and the V2G availability.

One limitation of this online reservation system is that it cannot capture last-minute stochastic arrivals. As was explained in the introduction, this approach reduces the uncertainty both for the users who need to be reassured of a certain level of SOC for their subsequent trip and for the parking operator who wants to achieve a smooth allocation of charging demand. However, it results in a conservative lower-bound estimation of the optimal revenue. Future research could combine a revenue-management approach, which by definition, has to be resolved in a preservice booking period, with a more dynamic application that considers last-minute arrivals.

The results suggest that the revenue-management framework simultaneously maximizes revenue and assists in the prevention of local transformer overloading. Charging peaks are alleviated especially when V2G services are adopted to provide energy back to

the grid. In addition, it accommodates drivers during the peak hours, while before, they were unable to charge their vehicles because of network constraints.

The outcomes of this research could potentially be interesting for retail operators that host charging infrastructure and want to understand revenue opportunities from merging charging and retailing services. The charging bundles described could be extended to incorporate a point system or the exchange of electricity with retail products. At the same time, the power constraints could reflect the energy needs of a retail building, such as, for example, lighting, heating and cooling for a supermarket in high-demand hours. It is significant to highlight here the rapid increase in studies that are exploring the integration of V2G with energy-management systems in buildings [59–61].

**Author Contributions:** Conceptualization, C.L., A.S. and J.W.P.; methodology, C.L., A.S. and J.W.P.; software, C.L.; validation, C.L.; formal analysis, C.L.; investigation, C.L., A.S., and J.W.P.; resources, A.S. and J.W.P.; data curation, C.L.; writing—original draft preparation, C.L.; writing—review and editing, C.L. and A.S.; visualization, C.L.; supervision, A.S. and J.W.P.; project administration, A.S. and J.W.P.; funding acquisition, A.S. and J.W.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by the Grantham Institute for Climate Change and Climate KIC.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in Dryad at https://doi.org/10.5061/dryad.c59zw3r68.

**Acknowledgments:** The authors would like to thank Transport for London for providing the LTDS dataset that was indispensable for the analysis. We would like to dedicate this paper to the recently deceased J.W.P. for his invaluable inspiration and support towards this research.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


#### *Article*

## **Multi-Criteria Stochastic Selection of Electric Vehicles for the Sustainable Development of Local Government and State Administration Units in Poland**

#### **Paweł Ziemba**

Faculty of Economics, Finance and Management, University of Szczecin, Cukrowa 8, 71-004 Szczecin, Poland; pawel.ziemba@usz.edu.pl

Received: 2 November 2020; Accepted: 27 November 2020; Published: 29 November 2020

**Abstract:** Increasing the popularity of electric vehicles is one way of reducing greenhouse gas emissions and making the economy more sustainable. In Poland, the use of electric vehicles is to be increased by the adoption of the Act on Electromobility and Alternative Fuels. This Act obliges local government units and state administration to expand the electric vehicle fleet. The expansion of the fleet should be carried out on a planned basis, based on rational decisions supported by economic analyses. Therefore, the aim of this article is to provide a recommendation of an electric vehicle that meets the needs of local and state administration to the greatest extent possible. The aim has been achieved using the multi-criteria decision analysis method called PROSA-C (PROMETHEE for Sustainability Assessment—Criteria) combined with the Monte Carlo method. The PROSA-C method allows promoting more sustainable vehicles with high technical, economic, environmental and social parameters. The Monte Carlo method, on the other hand, is a stochastic simulation tool that allows for taking into account the uncertainty of parameters describing vehicles. As a result of the research, the most and least attractive vehicles were identified from the perspective of the needs of local government units and state administration. Moreover, the conducted research allowed confirming the effectiveness and usefulness of the research methodology proposed in the article and the procedural approach combining the PROSA-C and Monte Carlo methods.

**Keywords:** electric vehicles; PROSA; PROMETHEE for Sustainability Assessment; MCDA; Multi-Criteria Decision Analysis; stochastic analysis; Monte Carlo; uncertainty

#### **1. Introduction**

Global energy demand is almost steadily increasing, the only exception being the global lockdown in the first half of 2020, which caused a temporary drop in demand for energy production [1]. Together with increased energy production, the number of greenhouse gases emitted into the atmosphere is also increasing: at present, more than 25 billion tons of CO2 are released annually as a result of human activity [2]. One of the most important options for reducing CO2 emissions into the atmosphere is the development of electric vehicles [3], because, according to the International Energy Agency, around 14% of greenhouse gases in the world are produced by the transport sector [4]. Therefore, the electrification of transport, together with an increase in the use of renewable energy sources, offers the possibility of significantly reducing greenhouse gas emissions [3]. Of course, the development of electric vehicles is not the only option for reducing CO2, emissions associated with transport. Another interesting option is, for example, the design of urban space in such a way as to maximize the possibility of walking [5,6]. However, the need to use the vehicle cannot always be eliminated, so electrifying transport remains the main way to reduce emissions contributing to the sustainability of the transport system.

In order to increase the share of electric vehicles in transport, the Polish government has adopted the Act on Electromobility and Alternative Fuels [7] imposing obligations on state and local government units in terms of:


According to this legal act, state administration authorities and local government units with more than 50,000 inhabitants must gradually increase the share of electric vehicles in the vehicle fleet. For state administration bodies, the share of battery electric vehicles in the fleet of used vehicles should be: from 1 January 2022 at least 10%, from the beginning of 2023 at least 20%, and in 2025 at least 50%. For larger municipal districts and counties, i.e., with more than 50,000 inhabitants, the share should be at least 10% from 1 January 2022, and at least 30% in 2025. In addition, local government units are obliged to provide public transport services using a fleet partly composed of zero-emission buses: from the beginning of 2021 such buses are to constitute 5%, from 2023 10%, from 2025 20%, and on 1 January 2028 it is to be 30% of the bus fleet. As far as the expansion of the electric vehicle charging network is concerned, the number of charging points required in 2021 depends on the number of inhabitants and the number of cars in each municipal district. The detailed requirements are shown in Table 1 [7,8].

**Table 1.** Required number of electric vehicle charging points installed by 31 March 2021 depending on the size of the municipal district.


Increasing the share of electric vehicles in the fleets of state and local government units is expected to contribute to the number of 1 million electric vehicles in use in Poland in 2025 [9]. Therefore, a critical problem is the appropriate expansion of the fleets of state and local administration vehicles. The expansion of fleets should be well thought-out and carried out in a planned manner, and decisions taken in this area must be rational and supported by analyses. The expansion of fleets of electric vehicles cannot take place chaotically, without a plan and research, because this can lead to wasteful spending of state and local government funds, with criminal sanctions associated with this. This is all the more important because the cost of purchasing electric vehicles is much higher than that of conventional vehicles, so ill-considered purchases of electric vehicles result in greater financial losses than conventional vehicles. Selecting suitable vehicles to expand the fleet is even more difficult due to the fact that electric vehicles, even belonging to the same class, differ in a number of parameters. It is important to properly select such vehicles in order to minimize the inconvenience associated with, among other things, the low range and long charging time of such vehicles, while maintaining a reasonable price of expanding the fleet of electric vehicles.

The research problem addressed in the article, and at the same time its practical objective, is to analyze the electric vehicles available in Poland and to present recommendations of vehicles with the highest utility for institutional users (state and local government bodies). This is of particular importance in the context of the previously indicated statutory obligation of state and local government units to expand the fleet of electric vehicles and the expected increase in demand for such vehicles.

The methodological contribution is the use of MCDA (Multi-Criteria Decision Analysis) method called PROSA-C (PROMETHEE for Sustainability Assessment–Criteria) [10,11], which allows taking into account the sustainability of decision-making alternatives in the solution of a decision problem. In the case of electric vehicles, the PROSA-C method allows promoting vehicles with more balanced economic, technical, social and environmental characteristics. What is important, in this article the method has been developed with elements of a stochastic simulation, which takes into account the uncertainty of individual parameters of electric vehicles. This uncertainty should be taken into account in the solution of the decision problem and is due, among other things, to the fact that certain operational parameters depend on the user and how the vehicle is used, e.g., the same vehicle may have different energy consumption and range depending on the driving style.

Section 2 contains a literature discussion on the applications of MCDA methods in decisionmaking problems related to electric vehicles. Section 3 discusses the PROSA-C method, a stochastic analysis using the Monte Carlo method, and a research procedure based on these methods. Section 4 presents the results of research on the selection of electric vehicles for local government units and state bodies in Poland. The article ends with a summary and indication of further research directions.

#### **2. Literature Review**

Electric vehicles are characterized by many parameters, such as: electric motor size, weight, range, battery capacity, energy consumption, charging time, etc. [12,13]. These parameters are often in conflict with each other, so improving one of them causes deterioration of another, e.g., higher battery capacity means longer charging time. Moreover, many parameters are characterized by uncertainty, e.g., charging time depends on the charger used. Therefore, the comparison and recommendation of electric vehicles is a multi-criteria problem characterized by uncertainty and consisting in the search for pareto-optimal solutions. MCDA methods are used to solve these types of decision-making problems [14]. These methods are used both to solve decision-making problems at the strategic level (e.g., selecting policies for the development of electric vehicles and their infrastructure) and at the tactical and operational level (e.g., selecting specific vehicles or locations of charging stations). Barfod et al. [15] studied the opportunities, risks and policies for the widespread use of commercial electric vehicles in Denmark. Adhikari et al. [16] examined the limitations and challenges of using electric vehicles in the context of Nepal. Both Zhang et al. [17] and Liu and Wei [18] examined the risk factors for building an electric vehicle charging infrastructure in a public-private partnership, with Zhang et al. [17] conducting the study from a Chinese perspective, and Liu and Wei [18] considered the three Chinese provinces separately, building their ranking. Fazeli et al. [19] assessed the potential impact of various fiscal policies on the acceptance of electric vehicles by consumers and the Government of Iceland over the next 30 years. Erbas et al. [20] and Ju et al. [21] examined the potential locations of electric vehicle charging stations and built a ranking, with Erbas et al. [20] conducting a survey for Ankara and Ju et al. [21] examining the locations in Beijing. Xu et al. [22] ranked the various electric vehicle sharing programs used in Beijing. Sałabun, Karczmarczyk [13] and W ˛atróbski et al. [23] compared different electric cars in their rankings. The decision-making problems considered in individual studies and the MCDA methods used are presented in Table 2.

In the above mentioned works, various MCDA methods were used, starting from methods based on single synthesizing criterion: AHP and Fuzzy AHP, TOPSIS and Fuzzy TOPSIS, VIKOR, SMARTER, DEMATEL, COMET, and ending with the outranking PROMETHEE II method. It should be noted here that the AHP method in each of the quoted tests was only used to determine the weights of the criteria. A detailed review of these and many other MCDA methods can be found, for instance, in the papers of [14,24]. In the context of the research objective, it should be noted that MCDA methods may have high or low compensation of criteria and therefore a correspondingly weak and strong sustainability. Depending on the adopted paradigm (strong or weak sustainability), different MCDA methods can be used to solve sustainability decision-making problems. Generally speaking, it is recognized that methods based on a single synthesizing criterion, such as AHP, SMARTER, TOPSIS, VIKOR, DEMATEL, COMET, among others, are characterized by high compensation and therefore only allow for weak sustainability. On the other hand, methods based on outranking, thanks to the use of indifference,

preference and sometimes also veto thresholds, are characterized by lower compensation and thus stronger sustainability [25,26]. In the light of the purpose of the research, in addition to the degree of compensation, another important feature of MCDA methods is their ability to take into account the uncertainty of the criteria. The MCDA methods are often used to deal with *ex-ante* decision-making problems, so the decision-maker or analyst is not able to fully and certainly foresee all its consequences at the time of the decision. So this is a decision taken in conditions of uncertainty. A natural tool for taking into account uncertainty in decision making problems is fuzzy MCDA methods [27]. In turn, the MCDA methods using crisp data can take into account uncertainty by using a stochastic approach, such as the Monte Carlo method [28].


**Table 2.** Decision-making problems related to electric vehicles and MCDA methods used.

Analyzing the quoted articles, several important research gaps can be observed in the studies on the selection of electric vehicles for public administration and local authorities. First of all, the presented literature review indicates a small number of works in which electric vehicles were compared in a multi-criteria way. It can also be noted that to the best of the author's knowledge, no scientific research on the selection or recommendation of electric vehicles for administrative and local government units has been published so far. On the other hand, the quoted research on the Polish electric vehicle market is out of date, because this market is changing dynamically and new vehicle models appear on it every year, and current vehicles are updated. As regards the MCDA methods used, it should be noted that among the approaches taking into account the uncertainty of parameters of electric vehicles, the only methods used are Fuzzy TOPSIS and COMET, operating on fuzzy sets. Unfortunately, both methods are characterized by high compensation [24] and therefore weak sustainability [14]. This means that a high

value of one of the vehicle's parameters can fully compensate for a low value of another parameter [29], so, for example, a high maximum speed can compensate for high energy consumption. In other words, in the case of high compensation, a profit for one criterion may compensate for losses for another criterion, while low or no compensation reduces or eliminates this possibility altogether [30]. There is another research gap here, because the use of highly compensated MCDA methods results in solutions obtained using the indicated methods are not sustainable.

The research gaps identified directly determine the objectives and contribution of this research. The aim is a multi-criteria analysis of electric vehicles currently available in Poland, which meet the needs of state and local administration bodies. The practical effect will be to identify the most useful vehicles for these administrative units. As far as the scientific contribution is concerned, in this article, the PROSA-C method was used to evaluate electric vehicles available in Poland. It is a method designed to solve multi-criteria decision-making problems, enabling sustainable solutions. It allows adjusting the balance between the criteria influencing the expected degree of sustainability of the solution. This article develops the PROSA-C method by introducing elements of a stochastic analysis, thanks to which the uncertainty of data describing characteristics of electric vehicles is taken into account.

#### **3. Materials and Methods**

#### *3.1. PROSA-C Method*

PROSA is used to consider discrete decision-making problems, where a set of *A* = {*a*, *b*, ...} consisting of *m* alternatives is considered. Alternatives are considered in terms of *n* criteria belonging to a set *C* = {*c*1, *c*2, ... , *cn*}. The calculation procedure of the PROSA-C method [10,11] consists of 7 stages, with the initial 4 stages taken directly from the PROMETHEE method [31,32], on which the PROSA method is based. These steps are based on the approach using single criterion net flows [33] (pp.161–162) [31] (pp.200). The subsequent stages are written below using mathematical formulas.

1. Determination of deviations based on pair-wise comparisons:

$$d\_j(a,b) := c\_j(a) - c\_j(b) \tag{1}$$

where *dj*(*a*, *b*) denotes the difference between the performances of *a* and *b* on a *j*-th criterion.

2. Application of the preference function:

$$P\_j(a,b) := F\_j[d\_j(a,b)]\tag{2}$$

where *Pj*(*a*, *b*) denotes the preference of alternative *a* with regard to alternative *b* on each criterion, as a function *F* of *dj*(*a*, *b*).

3. Calculation of a single criterion net outranking flow:

$$\phi\_j(a) = \frac{1}{m-1} \sum\_{i=-1}^{m} \left[ P\_j(a, b\_i) - P\_j(b\_i, a) \right] \tag{3}$$

where φ*j*(*a*) denotes the criterion net flow of *a* over any other alternative, and *m* describes the number of alternatives.

4. Calculation of a global (overall) net outranking flow:

$$\phi\_{\text{net}}(a) \;= \sum\_{j=1}^{n} \phi\_{j}(a) \; w\_{j} \tag{4}$$

where φ*net*(*a*) is the weighted sum of net flow for each criterion, *wj* is the weight of *j*-th criterion, and weights are normalized to 1 ( *<sup>n</sup> j* = 1 *wj* = 1). The above steps allow obtaining the PROMETHEE II solution, while the subsequent steps lead to the PROSA-C solution.

5. Determination of a single criterion absolute deviation:

$$AD\_{\dot{j}}(a) \;= \left| \phi\_{\text{net}}(a) - \phi\_{\dot{j}}(a) \right| \mathbf{s}\_{\dot{j}} \tag{5}$$

where *sj* denotes the sustainability (compensation) coefficient for a *j*-th criterion. As it can be easily noticed, *sj* is a sort of a weight coefficient, and *ADj*(*a*) is a weighted distance of a solution φ*net*(*a*) from solutions φ*j*(*a*) obtained for individual criteria. The greater the value *sj*, the more preferred are alternatives strongly sustainable with regard to the *j*-th criterion, therefore, the compensation degree for the criterion *cj*(*a*) is smaller.

6. Calculation of a single criterion PROSA net sustainable value:

$$PSV\_j(a) \;=\; \phi\_j(a) - AD\_j(a) \tag{6}$$

where *PSVj*(*a*) describes the sustainability of alternative *a* and with regard to the *j-*th criterion. 7. Calculation of a global PROSA net sustainable value:

$$PSV\_{\text{net}}(a) \;= \sum\_{j=-1}^{n} PSV\_j(a) \; w\_j \tag{7}$$

where *PSVnet*(*a*) is the weighted sum of the PROSA net sustainable value for each criterion.

The PROSA-C method allows conducting an analysis of the sustainability of criteria for individual decision alternatives. It distinguishes three sustainability/compensation relationships.


Relations *Cd* and *Cs* are relations indicating the unsustainability of the alternative *a* with regard to the criterion *cj*(*a*). The << and >> operators denote contractual relations "much less than" and "much greater than", expressing the subjective views of the decision-maker about the difference between the compared values. These relations can provide a hint to the decision-maker about the expected values of the coefficient *sj*. For example, if the decision-maker wants to increase the impact of sustainability on the obtained solution, a lower *sj* value can be adopted for more sustainable criteria, and higher *sj* values can be adopted for less sustainable criteria.

#### *3.2. Stochastic Analysis*

A stochastic analysis is based on random variables defined in the probability space. In this article, a stochastic simulation based on the Monte Carlo method is used. In the Monte Carlo method, *K* iterations are carried out, and during each of them *L* independent random variables *r* with a specific distribution *D* are drawn [34]:

$$r\_l^k \stackrel{\text{i.i.d.}}{\sim} D(v) \tag{8}$$

where *rk <sup>l</sup>* is an *l*-th independent random variable (*l* = 1,2, ... ,*L*) drawn in a *k*-th iteration (*k* = 1,2, ... ,*K*), *v* is a set of distribution parameters *D* (e.g., range of values), *i.i.d.* means independent and identically distributed. This approach allows considering solutions based on different values of random variables, covering the whole probability space, including the specified distribution.

In this article, the Monte Carlo method has been used to generate various possible values for criteria describing uncertain parameters of electric vehicles. Random variables represent the values of individual criteria for each of the alternatives *cj*(*a*). Therefore, the number of random variables generated in each iteration should be equal to the product of the number of criteria and alternatives considered (*L* = *n\*m*). However, each random variable may have a different distribution and distribution parameters, depending on which alternative and which criterion it represents. Therefore, in this article, the stochastic simulation is based on a formula:

$$c\_j(a\_i)^k = r\_l^k \stackrel{i.i.d.}{\sim} D(v)\_{i,j} \tag{9}$$

where *D*(*v*)*i*,*<sup>j</sup>* denotes the distribution and parameters adopted for a *j*-th criterion of an *i*-th alternative.

Further elements of the stochastic analysis are based on elements of SMAA (Stochastic Multicriteria Acceptability Analysis) [35]. On the basis of random variables generated in each iteration, rankings of alternatives are built using the PROSA-C method. The results obtained in subsequent iterations allow us to determine the statistic *Bir* indicating the number of iterations, in which the *i*-th alternative obtained an *r*-th position in the ranking. On this basis, after *K* iterations, rank acceptability indices [35] are estimated:

$$b\_i^r \approx \frac{B\_{ir}}{K} \ast 100\% \tag{10}$$

In practice, the rank acceptability indices *br <sup>i</sup>* show the probability that the *i*-th alternative will get the *r-*th position in the ranking. This probability is, of course, determined by random variables generated on the basis of the specified distributions *D*(*v*)*i*,*<sup>j</sup>* . The obtained values of the rank acceptability indices are characterized by a certain precision and an assumed confidence interval, depending on the number of iterations. The expected precision of the rank acceptability indices can be determined from the formula [36]:

$$
\varepsilon = \frac{z\_a}{\sqrt{4L}} \tag{11}
$$

where *z*<sup>α</sup> denotes a standard score for a confidence level 1 − α. In the SMAA method and Monte Carlo simulations 104–106 iterations [36,37] are usually used, which gives a precision of 0.01–0.001 for 95% confidence respectively.

#### *3.3. Research Procedure*

Some elements of the research procedure have already been mentioned in Section 3.2, while this section discusses the complete procedure applied to the decision-making problem of recommending electric vehicles for state and local authorities. A diagram of the procedure is presented in a graphical form in Figure 1.

**Figure 1.** Research procedure.


#### **4. Results**

At the outset, the most frequent needs of state and local government units with regard to electric vehicles were identified. The analysis of Internet sources [39–42] as well as the expert interview revealed that both the state administration and local government in Poland most often use electric hatchback, B-segment (small cars), C-segment (medium cars) and J-segment (SUVs) cars with appropriate electric motor power, energy consumption and range. Therefore, electric cars of these segments with electric motor power of not less than 130PS, a range of at least 200 km and energy consumption of not more than 250 Wh/km are considered in this research. Moreover, only electric vehicles available on the Polish market were considered [38]. The vehicles concerned and their characteristics are shown in Table 3.


**Table 3.** Characteristics of the electric vehicles concerned [38,43,44].

**Table 3.** *Cont.*


<sup>1</sup> 3.7 kW max; <sup>2</sup> 6.6 kW max; <sup>3</sup> 7.4 kW max; <sup>4</sup> 11 kW max; <sup>5</sup> optional 6.6 kW On-board Charger (3.6 kW is standard); <sup>6</sup> optional 11 kW On-board Charger (7.4 kW is standard); <sup>7</sup> 40 kW max; <sup>8</sup> 44 kW max; <sup>9</sup> 46 kW max; <sup>10</sup> 50 kW max; <sup>11</sup> 77 kW max; <sup>12</sup> 100 kW max.

On the basis of Table 3, the decision-making criteria were defined, which were: C1–Top Speed, C2–Acceleration, C3–Total Power, C4–Total Torque, C5–Cargo Volume, C6–Cargo Volume–Seats Folded, C7–Seats, C8–Battery Capacity, C9–Price, C10–Range, C11–Energy Consumption, C12–Charging Time, C13–Fast Charging Time, C14–Safety. It is easy to notice that criteria C9–C14 are uncertain and their values may vary depending on the situation. The other criteria (C1–C8) are certain and their values remain constant. Based on Table 3, the ranges of uncertain criteria values are defined in Table 4. Table 4 therefore shows the distribution parameters *D*(*v*)*i*,*<sup>j</sup>* used in the Monte Carlo simulation.

In subsequent iterations of the procedure, new values for uncertain criteria were generated on the basis of Table 4, using continuous uniform distribution for each criterion. These values, together with the values of the certain criteria, were recorded in the performance table and constituted the input for the PROSA-C method. In the PROSA-C method, in each iteration the same preference model was used, specifying the functions and directions of preferences and the weights of criteria. The weights may be defined directly or relative to other criteria [45], with the PROSA-C method expressing them directly. In the preference model, the most difficult to build is to select the preference function and set the values of the indifference and preference thresholds [46,47]. It is assumed that for quantitative criteria one of the functions should be used: V-shape criterion, V-shape criterion with indifference area, or Gaussian criterion. For qualitative criteria, the Usual criterion or Level criterion is most often used [48]. As regards the values of indifference (*q*) and preference (*p*) thresholds used in the V-shaped functions, Roy states that the threshold values should be between reliable minimum and maximum values for a given criterion. He also suggests that the values of these thresholds may be based on certain characteristics of the value of the criteria, such as mean value, standard deviation, minimum, maximum, etc. [49]. Deshmukh, on the other hand, notes in the context of the Gaussian criterion that the Gaussian threshold (σ) should be placed between the values of thresholds *q* and *p* [48]. Amponsah et al. interpret the threshold σ as the standard deviation of the value of a given criterion [50]. Based on these observations, it can be assumed that the threshold *q* should be less than the standard deviation of the criterion value and the threshold *p* should be greater than the standard deviation. In addition, Podvezko and Podvezko propose an approach where the values of thresholds *q*

and *p* should be between a minimum and a maximum value for the gap between the values of the alternatives on a given criterion [51,52]. The last parameter of the preference model in the PROSA-C method is the values of sustainability coefficients. Research on the PROSA-C method has shown that in order to maintain the performance scale [−1.1] in accordance with the classic PROMETHEE II method for a problem consisting of 14 criteria, sustainability coefficients with values no greater than 0.3 should be used [10]. Taking into account the above observations, a preference model presented in Table 5 was defined. The weights and functions of preferences were expertly defined taking into account assumptions about the applicability of the preferences function [48]. The preference thresholds were based on the population standard deviation σ*<sup>j</sup>* calculated from all the values of a given criterion (for certain criteria) and all the minimum values of a given criterion (for uncertain criteria). The indifference threshold was set at *q* = 0.5σ*<sup>j</sup>* whereas the preference threshold at *p* = 2σ*<sup>j</sup>* [53]. Such threshold values are in line with previous considerations, and at the same time meet the request of Podviezko and Podviezko for threshold values to be included between a minimum and a maximum range of alternatives for a given criterion. Moreover, based on the analysis in [10], the values of sustainability coefficients were set as *sj* = 0.3 for each criterion.


**Table 5.** Preference model.



**Table 5.** *Cont.*

**Preference function**: 5–V-shape with indifference, 1–Usual criterion, 3–V-shape criterion.

Based on the conducted simulation, the rank acceptability indices *br <sup>i</sup>* contained in Table 6 were calculated. Additionally, the values of the rank acceptability indices are graphically shown in Figure 2. One million iterations allowed obtaining accuracy of results at 0.1% with 95% confidence level.

**Table 6.** Rank acceptability indices for individual alternatives.


**Figure 2.** Plot of rank acceptability indices for individual alternatives A1–A12 (and zoomed fragment of the plot).

The analysis of Table 6 and Figure 2 shows that the best alternative, which most often comes first in the simulations, is A6, i.e., Nissan LEAF e+. Slightly worse results are obtained by the group of alternatives A11 (Peugeot e-208), A3 (Hyundai KONA Electric 64 kWh), A10 (Opel Corsa-e), but they are also in the lead and their profiles, determined by the ranks obtained in the simulations, are similar. The rankings are usually followed by the A5 (Nissan LEAF) and A4 (Hyundai IONIQ Electric) alternatives, which are followed by a group of alternatives with similar ranks: A1 (DS 3 CROSSBACK E-TENSE), A2 (Hyundai KONA Electric 39.2 kWh), A12 (Renault ZOE R135). The last positions in the rankings are usually very similar to A7 (BMW i3), A8 (BMW i3s) and A9 (Mini Cooper SE) alternatives, with Mini Cooper SE achieving the worst rank in over half of the simulation. The results obtained are, of course, still uncertain, i.e., it is impossible to determine the order of alternatives with unknown values for uncertain criteria. However, this order is largely predictable. Furthermore, it is possible to identify a group of vehicles whose purchase is worthwhile and a group of vehicles which should not be taken into account. The first group certainly includes A6—Nissan LEAF e+, and the following vehicles can also be considered: A3—Hyundai KONA Electric 64 kWh, A10—Opel Corsa-e and A11—Peugeot e-208. The second group should certainly include the following: A1—DS 3 CROSSBACK E-TENSE, A2—Hyundai KONA Electric 39.2 kWh, A12—Renault ZOE R135, A7—BMW i3, A8—BMW i3s, A9—Mini Cooper SE.

#### **5. Discussion**

The high rankings achieved by alternatives A3 and A6 are due to their good performance in terms of numerous criteria. The values of the criteria for the individual alternatives are shown in Figure 3. Bearing in mind that the preference direction for C1, C3–C8, C10, C14 is maximum and for C2, C9, C11–C13 the minimum is preferred, it is easy to observe that alternatives A3 and A6 have relatively high values of criteria C1–C8, C10, C14. The results are worse for criteria C12–C13 and values for the other criteria are average for the alternatives discussed. This allows alternatives A3 and A6 to obtain high values for φ*net* (compare formula 4). Moreover, these alternatives are relatively balanced on criteria (C12–C13 is an exception). This results in low absolute deviation values (compare formula 5), so they do not receive large penalties for not reaching a sustainable balance of criteria. As a result, they have high PSV values.

As far as the highly rated alternatives A10 and A11 are concerned, they also score well on criteria C7, C9-C10, C13 and, for most of the other criteria, do not deviate significantly from the average. This gives them quite high scores for φ*net*, but they gain higher profits in the PROSA-C method because they are more sustainable than alternatives A3 and A6. This observation can be confirmed by comparing the ranking of acceptability indices obtained in the PROSA-C method (Table 6) with the values obtained with the same assumptions (1 million iterations, accuracy 0.1%, 95% confidence level) in the classic PROMETHEE II method, presented in Table 7.

Comparing Tables 6 and 7, it is easy to see that in the case of alternatives A3 and A6, the PROSA-C sustainability study makes them ranked first less frequently compared to PROMETHEE, while alternatives A10 and A11 are more likely to win the PROSA-C rankings than in PROMETHEE II. Tables 8 and 9 show the rank acceptability indices obtained in the PROSA-C method with sustainability coefficients values of *sj* = 0.5 and *sj* = 1 respectively. The aim of this study was to check whether the results would change as the impact of the sustainability of alternatives on the achieved solution increased.

The analysis of Tables 6–9 shows that as the value of the sustainability coefficient increases, the alternatives A10 and A11 as well as A1, A2, A5 gain. In turn, the alternatives A3, A4, A6, A7, A8, A9, A12 lose. This comparison therefore allows dividing the alternatives into two groups, which include more and less sustainable alternatives.

**Figure 3.** Values of the criteria C1-C14 for the different alternatives A1–A12.




**Table 8.** Rank acceptability indices for individual alternatives obtained by the PROSA-C method, *sj* = 0.5.

**Table 9.** Rank acceptability indices for individual alternatives obtained by the PROSA-C method, *sj* = 1.


#### **6. Conclusions**

The article considers the problem of analyzing and recommending electric vehicles which are most useful for local government and state administration units. For this purpose, based on publicly available information about tenders for local governments and state administration, the most typical demand of such units for electric vehicles was determined. Next, the vehicles available on the Polish market were analyzed, indicating vehicles meeting the needs of local and state units. Based on the relatively new MCDA method called PROSA-C and on the Monte Carlo method, the most useful vehicles were identified, taking into account the uncertainty of parameters describing individual vehicles. The conducted research indicated that the best choice, based on a defined preference model, is currently Nissan LEAF e+.

As regards the generalized conclusions of the studies carried out, there is a wide range of electric vehicles in B, C and J segments. Twelve such vehicles were included in the studies carried out, but a few others were initially rejected on the grounds that they did not meet the requirements for electric motor power or range. In addition, it is easy to see that many of the parameters characterizing electric vehicles are uncertain and dependent on the use of the vehicles or on external factors such as consumption or the types of charging station available. In relation to the methodological approach used in the studies, its usefulness is obviously not limited to electric vehicles, but the methodology is much more universal. The methodological contribution of the article consists in developing the PROSA method by a stochastic analysis based on the Monte Carlo method. This allows the PROSA method, naturally designed to work on crisp data, to take into account the uncertainty of the input data. In addition, the PROSA method, indicating the most useful solution, takes into account the sustainability of the alternatives considered and through the option of changing the values of the sustainability coefficients *sj* allows seeking more or less sustainable solutions.

The limitations of the studies presented in the article are mainly related to the fact that one preference model, presented in Table 5, has been applied. Therefore, the study presented captures the uncertainty of the parameters of electric vehicles, but does not take into account the possible uncertainty of the weights of the criteria, as well as the possibility of applying other preference functions and other values of the indifference and preference thresholds. Another limitation refers to the constant changes on the Polish electric vehicle market. These changes are related, among other things, to the development of the electric vehicle market and the introduction of new vehicle models on it, and to changes in the legal environment of that market. These restrictions result in further directions of research.

An important methodological research direction will be to develop the PROSA method into Fuzzy PROSA, capable of capturing uncertainty by means of fuzzy numbers, as other MCDA fuzzy methods do [54–56]. As far as practical aspects are concerned, it should be noted that subsidies for individual users have been introduced in Poland for the purchase of electric vehicles [57,58]. This may significantly change the situation on the vehicle market and make electric vehicles more attractive and accessible to the public. It would be interesting to carry out a study to address this situation and consider the attractiveness of electric and hybrid vehicles compared to conventional (combustion) vehicles for individual users.

**Funding:** This work was supported by the National Science Centre, Poland, Grant no. 2019/35/D/HS4/02466. **Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Substantiation of Loading Hub Location for Electric Cargo Bikes Servicing City Areas with Restricted Traffic†**

**Vitalii Naumov**

Faculty of Civil Engineering, Cracow University of Technology, Warszawska 24, 31155 Krakow, Poland; vnaumov@pk.edu.pl; Tel.: +48-12-374-30-83

† The present work is the extended version of the paper "Choosing the localisation of loading points for the cargo bicycles system in the Krakow Old Town" presented at RelStat 2018 Conference, Riga, Lavia, 17–20 October 2018; and published in Lecture Notes in Networks and Systems.

**Abstract:** Electric cargo bicycles have become a popular mode of transport for last-mile goods deliveries under conditions of restricted traffic in urban areas. The indispensable elements of the cargo bike delivery systems are loading hubs: they serve as intermediate points between vans and bikes ensuring loading, storage, and e-vehicle charging operations. The choice of the loading hub location is one of the basic problems to be solved when designing city logistics systems that presume the use of electric bicycles. The paper proposes an approach to justifying the location of a loading hub based on computer simulations of the delivery process in the closed urban area under the condition of stochastic demand for transport services. The developed mathematical model considers consignees and loading hubs as vertices in the graph representing the transport network. A single request for transport services is described based on the set of numeric parameters, among which the most significant are the size of the consignment, its dimensions, and the time interval between the current and the previous requests for deliveries. The software implementation of the developed model in Python programming language was used to simulate the process of goods delivery by e-bikes for two cases—the synthetically generated rectangular network and the real-world case of the Old Town district in Krakow, Poland. The loading hub location was substantiated based on the simulation results from a set of alternative locations by using the minimum of the total transport work as the efficiency criterion. The obtained results differ from the loading hub locations chosen with the use of classical rectilinear and center-of-gravity methods to solve a simple facility location problem.

**Keywords:** cargo bicycles; loading hub; facility location problem; computer simulations; Python programing

#### **1. Introduction**

The growth of e-commerce during the last decade is one of the main reasons for the exponentially increased demand for last-mile deliveries that are characterized by cargo diversity and low utilization of vehicles' capacity. Commercial vehicles delivering the packages directly to customers may significantly contribute to the increase of traffic in urban areas. Especially, this concerns the central city areas with historic buildings and low capacity of the roads: commercial vehicles are the main source of noise and air pollution in such areas, they cause congestions and reduce the public space. For these reasons, nowadays, the restriction of traffic in the selected districts is provided in many cities: the motorized vehicles may enter the area only during the specified time window (usually these are night or early-morning hours). Such restrictions, however, negatively influence the businesses located in the areas with restricted traffic by increasing the storage costs and causing the loss of profit due to the lack of goods at the given time moment. In such a situation, cargo distribution systems based on cargo bikes could be considered as the alternative to conventional last-mile deliveries by motorized vehicles.

**Citation:** Naumov, V. Substantiation of Loading Hub Location for Electric Cargo Bikes Servicing City Areas with Restricted Traffic. *Energies* **2021**, *14*, 839. https://doi.org/10.3390/ en14040839

Academic Editor: Albert Y.S. Lam Received: 5 January 2021 Accepted: 2 February 2021 Published: 5 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Loading hubs are necessary elements of the urban distribution system based on cargo bicycles: loads are transported by vans to the storage point located at the bound of the area with restricted traffic and after are delivered to consignees. As examples of such hubs, the following real-world implementations may be listed: a trailer or a semi-trailer, a cargo container unit used as a mobile point, the dedicated transshipment bay, a parcel locker, or a storage space having the character of a permanent loading point. As the core function of the loading hubs, besides the transshipment operations and short-term storage, the e-bikes charging operations should be mentioned.

Both from practical and theoretical points of view, choosing the loading hub location constitutes a significant problem. Practically, such alternative locations should be found that satisfy the legislative restrictions and are characterized by availability for conventional transport and e-bikes. Then, theoretically, by applying a chosen methodology, such the hub location should be distinguished from the set of alternative variants that guarantees the most effective operation of the distribution system.

The current study presents the novel simulation-based methodology to choose the location of a loading hub for a cargo bicycle system given the set of alternative locations. The developed approach, unlike the existing simulation-based methods, considers stochasticity of demand for deliveries in the selected urban area as well as allows the implementation of the routing procedures as part of the technology of goods delivering by electric cargo bikes.

The paper has the following structure: The second part presents a brief review of recent publications related to the research problem; the proposed methodology to substantiate the loading hub location for a cargo bikes delivery system is described in the third part; the fourth part contains the introductive description of software implementing the developed model; the fifth section introduces case studies of choosing the loading point location for a synthetic rectangular network and the cargo bikes delivery system in the Old Town district of Krakow, Poland; the last part offers brief conclusions and directions of future research.

#### **2. Literature Review**

Many recent studies underline the advantages of cargo bicycles as the sustainable mean of transport under urban conditions [1–3]. The author of the paper [4] concludes that the use of cargo bikes contributes to the reduction of traffic congestion and the improvement of air quality. According to the publication [5], both operational and external transport costs can be reduced if the delivery scheme with transshipment hubs and cargo bikes is used for the distribution of e-commerce goods in Antwerp (Belgium). The authors of the research [6] point out that the use of electric cargo bikes in urban logistics activities in Porto (Portugal) reach up to 25% of reductions in external costs. Numeric results regarding the decrease of CO2 emissions due to the use of bicycles for cargo deliveries also show the significant impact: According to [6], CO2 emissions can be reduced by up to 73%, while the authors of the research [7] state that the use of electrically assisted cargo tricycles results in the CO2 emissions decrease by 54% per parcel delivered. Authors of the publication [6] also conclude that cargo bikes can replace up to 10% of the conventional vans without changing the overall network efficiency. The study [8] shows that the expected travel time for delivery distances up to 20 km is on average just 6 min longer if cargo bikes are used instead of traditional vans.

The transport of loads by bikes also has many limitations related to the delivery distance, the consignment size, the vehicles' speed, and the safety of the goods. However, these limitations are not subject to the case of deliveries under the urban conditions when electrically powered bicycles are used: Delivery distances are relatively short, the e-bike's capacity is comparable with the capacity of light commercial vehicles, the speed of conventional vans does not exceed the average e-bikes' speed due to road traffic restrictions, and the freight is protected from the weather conditions when the bikes with the covered bins are used.

The contemporary scientific literature describes the set of operational problems to be solved while designing the e-bike delivery system in an urban area: vehicle routing [9–11], synchronization of bikes with conventional vans [12], choice of vehicle type [10,13], vehicles scheduling [14,15], shaping the recharging strategy [15–17], etc. However, the most significant practical and theoretical issue when setting up the regional e-bike delivery system is the substantiation of the transshipment hub(s) parameters, including the type and location of the hub(s).

As it pointed out in the study [13], contemporary knowledge about planning urban transshipment points is very limited. Substantiation of the loading hub location refers to the branch of the operations research known as the facility location problem (FLP) [18]. The simplest version of FLP (the Weber problem) assumes the choice of a single facility location for the given set of customers' locations. For the multiple hub locations, the FLP becomes NP-hard and is usually solved by using different approximation algorithms that find the solution within a reasonable computational time. In real-world applications of FLP, the uncertainty of demand for deliveries, as well as the stochastic parameters of the servicing process and resource restrictions, should be considered [19–23]; it significantly complicates the process of searching for a reliable solution.

To solve FLP, various approaches are used by scholars and practitioners. The most popular mathematical apparatus for substantiation of the loading hubs location is the mixed-integer linear programming [9,14,15,21,24], although the fuzzy-logic-based models are used as well [17,25]. Another direction for finding the solution of FLP is the simulationbased approaches [5,8,19,26]: The delivery and servicing processes are simulated for a set of alternative locations and the best option is being substantiated based on the simulation results. The simulation-based approaches assume that the adequate model of the transport system is developed as the base for simulations. The model is used to run multiple simulations of the transport system to consider the overall influence of random parameters on the simulations' results and to evaluate statistically significant values of the efficiency criteria. The authors of the study [27] use the robust optimization approach to alleviate the impact of uncertainty on obtained solutions.

Different heuristics are used to obtain the solution for FLP. The authors of the paper [28], to place bicycle stations in a way that minimizes the distance between clients and the closest bike station in Malaga (Spain), have studied the comparative advantages of the following algorithms: genetic algorithm, iterated local search, particle swarm optimization, simulated annealing, and variable neighborhood search. Genetic algorithms (GA) are quite a popular heuristic for solving FLP: e.g., the optimal location and size of charging stations are substantiated based on the developed GA in the study [17], the location of a transshipment hub is being chosen based on the proposed GA in the paper [29] considering the vehicle routing problem in the multistage distribution system.

Although the contemporary literature proposes a variety of methods to substantiate the loading hub location, it should be mentioned that existing approaches are usually case-study-sensitive and allow solving the problem for the specific situation or the given initial data. Considering the presented literature review, the following research gaps should be filled by the proposed simulation-based approach:


#### **3. Simulation-Based Methodology for the Substantiation of a Loading Hub Location**

As the basis for estimating the alternative location of the loading hub, the simulation model of the transportation system is proposed to be used. Such model simulates demand for transport services, as well as the technological processes: handling and charging

operations in loading hubs, delivery of consignments to consignees located in the serviced area, as well as loading and unloading operations at customers' locations and in a hub. To run the model, its proper program implementation should be developed. Based on the simulation results and considered efficiency criteria, the expediency of the loading hub locations is estimated.

The process of goods delivery by electric bikes is implemented within the logistic system of the considered urban area. As basic structural elements of this system, the following subsystems should be listed:


Thus, the mathematical model **M***DS* of the delivery system in a general form can be presented as the tuple of three above listed elements:

$$\mathbf{M}\_{\rm DS} = \langle \Omega, \mathbf{D}, \Phi \rangle,\tag{1}$$

where **Ω** is the transport network; **D** is demand for transport services; **Φ** is the servicing subsystem.

#### *3.1. Transport Network Model*

The most commonly used approach to modeling transport networks is the use of mathematical structures based on graph models. The transport network can be formalized as a pair of subsets—the nodes and the edges:

$$\mathbf{\Omega} = \langle \{\eta\_i\}, \{\lambda\_j\} \rangle, \forall \eta\_i \in \mathbf{N}, \forall \lambda\_j \in \mathbf{A}\_\prime \tag{2}$$

where *η<sup>i</sup>* is the *i*-th node being the element of the network nodes set **N**; *λ<sup>j</sup>* is the *j*-th edge being the element of the network edges set **Λ**.

In the proposed mathematical model, as the nodes of the network, the locations of potential customers of transport services are presented, while the edges (links) are used to reflect the relevant sections of the road network connecting the nodes. Locations of the loading hubs, from which the cargoes are delivered by bikes, can be also represented as the network vertices. Additionally, for the more detailed networks, the nodes can be used to model the road intersections.

The basic characteristics of a node are its geographical coordinates and lists of ingoing and outgoing edges:

$$
\eta = \langle \mathbf{x}, \ y, \ \lambda\_{\rm in}, \ \lambda\_{\rm out} \rangle,\tag{3}
$$

where *x* and *y* are the coordinates characterizing the node location (latitude and longitude may be used as such coordinates); λ*in* and λ*out* are the sets of ingoing and outgoing edges for the given node *η* : λ*in* ⊂ **Λ**, λ*out* ⊂ **Λ**.

The obligatory set of parameters describing the graph edge includes its weight and end vertices:

$$
\lambda = \langle w, \eta\_{\rm out}, \eta\_{\rm in} \rangle,\tag{4}
$$

where *w* is the edge weight (e.g., the length of the corresponding road section, the travel time, the transport costs, etc.); *ηout* and *ηin* are the beginning and the ending nodes of the edge: *ηout* ∈ **N**, *ηin* ∈ **N**.

Note that in a more advanced version of the network model, the set of node and edge parameters could be extended depending on the problem being solved.

#### *3.2. Model of Demand for Goods Delivery*

The core unit shaping transport demand is a request for the delivery of goods understood as the customer's need for services, supported by his purchasing ability, and

presented on the market to be satisfied. A set of requests for the services of a given company forms the demand for the services of a transport enterprise. Each request could be quantified based on a set of numerical parameters.

In the general form, the demand model for goods delivery can be presented as an ordered set of requests:

$$\mathbf{D} = \{\rho\_1, \rho\_2, \dots, \rho\_N\},\tag{5}$$

where *ρ<sup>i</sup>* is the *i*-th request in the flow: *ρ<sup>i</sup>* ≺ *ρi*+<sup>1</sup> if *ti* ≤ *ti*+1, *ti* is the moment of appearance of the *i*-th request; *N* is the number of requests in a flow.

A single request for delivery is characterized by the following parameters:

$$
\rho = \langle \eta\_o, \eta\_{d\prime} \nsubseteq \omega, \ \langle \theta\_l, \theta\_w, \theta\_h \rangle \rangle,\tag{6}
$$

where *η<sup>o</sup>* and *η<sup>d</sup>* are the transport network nodes defining the location of a sender and a recipient: *η<sup>o</sup>* ∈ **N**, *η<sup>d</sup>* ∈ **N**; *ζ* is the time interval between the moments of appearance of the given and previous requests [min]; *ω* is the consignment weight [kg]; *θl*, *θw*, and *θ<sup>h</sup>* are dimensions of the load unit—its length, width, and height [cm].

In the Equation (6), the time interval *ζ* characterizes the intensity of the demand for deliveries for the given consignor: the more the value of the interval, the less intense the demand for transport services.

For a flow of requests with a finite number of elements, the subsetting of requests by sender and recipient can be presented in the form of a travel matrix **Δ**. An element *δij* of such the matrix reflects the number of requests for which the sender is located in the node *η<sup>i</sup>* (origin node), while the recipient—in the node *η<sup>j</sup>* (destination node).

For a single request, its numerical parameters are deterministic, but for the requests flow, these parameters are random, and can be described by stochastic variables. It follows that the demand model for cargo transport can be presented as a set of random variables characterizing the numerical parameters of requests for cargo deliveries supported by the origin-destination matrix defining spatial distribution of the requests:

$$\mathbf{D} = \left\langle \mathbf{A}, \widetilde{\zeta}, \widetilde{\omega}, \left\langle \widetilde{\theta}\_{l}, \widetilde{\theta}\_{w}, \widetilde{\theta}\_{h} \right\rangle \right\rangle,\tag{7}$$

where *<sup>x</sup>* is a random variable characterizing some numeric parameter *<sup>x</sup>* of demand for cargo deliveries.

Given the set of ordered requests for the cargo deliveries characterize the demand, the task of demand modeling can be presented as the task of generating numerical parameters of the requests in a flow. Implementation of the demand model as a flow of *N* elements is a set of requests in the form (5), where numerical parameters are values from samples, generated for the respective random variables, and the locations of senders and recipients are defined by the matrix **Δ**.

#### *3.3. Model of the Servicing Subsystem*

Basic components of the servicing subsystem are the fleet of vehicles (cargo bicycles) and a set of loading hubs:

$$\Phi = \langle \{ b\_i \}, \{ p\_j \} \rangle, \forall b\_i \in \mathbf{B}, \forall p\_j \in \mathbf{P}, \tag{8}$$

where *bi* is the *i*-th cargo bicycle being the element of the vehicles fleet **B**; *pj* is the *j*-th loading hub being the element of all the loading hubs' set **P** (in case, when the single hub location problem is solved, the set **P** contains just one element).

The basic parameters of cargo bikes as elements of the servicing system are their load capacity, dimensions of the cargo space, and technical speed:

$$b = \langle q\_{b\prime}, v\_{b\prime} \,\,\langle l\_{b\prime} \,\, w\_{b\prime} \,\, h\_{b} \rangle \rangle,\tag{9}$$

where *qb* is the load capacity of a cargo bicycle [kg]; *vb* is the average technical speed of a cargo bicycle [km/h] (the technical speed may be also considered as a random variable); *lb*, *wb*, and *hb* are dimensions of the bicycle cargo space—its length, width, and height [cm].

Note that the battery capacity may be considered as the key parameter for electric bikes, if it is used as the restriction in the problem being solved or if it is needed for the estimation of the resulting efficiency criteria. The dimensions of the loading space are considered in the model as the additional restriction to the load capacity (these parameters may be used if the solution of the knapsack problem is being considered in the procedure implementing the loading operations).

Loading hubs within the frame of the proposed model are short-term (temporary) storage points to which cargo is transported by other means of transport (usually—by delivery vans) to deliver them to the final recipient. The basic characteristics of loading hubs are their location and capacity:

$$p = \langle \eta\_{p'} q\_p \rangle,\tag{10}$$

where *η<sup>p</sup>* is the node of the transport network representing the location of the loading hub, *η<sup>p</sup>* ∈ **N**; *qp* is the loading hub capacity (possible maximum amount of cargo that can be stored at the hub) [t or m3].

#### *3.4. Shaping the Routes for Delivery of Cargoes by Electric Bikes*

In most cases, the deliveries of goods by bicycles are carried out under conditions of combined consignments: By aiming the minimization of the total distance, the delivery routes are being formed for the requests that are received during the considered time window (in such situations, the total weight of the combined consignment should not exceed the vehicle's capacity). Assuming that servicing companies behave rationally and shape rational (or optimal) delivery routes while servicing the clients, the simulation model should approximate the real-world behavior of transport operators, and thus—it should contain the implementation of routing procedures.

As input data in the routing algorithms, the matrix of the shortest distances between the vertices of the transport network is used. To estimate such matrix, any known method of searching for the shortest paths should be used (e.g., Dijkstra algorithm or Floyd–Warshall algorithm [30]). For the generated flow of requests, the task of shaping the delivery routes is solved as the traveling salesman problem (TSP), where the location of the load sender (the parameter *ηo*) is defined as the location of a loading hub, and locations of recipients (the parameter *ηd*)—as the locations of the corresponding nodes in the transport network model. Known heuristic methods can be used for solving the TSP (e.g., Clarke–Wright algorithm, simulated annealing method, methods based on genetic algorithms, ant colony optimization methods, etc. [31]). The result of the routing procedure is a set of delivery routes, being the ordered sets of vertices of the graph representing the network model. The first and last element in the set of nodes shaping the route are the vertices defining the location of a loading hub.

The basic characteristics of delivery routes, calculated based on given parameters of the mathematical model, are the route length (total distance covered), the total weight of transported cargo, and the total transport work. These technological characteristics may be considered as key indicators to estimate the efficiency of the routing procedures, as well as the efficiency of the delivery system for alternative facility locations.

Note that time windows may be added to the model as additional restrictions: for each consignor, the time window when a parcel should be delivered may be considered. More advanced optimization techniques should be used within the routing procedure in this case, as far as the solution for the capacitated vehicle routing problem with time windows should be found.

#### *3.5. The Objective Function for the Hub Location Problem*

Special scientific literature proposes various efficiency criteria for solving FLP. The most widely used criterion is total costs: In the research [5], the e-commerce distribution system based on cargo bikes and delivery points is assessed with total operational and external costs, the authors of the paper [9] propose to select cost-minimizing locations of parcel lockers for goods deliveries by cargo bikes, the study [15] uses total costs as the objective function to substantiate the location of recharging stations for electrical vehicles. However, some recently developed approaches define facility locations considering environmental pollution together with operational costs: The model developed in [10] uses the total costs and the environmental impacts to assess alternative configurations of urban consolidation centers, while the study [27] considers the combination of total costs and parameters of environmental pollution to select locations of refueling stations.

There are also other efficiency criteria mentioned in recent studies and used to substantiate the facility location. Authors of the publication [20] use total system travel time and total system net energy consumption for optimal positioning of dynamic wireless charging infrastructure for battery electric vehicles, while the minimization of the power loss in a distribution network is considered as the objective function in the study [17] when solving the problem of the charging stations location. The paper [24] proposes solving the multimodal capacitated hub location problem based on the profitability of alternative locations. The authors of the study [7] use the total distance traveled and the CO2 emissions per parcel delivered to substantiate the efficiency of an urban micro-consolidation center located in the delivery area where the deliveries are performed by electrically-assisted cargo tricycles and electric vans.

For solving the hub location problem based on the simulations of the delivery process, the minimization of total transport work can be used as the core objective function reflecting the technological operations performed within the logistic system of the considered urban area:

$$\mathcal{W}\_{tkm}(\eta\_H) = \sum\_{i=1}^{N\_R} \sum\_{j=1}^{N\_{S(i)}} q\_{ij} \times d\_{ij} \to \min,\tag{11}$$

where *η<sup>H</sup>* is the node of the transport network **Ω** that represents the loading hub location, *η<sup>H</sup>* ∈ **NH**, **NH** is the set of possible loading hub locations, **NH** ⊂ **N**; *NR* is the number of delivery routes formed to service the transport demand in the considered city area; *NS*(*i*) is the number of segments at the *i*-th route, *NS*(*i*) ≥ 2; *qij* is the total weight carried by a vehicle at the *j*-th segment of the *i*-th route [tons]; *dij* is the distance covered by a vehicle at the *j*-th segment of the *i*-th route [km].

Note that <sup>∑</sup>*NS*(*i*) *<sup>j</sup>*=<sup>1</sup> *qij* × *dij* represents the total transport work performed by a vehicle at the *i*-th delivery route.

The proposed objective function may be used for solving the multiple hub location problem as well. In such a case, the subsets of locations representing alternative solutions should be formed prior (if the number of available alternative locations *n* is greater than the number *k* of hubs to be located, then the number of subsets is equal to the number of

possible combinations *<sup>n</sup> k* ), and the total transport work should be estimated for each of the combinations.

Criterion (11) is the resulting technological parameter that may be used as the base for the calculation of other indicators. The performed ton-kilometers estimated at the level of delivery routes can serve for evaluation of total costs, environmental pollution, and energy consumption for a heterogeneous fleet of vehicles (total transport work may also be used to estimate the values of these indicators for homogenous fleet).

#### **4. Software Implementation**

To implement simulation models of the systems of goods transport by cargo bikes, a library of core classes was created. The library has been developed by using the Python programming language (Version 3.9, Manufacturer–Python Software Foundation), and

it is available in an open-source repository [32]. The structure of the library is presented in Figure 1.

The developed library contains the following core classes that are used to create a simulation model of the cargo bicycle transport system:


The *Net* class contains methods for generating demand as a flow of requests for the given random variables defining parameters of requests (consignment weight, the time interval between requests), the methods for calculating the shortest distance matrix based on the Floyd–Warshall and Dijkstra algorithm, as well as the methods for shaping delivery routes that implement the Clarke–Wright algorithm, the simulated annealing method, and the GA-based heuristic.

#### **5. Case Studies and Discussion**

The use of the proposed methodology of substantiating the loading hub location is presented for a synthetic transport network with two alternative locations and the realworld case study of the Old Town district in Krakow (Poland) with a set of feasible loading hub locations. The considered case studies represent examples of solving the FLP problem for a single facility.

#### *5.1. Synthetic Network Example*

Let us consider the rectangular (square) transport network, where the nodes define the locations of clients. To generate such the network, the dedicated procedure is implemented within the class *Net* [32]: the method takes as arguments the size of the network (the side of the square in nodes) and the random variable representing the length of the network edges. The Figure 2 shows the configuration of the square network with the square side equal to 7 nodes.

**Figure 2.** Synthetic network and alternative locations of loading hubs.

As alternative locations of the loading hub, the nodes connected to the corner node (location A) and the mid-side node (location B) are considered. The hub nodes are connected to the network with the edge that has the weight generated based on the same random variable that was used for the network generation.

To choose the loading hub locations out of two alternatives, the following simulation experiment was conducted:


The results of the computer simulations are shown in Figure 3 as the distributions of random variables representing the total transport work conducted for each of the alternative variants of a loading hub location. The normal distribution of the random variables for both locations is confirmed with the Kolmogorov–Smirnov test for the probability of confidence equal to 0.95.

**Figure 3.** Distributions of the total transport work for the considered alternative hub locations in the synthetic network.

The obtained results of simulations show that the average total transport work for location A (0.757 tkm) is lower than the average value of this indicator for location B (0.789 tkm). On this basis, it can be concluded that the location of the loading hub in the corner of the rectangular network allows reducing the total transport work compared to the option of locating the hub in the middle of the rectangle side. However, it should be noted that such configurations of the network sections length and the parameters of demand are possible when the corner location of the transshipment point is characterized by a greater value of the objective function (in the conducted experiment, the 46 largest values of transport work for location A are greater than the corresponding lowest values of the objective function for location B).

A sufficient number of observations was evaluated for each of the experiment series based on the normal distribution of the objective function. For the significance level equal to 0.05, the sufficient number of the model runs is 29 for the series with location A and 35 for the series where a loading hub is located in node B. As the number of completed runs of the simulation model is bigger than the sufficient number of observations, it can be stated that the obtained results are statistically significant for the considered level of significance (the corresponding probability of confidence equal to 0.95).

The solution for the single-facility location problem obtained based on the rectilinear location model [18] for the considered example is the node located in the center of the network (that is node 24 highlighted at the net presented in Figure 2). Accordingly, when choosing between locations A and B, the node located closer to the center of the network would be a more preferable alternative (evidently, that is node B). As we can see, the solution obtained based on the proposed methodology differs from the conventional approach. This result may be explained by the effect of the routing procedures considered in the simulation model.

#### *5.2. Kraków Old Town Case*

The Old Town district of Krakow is the area with restricted traffic: only 3 streets of this district are not closed for motorized vehicles. At the same time, Krakow Old Town is the main touristic attraction of the region, where hundreds of restaurants, shops, and other touristic objects are located (locations of 730 commercial enterprises out of 954 objects registered in Google Maps are shown in Figure 4).

**Figure 4.** Locations of potential clients in the Old Town district of Krakow.

As far as the district is closed for heavy vehicles, the supply of the objects located in the district is allowed in the morning from 8 p.m. to 9.30 a.m. under the condition that only light good vehicles with the carrying capacity not exceeding 1.5 tons deliver the goods. Such restrictions lead to the increase of logistics expenses due to additional storage costs and the lack of the ability to implement on-time deliveries. The use of cargo bicycles is a solution that allows reducing logistics costs in this situation while preserving the tourist appeal of the district and low level of environmental pollution.

The location of a loading hub for the bikes delivery system in the Old Town of Krakow should be chosen from the set of alternative variants that satisfy the conditions of transport accessibility (both for conventional vehicles and cargo bicycles) and consider the architectural features of historic buildings in the district. The set of such alternative locations is presented in Table 1.


**Table 1.** The alternative locations of a loading hub in the Old Town district of Krakow.

Using the developed software, the transport network model that considers roads available for bicycles was prepared for the district of the Krakow Old Town. Initial data for the network model (locations of the objects and intersections, the type of the objects) were obtained by the means of Google Maps API. Enterprises and non-commercial objects located in the Krakow Old Town (including hotels, shops, restaurants, cafés, museums, and theaters) and the set of alternative locations of a loading hub were considered as the nodes of the graph representing the network model. The graph of the obtained network model is presented in Figure 5.

**Figure 5.** Locations of clients, intersections, and loading hubs in the Krakow Old Town.

To choose the best location of the loading hub, the Monte-Carlo simulations were performed based on the developed model. For each element of the considered set of possible hub locations, 30 runs of the following procedure were conducted:


For the demand simulation procedure, the probability of the request appearance was taken equal to 0.1 for each client and the random variable of the consignment weight was generated as the normally distributed variable with the average value equal to 30 kg and standard deviation equal to 5 kg.

As the result, the samples characterizing random variables of the total transport work were obtained for the alternative loading hub locations (distributions of the total ton-kilometers performed at the delivery routes are shown in Figure 6).

**Figure 6.** Distribution of the total transport work for the considered loading hub locations.

More detailed characteristics of the random variables representing the total transport work for the considered locations are presented in Table 2. To confirm that the obtained samples are big enough to make statistically significant conclusions, the sufficient number of observations was calculated for the level of significance equal to 0.05, assuming that the random variables representing the total transport work have a normal distribution. As it follows from the data presented in Table 2 (the number of performed observations is bigger than the corresponding sufficient number), the obtained results should be considered as statistically significant with the confidence probability equal to 0.95.

Based on the obtained results of computer simulations, locations C and E are the best options in the Krakow Old Town case, as they are characterized by the smallest mean value of the total transport work.


**Table 2.** Results of computer simulations for the considered loading hub locations.

According to the center-of-gravity approach [18], for the set of potential clients located in the Old Town district of Krakow, the center of gravity is the point with coordinates (50.06178, 19.93846). Thus, based on the set of considered alternative locations, the solution of the single-facility location problem is the point closest to the center of gravity. The distances obtained from Google Maps (biking mode) and Euclidean distances between the gravity center and alternative hub locations are shown in Table 3.


**Table 3.** Distances from the gravity center to alternative hub locations.

As follows from the data presented in Table 3, both Euclidean and Google Maps distances to the gravity center are the smallest for location C, while location E is not the best alternative according to the conventional approach. Note that location A is evaluated as the worse possible option by the proposed methodology, as well as by the center-ofgravity approach.

The solution obtained by using the proposed simulation-based approach is superior to the one obtained based on the center-of-gravity approach: by choosing location E, we guarantee the minimum of the average total transport work (with 95% level of the probability of confidence), while by choosing location C, the minimum of the averaged values of the objective function cannot be guaranteed.

#### **6. Conclusions**

The proposed approach to the modeling of goods deliveries by cargo bicycles allows considering the stochastic nature of the demand for transport services and the used technology of the delivery process. The numerical parameters described in the mathematical model are the core characteristics, but the proposed model can be extended to be adapted for solving other optimization problems. Unlike the existing approaches, the proposed method allows taking into account the use of rational delivery routes, which increases the adequacy of the obtained results and the validity of the choice for the loading hub location problem. Since the main tool for choosing the loading hub location is the logistic system simulation, the proposed approach can be expanded by including other technological procedures in the program model in addition to routing.

By using the developed model and its software implementation, the problem of choosing the location for a loading point in the cargo bikes delivery system was solved for the Krakow Old Town district. It should be noted that the obtained results are preliminary: The detailed studies of demand and its simulation in the frame of the proposed model are needed for the real-world implementation of the system of goods delivery by cargo bicycles.

The use of the proposed approach in practice has the following implications:


However, the use of the proposed methodology may have certain limitations placed upon its practical implementation:


As the directions of further research on the developed methodology, the following topics should be mentioned:


**Funding:** This research was funded by European Union's Horizon 2020 research and innovation program, grant agreement No. 769086 (CityChangerCargoBike project).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The numeric results of the described simulations can be found at https://github.com/naumovvs/cargo-bikes-system.

**Acknowledgments:** The description of the developed model and the implemented software is reprinted by permission from Springer Nature Customers Service Center GmbH: Springer Lecture Notes in Networks and System, Choosing the localisation of loading points for the cargo bicycles system in the Krakow Old Town, Naumov V., Starczewski J. © 2019.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Energies* Editorial Office E-mail: energies@mdpi.com www.mdpi.com/journal/energies

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-4454-0