**6. Discussion**

In this paper, we proposed a synthetic data generator (SDG) to create samples of realistic EV session data. Each session is defined by arrival time, departure time and required energy. We described two modeling methodologies to generate arrivals, assuming that inter-arrival times follow exponential distribution. Different methods for modeling the daily profiles of the parameter *λ* were tested. For connection times and required energy, mixture models were trained to estimate the probability distributions. Our real-world dataset was used to train the SDG, and multiple samples of session data were generated.

Inter-arrival times followed an exponential distribution, which was validated by KS test results. Wilcoxon tests were used to compare daily and hourly EV arrivals from the generated samples and real-world data (Figure 4).

Arrival count (AC) models performed better compared to inter-arrival time (IAT) models. The negative binomial model from the AC models outperformed all the other models for generating EV arrivals. Samples generated by the IAT model exhibited high variance, which was introduced during the scale treatment and randomization step (Section 4.1.1). In the IAT models, we note that regression methods failed to capture the morning peak in the hourly arrivals. This occurred because the regression curves failed to capture the sharp increase in the number of arrivals. We indeed see that the polynomial model (IAT:poly) and localized regression model (IAT:loess) generated very low numbers of samples during morning peaks (*ts* = 7, 8). In contrast, the AC models were able to capture both morning and evening peaks, as can be seen in Figure 7. The AC models also captured the variance in number of arrivals throughout all *ts*. During night hours, we noticed a difference between the generated and real-world data. However, for practical purposes, this difference can be neglected as the average number of arrivals is very low. We can see that the negative binomial model performs best for both daily and hourly generation. This makes it ideal for both short and long-term data generation.

In our mixture models, Gaussian mixture models (GMM) were able to properly capture peaks of the conditional probability distributions. We clustered EV sessions based on arrival times and connection times, and each peak in the conditional probability distribution corresponds to one session cluster. Two peaks in required energy distribution represent the morning and evening demand. For the connection times, we can see that after generating the data, all three session cluster peaks were captured (Figure 8). In case of required energy, both morning and evening peaks were captured (Figure 9). Since we were able to capture the probability distributions of *tc* and *E*, GMM-based mixture models could be used for fitting the conditional distributions.

In case of weekdays, during some spring and summer months we observed a very high variance in the number of daily arrivals in actual data (in Figure 6). The reason is that there are multiple holidays during May, July and August that have very low numbers of arrivals. In retrospect, we found that many of these holidays (on weekdays) have arrival profiles similar to weekends. Due to that, arrival models trained for weekdays are unable to capture this variance. Introducing holidays as another daytype (*dt*) or modeling holidays as weekends should help to overcome this limitation.

We modeled the data under the assumption that the number of active charging stations would remain constant during the time in which the EV session data are collected. Furthermore, the generated sample is representative of EV sessions that might occur on this constant number of active charging stations. Hence, the proposed methodology does not model the effect of changing the number of charging stations, where future research is possible.
