**3. Dataset**

The data used here were collected from ELaadNL (https://www.elaad.nl/), which is the knowledge and innovation center mutually associated with providers of charging infrastructure for the grid, to prepare for a future with electric mobility and sustainable charging. Operating since 2009, it has established a network of approximately 2000 public charging stations across The Netherlands. The EV session data collected by ELaadNL are not publicly available, and we obtained them based on an agreement. Furthermore, ElaadNL was not involved in the study, and acted only as a data provider. People interested in the dataset are encouraged to contact us. In this section, we provide the details of the data cleaning and processing, and session clustering steps (Figure 1a).

#### *3.1. SDG Training Data*

The EV sessions' time series data were prepared for training the SDG (the training process is detailed in Section 4). These data contain: the date *d*, month *m*, type of day *dt*, arrival time *tarr*, arrival timeslot *ts*, connection time *tc* and required energy *E*, as shown in Table 1.

Timeslots have values ranging from 1 to 24, where 1 indicates the timespan 00:00–00:59, 2 indicates 01:00–01:59, etc. Further, *ta* and *tc* are real numbers (∈ [0, 24)); e.g., 1.5 means 01:30 A.M. More than 98% of the sessions have *tc* under 24 h, so we safely assumed that the maximum connection time was 24 h (and removed data points with *tc* > 24). In the real world, we will have sessions where the EV departs before it is fully charged. However, the collected data do not include the charging load that was unmet before the EV departed. Lacking such information, we resorted to assuming the measured charging load represents fully charging the EV. We represent this charging load, or energy required by *E* in kWh.

Further, the training data were properly cleaned, which included removing impractical or incorrect sessions parameters (where *E* < 0 or *ta* = *td*).


**Table 1.** Processed session data. Each row corresponds to an EV session.

#### *3.2. Charging Stations Analysis*

The full ELaadNL dataset contains 1.8 million sessions from January 2012 till June 2018. The infrastructure consists of charging stations of 10 different types, divided by manufacturer type, charging speed and other factors. In 2016 the EVnetNL (the infrastructure provider associated with ELaadNL) stations were transformed to integrate smart charging capability. Hardware and software of the charging stations (poles) were updated based on the station type. In 2017, more than 50% of EVnetNL stations were taken over by other charging station operators. Due to those two factors, we observed a sudden drop in the number of daily active charging stations in 2016 and 2017 (Figure 2). The years prior to 2014 have a very steep growth curve in terms of active poles, while from 2016 onwards, the active poles become unpredictable because of market factors. As we wanted our model to reflect charging behavior, and not be influenced by infrastructure changes, we selected the training data from the reasonably stable year 2015. The data used for training our SDG were from January to December 2015 of the ElaadNL dataset. This data contains 365,000 sessions. In 2015, the number of used poles amounted to 1677, out of which 1645 poles were active before and after 2015. We used the data from these 1645 poles for our analysis. Thus, we considered a constant number of poles to construct our SDG model, and avoided the effects of a changing number of EV charging stations.

**Figure 2.** Number of used poles per day, from 2012 to 2018. Each boxplot represents data for 1 month. The *y*-axis represents the number of daily active poles.
